Article,

On The Problem of Relevance in Statistical Inference

S. Mukhopadhyay, and K. Wang.
(2020)cite arxiv:2004.09588Comment: We'd love to hear your feedback. Email us. (We thank those who have already sent us their comments.).

Abstract

How many statistical inference tools we have for inference from massive data? A huge number, but only when we are ready to assume the given database is homogenous, consisting of a large cohort of "similar" cases. Why we need the homogeneity assumption? To make `learning from the experience of others' or `borrowing strength' possible. But, what if, we are dealing with a massive database of heterogeneous cases (which is a norm in almost all modern data-science applications including neuroscience, genomics, healthcare, and astronomy)? How many methods we have in this situation? Not much, if not ZERO. Why? It's not obvious how to go about gathering strength when each piece of information is fuzzy. The danger is that, if we include irrelevant cases, borrowing information might heavily damage the quality of the inference! This raises some fundamental questions for big data inference: When (not) to borrow? Whom (not) to borrow? How (not) to borrow? These questions are at the heart of the "Problem of Relevance" in statistical inference -- a puzzle that has remained too little addressed since its inception nearly half a century ago. Here we offer the first practical theory of relevance with precisely describable statistical formulation and algorithm. Through examples, we demonstrate how our new statistical perspective answers previously unanswerable questions in a realistic and feasible way.

BibTeX key: mukhopadhyay2020problem
entry type: article
year: 2020
url: http://arxiv.org/abs/2004.09588
note: cite arxiv:2004.09588Comment: We'd love to hear your feedback. Email us. (We thank those who have already sent us their comments.)

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{mukhopadhyay2020problem, abstract = {How many statistical inference tools we have for inference from massive data? A huge number, but only when we are ready to assume the given database is homogenous, consisting of a large cohort of "similar" cases. Why we need the homogeneity assumption? To make `learning from the experience of others' or `borrowing strength' possible. But, what if, we are dealing with a massive database of heterogeneous cases (which is a norm in almost all modern data-science applications including neuroscience, genomics, healthcare, and astronomy)? How many methods we have in this situation? Not much, if not ZERO. Why? It's not obvious how to go about gathering strength when each piece of information is fuzzy. The danger is that, if we include irrelevant cases, borrowing information might heavily damage the quality of the inference! This raises some fundamental questions for big data inference: When (not) to borrow? Whom (not) to borrow? How (not) to borrow? These questions are at the heart of the "Problem of Relevance" in statistical inference -- a puzzle that has remained too little addressed since its inception nearly half a century ago. Here we offer the first practical theory of relevance with precisely describable statistical formulation and algorithm. Through examples, we demonstrate how our new statistical perspective answers previously unanswerable questions in a realistic and feasible way.}, added-at = {2020-05-03T11:00:03.000+0200}, author = {Mukhopadhyay, Subhadeep and Wang, Kaijun}, biburl = {https://www.bibsonomy.org/bibtex/2985bd4fbf06f398352dfc6c854f321d5/kirk86}, description = {[2004.09588] On The Problem of Relevance in Statistical Inference}, interhash = {518a71afccdbedc9bf1b59099b0363c2}, intrahash = {985bd4fbf06f398352dfc6c854f321d5}, keywords = {inference stats}, note = {cite arxiv:2004.09588Comment: We'd love to hear your feedback. Email us. (We thank those who have already sent us their comments.)}, timestamp = {2020-05-03T11:00:03.000+0200}, title = {On The Problem of Relevance in Statistical Inference}, url = {http://arxiv.org/abs/2004.09588}, year = 2020 }

BibSonomy

On The Problem of Relevance in Statistical Inference

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on