D. Hogg, J. Bovy, and D. Lang. (2010)cite arxiv:1008.4686
Comment: a chapter from a non-existent book.
Abstract
We go through the many considerations involved in fitting a model to data,
using as an example the fit of a straight line to a set of points in a
two-dimensional plane. Standard weighted least-squares fitting is only
appropriate when there is a dimension along which the data points have
negligible uncertainties, and another along which all the uncertainties can be
described by Gaussians of known variance; these conditions are rarely met in
practice. We consider cases of general, heterogeneous, and arbitrarily
covariant two-dimensional uncertainties, and situations in which there are bad
data (large outliers), unknown uncertainties, and unknown but expected
intrinsic scatter in the linear relationship being fit. Above all we emphasize
the importance of having a "generative model" for the data, even an approximate
one. Once there is a generative model, the subsequent fitting is non-arbitrary
because the model permits direct computation of the likelihood of the
parameters or the posterior probability distribution. Construction of a
posterior probability distribution is indispensible if there are "nuisance
parameters" to marginalize away.
%0 Generic
%1 Hogg2010
%A Hogg, David W.
%A Bovy, Jo
%A Lang, Dustin
%D 2010
%K analysis data fitting model
%T Data analysis recipes: Fitting a model to data
%U http://arxiv.org/abs/1008.4686
%X We go through the many considerations involved in fitting a model to data,
using as an example the fit of a straight line to a set of points in a
two-dimensional plane. Standard weighted least-squares fitting is only
appropriate when there is a dimension along which the data points have
negligible uncertainties, and another along which all the uncertainties can be
described by Gaussians of known variance; these conditions are rarely met in
practice. We consider cases of general, heterogeneous, and arbitrarily
covariant two-dimensional uncertainties, and situations in which there are bad
data (large outliers), unknown uncertainties, and unknown but expected
intrinsic scatter in the linear relationship being fit. Above all we emphasize
the importance of having a "generative model" for the data, even an approximate
one. Once there is a generative model, the subsequent fitting is non-arbitrary
because the model permits direct computation of the likelihood of the
parameters or the posterior probability distribution. Construction of a
posterior probability distribution is indispensible if there are "nuisance
parameters" to marginalize away.
@misc{Hogg2010,
abstract = { We go through the many considerations involved in fitting a model to data,
using as an example the fit of a straight line to a set of points in a
two-dimensional plane. Standard weighted least-squares fitting is only
appropriate when there is a dimension along which the data points have
negligible uncertainties, and another along which all the uncertainties can be
described by Gaussians of known variance; these conditions are rarely met in
practice. We consider cases of general, heterogeneous, and arbitrarily
covariant two-dimensional uncertainties, and situations in which there are bad
data (large outliers), unknown uncertainties, and unknown but expected
intrinsic scatter in the linear relationship being fit. Above all we emphasize
the importance of having a "generative model" for the data, even an approximate
one. Once there is a generative model, the subsequent fitting is non-arbitrary
because the model permits direct computation of the likelihood of the
parameters or the posterior probability distribution. Construction of a
posterior probability distribution is indispensible if there are "nuisance
parameters" to marginalize away.
},
added-at = {2010-10-05T14:49:44.000+0200},
author = {Hogg, David W. and Bovy, Jo and Lang, Dustin},
biburl = {https://www.bibsonomy.org/bibtex/235c6a5301597765397fbbd84c68f233e/ihuston},
description = {Data analysis recipes: Fitting a model to data},
interhash = {c1f75c2a8736890c2063fd12695a6cf0},
intrahash = {35c6a5301597765397fbbd84c68f233e},
keywords = {analysis data fitting model},
note = {cite arxiv:1008.4686
Comment: a chapter from a non-existent book},
timestamp = {2010-10-05T14:49:44.000+0200},
title = {Data analysis recipes: Fitting a model to data},
url = {http://arxiv.org/abs/1008.4686},
year = 2010
}