Is a model fitted to data or is data fitted to a model?

Question

Is there a conceptual or procedural difference between fitting a model to data and fitting data to model? An example of the first wording can be seen in https://courses.washington.edu/matlab1/ModelFitting.html, and of the second in https://reference.wolfram.com/applications/eda/FittingDataToLinearModelsByLeast-SquaresTechniques.html.

+1 I am not impressed by the second link, but I am entertained. — The Laconic, Mar 24 '19 at 02:38
_Many_ models fits current data, but data typically fits **best** one model — Agnius Vasiliauskas, Mar 25 '19 at 09:45
I asked myself the other day: is it the pants that don't fit me anymore? or is it me who doesn't fit into my pants now? it's a very important question because the answers determine the course of action: go to a tailor or go to a gym — Aksakal, Jan 10 '22 at 20:15

Matthew Drury · Accepted Answer · 2019-03-25T23:55:01.947

Pretty much every source or person I've ever interacted with except the Wolfram source you linked refers to the process as fitting a model to data. This makes sense, since the model is the dynamic object and the data is static (a.k.a. fixed and constant).

To put a point on it, I like Larry Wasserman's approach to this. In his telling, a statistical model is a collection of distributions. For example, the collection of all normal distributions:

$$ \{ \text{Normal}(\mu, \sigma) : \mu, \sigma \in R, \sigma > 0 \} $$

or the set of all Poisson distributions:

$$ \{ \text{Poisson}(\lambda) : \lambda \in R, \lambda > 0 \} $$

Fitting a distribution to data is any algorithm that combines a statistical model with a set of data (the data is fixed), and chooses exactly one of the distributions from the model as the one that "best" reflects the data.

The model is the thing that changes (sort of): we are collapsing it from an entire collection of possibilities into a single best choice. The data is just the data; nothing happens to it at all.

score 18 · Answer 2 · answered Mar 24 '19 at 14:39

In the field of Rasch modelling it is common to fit the data to the model. The model is assumed to be correct and it is the analyst's job to find data which conform to it. The Wikipedia article on Rasch contains more details about the how and the why.

But I agree with others that in general in statistics we fit the model to the data because we can change the model but it is felt to be bad form to select or modify the data.

score 7 · Answer 3 · answered Mar 24 '19 at 08:13

7

Typically, the observed data are fixed while the model is mutable (e.g. because parameters are estimated), so it is the model that is made to fit the data, not the other way around. (Usually people mean this case when they say either expression.)

When people say they fit data to a model I find myself trying to figure out what the heck did they do to the data?.

[Now if you're transforming data, that would arguably be 'fitting data to a model', but people almost never say that for this case.]

answered Mar 24 '19 at 08:13

Glen_b

257,508
32
553
939

5

Removing outliers would also (arguably) be "fitting data to a model". – Federico Poloni Mar 24 '19 at 08:34
1

The phrasing might make sense if they're thinking of it as "fitting (data to a model)". That is, you're doing a process of fitting, and that process of fitting starts from data and transforms it to a model. I agree that's a less common/accurate interpretation versus the "(fitting X) to Y" parse, but I put it out there as a rationale as to why someone might logically say it. – R.M. Mar 24 '19 at 13:29
1

@FedericoPoloni Outliers are usually defined indepedently of the model that you later want to use. So even if we would want to call it fitting data, it would not be a model, but to something else. – BartoszKP Mar 24 '19 at 20:16
1

+1. There is a reason it's called "data" - it is what is *given*, see the Latin origin of the word: http://latindictionary.wikidot.com/verb:dare – Christoph Hanck Mar 25 '19 at 15:37

score 2 · Answer 4 · answered Mar 24 '19 at 21:34

Usually, we assume our data corresponds to the "real world" and making any modifications means we are moving away from modelling the "real world". For example, one needs to take care removing outliers since even if it makes computation nicer, outliers were still part of our data.

When testing a model or estimating properties of an estimator using bootstrap or other resampling techniques, we may simulate new data using an estimated model and our original data. This makes the assumption that the model is correct, and we are not modifying our original data.

score 0 · Answer 5 · answered Jan 10 '22 at 19:58

Interesting topic here and I could not let it pass me by without saying a few words. Here is my take on this. From a scientific point of view, a model by definition is a conceptual representation (universally accepted as a reference) of an event or process. With that in mind, it is my understanding that any data out there is a dynamic entity, meaning it changes over time. Unlike statistical models that are static(reference) and were developed to describe certain phenomena. Having said that, the data at hand is also a representation of an event but that is yet to be validated due to its dynamic nature. The process by which the data is validated (compared with a model/reference already in existence) is what we call data modeling or model fitting of the data. In conclusion, we should rather be saying "FITTING the DATA to A MODEL."

Is a model fitted to data or is data fitted to a model?

5 Answers5