Difference between regression analysis and curve fitting

Question

Can anybody please explain to me the real difference(s) between regression analysis and curve fitting (linear and nonlinear), with an example if possible?

It seems that both try to find a relationship between two variables (dependent vs independent) and then determine the parameter (or coefficient) associated with the models being proposed. For example, if I have a set of data like:

Y = [1.000 1.000 1.000 0.961 0.884 0.000] 
X = [1.000 0.063 0.031 0.012 0.005 0.000]

Can anybody suggest a correlation formula between these two variables? I am having a difficulty understanding the difference between these two approaches. If you prefer to support your answer with other data sets, it's OK since that one seems hard to fit (perhaps only for me).

The above data set represents the $x$ and $y$ axes of a receiver operating characteristic (ROC) curve, where $y$ is the true positive rate (TPR) and $x$ is the false positive rate (FPR).

I am trying to fit a curve, or do a regression analysis as per my original question, not sure yet, among these points to estimate the TPR for any particular FPR (or vice-versa).

First, is it scientifically acceptable to find such a curve fitting function between two independent variables (TPR and FPR)?

Second, is it scientifically acceptable to find such a function if I know that the distributions of the actual negative and the actual positive cases are not normal?

Terms are (unfortunately) used differently by different people & in different contexts. Can you link to / provide an example where people are distinguishing between them? — gung - Reinstate Monica, May 08 '15 at 17:31
That's what I am trying to figure out, how they are different and how I can distinguish between them. — Ali Sultan, May 08 '15 at 17:34
Fair enough, but did somebody tell you they were supposed to be different? — gung - Reinstate Monica, May 08 '15 at 17:36
On this site some people have used "curve fitting" in senses that cannot be considered regression. For instance, some of them view estimating a density as a form of "curve fitting" to a histogram. — whuber, May 08 '15 at 18:39

Nick Cox · Answer 1 · 2015-05-08T18:06:29.140

26

I doubt that there is a clear and consistent distinction across statistically minded sciences and fields between regression and curve-fitting.

Regression without qualification implies linear regression and least-squares estimation. That doesn't rule out other or broader senses: indeed once you allow logit, Poisson, negative binomial regression, etc., etc. it gets harder to see what modelling is not regression in some sense.

Curve-fitting does literally suggest a curve that can be drawn on a plane or at least in a low-dimensional space. Regression is not so bounded and can predict surfaces in a several dimensional space.

Curve-fitting may or may not use linear regression and/or least squares. It might refer to fitting a polynomial (power series) or a set of sine and cosine terms or in some other way actually qualify as linear regression in the key sense of fitting a functional form linear in the parameters. Indeed curve-fitting when nonlinear regression is regression too.

The term curve-fitting could be used in a disparaging, derogatory, deprecatory or dismissive sense ("that's just curve fitting!") or (almost the complete opposite) it might refer to fitting a specific curve carefully chosen with specific physical (biological, economic, whatever) rationale or tailored to match particular kinds of initial or limiting behaviour (e.g. being always positive, bounded in one or both directions, monotone, with an inflexion, with a single turning point, oscillatory, etc.).

One of several fuzzy issues here is that the same functional form can be at best empirical in some circumstances and excellent theory in others. Newton taught that trajectories of projectiles can be parabolic, and so naturally fitted by quadratics, whereas a quadratic fitted to age dependency in the social sciences is often just a fudge that matches some curvature in the data. Exponential decay is a really good approximation for radioactive isotopes and a sometimes not too crazy guess for the way that land values decline with distance from a centre.

Your example gets no explicit guesses from me. Much of the point here is that with a very small set of data and precisely no information on what the variables are or how they are expected to behave it could be irresponsible or foolish to suggest a model form. Perhaps the data should rise sharply from (0, 0) and then approach (1, 1), or perhaps something else. You tell us!

Note. Neither regression nor curve-fitting is limited to single predictors or single parameters (coefficients).

edited May 08 '15 at 18:06

answered May 08 '15 at 17:43

Nick Cox

48,377
8
110
156

2

"Curve-fitting" connotes something a-theoretical (eg, lowess) to me. Economists sometimes deride a-theoretical function fitting as 'charting', which sounds similar to some usages of curve-fitting. I think that it (eg lowess) has both pros & cons, when understood correctly. It's hard to know how somebody meant the terms distinctly w/o more context, though. – gung - Reinstate Monica May 08 '15 at 17:55
1

@gung I think there is similar part-jocular, part-serious usage across several natural (and unnatural) sciences. One of the issues is that given enough parameters, you necessarily have a lot of wiggle room. I'm reminded of time series models that allow not just ARIMA but also sinusoidal terms and steps, ramps and spikes wherever the data suggest. – Nick Cox May 08 '15 at 18:00
I second @gung, curve fitting has a more nonparametric connotation, at least to me. – Christoph Hanck May 08 '15 at 18:22
1

@ChristophHanck Please don't bring "nonparametric" into this! The discussion is muddy enough already! – Nick Cox May 08 '15 at 18:33
1

@gung: Thinking of smoothing splines and RKHS methods in general as the backbone of "curve-fitting" for example I feel "curve fitting" to be much more theoretical than "regression". (+1 to NickCox for this answer) – usεr11852 May 09 '15 at 00:46
@usεr11852, I have no idea what to do w/ that comment. The idea that taking some data & plotting a lowess curve is "much more theoretical" than fitting a regression model to test specific hypotheses baffles me. – gung - Reinstate Monica May 10 '15 at 15:18
@gung: I did not focus on lowess only; I specifically mentioned smoothing splines (and RHKS methods) in general for curve fitting. The concepts of a [splines](https://en.wikipedia.org/wiki/Spline_%28mathematics%29) and spline interpolation is much more mathematically involved (to me at least) than fitting a regression. Even if you focus only in lowess, one has to think *why* it works. After all lowess implementation-wise is a series of regression models glued together so it has to be a superset of a simple regression model. – usεr11852 May 10 '15 at 17:31
@usεr11852, splines & lowess are generalizations of regression & thus are more advanced in some sense. The *statistical* theory can be seen as further along, but statistical techniques are almost always applied to some actual data to advance understanding of a *substantive topic*. The use of those techniques is sometimes derided as 'curve-fitting' in that the analyst is giving up on understanding the nature of the relationship & just wants something that will account for whatever curve might exist. – gung - Reinstate Monica May 10 '15 at 17:56
When technical analysts look at a stock & say 'this has been going up & down, so it may breakout soon' w/o taking into account anything about the nature of the firm or their products (is their demographic customer base expanding, etc), this is derided as curve-fitting. They get something sufficiently wiggly to fit whatever might be there w/o thinking at all about what is known about business, growth, etc. There may be statistical theories behind splines, but the use in that context is definitely a-theoretical. – gung - Reinstate Monica May 10 '15 at 17:59
@gung: Thank you for clarifying what you mean by "a-theoretical". – usεr11852 May 10 '15 at 18:43

Aleksandr Blekh · Answer 2 · 2015-05-09T01:25:44.777

8

In addition to @NickCox's excellent answer (+1), I wanted to share my subjective impression on this somewhat fuzzy terminology topic. I think that a rather subtle difference between the two terms lies in the following. On one hand, regression often, if not always, implies an analytical solution (reference to regressors implies determining their parameters, hence my argument about analytical solution). On the other hand, curve fitting does not necessarily imply producing an analytical solution and IMHO often might be and is used as an exploratory approach.

edited May 09 '15 at 01:25

answered May 08 '15 at 17:59

Aleksandr Blekh

7,867
2
27
93

2

Can't something with an analytical solution be used for exploratory reasons too? I don't think I get the opposition you are making. – amoeba May 09 '15 at 13:08
@amoeba: Analytical solutions certainly can be used for exploratory research as well. However, the point I am making is about the **most popular** _implied essence_ of the terms in question. – Aleksandr Blekh May 09 '15 at 23:56

score 0 · Answer 3 · answered Feb 26 '20 at 08:45

As there already seems to be an adequate array of explanations of Regression Analysis vs Curve Fitting, I’ll leave that alone. However, there is an additional question buried in the OP’s original question. There’s very little ‘given data’, but he asked if someone could suggest a correlation formula, so I’ll add my 2 cents.

I don’t have any experience with ROC Curves etc..., however, plotting the data gives a strong indication that it’s a First-Order System $\frac{1}{\tau\centerdot s+1}$ (in Laplace terminology) responding to a Step-Input $\frac{1}{s}$ (in Laplace terminology). Obviously, the time constant is very small, yielding an extremely fast steady-state. I’ll assume that the dependent variable is ‘y’, and the independent variable is ‘t’.

A general equation for a 1st order process is $y=A[1 – B \centerdot e^ {-\frac {t}{\tau}}]$, where $\tau$ is the process time constant (where 1 $\centerdot\tau$ generates approx 63.2% of the response).

Using your data, $@t=\infty, y=1$, therefore, $A=1$. The general model now becomes $y=1 – B \centerdot e^ {-\frac {t}{\tau}}$.

In addition, $@t=0, y=0$, therefore $B=1$. The model is now $y=1 – e^ {-\frac {t}{\tau}}$.

Regressing (or curve fitting) your data to this equation yields $y=1 – e^ {-\frac {t}{0.0023}}$. Explicitly, this says that in $\approx 0.0023\space sec$, 63.2% of the response has completed.

I have the benefit of not knowing your specific noise or data variance, so I took the liberty to curve fit a 2-parameter model to see if it tightens the error. That yielded the model $y=1 – 0.99972 \centerdot e^ {-\frac {t}{0.0023}}$.

I’d recommend taking more data on the early portion of the response, and ignore sampling after the response lines out, as there’s nothing interesting to model at steady-state. In addition, the $exp()$ argument is getting extremely high the further you go out into steady-state.

Difference between regression analysis and curve fitting

3 Answers3

Linked