Why is the use of high order polynomials for regression discouraged?

Question

I've read many times on this site that high order polynomials (generally more than third) shouldn't be used in linear regression, unless there is a substantial justification to do so.

I understand the issues about extrapolation (and prediction at the boundaries).

Since extrapolation isn't important to me...

Are high order polynomials also a bad way of approximating the underlying function within the range of the data points? (i.e. interpolation)
If so, what problems are arising?

I don't mind being redirected to a good book or paper about this. Thanks.

in short, they are usually less generalisable and not robust against outliers and/or noise. — Our, Oct 21 '21 at 20:41
A number of previous questions on site relate to this issue. — Glen_b, Oct 23 '21 at 04:06

score 41 · Accepted Answer · answered Oct 20 '21 at 12:34

41

I cover this in some detail in Chapter 2 of RMS. Briefly, besides extrapolation problems, ordinary polynomials have these problems:

The shape of the fit in one region of the data is influenced by far away points
Polynomials cannot fit threshold effects, e.g., a nearly flat curve that suddenly accelerates
Polynomials cannot fit logarithmic-looking relationships, e.g., ones that get progressively flatter over a long interval
Polynomials can't have a very rapid turn

These are reasons that regression splines are so popular, i.e., segmented polynomials tend to work better than unsegmented polynomials. You can also relax a continuity assumption for a spline if you want to have a discontinuous change point in the fit.

answered Oct 20 '21 at 12:34

Frank Harrell

74,029
5
148
322

I agree, but you missed that there are ***much*** better tools for fitting non-linear regressions/dispensing with linearity assumption, namely nonparametric regressions, GAMs, etc. :) – Alexis Oct 21 '21 at 03:59
7

Have to disagree. You'll be surprised how little difference there is between a much simpler natural spline fit and a more cumbersome and slower nonparametric fit as in GAM. – Frank Harrell Oct 21 '21 at 11:40
OP asks specifically about high-order polynomials. These four answers all seem to apply to polynomials generally, and not to high-order polynomials in particular. – Ceph Oct 21 '21 at 14:29
1

@FrankHarrell natural splines good as well, but if you prefer higher order polynomial regression to nonparametric regression, all power too you. ;) – Alexis Oct 21 '21 at 16:22
6

Please read what I wrote more carefully. I don't like high-order regular polynomial regression. I prefer regression splines (especially restricted cubic splines AKA natural splines) over nonparametric regression, for many reasons. – Frank Harrell Oct 21 '21 at 17:01
We were laughing in physics class when I took out my old old program because I couldn't figure out the correct equation and asked it to fit a ninth-order polynomial, intending to convert it to a Taylor series. It didn't work; the first terms of the Taylor series weren't even close to anything we could recognize. Later, on having the correct equation in hand we discovered the ninth order polynomial fit the curve better than the the correct equation. Oops. – Joshua Oct 23 '21 at 03:20
1

Probably it's just due to the short formulation, but as they are written 2-4 contradict the [Stone-Weierstrass theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem). – Federico Poloni Oct 23 '21 at 14:31
Don't mix short interval approximation accuracy with long interval goodness of fit. – Frank Harrell Oct 24 '21 at 12:52

score 34 · Answer 2 · edited Oct 21 '21 at 12:51

Yes, polynomials are also problematic in interpolation, because of overfitting and high variability.

Here is an example. Assume your dependent variable $y$ is uniformly distributed on the interval $[0,1]$. You also have a "predictor" variable $x$, also uniformly distributed on the interval $[0,1]$. However, there is no relationship between the two. Thus, any regression of $y$ on any power of $x$ will be overfitting.

Now, assume we draw 20 data points from this data generating process and fit $y\sim x^n$ for $n=1, 2, 3, 5$. Here are the fits:

As you see, the fit gets "wigglier" for higher $n$.

However, one key problem is that (of course) the fit will depend on the data we have randomly sampled from our data generating process. After all, we could have drawn 20 quite different pairs $(x,y)$. Let's repeat the exercise another three times, with new random samples each time. Below, the top row is the same as the previous plot, and the three other rows are just the fits based on new samples:

The main problem is visible when you compare the left column (linear fits) and the right column ($x^5$ fits): the fit for a lower order polynomial is much less variable and dependent on the randomness in our data sampling than the fit for the high order polynomial. If we want to interpolate $y$ for some $x$ even somewhere in the middle of the interval, using a higher order polynomial will yield a fit that is much more wobbly than a lower order polynomial.

R code:

nn <- 20
xx <- seq(0,1,by=.01)

png("sims_1.png",width=800,height=200)
    opar <- par(mfrow=c(1,4),mai=c(.3,.3,.3,.1),las=1)
    set.seed(1)
    obs <- data.frame(x=runif(nn),y=runif(nn))
    for ( exponent in c(1,2,3,5) ) {
        model <- lm(y~poly(x,exponent),obs)
        plot(obs$x,obs$y,pch=19,xlab="",ylab="",main=paste("Exponent:",exponent),xlim=c(0,1),ylim=c(0,1))
        lines(xx,predict(model,newdata=data.frame(x=xx)),col="red",lwd=2)
    }
dev.off()

png("sims_2.png",width=800,height=800)
    opar <- par(mfrow=c(4,4),mai=c(.3,.3,.3,.1),las=1)
    for ( jj in 1:4 ) {
        set.seed(jj)
        obs <- data.frame(x=runif(nn),y=runif(nn))
        for ( exponent in c(1,2,3,5) ) {
            model <- lm(y~poly(x,exponent),obs)
            plot(obs$x,obs$y,pch=19,xlab="",ylab="",main=paste("Exponent:",exponent),xlim=c(0,1),ylim=c(0,1))
            lines(xx,predict(model,newdata=data.frame(x=xx)),col="red",lwd=2)
        }
    }
    par(opar)
dev.off()

To be fair, this issue isn't really specific to high-order _polynomials_, is it? It's just a matter of having many parameters. — leftaroundabout, Oct 23 '21 at 00:29
@leftaroundabout: yes indeed. Overfitting can happen by having spurious predictors, or by transforming those that *are* useful too exotically (e.g., by using all possible interactions). Per [John Madden's answer](https://stats.stackexchange.com/a/549220/1352), regularization is always a good idea. — Stephan Kolassa, Oct 23 '21 at 08:21

score 12 · Answer 3 · answered Oct 20 '21 at 12:16

12

If your goal is interpolation, you typically want the simplest function that describes your observations and avoid overfitting.

Given that it is unusual to see physical laws and relationships which contain powers higher than 3, using higher order polynomials when there is no intuitive justification is taken to be a sign of overfitting.

See also this similar question.

answered Oct 20 '21 at 12:16

Adam Kells

908
1
12

2

The "higher than 3" applies if you have pre-transformed the variables in question correctly. For example there are many relationships that are linear once you've logged a variable. But if you don't know to log it, multiple parameters will need to be estimated to allow estimation of the whole transformed effect. – Frank Harrell Oct 21 '21 at 12:54

score 11 · Answer 4 · answered Oct 21 '21 at 02:25

11

Runge's phenomenon can lead to high-degree polynomials being much wigglier than the variation actually suggested by the data. An appeal of splines as a substitute for high-degree polynomials, particularly natural splines, is to allow nonmonotonicity and varying slopes without varying too wildly. I would be hard-pressed to find an example of real data for which a fourth- or higher-degree polynomial seems a more reasonable choice than a spline.

answered Oct 21 '21 at 02:25

Kodiologist

19,063
2
36
68

Asteroid trajectory data would be an example that should generally be better handled by a high-order polynomial, than by splines. (Of course, a more directly physics-motivated model would be even better, but it so happens that the solution to the physical ODEs is an analytic function with a well-behaved Taylor expansion, and that would then match the polynomial fit.) – leftaroundabout Oct 23 '21 at 00:37

John Madden · Answer 5 · 2021-10-21T16:51:40.277

As an inveterate contrarian, I feel the need to amend the premise that high order polynomials shouldn't be used for interpolation. I would argue that the correct statement is "high order polynomials make poor interpolants unless properly regularized".

Indeed, it is quite popular (at least in academic circles) to conduct interpolation with polynomials of infinite degree, which is referred to as Kriging when a particular regularizer is used (the so called Reproducing Kernel Hilbert Space regularization, which intuitively speaking penalizes both a function's $L_2$ norm and its tendency to oscillate). And, of course, by infinite degree, we mean it is possible to construct a sequence of arbitrarily high degree polynomials which converge to this reasonable interpolant.

Additionally, recent work has shown that by increasing the degree to 2 or 3 times the size of the data (but keeping it finite), together with appropriate regularization, can still perform well. I can't find it just this moment, but this work was based on a similar discovery in the context of neural networks.

Generically, regularization prevents many of the issues brought up by other answerers.

Good point, although I think Frank Harrell's answer does a reasonably good job explaining why polynomials would be a *worse* choice of basis than splines (i.e., separating the issue of model complexity/degrees of freedom from the way in which model complexity is decomposed) — Ben Bolker, Oct 21 '21 at 22:37

Why is the use of high order polynomials for regression discouraged?

5 Answers5