6

Say I have outcome variable $Y_i$ and predictors $X_{i1}$ and $X_{i2}$ for some data point $i$. Wikipedia says that a model is linear when:

the mean of the response variable is a linear combination of the parameters (regression coefficients) and the predictor variables.

I thought this meant that a model can be no more complicated than: $Y_i = \beta_1 X_{i1} + \beta_2 X_{i2}$. However, upon further reading, I found out you could handle non-linear "interactions" of the predictors like in $Y_i = \beta_1 X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i1}\ X_{i2}$ by viewing $X_{i1}\ X_{i2}$ as just another predictor (which happens to be dependent on $X_{i1}$ and $X_{i2}$). This seems to mean that you can use any (linear or non-linear) function of the predictors like $\log(X_{i1} / X_{i2}^2)$ or whatever. Conceptually, this type of "recoding" seems like it should work for the coefficients as well.

So: What exactly are the limits of linear regression, given you can do this kind of manipulation?

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
user3243135
  • 389
  • 2
  • 12

2 Answers2

8

The parameter needs to enter linearly into the equation. So something like $E(Y)=\beta_1 \cos(\beta_2 x_i + \beta_3)$ would not qualify. But you can take functions of the independent variables as follows:

$E(Y)=\beta_0 + \beta_1X_i + \beta_2X^2 + \beta_3 e^{X_i}$

for example.

So the limits of linear regressions are: the mean of the $Y$ values is of the form parameter times (independent variable stuff) + parameter times (more independent variable stuff) ... and so on.

Placidia
  • 13,501
  • 6
  • 33
  • 62
  • Sorry for my math-ineptitude but what is the official definition of "entering linearly" into an equation? – user3243135 May 31 '14 at 04:53
  • user2243865: The parameters enter linearly when they see it as a [linear function](http://en.wikipedia.org/wiki/Linear_function#As_a_linear_map). In the second equation, that's the case, but it isn't in the first equation. – Glen_b May 31 '14 at 07:38
  • Ah, ok, I was confused -- linear means it only needs to be "linear in" the coefficients. – user3243135 May 31 '14 at 08:17
  • user2243865: It's also "linear in the entered predictors". The term $\beta_3e^{x}$ is not linear in $x$, but it *is* linear in the thing $\beta_3$ is the coefficient for ($e^{x}$), which is what gets supplied as the fourth column of the X-matrix. In matrix notation, $E(Y)=X\beta$ is linear in both $\beta$ and $X$.... it's just that the columns of $X$ aren't necessarily linear in some variable, $x$, which is how linear models can fit some forms of nonlinear relationships. – Glen_b Jun 04 '14 at 00:55
  • Placidia: neither of your equations can be true if there's any noise (/error) in your observations (without which why would we need regression at all?). You either need an error term on the RHS or to put an expectation on the LHS. – Glen_b Jun 04 '14 at 00:58
  • Thanks for the correction. I was just thinking about the expression for the expected value. – Placidia Jun 04 '14 at 01:56
5

(Almost) Everything can be expressed as a linear model, if you don't restrict it to a finite number of parameters.

This is the basis of functional analysis and kernel regression (as in SVMs with kernels). For instance, Fourier series - you can produce an infinite sine/cosine series, where the amplitude of the wave of each frequency gets a learned coefficient, and you can learn (almost) any function (any function whose square is integrable - which is a very weak condition).

Kernel machines, and functional analysis, are a wonderful idea, and make the world seem very beautiful - virtually everything is linear!

See http://en.wikipedia.org/wiki/Kernel_methods

The classic statistical probabilistic reference is Grace Wahba's Spline Models for Observational Data.

Joe
  • 1,121
  • 7
  • 11
  • I've only heard of kernels in random and seemingly unrelated places, like for kernel density estimation and convolutions in general. Do you have any recommendations for literature from a more mathematical/probabilistic point-of-view? (as opposed to algorithm/computer science perspectives, which I have a harder time digesting). – user3243135 May 31 '14 at 07:24
  • 2
    I upvoted this for the sheer boldness of the assertion. Mathematics is full of approximation theorems of this sort. But the fact remains that non-linearity is real, with distinct characteristics of its own. An approximation is only good for as far as it goes. And as a practical matter, I do like to limit my models to a finite number of parameters. – Placidia Jun 01 '14 at 03:28
  • user2243865: I updated the post with your reference – Joe Jun 01 '14 at 19:53
  • Placidia: Actually, Fourier and functional analysis are not approximation results, but show that wide classes of functions (i.e. square integrable) can be represented *exactly* by infinite linear models. Approximation comes in where you can regularization/take the first n terms of the series, etc., and come close, so that you have a finite model with low approximation error. – Joe Jun 01 '14 at 19:54
  • 2
    @Joe, I get that. But I'm old fashioned, and when I studied pure mathematics, back the day, infinite series were always addressed as the limits of finite series. So concretely, you are always dealing with a finite series. But even if you could take an infinite series, typically those convergence results only apply on a compact set. Outside that region, the approximand could differ a lot ielike feedback loops, tipping points and singularites, are essentially non-linear behaviours, which perhaps deserve non-linear models. – Placidia Jun 02 '14 at 00:47