Interpretation of simultaneous and independent ordinary least squares regression

Question

I'm using ordinary least squares to regress a noisy overdetermined system.

$$y = \beta_0 x_0 + \beta_1 x_1$$

For comparison, I'm also solving the independent equations

\begin{align} y &= \beta_0 x_0 \\ y &= \beta_1 x_1 \end{align}

I'm surprised to find that sometimes when the independent solutions are all positive, some of the simultaneous solution elements are negative. What does it mean when that occurs? Can I conclude anything about my data set? Can I conclude that my data set violates the no autocorrelation assumption about OLS regression?

Should I use Feasible Generalized Least Squares?

Here is an example of a small data set for which the independent solutions are positive, but the simultaneous solution has negative elements.

#! /usr/bin/env runhaskell

import System.IO
import Data.Functor
import Numeric.LinearAlgebra
import Numeric.LinearAlgebra.Data
import Numeric.LinearAlgebra.HMatrix

main :: IO ()
main = do

    putStr "independent  β = "
    print $ (<\> y) . asColumn <$> toColumns x

    putStr "simultaneous β = "
    print $ x <\> y

    where

    x = matrix 2
        [ 1, 1
        , 2, 4
        , 3, 9
        ]

    y = vector
        [ 1
        , 2
        , 9
        ]

Output:

independent  β = [[2.285714285714285],[0.9183673469387755]]
simultaneous β = [-1.3684210526315748,1.4210526315789458]

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

The problem here isn't autocorrelation, it's multicollinearity (the correlation between $x_1$ and $x_2$ is $.99$). You don't (necessarily) want to use GLS, but you may want to use a method like ridge regression. Lastly, you are suppressing the intercept, which is not a good idea (see: When is it OK to remove the intercept in lm()?, and When forcing intercept of 0 in linear regression is acceptable/advisable).

To understand what is happening in your example (i.e., why the signs on $x_1$ flips when controlling for $x_2$, see my answer here: Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?

Interpretation of simultaneous and independent ordinary least squares regression

1 Answers1