Simple linear regression fit manually via matrix equations does not match lm() output

Question

I am trying to fit a linear model using matrices to my data set even though I can use OLS and do it without matrices as a simple tutorial for myself to better understand both R and matrix notation.

This is the model I am trying to fit:

$$\bf Y=X\boldsymbol\beta+\varepsilon$$

where $\bf Y$ is a $1\times n$ matrix, $\bf X$ is a $n\times k$ matrix (where $k$ is the number of $\beta$'s, which in this case is 2), $\boldsymbol \beta$ is a $k\times 1$ matrix and lastly our error term is $n\times 1$. I understand this portion.

When I simply use the lm() command to fit my data, I get the following from the summary() command:

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0503 -1.4390  0.4921  1.0589  3.9446 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.9849     0.8219   3.632  0.00191 ** 
x             0.5612     0.1084   5.178 6.32e-05 ***

So the summary() is telling me that the $\beta$ matrix is a $2\times 1$ matrix, with the first number (which is $\beta_0$) as 2.0949 and the second number (which is $\beta_1)$ as 0.1084. My question is this:

We know that the matrix $\beta$ is actually:

$$\boldsymbol\beta=(\bf{X}^T\bf{X})^{-1}(\bf{X}^T\bf{Y})$$

and when I try to simply carry out this calculate by hand using R using b=(t(x)*x)^-1*t(x)*y, I get a $1\times 20$ vector (where $20$ of course is $n$, the number of observations). Why am I not getting a $2\times 1$ matrix like I should be getting?

Welcome to our site! Please use the formatting tools available when you are editing so that your code is readable. Your "code" for $b$ obviously is not correct `R` code so it's impossible to tell what you're doing wrong. I believe there is `R` code posted on this site doing exactly what you're trying, so searches on relevant keywords like [tag:r] and [tag:regression] as well as on likely parts of the code (such as `solve`) might turn up some useful stuff for you. — whuber, Feb 09 '14 at 23:35
Thanks so much for your guidance! I rewrote the code that I am trying to evaluate and I have searched for "solve" and "ginv" although they are not working as they should, or most probably I am using them wrong! — nicefella, Feb 10 '14 at 00:03
You don't invert a matrix with `^-1` in R, and in fact shouldn't explicitly invert $X^TX$ at all. (Also you don't do matrix multiplication with `*`). You should solve $(X^TX)\hat\beta = (X^TY)$. See `?solve`, which does both solution of linear systems and inversion. That's still not the best function to use if you're trying to be accurate (QR decomposition is probably the most common way these days), but it will do for getting the ideas down. — Glen_b, Feb 10 '14 at 00:36

Sycorax · Accepted Answer · 2016-10-30T01:22:08.877

You've made two mistakes in your R code for b.

solve is used for matrix inversion. Raising X to the $-1$ power inverts each element of X, which can occasionally be useful, but is not what we want here.
R uses the operator %*% for matrix multiplication. Otherwise, it does element-wise multiplication and requires your arrays to be conformable according to R's handling of vectors. Again, occasionally useful.

This is all explained in more detail in the R documentation, or the nigh-uncountable intro to R materials online, such as this one.

b<-solve(t(X)%*%X)%*%t(X)%*%y is the literal representation of the normal equations, while b<-solve(crossprod(X), crossprod(X,y)) is faster and more idiomatic, and matches the output of lm(). But you definitely don't want to explicitly compute the normal equations directly for numerical reasons (and R won't warn you until you lose all significant figures). The lm method uses QR decomposition by default, which is more numerically stable. If you apply the correct "hand cranked" computation and it still differs from the R output, it's plausibly because of loss of numerical precision due to the explicit inversion of $X^TX$.

Don't forget to add a column of $1$ for the intercept!

Thanks so much for your answer. However, when I run the command you have provided, I get a $1$x$1$ matrix, a scalar, which = 0.8770944. Surely this cannot be the correct answer since the lm function gives two estimates, one for the intercept and one for x! Thanks for your guidance. — nicefella, Feb 10 '14 at 00:36
I forgot to add the column of $1$s to my $X$ matrix. Thanks the code works well now :) — nicefella, Feb 10 '14 at 00:57

score 3 · Answer 2 · edited Apr 13 '17 at 12:44

You need to include a column of ones in your matrix x, which corresponds to the intercept. The lm() function does this for you automatically, but you need to add this yourself when you calculate the answer using the normal equations.

See this previous question on this site: Using the normal equations to calculate coefficients in multiple linear regression. There is example code which you should be able to copy-and-paste directly.

This is a great introductory book that explains why you have to add the column of ones: http://www.amazon.com/Applied-Regression-Models-Edition-Student/dp/0073014664

You will also need to follow the advice given by @user777

Ah but of course! How could I have forgotten! Everything works fine now! Thanks for the links as well, the first one seems especially helpful :) — nicefella, Feb 10 '14 at 00:56

MBorg · Answer 3 · 2020-02-24T11:46:34.417

This R code can be used to calculate Y (a vector of y values, the fitted values) and Beta (a vector of the coefficients) via matrix regression for a given dataset which I called insert.dataset. This should work even if you add additional numeric variables to the formula.

library(matlib) # enables function inv() to calculate a matrix's inverse

model <- lm(formula = y ~ x, data = insert.dataset) # save linear model
beta0 <- rep(1, nrow(model$model)) # column of 1s representing coefficient beta0 (intercept)
X <- as.matrix(cbind(beta0, model$model[,-1]), nrow=nrow(model$model)) # create X matrix, replacing column of outcomes with beta0

# Matrix equation to create Y (fitted values), using X and coefficients
Y <- X %*% model$coefficients
model$fitted.values # should be identical to Y

# Matrix equation to create Beta (coefficients), using X and Y
Beta <- inv(t(X) %*% X) %*% t(X) %*% model$fitted.values
model$coefficients # should be identical to Beta

It should be noted that the output from lm already has vectors representing Y (model$fitted.values) and Beta model$coefficients, but some small modification is needed to obtain X (the matrix of observed predictor values). That small modification replaces the column containing the observed y values with a column full of 1s (this is needed for multiplication with beta0) and is why you are not getting the 2 * 1 matrix that you want.

Simple linear regression fit manually via matrix equations does not match lm() output

3 Answers3