1

Say I have n sample, each with p explanatory variables, namely X. I also have n corresponding response variables, namely y.

I want to understand the how the explanatory variables contribute to the response variable.

We have two classical approaches:

1). I can get the effect sizes by linear regression, like solving the following problem, $$ \arg\min_\beta ||y-X\beta||_2^2 $$ and then I can analyze the $\beta$ I got.

2). Alternatively, I can run p hypothesis testings with the ith null hypothesis states that the ith variable has nothing to do with y. And then I can analyze the p-values I got.

Then, for the 1st method, I lost the statistical guarantee because I don't have p-values. But for the 2nd method, I only test these explanatory variables independently, and I lost the interaction between them.

Is there a way that I can run the linear regression, get $\beta$ and then test the p-values for it?

Also, I know there are packages like R that can run linear regression and reports effect sizes together with p-values. But my question is more about how is it calculated? (For example, I need to know how it's calculated if I want to implement my own tool.)

EDIT:

Thanks for Matthew's comments about first calculating $\beta$'s variance with $(X^TX)^{-1}$, but what if this is a high dimension problem $p>n$, so that $(X^TX)$ cannot be inversed? (I assume most linear regression problems today will be high-dimensional ones? At least the one I'm facing at is.)

  • Perhaps duplicate of: http://stats.stackexchange.com/questions/44838/how-are-the-standard-errors-of-coefficients-calculated-in-a-regression That calculates standard errors under the assumption of homoskedasticity. Note there are different types of standard errors (eg. robust standard errors, clustered standard errors) based on different assumptions. – Matthew Gunn Dec 11 '16 at 05:09
  • @MatthewGunn Thanks. I am still confused because if I want to do it for all $\beta$ together, then $X^TX\sigma$ is a $p\times p$ matrix, but shouldn't the variance of $\beta$ a vector of length $p$? –  Dec 11 '16 at 05:14
  • 2
    $\hat{\mathrm{Var}}(\mathbf{b}) = \left( X^T X \right) ^{-1} \hat{\sigma}^2 $ is an estimated covariance matrix for your estimated coefficients $\mathbf{b}$. For example, $\mathrm{Var}(b_3)$ would be the 3rd element on the diagonal of the matrix, $\mathrm{Var}(b_4)$ would be the 4th element on the diagonal of the matrix etc... Note also that $\hat{\sigma}^2 = \frac{1}{n-p} \sum_i e_i^2$. – Matthew Gunn Dec 11 '16 at 06:06
  • @MatthewGunn Thank you very much. Is there anything I can do if $p$ is greater than $n$ (so $X^TX$ cannot be inversed? ) –  Dec 11 '16 at 19:57
  • Also, it seems I could get negative values out of that diagonal, but a variance term cannot be negative, right? am I doing something wrong? @MatthewGunn –  Dec 11 '16 at 20:50
  • You cannot get negative terms because $X^T X$ is guaranteed to be positive semi-definite, and the inverse of positive definite matrix is also positive definite. – Matthew Gunn Dec 11 '16 at 21:27
  • Two texts that I liked: [Linear Algebra Done Right](https://www.amazon.com/Linear-Algebra-Right-Undergraduate-Mathematics/dp/0387982582) by Axler and [Econometrics](https://www.amazon.com/Econometrics-Fumio-Hayashi/dp/0691010188) by Hayashi. – Matthew Gunn Dec 11 '16 at 21:29
  • I see. Thanks. But what if it's not full rank? –  Dec 11 '16 at 21:30
  • If $X^TX$ is not full rank, this is known as [multicollinearity](https://en.wikipedia.org/wiki/Multicollinearity) and occurs when you make a mistake and include variables which are linearly dependent. (eg. current year, year born, and age are linearly dependent because currentyear - yearborn = age). Regression packages typically then drop one of your variables so that $X^TX$ is invertible. Anyway, that's another topic. Another approach (esp. in machine learning context) is [regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)), eg. Tikhonov regularization. – Matthew Gunn Dec 11 '16 at 21:35
  • 2
    On your last comment, if your number of regressors $k$ is greater than your number of observations $n$, I don't see how you can possibly get reliable p-values for individual coefficients. I don't know what you can do instead, and I'm not an expert on the high dimensional stuff. – Matthew Gunn Dec 11 '16 at 21:42

0 Answers0