Say I have n
sample, each with p
explanatory variables, namely X
. I also have n
corresponding response variables, namely y
.
I want to understand the how the explanatory variables contribute to the response variable.
We have two classical approaches:
1). I can get the effect sizes by linear regression, like solving the following problem, $$ \arg\min_\beta ||y-X\beta||_2^2 $$ and then I can analyze the $\beta$ I got.
2). Alternatively, I can run p
hypothesis testings with the i
th null hypothesis states that the i
th variable has nothing to do with y
. And then I can analyze the p-values I got.
Then, for the 1st method, I lost the statistical guarantee because I don't have p-values. But for the 2nd method, I only test these explanatory variables independently, and I lost the interaction between them.
Is there a way that I can run the linear regression, get $\beta$ and then test the p-values for it?
Also, I know there are packages like R that can run linear regression and reports effect sizes together with p-values. But my question is more about how is it calculated? (For example, I need to know how it's calculated if I want to implement my own tool.)
EDIT:
Thanks for Matthew's comments about first calculating $\beta$'s variance with $(X^TX)^{-1}$, but what if this is a high dimension problem $p>n$, so that $(X^TX)$ cannot be inversed? (I assume most linear regression problems today will be high-dimensional ones? At least the one I'm facing at is.)