Meaning of p-values in regression

Question

When I perform a linear regression in some software packages (for example Mathematica), I get p-values associated with the individual parameters in the model. For, instance the results of a linear regression that produces a result $ax+b$ will have a p-value associated with $a$ and one with $b$.

What do these p-values mean individually about those parameters?
Is there a general way to compute parameters for any regression model?
Can the p-value associated with each parameter be combined into a p-value for the whole model?

To keep this question mathematical in nature, I am seeking only the interpretation of p-values in terms of probabilities.

Gavin's answer in the question @cardinal linked to says it well. — J. M. is not a statistician, Aug 29 '11 at 01:26
@zyx, there is nothing advanced about the OP's questions. These are *very* common questions for which, in my opinion, stats.SE is more appropriate---and to which the participants there are more attuned to, as well. Math.SE and MO are both excellent resources for probability questions, but much less so for statistical ones. The OP's questions lean much more toward the latter. — cardinal, Aug 29 '11 at 01:36
@cardinal: I've followed stats.SE since the start of the public beta. Out of 4800+ questions to date I was not able to locate *one* that asks or answers item 3 from the OP, which is odd if this is a "very common" query. Nor have I seen conceptually precise answers to item 1 on the few times it came up. I think these things should be posted to math.SE and MO periodically to engage the attention of a larger audience, not migrated within minutes to stats.SE. It doesn't hurt to *also* ask on stat.SE but turning the latter into the sole place where stats can be discussed is not helpful. — zyx, Aug 29 '11 at 02:21
There is now a thread about math.SE to stats.SE migrations in meta.math.SE. — zyx, Aug 29 '11 at 02:42
(Some comments referenced above were lost in migration. They are visible at the original math.SE posting, linked below next to the words "migrated from...") — zyx, Aug 29 '11 at 06:43
@zyx: Yes, not sure what happened to the initial comments. Maybe when it's migrated one or both of the corresponding mods have a chance to strip some of them out. — cardinal, Aug 29 '11 at 11:13
@zyx , you just need to know what to look for in regards to question #3. Although the question could be interpreted differently, I suspect a satisfactory answer for the OP would involve detailing what the "[f-test](http://stats.stackexchange.com/search?q=f+test)" is for a multiple regression equation. These two answers would likely be of interest ([1](http://stats.stackexchange.com/questions/8237/logic-behind-the-anova-f-test-in-simple-linear-regression),[2](http://stats.stackexchange.com/questions/3549/f-and-t-statistics-in-a-regression)). — Andy W, Aug 29 '11 at 13:10
@Cardinal Nope, the mods don't get to do anything during migration that sufficiently high-rep users (such as yourself) cannot already do. It is possible some comments were deleted (either by their original owners or a stats.SE mod) *after* migration. Comments that become irrelevant due to changed circumstances (such as an edit to a question) are considered "noise" by the powers that be; one role of mods is to remove such distractions. — whuber, Aug 29 '11 at 13:57
@whuber: Thanks. One of the comments deleted was one of my own in which I included a link to a very related question and a link to the tag `[p-value]`. There was at least one other comment (of zyx's, I believe) deleted, but I don't recall the content, at the moment. — cardinal, Aug 29 '11 at 14:21
@Cardinal Yes, I see two comments that appear to have been left behind during the migration (and should have been migrated, based on the evidence of their time stamps). I can't explain that. In case there's some unwanted behavior going on (I hesitate to call it a 'bug'), let's keep an eye out for similar anomalies in the future. — whuber, Aug 29 '11 at 14:40
The first comment with link to [p-values] tag was removed by moderator or otherwise by the time I saw the migrated thread. I deleted my comment (still up at the original) about stat.SE since the context was gone and, although accurate in my opinion, the comment could cause disputes if posted here. Both are still visible at the math.SE original posting. I don't remember if there were other comments there that got lost in the shuffle. — zyx, Aug 29 '11 at 14:50
@Andy W: none of the F-test links pertain to item 3 of the question, which was whether one can determine the model p-value from coefficient p-values (or, interpreted more broadly, whether there is some other relation between the two types of p-value). — zyx, Aug 29 '11 at 15:14
@zyx, it depends on how you interpret the question. If you interpret as the joint significance of the model, then the [#2](http://stats.stackexchange.com/q/3549/1036) question I referenced above is answering that (and is related to the comment cardinal made on your answer). This typically isn't considered a hypothesis test of `a=b=0` though (which is how you framed it, not the author of the question). — Andy W, Aug 29 '11 at 15:27
@Andy, it seems to me that link #2 does *not* address the question in any apparent way. Having high p-values for the regressors and low p-value for the overall model, does not indicate whether the former "can ... be combined into a p-value for the whole model". Maybe under some strong assumptions on what "combine" can mean, such as using a formula that extends continuously to the limit where some regressor p-values are zero, or something more than that, plus the ability to produce unboundedly extreme examples of the type seen in link #2. But all this is well beyond link #2 contents. — zyx, Aug 29 '11 at 15:43
@zyx, I suppose someone needs to connect some dots, but those dots aren't very complicated. It should be clear that p-values related to hypothesis tests of the *individual parameter estimates* do not say anything about the F-test for the reduction in sums of squares for the overall model. Nothing in the answers would suggest they can be combined in such a way, and hence amount to the same thing as what you exactly saying in #3 in your response. — Andy W, Aug 29 '11 at 15:51
@Andy W, why should it be clear *a priori* that the hypothesis tests for the individual parameters do not say anything about significance of the overall model? (It is not assumed in OP's question or my comments, by the way, that the model significance can be quantified only by F-tests, and even in that case there are examples where F is equivalent to a t-test, so why not contemplate the possibility of a more complicated F being computable or estimable from a suite of t-tests?). — zyx, Aug 29 '11 at 16:02

zyx · Accepted Answer · 2011-08-29T16:44:15.887

15

The p-value for $a$ is the p-value in a test of the hypothesis "$\alpha = 0$" (usually a 2-sided $t$-test). The p-value for $b$ is the p-value in a test of the hypothesis "$\beta = 0$" (also usually a 2-sided $t$-test) and likewise for any other coefficients in the regression. The probability models for these tests are determined by the one assumed in the linear regression model. For least-squares linear regression, the pair ($a,b$) follows a bivariate normal distribution centered on the true parameter values ($\alpha, \beta$), and the hypothesis test for each coefficient is equivalent to $t$-testing whether $\alpha = 0$ (resp. $\beta=0$) based on samples from a suitable normal distribution [of one variable, i.e., the distribution of $a$ or $b$ alone]. The details of which normal distributions appear are somewhat complicated and involve "degrees of freedom" and "hat matrices" (based on the notation $\hat{A}$ for some of the matrices that constantly appear in the theory of OLS regression).
Yes. Usually it is done (and defined) by Maximum Likelihood Estimation. For OLS linear regression and a small number of other models there are exact formulas for estimating the parameters from the data. For more general regressions the solutions are iterative and numerical in nature.
Not directly. A p-value is calculated separately for a test of the whole model, that is, a test of the hypothesis that all the coefficients (of the variables presumed to actually vary, so not including the coefficient of the "constant term" if there is one). But this p-value cannot usually be calculated from knowledge of the p-values of the coefficients.

edited Aug 29 '11 at 16:44

answered Aug 29 '11 at 01:26

zyx

286
2
6

2

In your point (1.) there seems to be a bit of confusion between a *parameter* and an *estimator*. The $p$-value is associated with the estimator rather than the parameter and the estimators are bivariate normal, not the parameters (which, at least, in classical statistics are considered fixed). Also, your comments in point (3.) can lead to confusion since it is entirely possible (and quite common) for some of the individual $p$-values of regression estimates to be both larger and smaller than the joint $p$-value from the corresponding $F$-test. – cardinal Aug 29 '11 at 12:22
@NRH: Sorry, Can you clarify your previous comment. I don't quite follow it (yet). :) – cardinal Aug 29 '11 at 12:42
@cardinal: it seems more accurate to say that a p-value is associated to a hypothesis test. The parameters appear in the null hypothesis of the test and the pair (observed value of estimator, alternative hypothesis) then determine a p-value. The null hypotheses should be described using parameters, such as α=0 rather than estimators a=0 as was [carelessly] done in the original answer, now edited (thanks for pointing out the error). However, the supposedly confused or missing distinction "the estimators are bivariate normal, not the parameters" was stated explicitly in the answer. – zyx Aug 29 '11 at 13:26
1

Sorry, I just couldn't resist. @zyx made a comment to the original post on math.SE that answers on stat.SE were often imprecise. I find that many answers are quite [accurate](http://en.wikipedia.org/wiki/Accuracy_and_precision) though sometimes mathematical imprecise. That is in the nature of things. Statistical questions and answers can not always be reduced to precise mathematical statements. In particularly not the difficult ones. Yet the answer provided here is neither particularly accurate nor precise in my opinion. – NRH Aug 29 '11 at 13:28
re (3) and the relative size of different p-values, I've deleted the original comments but would be interested to know if any valid inequalities do hold (thanks again for the correction). Although a narrower null model such as "alpha = beta = 0" does have higher p-values than a less specific one such as "alpha=0" in tests with the *same* estimator and alternative hypothesis, they are not the same in the case of linear regression. – zyx Aug 29 '11 at 13:33
@zyx, the edit improved the answer. – NRH Aug 29 '11 at 13:40
@NRH I would like to invite you to take advantage of the SE structure and improve any answer that you don't find sufficiently "precise." I'm sure such constructive efforts will be well received and appreciated. – whuber Aug 29 '11 at 13:51
@whuber, thanks for the invitation. I do, and will, when I feel I can contribute and have the time. It was zyx's own comment on math.SE that triggered me. I am perfectly happy with the level of precision, and would recommend to post a question like the above on stat.SE to make sure that it gets the best attention. – NRH Aug 29 '11 at 14:19
Zyx: Your edit improves this answer. Thanks. Your last comment still strikes me as a little confused/confusing, though. – cardinal Aug 29 '11 at 14:27
3

I think it would be nice if whoever downvoted supplied an explanatory comment. – cardinal Aug 29 '11 at 14:28
Zyx: If I am not mistaken, the terminology regarding the "hat matrix" $H$ follows from the fact that $\hat{y} = H y$. It is called the "hat matrix" because it is the linear transformation that takes the responses as inputs and "puts the hat on them" to get the fitted values $\hat{y}$ as the output. :) – cardinal Aug 29 '11 at 14:30
@cardinal: which part of which last comment involves possible confusion? If re: (3) -- the hyp.tests are simply different for alpha=beta=0 versus alpha=0 so there is no a priori ordering of their p-values. However, a test of, say "-1 < alpha < +1" would always have higher p-value (for any given value of the estimator) than a test of "alpha=0" since presumably the same estimator would be used and interpreted the same way in both tests, but the set of models in the first test is broader than in the second. – zyx Aug 29 '11 at 14:40

score 1 · Answer 2 · answered Aug 29 '11 at 09:13

1

wrt your first question: this depends on your software of choice. There are really two types of p-values that are used frequently in these scenarios, both typically based upon likelihood ratio tests (there are others but these are typically equivalent or at least differ little in their results).

It is important to realize that all of these p-values are conditional on (part of) the rest of the parameters. That means: Assuming (some of) the other parameter estimates are correct, you test whether or not the coefficient for a parameter is zero. Typically, the null hypothesis for these tests is that the coefficient is zero, so if you have a small p-value, it means (conditionally on the value of the other coefficients) that the coefficient itself is unlikely to be zero.

Type I tests test for the zeroness of each coefficient conditionally on the value of the coefficients that come before it in the model (left to right). Type III tests (marginal tests), test for the zeroness of each coefficient conditional on the value of all other coefficients.

Different tools present different p-values as the default, although typically you have ways of obtaining both. If you don't have a reason outside of statistics to include the parameters in some order, you will generally be interested in the type III test results.

Finally (relating more to your last question), with a likelihood ratio test you can always create a test for any set of coefficients conditional on the rest. This is the way to go if you want to test for multiple coefficients being zero at the same time (otherwise you run into some nasty multiple testing issues).

answered Aug 29 '11 at 09:13

Nick Sabbe

12,119
2
35
43

Could you please elaborate on the conditionality you mentioned? In the univariate regression with $p$ predictors and an intercept, testing a hypothesis on a linear combination of parameters $\psi = c'\beta$ uses test statistic $t = \frac{\hat{\psi} - \psi_0}{\hat{\sigma} \sqrt{c' (X' X)^{-1} c}}$... – caracal Aug 29 '11 at 14:09
Here $\hat{\psi} = c'\hat{\beta}$, with $\hat{\beta}$ being the vector of parameter estimates, and $c$ a vector of coefficients. $X$ is the design matrix, and $\hat{\sigma}$ is the residual standard error $||e||^2 / (n - (p+1))$, where $e$ is the vector of residuals from the supplied model. For the test of a single parameter $j$ being 0, $c$ is the $j$-th unit vector, and $\psi_0 = 0$. I don't see where model comparisons play a roll for $t$. – caracal Aug 29 '11 at 14:09
The essence of the matter is captured for example [here](http://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/). Remember that anova is just a special case of regression. Basically, it comes down to this: if you do a test for zeroness of (the coefficient of) variable A in a model with or without variable B, you may get different results. Hence, the result is conditional on your model, the data (even for the values of variable B) and thus on the coefficients not in your test but in your model. Finding that idea in the maths may be somewhat harder :-) – Nick Sabbe Aug 29 '11 at 14:58
True, but the anova hypotheses test whether all $p-1$ effect parameters corresponding to the $p$ groups of a factor are simultaneously 0. This hypothesis is different from the one about $c'\beta$ (here a single parameter $\beta_j$), and also uses a different test statistic: $F = \frac{(SS_{er} - SS_{eu}) / (df_{er} - df_{eu})}{SS_{eu} / df_{eu}}$ where $SS_{er}$ and $df_{er}$ are the residual sum of squares $||e_r||^2$ and their df for the restricted model, likewise $u$ for the unrestricted model. Obviously, this indeed depends on the choice for the restricted and unrestricted models. – caracal Aug 29 '11 at 15:18
The continuous case should be completely equivalent to a dichotomous 0-1 encoded variable. – Nick Sabbe Aug 29 '11 at 15:30
True, but: a) the regression $t$-statistics each test the hypothesis that **one single** parameter $\beta_j=0$ - a special case of testing the value of $\psi=c'\beta = \psi_0$. b) The regression/anova $F$-test, on the other hand, tests whether **several** parameters (a subset of the elements in vector $\beta$) are **simultaneously** 0. AFAIK, b) cannot be expressed like a), and leads to a different test statistic. In anova, the hypothesis corresponding to a) is a planned comparison, e.g., whether the parameter corresponding to the difference $\mu_j - \mu$ is 0 (assuming effect coding). – caracal Aug 29 '11 at 16:02
I would say the p-values the OP asks about are more like b) (different p-values for different parameter coefficients are mentioned) – Nick Sabbe Aug 29 '11 at 16:30

Meaning of p-values in regression

2 Answers2

Linked