In multiple regression, why isn't p-value sensitive to small n?

Question

$$ y = (2,4,6,8,10) $$ $$ x_1 = (1,2,3,4,5) $$

Linear model:

$$ y = \beta_0 + \beta_1x_1 $$

p-value of $x_1$: <2e-16
$R^2$: 1.00
p-value of model: <2e-16 with 1 var, 3df

Why doesn't p-value tell us to reject this model and this variable until we increase size of n?

What's MLR? In a forum where for some "ML" means maximum likelihood (of course, they say) and for others it means machine learning (of course, _they_ say) explaining your abbreviations does no harm and can defuse puzzlement. — Nick Cox, Mar 11 '15 at 12:48
A P-value isn't a certification of whether your analysis is sensible (appropriate, well judged) or a quantification of how far it is sensible (etc.). It's just flagging here that a fit that good is unlikely to be a chance fluctuation with this sample size. Wouldn't you troubled if that were not the case, as it is a perfect fit? — Nick Cox, Mar 11 '15 at 12:51
As question has been edited, earlier comments may appear puzzling. The short answer is that $P$-value is (highly) sensitive to small $n$; it is just not evident in the example you give. — Nick Cox, Mar 11 '15 at 13:01
@NickCox: Apologies for abbreviation. Given sample size, pooled variance, and valid assumptions about normality, linearity, homoscedasticity, i.i.d., etc., can we say "the likelihood that this relationship is due to random chance--that $x_1$ neither causes $y$ (nor vice versa), nor shares a causal antecedent with $y$, is <2e-16"? (cf. https://stats.stackexchange.com/questions/141253/can-two-variables-be-perfectly-correlated-but-not-share-a-single-causal-chain-an) — jtd, Mar 11 '15 at 13:11
That wouldn't be correct. No independent observer could say whether this is a chance relationship, a legitimate systematic relationship, or even something someone cooked up. Approach it "from the other direction." "IF there were NO relationship in the larger population, random samples of 5 would show this degree of linear connection in fewer than 2 of 10^16 instances." (Although not every software package would quantify it that way. E.g., SPSS reports no p-value at all.) — rolando2, Mar 11 '15 at 13:24
@NickCox - it's not that "a fit that good is unlikely to be a chance fluctuation with this sample size"; it's that "chance fluctuations around a condition of zero fit are unlikely to produce a fit this good with this sample size." — rolando2, Mar 11 '15 at 13:29
I agree with @Rolando2. No program can tell you just by looking at data anything about "causal antecedents" or causes. Nor is there a population of relationships, some of which are caused by "random chance", whatever that means, and some of which aren't. By the way, the precise P-value of the order of 1e-16 is suppositious, if only because nothing can be stronger than perfect fit. Unfortunately there is no wording for this that is simultaneously clear, correct and charming, as it is a kind of backwards logic (indeed to many people in statistical science, quite absurd!). — Nick Cox, Mar 11 '15 at 13:29
@Rolando2 Yes; that is more accurate wording. I am reaching for paraphrases that will make some kind of sense at the level of this question and inadvertently showing that it's dangerous to do so. — Nick Cox, Mar 11 '15 at 13:32
@rolando2 and NickCox: Thanks! I have tried to put your knowledge into an answer. — jtd, Mar 11 '15 at 13:41

score 0 · Accepted Answer · answered Mar 11 '15 at 13:38

Attempting to put the knowledge from @NickCox and @rolando2 into this answer:

The p-value of a multiple regression variable (or model) cannot tell an independent observer anything about causes, but it can say:

IF there were NO relationship in the population between $x_1$ and $y$, properly random samples of $n=5$ would show this degree of fit (or relationship) in fewer than (a suppositious)* 2e-16 of the samples.

*Note that a perfect fit between $x_1$ and $y$ in the question makes the p-value suppositious.

Please feel free to edit!

In multiple regression, why isn't p-value sensitive to small n?

1 Answers1