How do I interpret the results of this SPSS paired samples t test?

Question

Paired samples statistics (table 1): pretest score mean= 41.6667, post test score mean= 93.3333, N=3
Paired samples correlations (table 2) Correlations= -.277 , Sig.=.821
Paired samples Test (Table 3) t=-7.750, df=2, sig(2-tailed)=.016

PRETEST SCORE

A=30 B=50 C=45

POSTTEST SCORE

A=95 B=95 C=90

The significance level in Table 2 is high while that in Table 3 is low. Should I reject the null or reject the alternative hypothesis in this t-test?

Please see https://stats.stackexchange.com/questions/31/what-is-the-meaning-of-p-values-and-t-values-in-statistical-tests. — whuber, Dec 01 '19 at 19:30
Hi thank you for your reply, but I'm still curious about which table should I refer. If based on table three, it shows that I should accept the alternative hypothesis since it has a significant difference but when it goes to table two it shows that both variables have a moderate negative correlation is it mean I have to accept null hypothesis? — alicia ho, Dec 01 '19 at 19:44
@aliciaho, regarding your question about whether you should "accept null [the] hypothesis", it may help you to read my answer here: [Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?](https://stats.stackexchange.com/a/85914/7290) — gung - Reinstate Monica, Dec 02 '19 at 12:39

score 1 · Answer 1 · answered Dec 01 '19 at 20:19

1

That will depend on which null hypothesis you are testing. At a guess, since this is a t-test, you want to see if the mean changed, and that would be table 3 which is significant (that is, the means changed).

However, I'd be a little leery. There was a huge change not only in the mean scores but in the standard deviation - it was much lower in the post-test (see Table 1). Was this expected? Also, table 2 shows that the correlation between pre and post test is quite low -- in fact, it is negative -- which is odd. You would expect a positive correlation between two tests.

Finally N is only 3.

answered Dec 01 '19 at 20:19

Peter Flom

94,055
35
143
276

I believe you can deduce much more. First, you can determine from the output whether the correlation differs significantly from zero. Second, you can apply an F test to establish whether this change in SD is compatible with the data and the implicit assumptions of the test. (If I'm reading the numbers correctly--they appear to be around 10.4 and 2.8--then the p-value for this two sided test is about 15%, suggesting that such a change is not at all surprising.) Finally, could you be explicit about what bearing the final remark "N is only 3" has on rejecting or accepting the null? – whuber Dec 01 '19 at 20:27
One more thing: why would anyone expect a "positive correlation"? If the subjects are independently tested, as they are in most cases, wouldn't one expect *zero* correlation? This issue is germane because it explains the reason for the first test (in Table 2). – whuber Dec 01 '19 at 20:29
1

@whuber, this is a *paired* t-test. So the 3 subjects are the same in both conditions (before & after?). It is common in such cases for the unit (patient) with the lowest pre-score to have the lowest post-score, etc. We don't know what these data are, but it is common for paired data to be positively correlated. – gung - Reinstate Monica Dec 01 '19 at 20:37
1

@gung Thank you: I did not understand the test was correlating pre and post results; I thought it was performing some kind of test of independence of the differences. I would hope, though, that the software is not performing a *one-sided* correlation test. There are plenty of circumstances where one expects negative correlations (people who were worse off at the beginning made greater gains), indicating the default test ought to be two-sided. – whuber Dec 01 '19 at 21:24

score 1 · Answer 2 · answered Dec 01 '19 at 21:15

1

The first table tells you that the pre- and post-average test scores are radically different. The hypothesis you test is that they are indeed different. Such scores are so different as not needing statistical testing for confirmation. If needed, you do have that statistical confirmation in the third table.

The second table has no direct relevance to whether the pre- and post-test average scores are different. It just shows a negative correlation between the two; which is a bit strange. I gather maybe the patients that were in the worst shape improved the most.

The one concern I have is that your sample has only 3 individuals. That seems way too low to derive any sort of reliable inference on the effectiveness of the treatment. You can just observe that based on these 3 patients it overall makes a very big difference. But, you probably need to have a sample of at least 10 x that to derive more reliable results.

answered Dec 01 '19 at 21:15

Sympa

6,862
3
30
56

1

Re Table 2: you need to interpret its p-value. Re The final comments: they might over-reach a little. Suppose, for instance, these subjects were three randomly-selected US states, the intervention was a test of a multiyear government program to improve school graduation rates, and the average rates increased from 41% to 93%. Do you think people in the remaining states would agree to let you study another much larger sample before rolling out the program nationwide? The caution about a small $N$ is fair, but this example shows you can't declare that $N$ must exceed 30 to be reliable. – whuber Dec 01 '19 at 21:36
Well Table 2, the negative correlation bit is very far from being statistically significant... which is somewhat of a relief given that this negative correlation may be a bit counterintuitive. I stand by that N = 3 is much too small to make any generalized inference. However, in your example one single data point represents the average for an entire State including millions of underlying data points. That's a pretty different situation from the standard paired t test within a standard hypothesis testing framework (clinical trials, etc.). – Sympa Dec 03 '19 at 01:22
"Pretty different" in what sense? There's nothing in this question or in your answer that would distinguish those two situations. That's why I am suggesting that you not be so definite about your recommendation to use a sample size of 30 or larger. – whuber Dec 03 '19 at 14:46
I think in most hypothesis testing framework, using N = 3 to fully test an hypothesis is way too small. And, suggesting 10 x 3 or a sample of ~ 30 seems acceptable (that's not an inordinately large N). Even in your example at the State level, taking cost and political consideration aside using a much larger sample would be preferable from a statistical standpoint. – Sympa Dec 03 '19 at 21:56
"Taking cost and political considerations aside" shows you're not really trying to solve the statistical problem, but only invoking general rules of thumb and feelings. The statistical standpoint emphatically *is* the standpoint from which costs, constraints, and objectives are seriously considered in framing and solving the problem; anything else is just empty mathematics. – whuber Dec 03 '19 at 22:23
To the best of my knowledge Alicia Ho is not talking about your specific hypothetical State-level framework. Also, recommending using N > 3 in hypothesis testing is not just a frivolous rule of thumb or feeling. It goes to the basics of hypothesis testing ensuring you have an adequate sample to make reasonably rigorous inference. – Sympa Dec 04 '19 at 01:09

How do I interpret the results of this SPSS paired samples t test?

2 Answers2