two-sample two-tailed t-test in R using sample statistics rather than raw data

Question

The second most popular answer to this question does not produce p-values that decrease as the stat being estimated gets further from the sample mean in the positive direction, only the negative direction.

Here is a trial problem with data frame dat, found here:

#compute t-stat
dat$mean_diff_tstat <- (dat$mean_diff - 0) /
  (sqrt((dat$sd_post^2 / dat$n_post) + (dat$sd_pre^2 / dat$n_pre)))

#get degrees of freedom
dat$mean_diff_dof <- dat$n_post + dat$n_pre - 2

#compute p-values
dat$mean_diff_pval <- pt(dat$mean_diff_tstat, dat$mean_diff_dof)

#assign significance indicators
library(dplyr)

dat$mean_diff_sig <- if_else(dat$mean_diff_pval <= 0.001,   "***",
                                        if_else(dat$mean_diff_pval <= 0.01, "**",
                                                if_else(dat$mean_diff_pval <= 0.05, "*",
                                                        if_else(dat$mean_diff_pval <= 0.1, ".", " "))))

If you order the dataset by p-value, you should see p-values that generally decrease as the absolute value of the corresponding differences in mean get further from zero. But p-values only decrease as the difference in mean gets more negative; positive mean differences have HIGHER p-values than their close-to-zero counterparts.

dat <- dat[order(dat$mean_diff_pval),]

I'm sure I"m missing something - please disabuse me.

EDIT: dat is effectively a bunch of hypothesis tests, one row per test. I'm aware I need to apply something like a bonferroni correction to the data, but want to avoid making the question opaque with additional detail.

score 2 · Accepted Answer · answered Jul 28 '20 at 18:09

2

dat$mean_diff_pval <- pt(dat$mean_diff_tstat, dat$mean_diff_dof)

This is wrong. The p-value is $P(|t|>|t^*|)$ and you can calculate it by:

pt(abs(dat$mean_diff_tstat), dat$mean_diff_dof, lower.tail=FALSE) # P(t > |tstat|)
+ pt(-abs(dat$mean_diff_tstat), dat$mean_diff_dof)                # P(t < -|tstat|)

or

dat$mean_diff_pval <- 2*pt(-abs(dat$mean_diff_tstat), dat$mean_diff_dof)

(see here) as in the first most popular answer to that question.

answered Jul 28 '20 at 18:09

Sergio

5,628
2
11
27

I'm familiar with the formula for the p-value, but wasn't sure of how the `pt()` function was operating. The first response to the question we have both linked to did not address that; the second was not a two-tailed solution. Your solution provides one and answers my question. – JmQ Jul 28 '20 at 18:37

two-sample two-tailed t-test in R using sample statistics rather than raw data

1 Answers1