0

Suppose I have data like this -

Val Bin
-1  Y
5   N
-2  N
4  Y

so forth - where Bin is a binary value. I want to construct a 95% confidence interval for the average difference between the values that have a Y associated with them and the values that have a N associated with them. Note - I have more values associated with Y than with N(which is why I'm not clear about using a normal T-stat).

I'm unclear as to whether I should use a pooled t-stat or a normal t-stat.

What should I be using?

praks5432
  • 101
  • Can you clarify what you mean by "pooled t-stat" vs "normal t-stat"? If your only question is whether a t-test is OK to use if the $n$s are unequal, then this Q is a duplicate of [How should one interpret the comparison of means from different sample sizes?](http://stats.stackexchange.com/questions/31326/) – gung - Reinstate Monica Jan 21 '14 at 01:55
  • What's a 'normal t-stat', exactly? It's by no means obvious – Glen_b Jan 21 '14 at 03:11

1 Answers1

2

Note my response is phrased in the context of hypothesis testing, but it applies equally to the construction of confidence intervals.

The type of test that is appropriate depends on what kind of assumptions you make about the sampling distribution. If you assume the Val data are drawn from a normal distribution with unknown mean and variance, then some kind of $t$-test is appropriate.

If you assume that the sampling distributions of Val given Bin have the same variance--i.e., their means are not necessarily equal but their variances are assumed the same regardless of whether Bin = Y or Bin = N, then you use the pooled $t$-test, because this test uses the pooled estimate of the variance (hence the name).

If, however, you do not assume the sampling distributions are the same for the two groups, then the pooled $t$-test is inappropriate: it will in general underestimate the variance and any confidence interval thus constructed is likely to be too narrow for the desired coverage probability. An appropriate test to use employs Satterwaithe's approximation (the exact distribution of the difference of sample means is not $t$-distributed), and is called the Welch $t$-test.

heropup
  • 5,006
  • 1
  • 16
  • 25