Different confidence interval results in R, why?

Question

Using the following population:

sample <- c(41.5, 56.7, 54.2, 98.9, 56.7, 43.9, 35.8, 28.8)

I get a different result for the upper and lower confidence intervals when calculating them "manually" than what result from the standard library t.test() function.

s <- sd(sample) # standard deviation 
se <- s/sqrt(NROW(sample)) # standard error 

# Using t.test()
lower <- (t.test(sample))$conf.int[1] # yields 34.11755
upper <- (t.test(sample))$conf.int[2] # yields 70.00745

# Calculating manually
lower <- mean(sample)-(1.96*se) # yields 37.18821
upper <- mean(sample)+(1.96*se) # yields 66.93679

Can somebody explain what is going on here?

Update: Thanks for the information everybody! This was really enlightening.

Jonathan Christensen · Accepted Answer · 2012-12-06T21:25:47.920

19

You are using 1.96, which is the Normal quantile, rather than the quantile from the t distribution with appropriate degrees of freedom (length(sample)-1). Your manually-calculated confidence interval is too narrow.

edited Dec 06 '12 at 21:25

answered Dec 06 '12 at 21:12

Jonathan Christensen

3,989
19
25

Interestingly, `NROW` (as in the OP) would be needed here as it will treat a vector as a one-column matrix. `length` would be another option. (`NROW` was actually new to me, I had to look it up.) – Aaron left Stack Overflow Dec 06 '12 at 21:17
Oh, interesting. Thanks for the correction. I'll change it to `length()`. – Jonathan Christensen Dec 06 '12 at 21:18
5

+1. In other words, change the manual calculations to `mean(sample) + qt(c(.025, .975), length(sample) - 1) * se`. – whuber Dec 06 '12 at 21:18
@JonathanChristensen Just to clarify, is this due to the t-test calculation being made assuming [the standard deviation is unknown](http://en.wikipedia.org/wiki/Student%27s_t-distribution), which "uses" one degree of freedom? – ryanjdillon Dec 06 '12 at 22:01
5

And be careful with `length(x)` (or `NROW(x)`) since it will return the size of the vector, including missing values if any. In case of doubt, `sum(!is.na(x))` or `sum(complete.cases(x))` are to be preferred. (BTW, `sample` is the name of a specific function in R.) – chl Dec 06 '12 at 22:12
@chl Thanks. That is really useful. I only used `sample` for this example, and I try to follow typical naming conventions. – ryanjdillon Dec 06 '12 at 22:31
1

@shootingstars You're welcome. Re: your preceding comment; if you know the population variance (or SD) you don't need to estimate it from your sample and you can use a z-test (or, equivalently, refer to standard N(0;1) quantiles) -- but this is rarely true (that we know the true SD). – chl Dec 06 '12 at 22:41

Different confidence interval results in R, why?

1 Answers1

Linked