0

I am having a lot of difficulty figuring out why I cannot calculate a 95% CI by hand with one set of data; but I can using the same process with a different set of data.

Drawing on OLS results; I am doing the following calculation:

beta ± 1.96 * standard error

Here are two sets of data in Python if you wanted to execute it yourself.

rng = np.random.RandomState(42)
x = 10 * rng.rand(50)
x2 = 10 * rng.rand(50)
y = 2 * x - 1 + rng.randn(50)

X_1 = np.column_stack((np.repeat(1, len(x)), x, x2))
y_1 = np.vstack(y)

model = sm.OLS(y_1, X_1)
res = model.fit()
print(res.summary())

model result for x1: coef: 1.86, std err: 0.043, lower ci: 1.78, upper ci: 1.954

Calculate 95% CI by hand to replicate the result above:

lower: 1.86 - 1.96*0.043 = 1.77 ; correct
upper: 1.86 + 1.96*0.043 = 1.94 ; correct

Now, do the same thing with different data.

y = np.array([10, 7, 2, 8, 10, 6, 2, 10, 8, 5])
x1 = np.array([2, 1, 2, 0, 1, 3, 11, 0, 5, 0])
x2 = np.array([5, 1, 0, 2, 0, 0, 0, 0, 12, 2])

y = np.vstack(y)
X = np.column_stack((np.repeat(1, len(x1)), x1, x2))

model = sm.OLS(y, X)
res = model.fit()
print(res.summary())

model result for x1: coef: -0.52, std err: 0.265, lower ci: -1.15, upper ci: 0.10

Calculate 95% CI by hand to replicate the result above:

lower: -0.52 - 1.96*0.265 # -1.03 - not correct
upper: -0.52 + 1.96*0.265 # 0 - not correct
John Stud
  • 299
  • 1
  • 11
  • 3
    Python is computing CIs based on the (correct) t-distribution whereas you are computing CIs using the (incorrect) normal distribution. The t-distribution becomes closer to normal as the number of observations increases, which is why your CI is more wrong for the second dataset than for the first. – Gordon Smyth Jan 16 '19 at 01:34
  • Interesting and thank you; I thought it might have had to do with the sample size! – John Stud Jan 16 '19 at 01:57
  • 1
    @JohnStud welcome to your first (I think) introduction to the Central Limit Theorem (by accident) ;-) ! – StatsStudent Jan 16 '19 at 02:05
  • 1
    That's not a CLT issue – Glen_b Jan 16 '19 at 04:01
  • first one in linked duplicates seems to be essentially exact; the thing it in turn is a duplicate of answers the question/explains the issue (so counts as a duplicate) – Glen_b Jan 16 '19 at 04:09

0 Answers0