1

Suppose I have a toy example for linear regression

set.seed(0)

n_data=50
true_coeff_b1=0.37
noise_ratio=0.13
conf_int_level=0.77

x=runif(n_data)
y=true_coeff_b1*x+noise_ratio*rnorm(n_data)
fit=lm(y~x-1)
summary(fit)
confint(fit)

Whats the relationship between Std. Error and Confidence Interval (as shown in red boxes in the figure?)

enter image description here

I remember from some book, they are different things, one is point estimation another is a random interval estimation (population parameter is fixed and unknown). But is there any relationship between them?

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • I do not use R, but checking in Python it looks like `confint` is a 95% confidence interval based on the `estimate` and `std. error`, assuming the standardized PDF is a $t$ distribution with `DF` degrees of freedom? – GeoMatt22 Feb 15 '17 at 22:39
  • @GeoMatt22 I am little confused with "std of point estimation" vs "confidence interval" that is why I am asking this. – Haitao Du Feb 16 '17 at 00:17
  • OK. My earlier comment was just about the numerical relationship. I believe that both of your quoted terms would be in reference to the inferred sampling distribution of the statistic (as estimated from the data, i.e. that would be why a $t$ PDF vs. a standard-normal one). – GeoMatt22 Feb 16 '17 at 00:21
  • 1
    A better explanation is given by @gung [here](http://stats.stackexchange.com/questions/18208/how-to-interpret-coefficient-standard-errors-in-linear-regression/18213#18213). – GeoMatt22 Feb 16 '17 at 00:39
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/53722/discussion-between-geomatt22-and-hxd1011). – GeoMatt22 Feb 16 '17 at 01:15

2 Answers2

1

My attempt to answer (thanks GeoMat22 !!)

Confidence Interval minus point estimation divided by standard error of point estimation will satisfy T distribution with corresponding degree of freedom.

Here is the verification !

> point_est=fit$coefficients
> point_est_se=coef(summary(fit))[, "Std. Error"]
> c95_ci=as.vector(confint(fit))
> t_bnd=(c95_ci-point_est)/point_est_se
> pt(t_bnd,df=49)
[1] 0.025 0.975

OR

> point_est+qt(0.025,df=49)*point_est_se
        x 
0.2859806 
> point_est+qt(0.975,df=49)*point_est_se
        x 
0.3944128 
> c95_ci
[1] 0.2859806 0.3944128
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
  • The implied underlying logic is something like: "The [$z$ score](https://en.wikipedia.org/wiki/Standard_score) of coefficients obtained from repeated experiments, when standardized using the point estimate + standard error of the *current* experiment, is expected to follow a $t$ PDF with $df$ degrees of freedom". – GeoMatt22 Feb 16 '17 at 02:05
-1

In normal/Gaussian distribution, the confidence intervals can be derived as follows:

$$\text{95% CI lower limit} = \text{mean} - 1.96 \times \text{SE}$$

$$\text{95% CI upper limit} = \text{mean} + 1.96 \times \text{SE}$$

In your case, your mean estimate (or better, your estimated mean for the coefficient) is $0.34020$ and your standard error (SE)) is $0.02698$

Plug that into the equation above, and you get (roughly) the same values of your confidence interval.

I assume (but may be wrong) that the small difference in some of the digits after the decimal are due to the output rounding the model results to certain number of digits, while the confint(fit) probably uses the actual number (with as many digits as it has "inside").

Haitao Du
  • 32,885
  • 17
  • 118
  • 213
Tilen
  • 740
  • 7
  • 18
  • 2
    This is close, but you may notice that the `(CI - estimate)/std_err` values give a systematically wider "z" interval, more like +/-2.01 (far larger than expected from rounding, given the # digits reported in the OP's output). This suggests some "finite sample uncertainty" correction, and indeed is consistent with a "t" interval using the reported $df = 49$ degrees of freedom. BTW if you delete this answer the negative rep. from the downvotes will disappear. (I did not downvote, but the above may be why some did.) – GeoMatt22 Feb 16 '17 at 01:53
  • 2
    I do not mind being down voted, but it is better to have some explanation on why it is wrong. – Haitao Du Feb 16 '17 at 02:12
  • 1
    @hxd1011 I agree, and commenting is definitely [encouraged](http://meta.stats.stackexchange.com/questions/333/should-we-encourage-downvoters-to-leave-a-comment-for-their-downvote). – GeoMatt22 Feb 16 '17 at 02:23
  • I also don't see the point why this was down voted. It could have been easily demonstrated in the comments that a confidence interval based on the t-distribution is calculated as the standard error of the estimate multiplied with the quantile function for the t-distribution and the respective degrees of freedom. The reason why you got close is because in @hxd1011 example the degrees of freedom were large (in `R` compare `qt(0.025, 49)` vs. `qnorm(0.025)`, see also [here](http://stats.stackexchange.com/questions/110359/why-does-the-t-distribution-become-more-normal-as-sample-size-increases)). – Stefan Feb 16 '17 at 04:15
  • hxd1011, I think @GeoMatt22 referred to my answer being down voted. Indeed, I was also puzzled why it was downvoted, I thought I may have misinterpreted the question. – Tilen Feb 16 '17 at 09:24
  • 1
    I also would not mind understanding why the answer was downvoted, so any hints appreciated. – Tilen Feb 16 '17 at 09:32