1

For a regression model through the origin, with Var($e_i|x_i) = x_i^2σ^2$ . The corresponding regression model is $Y_i$ = $\beta$$X_i$ +$e_i$. How do I create a least squares model? I know I need to take the derivative of $\sum w_i(y_i-\hat{y_i})^2$. Is the weight $\frac{1}{(X_i)^2}$?

Darian
  • 81
  • 1
  • 5
  • See here: http://stats.stackexchange.com/questions/54794/regression-through-the-origin for, among other things, reasons why you probably do not want a regression through the origin. Also, there is probably no reason why you should use weights of the form you have given. – kjetil b halvorsen Nov 12 '14 at 10:47
  • 1
    @kjetilbhalvorsen The weights are being taken inversely proportional to variance; [there is a reason](http://en.wikipedia.org/wiki/Least_squares#Weighted_least_squares) to do that. – Glen_b Nov 12 '14 at 13:51
  • Glen_b: Yes, I know that, but that was not what the questioner asked for! There is no apriori reason the variane should depend on $x_i$, and if it did, it could well depend in some other way! – kjetil b halvorsen Nov 12 '14 at 15:20

1 Answers1

2

This is a linear regression model with heteroskedastic (and I presume non-autocorrelated) error terms, with the functional form of heteroskedasticity known. In such a case, things are pretty easy, because the structure of the covariance-matrix of the error term is known, and so we can implement Generalized Least Squares (not "Feasible" such).

What should the weights be? The purpose of the weights is to transform all the variables involved in the equation in such a way so as the transformed error term has constant variance. Denote this weight $w_i$ (to be determined). Then we are looking at

$$w_iy_i = \beta w_ix_i+w_ie_i \Rightarrow \tilde y_i= \beta \tilde x_i+\tilde e_i$$

We want

$$\text{Var}(\tilde e_i \mid \tilde x_i) = \sigma^2 \Rightarrow E[\tilde e_i^2 \mid \tilde x_i]=\sigma^2$$

$$\Rightarrow E[(w_ie_i)^2 \mid w_ix_i)=\sigma^2 \Rightarrow w_i^2E[e_i^2\mid w_ix_i] = \sigma^2 \Rightarrow w_i^2\cdot (x_i^2\sigma^2) = \sigma^2$$

The only way for this to hold is if we set $$w_i^2 = \frac 1{x_i^2} \Rightarrow w_i = \frac 1{|x_i|}$$

As provided in the post linked to by a comment, for the initial equation $y_i = \beta x_i+e_i$ we have

$$\hat{\beta}_{OLS}=\frac{\sum_{i=1}^N x_iy_i}{\sum_{i=1}^N x_i^2}$$

Then for our transformed model we have

$$\hat{\beta}_{GLS}=\frac{\sum_{i=1}^N \tilde x_i\tilde y_i}{\sum_{i=1}^N \tilde x_i^2} = \frac{\sum_{i=1}^N \frac{x_i}{|x_i|}\frac{y_i}{|x_i|}}{\sum_{i=1}^N \frac {x_i^2}{|x_i|^2}} = \frac 1N\sum_{i=1}^N \left(\frac{y_i}{x_i}\right)$$

Note that implicit in all the above is that the regressor does not take zero values (otherwise one could apply a correction, but we will then be faced with a possibly very large variance for the observation involved).

Using $y_i = \beta x_i+e_i$ we can arrive at $$\hat{\beta}_{GLS} = \beta + \frac 1N \sum_{i=1}^N \left(\frac{e_i}{x_i}\right)$$

which gives

$$\text{Var}(\hat \beta_{GLS} \mid \mathbf x) = \frac 1{N^2}\sum_{i=1}^N \left(\frac{\text{Var}(e_i \mid \mathbf x)}{x_i^2}\right) = \sigma^2/N$$

This should be anticipated, since

$$\tilde x_i = \frac {x_i}{|x_i|} \Rightarrow \tilde x_i^2 = 1$$

and, as a general result for a simple regression without a constant,

$$\text{Var}(\hat \beta_{GLS} \mid \mathbf x) = \frac{\sigma^2}{\sum_{i=1}^N \tilde x_i^2}$$

Moreover, given this estimator, we know that the expression

$$\frac 1{N-1}\sum_{i=1}^N \hat {\tilde e_i}^2,\;\; \hat {\tilde e_i} = \tilde y_i - \hat{\beta}_{GLS}\tilde x_i$$ is a meaningfull estimator of the unknown constant $\sigma^2$.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
  • In your formula for $\hat{\beta}_\text{GLS}$, an $x_i$ will cancel from numerator and denominator, yielding $\sum_{i=1}^N\frac{ y_i}{x_i}$... which is given in a different answer at the same link... – Glen_b Nov 12 '14 at 13:49
  • @Glen_b : Oh boy, naturally -the sign will be preserved in any case. Fixed it, thanks. – Alecos Papadopoulos Nov 12 '14 at 13:54
  • Unfortunately, I managed to edit a $\frac{1}{N}$ out of my comment. And then as soon as I noticed, couldn't get my internet connection to work until it was too late to edit it back in again. – Glen_b Nov 12 '14 at 13:56
  • @Glen_n Don't worry though, I didn't touch the $1/N$ factor in my answer! – Alecos Papadopoulos Nov 12 '14 at 14:01
  • Anyway, the originap poster should be told he probably do not want a regressioin without intercept term! That is almost ne3ver appropriate. – kjetil b halvorsen Nov 12 '14 at 15:21
  • @kjetilbhalvorsen That is true in general but as the linked post indicates there are cases where the constant term _has_ to be excluded on theory grounds, even though this jeopardizes the _statistical_ properties of the estimator. – Alecos Papadopoulos Nov 12 '14 at 15:38
  • A regression though the origin is appropriate if you know that it is consistent with the physical process that gave rise to the data and that you are only interested in estimating the slope of the line and its uncertainty. – Thomas Nov 12 '14 at 16:00
  • 1
    @Thomas Quite true, subject to the caveat that if misspecification of the regressor matrix cannot be excluded, then the omission of the constant term creates a visible probability of garbage estimation results. In social sciences, misspecification is the rule rather than the exception, hence the general rule of thumb "don't omit the constant term!". In physical processes, I imagine we can be more certain about what to include in the regressor matrix. – Alecos Papadopoulos Nov 12 '14 at 16:24