14

I am looking for methods which can be used to estimate the "OLS" measurement error model.

$$y_{i}=Y_{i}+e_{y,i}$$ $$x_{i}=X_{i}+e_{x,i}$$ $$Y_{i}=\alpha + \beta X_{i}$$

Where the errors are independent normal with unknown variances $\sigma_{y}^{2}$ and $\sigma_{x}^{2}$. "Standard" OLS won't work in this case.

Wikipedia has some unappealing solutions - the two given force you to assume that either the "variance ratio" $\delta=\frac{\sigma_{y}^{2}}{\sigma_{x}^{2}}$ or the "reliability ratio" $\lambda=\frac{\sigma_{X}^{2}}{\sigma_{x}^{2}+\sigma_{X}^{2}}$ is known, where $\sigma_{X}^2$ is the variance of the true regressor $X_i$. I am not satisfied by this, because how can someone who doesn't know the variances know their ratio?

Anyways, are there any other solutions besides these two which don't require me to "know" anything about the parameters?

Solutions for just the intercept and slope are fine.

cardinal
  • 24,973
  • 8
  • 94
  • 128
probabilityislogic
  • 22,555
  • 4
  • 76
  • 97
  • the Wikipedia article itself provides you the answer to this question. If you assume normality of the "true" regressor, then you need further conditions on the distributions of the errors. If the true regressor is not Gaussian, then you have some hope. See [Reiersol (1950)](http://www.jstor.org/pss/1907835). – cardinal Feb 26 '11 at 17:07
  • also, what do you mean by "Solutions for just the intercept and slope are fine". Those are your only two parameters! Or were you hoping to try to back out the "true" regressor as well? – cardinal Feb 26 '11 at 17:10
  • @cardinal - I meant that I didn't particularly care about the two scale parameters, and as you say, the "true" regressor $X_{i}$. – probabilityislogic Feb 26 '11 at 23:40
  • I see. That makes sense. – cardinal Feb 26 '11 at 23:45

1 Answers1

7

There are a range of possibilities described by J.W. Gillard in An Historical Overview of Linear Regression with Errors in both Variables

If you are not interested in details or reasons for choosing one method over another, just go with the simplest, which is to draw the line through the centroid $(\bar{x},\bar{y})$ with slope $\hat{\beta}=s_y/s_x$, i.e. the ratio of the observed standard deviations (making the sign of the slope the same as the sign of the covariance of $x$ and $y$); as you can probably work out, this gives an intercept on the $y$-axis of $\hat{\alpha}=\bar{y}-\hat{\beta}\bar{x}.$

The merits of this particular approach are

  1. it gives the same line comparing $x$ against $y$ as $y$ against $x$,
  2. it is scale-invariant so you do not need to worry about units,
  3. it lies between the two ordinary linear regression lines
  4. it crosses them where they cross each other at the centroid of the observations, and
  5. it is very easy to calculate.

The slope is the geometric mean of the slopes of the two ordinary linear regression slopes. It is also what you would get if you standardised the $x$ and $y$ observations, drew a line at 45° (or 135° if there is negative correlation) and then de-standardised the line. It could also be seen as equivalent to making an implicit assumption that the variances of the two sets of errors are proportional to the variances of the two sets of observations; as far as I can tell, you claim not to know which way this is wrong.

Here is some R code to illustrate: the red line in the chart is OLS regression of $Y$ on $X$, the blue line is OLS regression of $X$ on $Y$, and the green line is this simple method. Note that the slope should be about 5.

X0 <- 1600:3600
Y0 <- 5*X0 + 700
X1 <- X0 + 400*rnorm(2001)
Y1 <- Y0 + 2000*rnorm(2001)
slopeOLSXY  <- lm(Y1 ~ X1)$coefficients[2]     #OLS slope of Y on X
slopeOLSYX  <- 1/lm(X1 ~ Y1)$coefficients[2]   #Inverse of OLS slope of X on Y
slopesimple <- sd(Y1)/sd(X1) *sign(cov(X1,Y1)) #Simple slope
c(slopeOLSXY, slopeOLSYX, slopesimple)         #Show the three slopes
plot(Y1~X1)
abline(mean(Y1) - slopeOLSXY  * mean(X1), slopeOLSXY,  col="red")
abline(mean(Y1) - slopeOLSYX  * mean(X1), slopeOLSYX,  col="blue")
abline(mean(Y1) - slopesimple * mean(X1), slopesimple, col="green")
Henry
  • 30,848
  • 1
  • 63
  • 107
  • @Henry, your definition of $\hat{\beta}$ doesn't make any sense to me. Are some "hats" missing? – cardinal Feb 26 '11 at 23:34
  • It is mean to be the observed standard deviation of $\{y_i\}$ divided by the observed standard deviation of $\{x_i\}$. I'll change $\sigma$ to $s$ – Henry Feb 26 '11 at 23:58
  • @Henry, can you clarify some of your comments? Something strikes me as being off based on your current description. Let $\hat{\beta}_{xy}$ be the slope assuming $y$ is the response and $x$ is the predictor. Let $\hat{\beta}_{yx}$ be the slope assuming $x$ is the response and $y$ the predictor. Then $\hat{\beta}_{xy} = \hat{\rho}s_y / s_x$ and $\hat{\beta}_{yx} = \hat{\rho} s_x / s_y$, where $\hat{\rho}$ is the sample *correlation* between $x$ and $y$. Hence the geometric mean of these two slope estimates is just $\hat{\rho}$. – cardinal Feb 27 '11 at 00:15
  • @cardinal: No - when I see $x = by+c$ I mean the slope is $1/b$ since it can be rewritten as $y=x/b-c/b$. When you try to draw the two OLS lines on the same graph together with the observed points (e.g. with $y$ on the vertical axis and $x$ on the horizontal axis) you have to invert one of the slopes. So I meant that you take the geometric mean of $\hat{\rho}s_y/s_x$ and $s_y/\hat{\rho}s_x$, which is simply $s_y/s_x$. Or, if you are unconventional enough to plot $y$ and $x$ the other way round for both lines and the observed points, then you get the inverse of that as the slope. – Henry Feb 27 '11 at 00:39
  • @Henry - that's quite an interesting answer. I don't necessarily doubt its validity, but one thing which does surprise me is that the correlation/covariance between $Y$ and $X$ is completely absent from the answer. Surely this *should* be relevant to the answer? – probabilityislogic Feb 27 '11 at 01:04
  • @probabilityislogic: Not *completely* as I do use the sign of the correlation/covariance ;) But the magnitude may just be noise. In standard OLS, the noise is all assumed to be on the "dependent variable" so the predicted change in the dependent variable due to the change in the independent variable is reduced by the magnitude of the correlation. But if you don't know where the noise is, or indeed which is dependent and which independent, then how can you decide whether to increase or decrease? – Henry Feb 27 '11 at 01:22
  • @Henry, thanks, that clarifies things a bit. Still, the methodology seems a little odd. I may think about this some more and ask another question or two. – cardinal Feb 27 '11 at 03:45
  • @cardinal: To stress my original point, this is not the best way to approach the problem if you have any idea what might actually be going on. It does dreadful things to residuals if the correlation is low. But it is simple if you think you have absolutely no information. Plato is supposed to have said something like *ignorance is the root of all evil*. – Henry Feb 27 '11 at 04:40
  • @Henry, it seems to me that in the setting of this problem, the solution given is equivalent to assuming that $e_{y,i} = \beta e_{x,i}$ for each $i$. In an OLS setting, this method completely ignores the uncertainty in the data. It *pretends* that $y_i$ and $x_i$ are perfectly correlated and so assumes the data lie ***perfectly*** on a line, despite the fact that the data themselves contradict that. In either case, this strikes me as a very ***strong*** (and false) assumption. – cardinal Feb 27 '11 at 05:20
  • @cardinal: No - the simple solution cannot make any requirement that the $e_{y,i}$ and $e_{x,i}$ are correlated. It is discussed in section 2.4 of Gillard's historical overview linked in my original answer, and has the names "Geometric mean regression", "Reduced major axis regression", "Standardized principal component regression" or "Ordinary least products regression" in the literature, having been invented several times. I think the strongest criticism is that it can appear to provide a result where vagueness would be better, particularly when the observed correlation is weak. – Henry Feb 27 '11 at 12:11
  • @Henry, perhaps I'm having a slow last couple of days and you can find the flaw in my argument. If we fix $\hat{\beta} = s_y / s_x$, then we implicitly assume that the sample correlation between $x$ and $y$ is one. Now, this can **only** happen if $x$ and $y$ are (perfectly) linearly related. If we consider the model of this question, a simple substitution yields $y_i = \alpha + \beta x_i - \beta e_{x,i} + e_{y,i}$. The only way to get a perfect linear relationship between $x$ and $y$ is to take $e_{y,i} = \beta e_{x,i}$ for all $i$. Perhaps I'm being dense. Do you see a flaw? – cardinal Feb 27 '11 at 16:20
  • @cardinal: You are thinking in OLS terms and seem to have $\hat{\beta}_{xy} = \hat{\rho}s_y / s_x$ fixed in your mind; in OLS this minimises the sum of squares of vertical residuals, which is not appropriate here as the $x_i$ have errors. I will put some R code in my answer to illustrate the impact. – Henry Feb 27 '11 at 17:14
  • @Henry, my argument is that they are equivalent. Independently of how one arrives at it, by making the choice $\hat{\beta} = s_y / s_x$, this is ***equivalent*** to assuming that your two observable variables $x$ and $y$ are perfectly correlated in an OLS framework. I'm, of course, basing this on the (somewhat limited) information in your post. I'll try to have a glance at the link and await your $R$ code. – cardinal Feb 27 '11 at 17:32
  • @Henry, I took a quick look at your link. Notice that their (2.1) is a weak form of the relation that I noted above. That is, for unbiasedness (though I think they really meant consistency), $\sigma_y^2 = \beta^2 \sigma_x^2$ is required, which is equivalent to a second-moment version of my (stronger) requirement. – cardinal Feb 27 '11 at 18:10
  • @cardinal: Gillard does mean *unbiased*. This is why I said in my original answer "It could also be seen as equivalent to making an implicit assumption that the variances of the two sets of errors are proportional to the variances of the two sets of observations" – Henry Feb 27 '11 at 19:27
  • @Henry, I'm not so sure about that. First, no model is specified for the latent variable in the surrounding text. Is it fixed (i.e., *functional* form) or random (i.e., *structural* form)? If the former, some hope remains (though I actually doubt this---but haven't done the calculation). If it's the latter, there is **no** way that I can see that $\tilde{\beta}_{\mathrm{GM}}$ would be *unbiased* for all possible distributions of the latent variable. Furthermore, the surrounding text and following math expression strongly hint that they intended to say "consistent" instead of "unbiased". – cardinal Feb 27 '11 at 20:51
  • @Henry, your example is interesting. The constants are chosen "perfectly" to satisfy the only case in which the estimate could be consistent. – cardinal Feb 27 '11 at 20:58
  • @cardinal: If you mean $2000/400=5$, then it was indeed deliberate. Apart from that ratio, the constants were arbitrary. At the start of this overlong thread, all I was offering was a simple method with some basic properties. – Henry Feb 27 '11 at 21:50
  • @Henry - is there a measure of error for the slope parameter? from my intuition it would seem to have something to do with "how far" apart the two lines are in the geometric average. For instance, how should I test the hypothesis that the slope is zero? – probabilityislogic Mar 06 '11 at 13:10
  • @probabilityislogic: I would have thought that looking at the [correlation_coefficient](http://en.wikipedia.org/wiki/Pearson_correlation_coefficient#Inference), taking account of the sample size, ought to tell you this. It is a measure of how far apart the $y$-on-$x$ and $x$-on-$y$ slopes are. – Henry Mar 06 '11 at 15:57
  • @henry - so I take it that there is no standard variance formula for this slope - only my intuition to guide me? presumably I could use the delta method, and get $var(\hat{\beta})\approx\frac{var(s_y^2)}{E(s_x)^2}$ – probabilityislogic Mar 13 '11 at 03:21
  • @probabilityislogic: You are correct when saying there is no standard method for $\hat{\beta}$. I am not sure what your formula for $var(\hat{\beta})$ is, but it may be better at this stage to ask a new question. – Henry Mar 13 '11 at 08:46
  • @Henry - I have accepted your answer. One point I would make in addition to the ones you have already made is that your simple slope is the same as the first Principal Component, based on the correlation matrix of the data. This will certainly help with calculating standard errors – probabilityislogic Jul 31 '11 at 00:00
  • @Henry, how does this extend to $x$ vector, $y$ scalar -- can I just centre x¯ y¯ to 0 0, then take the hyperplane with slope $s_y, s_{xi}$ in each coordinate ? (Shall I ask a new question ?) – denis Aug 18 '14 at 09:50