7

Questions:

  • Even if there is no "widely accepted" technique, is there a useful-and-above-average technique for estimating goodness of fit in orthogonal regressions?
  • What are the pros/cons of this technique?

Background and Motivation: I recently discovered the orthogonal regression (=total least square regression, i.e. Deming with ratio of variances set to 1). Basically, I have $x$ and $y$, which are disease symptoms corresponding to two steps of a disease.

$$x = x^* + \mathrm{error}$$ $$y = y^* + \mathrm{error}$$

Here $(x,y)$ are the observed variables (symptoms measured visually, including error in disease assessment: same error for both) and $(x^*,y^*)$ are the latent variables ("true" symptoms). Note that $x^*$ and $y^*$ could be measured directly, e.g. by taking pictures (HD pictures then image analysis; close to no error in disease assessment); not done here because it is very long.

See here for the graphic (and first discussion on which regression to use). I made an orthogonal regression to have the relationship between $x$ and $y$. I would like to measure the goodness of fit for my orthogonal regression:

  • explained differently, I would like to know how much $x$ could help in predicting $y$ (= how much visual disease assessment of a symptom helps in predicting the visual assessment of the other symptom).
  • if not possible, knowing how much $x^*$ could help in predicting $y^*$ (= how much one symptom measured without error helps in predicting the other symptom measured without error; also, it would help in understanding the behaviour of the disease).

I asked for hints on R functions/packages on SO, on obtained this answer from @Gaurav:

There are many proposed methods to calculate goodness of fit and tolerance intervals for Deming Regression but none of them widely accepted. The conventional methods we use for OLS regression may not make sense. This is an area of active research. I don't think there many R-packages which will help you compute that since not many mathematicians agree on any particular method.

NOTM
  • 113
  • 9
  • If you are predicting $Y$ from $X$, then it is not clear why you would want to use orthogonal regression. How does your situation differ from a standard regression of $Y$ against $X$? – whuber Aug 19 '15 at 16:49
  • Ok I wasn't clear. X and Y are measured independently (two steps of a disease). But it _seems_ that, in fact, they are not independant, meaning that X may explain Y. My objective is to have the relationship between X and Y (regression), and to know how much X is explaining Y. – NOTM Aug 19 '15 at 16:52
  • @whuber: isn't your question answered by the OP's remark that both X and Y were measured with error? – amoeba Aug 19 '15 at 16:57
  • @ whuber, amoeba: I edited my post to make it more clear. – NOTM Aug 19 '15 at 17:02
  • 1
    @amoeba No, it's not. When $Y$ is to be predicted from a measurement $X$, then what is of interest is precisely that: given the *observed* value of $X$, what can be said about $Y$? That's what regression methods do. A Deming regression fit is useless for that purpose. (It tries to uncover information about an underlying linear *association* between the *unobserved* values.) – whuber Aug 19 '15 at 18:27
  • Does it? I thought Deming regression was designed to uncover information about an underlying linear association between two _observed values_, each with a _known error_ (or a precisely-estimated error). – NOTM Aug 19 '15 at 19:48
  • @whuber, I am not at all an expert here, but I thought that the whole "errors-in-variables" business is precisely about that: wikipedia says [in the first sentence](https://en.wikipedia.org/wiki/Errors-in-variables_models) that those are "regression models that account for measurement errors in the independent variables"; Deming regression is given as one of the examples of errors-in-variables. See also the last comment by NOTM. – amoeba Aug 19 '15 at 20:45
  • 1
    @amoeba Take a look at the model specification. The parameter estimates relate the response value to the *unobserved* values of the regressors. You don't have those unobserved values available when you're making predictions. The whole point of Deming regression is to uncover some *theoretical relationship* among the unobserved values, but it's not to make predictions in the sense of asserting what $y$ is likely to be given an observed $x$. – whuber Aug 19 '15 at 21:18
  • @whuber, amoeba: thanks for your answers. Might be a silly question to ask, but if I have 1)x=x*+error and y=y*+error and 2)a theoretical relationship between x* and y*, can't I estimate how much x* helps in predicting y*? I.e. same question as the original, but on the latent variables rather than the observed variables. – NOTM Aug 20 '15 at 07:20
  • In [this post](http://stats.stackexchange.com/questions/96069/goodness-of-fit-value-from-orthogonal-distance-regression), the Method of Moments is proposed as a solution (although the OP finally goes for Pearson Coefficients, that were deemed inappropriate by in my post [on SO](http://stackoverflow.com/questions/32046532/methcomp-deming-orthogonal-regression-goodness-of-fit-confidence-inter)). What do you think of Method of Moments? – NOTM Aug 20 '15 at 10:21
  • Yes, you can ask how much $x^{*}$ helps in predicting $y^{*}$--but that has no practical applications, since you cannot actually observe $x^{*}$. I wonder whether you might be using the word "predicting" in the sense of "fitting." That means you are interested in estimating a linear relation of the form $y^{*}=\alpha+\beta x^{*}$ in order to make inferences about how $x^{*}$ and $y^{*}$ might be associated. Note that this is a *symmetrical* relation; it could just as well be written in the form $\theta_y y^{*}+\theta_x x^{*}+\theta_0=0$, with the objective being to estimate the $\theta_i$. – whuber Aug 20 '15 at 14:10
  • 1
    @whuber: thanks for your answer. Actually, $x*$ could be measured, with a technique that does not make mistake (e.g., instead a visual assessment of disease symptoms that gives $x$, HD picture + image analysis: possible but very long; the error would not be null, but arguably negligible if done the right way). So yes, they are practical applications (apart from a basic understanding of what drives the disease, which helps in understanding others results properly). – NOTM Aug 20 '15 at 14:31
  • 1
    That is a revealing comment; it really clarifies your situation. Please include that information in the question itself. – whuber Aug 20 '15 at 14:33
  • @whuber: done, I hope this helps. What about the Method of Moments (previous comment)? – NOTM Aug 20 '15 at 14:48

0 Answers0