6

I am trying to derive (one-sided) tolerance intervals related to the Deming regression model: $$ x_i=x^*_i + \epsilon_i$$ $$ y_i = (\alpha+\beta x^*_i) + \epsilon'_i$$ where the $x^*_i$'s are nonrandom fixed numbers, $\epsilon_i \sim {\cal N}(0,\sigma_x^2)$, $\epsilon'_i \sim {\cal N}(0,\sigma_y^2)$, and all variables $\epsilon_i, \epsilon'_i$ are mutually independent. The ratio $\sigma_x^2/\sigma_y^2$ is assumed to be known. Put $\tau^2=\sigma_x^2+\sigma_y^2$.

Given a new ``true $x$'' value $x^*_\text{new}$ I'm looking for an upper tolerance bound of the difference $$y_\text{new} - x_\text{new} \sim {\cal N}(\alpha+(\beta-1)x^*_\text{new}, \tau^2).$$ Actually this upper tolerance bound is nothing but an upper $100(1-\alpha)\%$-confidence bound about the $100p\%$-quantile of $y_\text{new} - x_\text{new}$. As for classical Gaussian models I'm looking for a tolerance bound having form $$B(x,y)=(\hat{y^*}_\text{new} - x^*_\text{new}) + K s$$ with $\hat{y^*}_\text{new} = \hat\alpha + \hat\beta x^*_\text{new}$ and $s^2= \hat\tau^2$. Denoting by $F_\text{new}$ the cumulative distribution function of the difference $y_\text{new} - x_\text{new}$, the tolerance factor $K$ must be derived in order that $$\Pr\left( F_\text{new}(B(x,y)) > p \right) = 1-\alpha$$ for some given $p\in]0,1[$ (the tolerance level) and confidence level $1-\alpha$ (of course here $\alpha$ is not the intercept parameter $\alpha$ of the model!)

Estimates of $\alpha$, $\beta$ and $\tau^2$ as well as their asymptotic variance matrix $V$ and an estimate $\hat V$ of this asymptotic variance matrix are available in some textbooks and/or research papers (see for instance Fuller's book or Iles & Gillard's report 1 and report2). In particular $\hat\tau^2$ is asymptotically independent of $(\hat\alpha,\hat\beta)$.

Now I'm trying to derive the tolerance bound by miming the way used for deriving the tolerance bound in a classical regression model. It is based on the following lemma.

Lemma Let $W \sim {\cal N}(0, \sigma^2)$ independent of $Q \sim \frac{\chi^2_{d}}{d}$. Then the number $K>0$ satisfying $$\Pr\left( \Phi(W+K\sqrt{Q}) > p \right) =1-\alpha$$ is given by a known formula $K=K(\sigma,d,p,\alpha)$ (see Krishnamoorthy & Mathew's book).

Now put $X_\text{new}=(1,x_\text{new})'$. One has the three asymptotically independent random variables $$ \frac{(y_\text{new} - x_\text{new} ) - (\alpha+(\beta-1)x^*_\text{new})}{\sqrt{\tau^2}} \sim {\cal N}(0,1),$$ $$ \frac{(\hat{y^*}_\text{new} - x^*_\text{new} ) - (\alpha+(\beta-1)x^*_\text{new})}{\sqrt{\tau^2}} \sim {\cal N}\left(0,\frac{X_\text{new}'VX_\text{new}}{\tau^2}\right),$$ $$\frac{s^2}{\tau^2} \sim \frac{\chi^2_{n-2}}{n-2}$$

The definition of $K$ becomes $$\Pr\left( \Phi(W+K\sqrt{Q}) > p \right) =1-\alpha$$ with $W \sim {\cal N}\left(0,\frac{X_\text{new}'VX_\text{new}}{\tau^2}\right)$ independent of $Q \sim \frac{\chi^2_{n-2}}{n-2}$.

Applying the lemma we get $K$ depending on the unknown values of $\tau$ and $V$. Substituting with the estimates $\hat\tau$ and $\hat V$ gives an approximate tolerance factor $\hat K$.

Well, I have done it and I have implemented everything in R. But assessing the performance of my tolerance bound with simulation shows that even with large sample sizes it does not work, except for the case $\beta=1$. I'm pretty sure of my formulas for the estimates and the variance matrices since I have also implemented confidence intervals and prediction intervals and they work well.

EDIT: In fact that works !! There was a silly error in the R code I used for simulations. Should I delete this post ? I think no because the method could interest others.

Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101
  • 1
    Hi Stephane: if you are able to post an answer to your own question with an illustrative example and/or code, I think that would make an excellent addition to the site! :-) – cardinal Nov 26 '12 at 16:08
  • 1
    @cardinal Ok. I will do it whenever I will be less busy. – Stéphane Laurent Dec 03 '12 at 19:43
  • Stephane: Sounds good. I know how that goes! :-) – cardinal Dec 03 '12 at 19:56
  • 1
    Done, @cardinal. – Stéphane Laurent Jul 01 '13 at 21:18
  • I have implemented this code and have a question. If I wanted to plot a tolerance interval only on ynew to make tolerance limits on a regression model, could I simply change TolUp – rconway91 Oct 14 '16 at 02:53
  • Also, could you provide updated reference links? – rconway91 Oct 14 '16 at 03:29
  • @rconway91 No it is not correct to do that. I will reply later because I'm a bit sick right now (send me a reminder if I don't reply within a few days). For the broken links, you could try to type "Gillard Iles regression" in Google. – Stéphane Laurent Oct 14 '16 at 10:45
  • @Stéphane Laurent, thank you. If this method is not appropriate for determining prediction or tolerance bounds on ynew, do you know of any methods that are? I am having trouble finding anything consistent on the internet – rconway91 Oct 14 '16 at 13:57
  • @rconway91 It is possible to get the tolerance interval you desire by following a similar method.. I have a R code returning this interval, and I will post it. – Stéphane Laurent Oct 14 '16 at 14:37
  • @Stéphane Laurent, thank you very much. Do you know of a publication I can review and cite in reference to the code? – rconway91 Oct 14 '16 at 15:20
  • @rconway91 Finally I don't have this R code. After thinking about your question, I believe you have to replace the two occurences of `sigma.ee + sigma.uu` with `sigma.ee` (and to do the change of `TolUp` as you proposed). But I would do some simulations to check it is correct. – Stéphane Laurent Oct 16 '16 at 15:49
  • @rconway91 As far as I know, there's no publication about the tolerance bounds. – Stéphane Laurent Oct 16 '16 at 15:50
  • @Stéphane Laurent thank you for your input it has been tremendously helpful. I too have been unable to find anything on tolerance or prediction intervals for Demings Regression which has been very frustrating. Can you think of any other method to apply a prediction interval to ynew? – rconway91 Oct 17 '16 at 13:17
  • @rconway91 My code for prediction intervals [is here](https://gist.github.com/stla/5fcd959576413798d4cc09e7493e53e9). There are three prediction intervals: prediction of $y-x$ given $x^*$, prediction of $y$ given $x^*$, and prediction of $y$ given $x$. I do not remember how I did the last one. I started to do some simulations to check this is correct, and this looks correct at first glance but I should take more time to look more carefully. – Stéphane Laurent Oct 19 '16 at 09:30
  • @Stéphane Laurent thank you, this is just what I needed. For the references, I have not had much luck with the searches suggested above, do you know of anything else that may be helpful? – rconway91 Oct 19 '16 at 21:25
  • @rconway91 Maybe you can search in the papers written by Bernard Francq, using the keyword "errors in variables regression". I know Bernard, I could ask him his thesis when I have the opportunity. – Stéphane Laurent Oct 20 '16 at 02:57
  • @Stéphane Laurent, thanks, I found some great help [here](https://cse.ine.pt/revstat/pdf/rs100104.pdf) and [here](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.332.7647&rep=rep1&type=pdf). I am still a little confused on what is going on the deming.estim function. What exactly is the output sigma? – rconway91 Oct 21 '16 at 01:56
  • @rconway91 This is an estimate of $\sqrt{\sigma_x^2}$ with the notations of my post. – Stéphane Laurent Oct 21 '16 at 04:05
  • @Stéphane Laurent, thanks. In my application, the standard error of measurement of methods **X** and **Y** are known. When I plug these known standard errors in for $\sqrt{\sigma_x^2}$ and $\sqrt{\sigma_y^2}$, my bounds are far too tight when compared those calculated using $\sqrt{\sigma_x^2}$ and $\sqrt{\sigma_y^2}$ or **OLSv**. Are the parameters $\sqrt{\sigma_x^2}$ and $\sqrt{\sigma_y^2}$ also accounting model error (similar to Mean Square Error or Residuals in OLSv? – rconway91 Oct 21 '16 at 14:16
  • Do you have the derivation of the formula for your prediction interval? I'm wondering where the V[2,2] in the following line comes from: sd.ynew.Fuller2 – John Rogers Mar 26 '17 at 09:49
  • @JohnRogers This is similar to http://stats.stackexchange.com/questions/6163/what-is-the-prediction-error-while-using-deming-regression-weighted-total-least – Stéphane Laurent Mar 26 '17 at 12:17

1 Answers1

4

Below is the code. Not very pretty, but (as said in the update of my question) it works well.

# returns estimates 
deming.estim <- function(x,y,lambda=1){  # lambda=sigmay²/sigmax² 
  n <- length(x)
  my <- mean(y)
  mx <- mean(x)
  SSDy <- crossprod(y-my)[,]
  SSDx <- crossprod(x-mx)[,]
  SPDxy <- crossprod(x-mx,y-my)[,] 
  A <- sqrt((SSDy - lambda*SSDx)^2 + 4*lambda*SPDxy^2)
  B <- SSDy - lambda*SSDx
  beta <- (B + A) / (2*SPDxy)
  alpha <- my - mx*beta
  sigma.uu <- ( (SSDy + lambda*SSDx) - A ) /(2*lambda) / (n-1)
  s.vv <- crossprod(y-my-beta*(x-mx))/(n-2) 
  # formula Gilard & Iles 
  sbeta2. <- (SSDx*SSDy-SPDxy^2)/n/(SPDxy^2/beta^2)
  sbeta. <- sqrt(sbeta2.)
  salpha2. <- s.vv/n + mx^2*sbeta2.  
  salpha. <- sqrt(salpha2.)
  V <- rbind( c(salpha2., -mx*sbeta2.), c(-mx*sbeta2., sbeta2.) )  
  return(list(alpha=alpha,beta=beta, 
              V=V, 
              sigma=sqrt(sigma.uu*(n-1)/(n-2)))
  ) 
}

# returns one-sided upper tolerance bound
deming.tolerance <- function(x,y,lambda=1,xnew,p=98/100,alph=5/100){
  fit <- deming.estim(x,y,lambda)
  V <- fit$V
      sigma.uu <- fit$sigma^2
  sigma.ee <- lambda*sigma.uu
  Vbeta.over.sigma <- V/(sigma.ee + sigma.uu)
  Xnew <- as.matrix(c(1,xnew))
  d <- sqrt( (t(Xnew)%*%Vbeta.over.sigma%*%Xnew) )
  k1 <- d * qt(1-alph, length(x)-2, ncp=qnorm(p)/d)
  S <- sqrt(sigma.ee + sigma.uu)  
  ynew <- fit$alpha+fit$beta*xnew
  TolUp <- ynew - xnew + k1*S
  TolUp
}

Some simulations to check the frequentist coverage:

#################
## simulations ##
#################
n <- 60
mu <- runif(n, 1, 100)
xnew <- 50
sigma.x <- 1
lambda0 <- 2
sigma.y <- sqrt(lambda0)*sigma.x
alpha <- 2
beta <- 3
p <- 0.98
# true value of the quantile:
qq <- qnorm(p, alpha+(beta-1)*xnew, sqrt(sigma.x^2+sigma.y^2))

nsims <- 5000
test <- rep(NA,nsims)
lambda <- lambda0
for(i in 1:nsims){
  x <- rnorm(n, mu, sigma.x)
  y <- rnorm(n, alpha+beta*mu, sigma.y)
  test[i] <- ( qq < deming.tolerance(x,y,lambda=lambda,xnew=xnew,p=p) )
}

> mean(test)
[1] 0.9544
Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101