3

I'm calibrating a piece of lab instrumentation. I create solutions of known concentration ($x$) and measure my instrument response ($y$). On unknown samples, I measure the response and use the regression line to predict the actual concentration (reverse regression).

I know the equation for $PI(x)$, the prediction interval given some value of $x$:

$$PI(x) = t \cdot Syx \cdot \sqrt{ \frac{1}{q} + \frac{1}{n} + \frac{(x - \bar{x})^2}{Sxx} }$$

And the prediction interval bands are plotted as:

$$Y(x) = mx + b \pm PI(x)$$

where:

  • $Syx$ is sqrt of (sum of squared residuals divided by degrees of freedom)

  • $Sxx$ is $\sum (x_i - \bar{x})^2$

  • $q$ is the number of replicate runs

  • $n$ in the number of points in the calibration

  • $m$ and $b$ are the regression parameters

  • $t$ is the inverse $t$ value at whatever significance level you are interested in

If I want the uncertainty of $x$ given a measurement of $y$, would I use the inverse of the $PI(x)$? It seems like the Prediction Interval given a value of $x$ is what range of $\bar{y}$ you would expect to see for future analyses of one or more samples of known $x$. The inverse of the prediction interval given $x$ are the roots of a very large complicated quadratic equation. Using the inverse of the above $PI$ equation would find the $x$ value that has the given y as an upper $PI$ bound, and the $x$ that has the same $y$ as a lower bound. The two intervals (left versus right) will be slightly different.

A colleague was asking about a passage in a textbook of his (Quantitative Chemical Analysis by Daniel Harris), where it stated that this uncertainty estimation is instead:

$$\Delta x = t \frac{Syx}{m} \sqrt{\frac{1}{q} + \frac{1}{n} + \frac{(y - \bar{y})^2}{\frac{m^2}{Sxx}}}$$

It appears that this in the same as the first equation, where the appropriate $y$ values have been substituted for $x$. Which of these is correct? Using actual data, the results are similar. The textbook value is about the average of the two that are calculated from the inverse. However, when using $x$ values near the extents of the calibration line, or for poorly fitted data, the differences between the two are vastly different.

tdy
  • 313
  • 7
Kevin Nowaczyk
  • 592
  • 3
  • 17
  • 1
    You appear to be looking for [fiducial limits in inverse regression](https://stats.stackexchange.com/search?q=%22inverse+regression%22+fiducial). Does that thread answer your questions? – whuber Mar 03 '21 at 16:17
  • whuber Thanks for that link. The equation provided by @Edm in that question is exactly the same equation my colleague referenced, while what you demonstrate may be closer to what I was doing. However, you are referencing the confidence interval instead of the prediction interval. Should I be using the CI instead of the PI for this? The algebra that is used to calculate your fiducial interval from the confidence interval can easily be extended to the prediction interval, which I have done. Between these two solutions, which is "correct"? – Kevin Nowaczyk Mar 03 '21 at 16:39
  • @whuber, I assume you're going to say that your answer in that thread is the right way. Any idea why this textbook would propose the opposite solution? – Kevin Nowaczyk Mar 03 '21 at 16:43
  • I doubt it proposes the opposite solution, but without being able to consult it I don't want to speculate. – whuber Mar 03 '21 at 16:46
  • @whuber, the textbook proposes the same equation as Edm does in the question you sent. The way I visualize this equation is, you measure some y value, reverse-regress the x value, and measure the prediction interval at this value of X. You then use the regression formula on these UCL / LCL values to reverse regress the control limits into its corresponding x values. In your example the left and right side intervals are different. In Edm's equation, the limits are equal, and equal to the vertical interval, scaled by the slope of the best fit line. – Kevin Nowaczyk Mar 03 '21 at 16:52
  • An alternative description of what I imagine the @Edm equation (and the textbook) does is, it transforms the X data into Y data and Y into X using the regression equation, then uses the PI in this newly transformed space. Is this a valid procedure in your opinion? – Kevin Nowaczyk Mar 03 '21 at 18:37
  • Usually it is not valid, because it's a different model that conflicts with the first: they can't both be valid. Regression views the $x$ coordinates as known precisely while only the $y$ coordinates vary from the model. This is crucial for assessing uncertainty and constructing prediction intervals. That is why one goes through the contortions of "inverse regression" instead of just switching the variables. – whuber Mar 03 '21 at 19:10
  • This was what I thought, but the fact that it was the accepted answer in the question that you passed on, as well as being published in one of the most used Quantitative Analysis textbooks used by undergraduate chemistry students, gave me pause. It might be helpful to amend the answer that you posed in that question with a couple sentences on the flaws with the the accepted answer. – Kevin Nowaczyk Mar 03 '21 at 19:16

0 Answers0