15

I recently had a client come to me to do a bootstrap analysis because an FDA reviewer said that their errors-in-variables regression was invalid because when pooling data from sites the analysis include pooling data from three sites where two sites included some samples that were the same.

BACKGROUND

The client had a new assaying method they wanted to show was "equivalent" to an existing approved method. Their approach was to compare the results of both methods applied to the same samples. Three sites were used to do the testing. Errors-in-variables (Deming regression) was applied to the data at each site. The idea is that if the regression showed the slope parameter to be close to 1 and the intercept near 0 this would show that the two assaying techniques gave nearly the same results and hence the new method should be approved. At site 1 they had 45 samples giving them 45 paired observations. Site 2 had 40 samples and site 3, 43 samples. They did three separate Deming regressions (assuming a ratio of 1 for the measurement errors for the two methods). So the algorithm minimized the sum of squared perpendicular distances. Separate regressions were done at each site and a pooled regression was also done using all the data from all three sites.

In their submission the client pointed out that some of the samples used at sites 1 and 2 were the same. In the review the FDA reviewer said that the Deming regression was invalid because common samples were used which causes "interference" that invalidates the assumptions of the model. They requested that a bootstrap adjustment be applied to the Deming results to take account of this interference.

At that point since the client did not know how to do the bootstrap I was brought in. The term interference was strange and I was not sure exactly what the reviewer was getting at. I assumed that the point really was that the because the pooled data had common samples there would be correlation for the common samples and hence the model error terms would not all be independent.

THE CLIENT'S ANALYSIS

The three separate regressions were very similar. Each had slope parameters close to 1 and intercepts near 0. The 95% confidence interval contained 1 and 0 for the slope and intercept respectively in each case. The main difference was a slightly higher residual variance at site 3. Furthermore they compared this to the results from doing OLS and found them to be very similar (in only one case did the confidence interval for the slope based on OLS not contain 1). In the case where the OLS CI for the slope did not contain 1 the upper bound of the interval was something like 0.99.

With the results being so similar at all three sites pooling the site data seemed reasonable. The client did a pooled Deming regression which also lead to similar results. Given these results I wrote a report for the client disputing the claim that the regressions were invalid. My argument is that because there are similar measurement errors in both variables the client was right to use Deming regression as a way to show agreement / disagreement. The individual site regressions had no problems of correlated errors because no samples were repeated within a given site. Pooling data to get tighter confidence intervals. The pooling which included use of the the common samples twice might produce positive correlation between the residuals for those common samples which would mean that the confidence intervals for the regression parameters would be too narrow (the estimated model residual variance biased on the low side).

This difficulty could be remedied by simply pooling the data with the common samples from site 1 say left out. Also the three individual site models do not have the problem and are valid. This seems to me to provide strong evidence of agreement even without the pooling. Furthermore the measurements were taken independently at sites 1 and 2 for the common sites. So I think that even the pooled analysis using all the data is valid because the measurement errors for a sample at site 1 are not correlated with the measurement errors in the corresponding sample at site 2. This really just amounts to repeating a point in the design space which should not be a problem. It does not create correlation / "interference".

In my report I wrote that a bootstrap analysis was unnecessary because there is no correlation to adjust for. The three site models were valid (no possible "interference" within sites) and a pooled analysis could be done removing the common samples at site 1 when doing the pooling. Such a pooled analysis could not have an interference problem. A bootstrap adjustment would not be necessary because there is no bias to adjust for.

CONCLUSION

The client agreed with my analysis but was afraid to take it to the FDA. They want me to do the bootstrap adjustment anyway.

MY QUESTIONS

A) Do you agree with (1) My analysis of the client's results and (2) my argument that the bootstrap is unnecessary.

B) Given that I have to bootstrap the Deming regression are there any procedures SAS or R that are available for me to do the Deming regression on the bootstrap samples?

EDIT: Given the suggestion of Bill Huber I plan to look at bounds on the errors-in-variables regression by regression both y on x and x on y. We already know that for one version of OLS the answer is essentially the same as errors-in-variables when the two error variances are assumed to be equal. If this is true for the other regression then I think that will show that the Deming regression gives an appropriate solution. Do you agree?

In order to meet the client's request I need to do the requested bootstrap analysis that was vaguely defined. Ethically I think it would be wrong to just provide the bootstrap because it doesn't really solve the client's real problem, which is to justify their assay measuring procedure. So I will give them both analyses and request at least that they tell the FDA that in addition to do the bootstrap I did inverse regression and bounded the Deming regressions which I think is more appropriate. Also I think that analysis will show that their method is equivalent to the reference and the Deming regression is therefore adequate also.

I plan to use the R program that @whuber suggested in his answer to enable me to bootstrap the Deming regression. I am not very familiar with R but I think I can do it. I have R installed along with R Studio. Will that make it easy enough for a novice like me?

Also I have SAS and am more comfortable programming in SAS. So if anyone knows a way to do this in SAS I would appreciate knowing about it.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • 2
    I don't know the answer to this question, but, on a purely political basis, wouldn't it be better to do what the FDA wants and show (at least, presumably), that the results are similar? (Good question, BTW, +1) – Peter Flom Sep 26 '12 at 11:21
  • 1
    Yes @PeterFlom I agree that doing the analysis for the FDA and showing it doesn't matter. But I think that diplomatically pointing out the results of the regressions and their implications and doing the pooling without the overlapping samples strengthens the argument. I am going to do the bootstrap but I could use help finding available software to do the Deming regression myself without independently coding it up. – Michael R. Chernick Sep 26 '12 at 11:40
  • 2
    Michael, the possibility of "samples" common to "sites" calls into question some natural interpretations of what these (abstract) terms might mean. For instance, I initially thought of "sites" as different geographic locations and "samples" as *separate* entities associated with those locations, each subjected to independent measurements. In this model it is impossible for samples to be common to different sites. Could you please clarify what *you* mean by these terms? – whuber Sep 26 '12 at 13:16
  • 3
    @whuber the sites are different locations. The samples are citrated plasma from individuals. The lab testing is done at the different sites at different times. The comparisons are for two assay measuring devices that are intended to do the same function. At sites 1 and 2 some of the samples were reused but the devices operated independently at site 1 and site 2. So that is why I say the measurement errors are really independent even though the same samples (or portions of the same samples) are used. – Michael R. Chernick Sep 26 '12 at 13:58
  • 1
    a) Agreed that leaving out the duplicated sample from the pooled analysis removes the concerns about lack of independence. b) Very few SAS users are going find it "easy" to use R for bootstrap analyses involving uncommon regression methods. Bootstrap analyses really do require the functional programming mode of thinking, and that is not a mode that SAS encourages. – DWin Sep 29 '12 at 15:49
  • @DWin Thanks for your comments. SAS is using bootstrap more often in its procedures. PROC MULTTEST obviously uses it. The survey sampling procedures in SAS. J. D. Opdike has written efficient SAS macros to do bootstrapping in SAS. One idea I had was to use his macro on a procedure to do Deming regression. So far my difficulty is finding a regression procedure in SAS that does Deming regression. – Michael R. Chernick Sep 29 '12 at 16:29
  • There are several implementations posted for it over the years on R-help. Would citations be helpful in building a SAS version or not? – DWin Sep 29 '12 at 17:23
  • @DWin I would be happy to take whatever you can give to me. – Michael R. Chernick Sep 29 '12 at 17:26
  • Will post as comments on your follow-up SO question. – DWin Sep 29 '12 at 18:45

1 Answers1

10

This is a mutual calibration problem: that is, of quantitatively comparing two independent measurement devices.

There appear to be two principal issues. The first (which is only implicit in the question) is in framing the problem: how should one determine whether a new method is "equivalent" to an approved one? The second concerns how to analyze data in which some samples may have been measured more than once.

Framing the question

The best (and perhaps obvious) solution to the stated problem is to evaluate the new method using samples with accurately known values obtained from comparable media (such as human plasma). (This is usually done by spiking actual samples with standard materials of known concentration.) Because this has not been done, let's assume it is either not possible or would not be acceptable to the regulators (for whatever reason). Thus, we are reduced to comparing two measurement methods, one of which is being used as a reference because it is believed to be accurate and reproducible (but without perfect precision).

In effect, the client will be requesting that the FDA allow the new method as a proxy or surrogate for the approved method. As such, their burden is to demonstrate that results from the new method will predict, with sufficient accuracy, what the approved method would have determined had it been applied. The subtle aspect of this is that we are not attempting to predict the true values themselves--we don't even know them. Thus, errors-in-variables regression might not be the most appropriate way to analyze these data.

The usual solution in such cases is "inverse regression" (as described, for instance, in Draper & Smith, Applied Regression Analysis (Second Edition), section 1.7). Briefly, this technique regresses the new method's results $Y$ against the approved method's results $X$, erects a suitable prediction interval, and then functionally inverts that interval to obtain ranges of $X$ for any given values of $Y$. If, for the intended range of $Y$ values, these ranges of $X$ are "sufficiently small," then $Y$ is an effective proxy for $X$. (In my experience this approach tends to be conservatively stringent: these intervals can be surprisingly large unless both measurements are highly accurate, precise, and linearly related.)

Addressing duplicate samples

The relevant concepts here are of sample support and components of variance. "Sample support" refers to the physical portion of a subject (a human being here) that is actually measured. After some portion of the subject is taken, it usually needs to be divided into subsamples suitable for the measurement process. We might be concerned about the possibility of variation between subsamples. In a liquid sample which is well-mixed, there is essentially no variation in the underlying quantity (such as a concentration of a chemical) throughout the sample, but in samples of solids or semisolids (which might include blood), such variation can be substantial. Considering that laboratories often need only microliters of a solution to perform a measurement, we have to be concerned about variation almost on a microscopic scale. This could be important.

The possibility of such variation within a physical sample indicates that the variation in measurement results should be partitioned into separate "components of variance." One component is the variance from within-sample variation, and others are contributions to variance from each independent step of the subsequent measurement process. (These steps may include the physical act of subsampling, further chemical and physical processing of the sample--such as adding stabilizers or centrifugation--, injection of the sample into the measuring instrument, variations within the instrument, variations between instruments, and other variations due to changes in who operates the instrument, possible ambient contamination in the laboratories, and more. I hope this makes it clear that in order to do a really good job of answering this question, the statistician needs a thorough understanding of the entire sampling and analytical process. All I can do is provide some general guidance.)

These considerations apply to the question at hand because one "sample" that is measured at two different "sites" really is two physical samples obtained from the same person and then split among laboratories. The measurement by the approved method will use one piece of a split sample and the simultaneous measurement by the new method will use another piece of the split sample. By considering the components of variance these splits imply, we can settle the main issue of the question. It should now be clear that differences between these paired measurements should be attributed to two things: first, actual differences between the measurement procedures--this is what we are trying to assess--and second, differences due to any variation within the sample as well as variation caused by the physical processes of extracting the two subsamples to be measured. If physical reasoning about the sample homogeneity and the subsampling process can establish that the second form of variance is negligible, then indeed there is no "interference" as claimed by the reviewer. Otherwise, these components of variance may need explicitly to be modeled and estimated in the inverse regression analysis.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    Thank you for a very nice analysis suggesting the best way to address this problem. However in my particular situation the client has chosen the Deming regression approach and is not looking for a different method. The FDA onjection to the Deming regression appears to be only due to the interference and their suggestion for getting around the problem is some type of bootstrap correction. I was only brought in because they do not know how to do bootstrap. They have no statisticians involved and did not present a statistical analysis of the results as I gave in my report. – Michael R. Chernick Sep 26 '12 at 15:24
  • 2
    I do appreciate the constraints (and should have been explicit about that). In general, though, a good framework for resolving questions like this is to take an appropriate model as your point of departure. If you try to reason your way to a solution using an inappropriate approach and invalid model (to satisfy a client), you will only compound the errors and be unable to reach any clearly defensible solution. What you might consider now is how the Deming regression varies from inverse regression, as well as how Deming regression could be adapted to accommodate multiple variance components. – whuber Sep 26 '12 at 15:28
  • 1
    You may be motivated to demonstrate that the Deming regression, *as already applied,* is sufficiently close to what a more usual or appropriate method would produce: such a demonstration might be the best possible resolution in your situation. – whuber Sep 26 '12 at 15:29
  • Instead what they did was simply to describe the problem and how the data were collected and display the output of the Deming regression. Had a statistician been involved there may have been fewer statistical issues raised about the Deming regression. All that I can do for the clinet is provide a case for the analysis that was done (which included an explanation of why most of the regression could be analyzed without worry about interference from repeated sampling from a common source) and to provide the requested bootstrap adjustment for the residual variance in the pooled model. – Michael R. Chernick Sep 26 '12 at 15:30
  • I cannot at this point tell them to do inverse regression. If a measurement method is approved, I think it can be viewed as the reference and the burden on the company is to show that the new method does essentially the same job as the reference. For this I think the Deming regression can be suitable and at least may be acceptable to the FDA. It probably would have been if the issue of repeated samples did not come up. That issue would not have arisen had they left our one of the repeated samples when they did the pooling. – Michael R. Chernick Sep 26 '12 at 15:34
  • The bootstrap that was requested is I think a waste of time. While we may not be able to separate components of variance with the current approach I do think that there is no bias in the estimate of residual variance that needs adjusting. Nevertheless the company wants me to do the bootstrap and they do not want to use my report that explains why bootstrapping is not necessary. I will still encourage them to at least do the pooled regression with "duplicates" left out. – Michael R. Chernick Sep 26 '12 at 15:39
  • The company expressed to me that there are just a small list of things that they need to do to satisfy the FDA and it is only the bootstrap that they need from me. Given that I am going to do the bootstrap my question to the community is how do I impliment the Deming regression? I do not have their software. I do have their data and SAS available to me. I probably also could do this in R if Deming regression is available in R. I ahve been trying to see if proc reg or proc glm provide options for Deming regression but I could not find such options. Maybe a SAS expert here can tell me. – Michael R. Chernick Sep 26 '12 at 15:44
  • If it can't be done in SAS without writing a separate macro I would like to if there is anything in R that can help me do this task. – Michael R. Chernick Sep 26 '12 at 15:45
  • http://cran.r-project.org/web/packages/mcr/mcr.pdf – whuber Sep 26 '12 at 15:46
  • Your answer and your comments have been very helpful. Also I will check out your link to the can libraries to see if the software there will do what I want. – Michael R. Chernick Sep 26 '12 at 16:01
  • I am unable to use your link to the mcr.pdf file. – Michael R. Chernick Sep 26 '12 at 16:41
  • The link is valid--I just re-checked it. Try pasting it directly into a browser's address textbox. – whuber Sep 26 '12 at 17:33
  • Thanks it seems to be just a matter of my computer taking too long doing the search. I tried the link again and it worked. This is a 67 page document on mcr a package that does error in variables regression in R. – Michael R. Chernick Sep 26 '12 at 17:39
  • I recall Berkson writing about 'the two regressions". One is what you get by regressing y on x and the other x on y. In errors in variables regression the slope of a linear regression line would be somewhere between the slope for these two regressions. I also thought that inverse regression is the same as swapping the role of y and x in the regression and that it is the same as the method called "calibration." Is that not true? Also please see my edited question and if you cna address my additional questions I would appreciate that. – Michael R. Chernick Sep 28 '12 at 10:07
  • Inverse regression is not the same as swapping $x$ and $y$: notice that in my description of it, $y$ is still regressed on $x$. The SO site is great for getting help with the mechanics of `R` and this site now provides thousands of examples of `R` in the context of statistical questions: consider using both as resources to speed your learning curve. – whuber Sep 28 '12 at 14:55
  • in the five months that I have been using SE I have mostly answered questions and enjoyed doing so. This question is the first time I really needed help and I am very pleased with the responses thus far. It gives me a different viewpoint to see the value of the site. – Michael R. Chernick Sep 28 '12 at 15:14
  • Regarding inverse regression I was more interested in whether the terminology is sometimes used differently. Even though what I describe is different from what you call inverse regression I have heard the term calibration used for what I mentioned and I think some people refer to it as inverse regression. Also is my argument about bounding the slopes for the Deming regression correct and why do you feel that decomposing error variances with respect to the measurement error is important when comparing a new method to a standard? – Michael R. Chernick Sep 28 '12 at 15:23
  • let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/5958/discussion-between-whuber-and-michael-chernick) – whuber Sep 28 '12 at 16:24