How can using standardized residuals as an outcome be a valid approach? And how do results differ from doing one regression only?

Question

The residual approach uses standardized residuals SR from regression of Y on X1 as an outcome, and then regress them on X2 (here is a literature review: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448552/ ; here: (pdf), in Subsection: "How Can We Identify Children Who Are Resilient to the Harmful Effects of SES Deprivation?", a short description is given). Here: https://www.tandfonline.com/doi/epub/10.1080/21642850.2019.1593845?needAccess=true the method is discussed at pages 5-7 (94-96 of the journal), in Subsections "Residuals" and the following "Strengths & limitations".

How can this approach be valid? Not only it assumes the 1st-step regression model is the correct one, but it also ignores its estimation error, by treating SR as an observed variable.
What is the practical difference with respect to performing a multiple regression of Y on X1 and X2? Wouldn't individuals with a positive(negative) coefficient on the second-step regression (SR on X2) be the same with a positive(negative) coefficient of X2 in multiple regression? And, if the issue is standardization, wouldn't it be the same to standardize the coefficient?

Your language is vague. In particular, by "regression of Y on X1 as an outcome," what are the roles of the two variables? If Y is the response, then your remarks about "ignores measurement error" in (1) are obscure and hard to understand. Would you have a reference describing this "residual approach" in more detail? — whuber, Dec 18 '21 at 23:33
I mean that the residuals from regression of Y on X1 are not an observed variable, but an estimate, thus prone to error. The specific strand of literature where I've found this is the one on resilience, where residuals from a regression of a psychological outcome on a specific stressors become the outcomes of a new regression. I haven't found a theoretical paper on this, but here: https://srcd.onlinelibrary.wiley.com/doi/epdf/10.1111/j.1467-8624.2004.00699.x in Subsection: "How Can We Identify Children Who Are Resilient to the Harmful Effects of SES Deprivation?", a short description is given — Federico Tedeschi, Dec 19 '21 at 06:45
I don't follow your reasoning, because *by assumption* the original responses y are "prone to error." Are you perhaps asking about the sequential formulation of multiple regression as discussed at https://stats.stackexchange.com/questions/46185, https://stats.stackexchange.com/questions/17336, https://stats.stackexchange.com/questions/352130, and elsewhere? — whuber, Dec 19 '21 at 15:30
Yes, I remember reading that results from multiple regression for a specific coefficient X are the same as the ones you get by regressing Y on the other regressors, and then regressing the residuals on X. That's why I wondered what the advantage was in adopting a 2-step approach (I don't see the advantage of standardizing the residuals wrt standardizing the beta coefficient afterwards). The error I'm talking about is the one in the estimate of the Beta coefficient at first step: residuals are not observed, but estimated. Moreover, they have the restraint to have a null mean. — Federico Tedeschi, Dec 19 '21 at 15:46
I believe the links I provided respond to all the issues except one: there are many ways to standardize the residuals. If the standardization depends on $x,$ then the procedure you describe is unusual and difficult to justify mathematically. If the standardization scales all the residuals by a constant value, then the procedure will produce the same estimates as any other multiple regression least squares algorithm. Please, then, tell us what form of standardization you have in mind. — whuber, Dec 19 '21 at 16:38
I've come across many papers, but it seems to me they don't specify what they mean by "standardized". Here I found a literature review that doesn't even talk about standardization of residuals: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8448552/ Actually, I gave as granted that the standardization was just a re-scaling in order to have null mean and unit variance. In any case, if the goal is not to measure resilience per se but its association with a predictor, I still wonder why this method has to be used, when a multiple regression would give the same result — Federico Tedeschi, Dec 19 '21 at 21:52
I believe the answers are to be found in our threads about what it means to "control for" covariates. Nevertheless, your question looks sufficiently different in its aim to be genuinely new--but please edit your post to emphasize the points in your latest comment. — whuber, Dec 20 '21 at 03:55
Thank you. I've edited the question. I don't know if it is in line with what you asked. — Federico Tedeschi, Dec 21 '21 at 08:37
Unfortunately, the key reference (your second) is behind a paywall. Few will be interested in obtaining it just to understand what you are trying to ask. It is incumbent on you to describe this method within your post. — whuber, Dec 21 '21 at 14:30
I didn't know; I changed the link. The paper is available also through JSTOR: https://www.jstor.org/stable/pdf/3696586.pdf?refreqid=excelsior%3A8c02ecbad7c1b7c1f21ec3af998879fa and academia.edu: https://www.academia.edu/21284850/Genetic_and_Environmental_Processes_in_Young_Childrens_Resilience_and_Vulnerability_to_Socioeconomic_Deprivation — Federico Tedeschi, Dec 21 '21 at 16:57
Thank you! Unfortunately, the paper turns out to be useless because it is too vague. Besides not explaining what form of "standardizing" was used, the authors then immediately *alter* those residuals: "Residual scores were recoded." That could mean literally *anything.* Cynical translation: "we made stuff up and we're not going to tell you how." — whuber, Dec 21 '21 at 18:28
I will re-try to do a web-search during the winter break, to see if a detailed explanation of the procedure is available somewhere. As for transforming the residuals: I saw a paper taking the inverse of standardized residuals. In any case, what I personally find frustrating is that I was convinced to find a justification of the residual approach as something overcoming issues in using multiple regression, while it hasn't been the case so far. — Federico Tedeschi, Dec 21 '21 at 20:57
I have added: "Here: https://www.tandfonline.com/doi/epub/10.1080/21642850.2019.1593845?needAccess=true the method is discussed at pages 5-7 (94-96 of the journal), in Subsections "Residuals" and the following "Strengths & limitations"" — Federico Tedeschi, Jan 11 '22 at 12:55

How can using standardized residuals as an outcome be a valid approach? And how do results differ from doing one regression only?

0 Answers0