0

X and Y are continuous variables.

I have found a relationship between X and Y.

Now I want to know if that relationship is characteristic of all of X. Is it possible to somehow examine that relationship at different points along X?

In actuality, X was entered on the final step of a hierarchical regression. If it is possible to check this while accounting for all of the other variables in the model, that would be even better.

Edit: I suppose that plotting the residuals is one way to go about this. But, is there a good way to test whether the residuals are randomly distributed?

Dave
  • 1,641
  • 2
  • 14
  • 27

2 Answers2

3
  1. If you want to know if the relationship between X and Y depends on X, then you are describing a polynomial function. You could think of a quadratic equation (Y ~ X + X^2) as X interacting with itself. So you might try that if you want to see if the same linear trend holds up for all ranges of X. If you want to take an exploratory look at this, you could try a loess (local regression) curve.

  2. As for your edit, you could descriptively look at residuals with a simple Q-Q plot. If you want a formal test, you could try the Shapiro-Wilk and/or Kolmogorov-Smirnov tests.

Mark White
  • 8,712
  • 4
  • 23
  • 61
  • Thanks for the reply! I will try the loess curve, that's a great idea. As far as testing the normality, correct me if I'm wrong, but I would actually want to show that the residuals are non-normally distributed (in particular that there is a uniform distribution) along the range of x? – Dave May 12 '17 at 17:17
  • If you are expecting Y to be predicted differently by X at different values of X, then yes: You are expecting non-normal residuals. You can also look for heteroskedasticity. Here's some R code that demonstrates this: `set.seed(1839)` `x – Mark White May 12 '17 at 22:29
  • I'm actually hoping to show that it is predicted equally at different values of X. Is that also possible? – Dave May 15 '17 at 20:35
  • That question would be answered by the `plot(lm(y~x)$fitted, lm(y~x)$res)` code. That plot looks for homoscedasticity. See other stuff on the issue: https://stats.stackexchange.com/questions/76226/interpreting-the-residuals-vs-fitted-values-plot-for-verifying-the-assumptions, https://onlinecourses.science.psu.edu/stat501/node/36, http://reocities.com/Heartland/4205/SPSS/HeteroscedasticityTestingAndCorrectingInSPSS1.pdf – Mark White May 15 '17 at 21:39
  • Perfect, I think this is exactly what I'm after, thanks! – Dave May 15 '17 at 21:50
1

Plotting the 95% confidence intervals would be my approach, this can show regions of the fitted relationship where confidence in the results of modeling is strong and where it is weak. Visually you can see how these can give some insight into the relative strength or weakness of fitted relationships here:

http://commonproblems.readthedocs.io/en/latest/

James Phillips
  • 1,158
  • 3
  • 8
  • 7
  • Great idea, I will try this, thanks! One thing though, X is non-normally distributed (negatively skewed), I'm assuming this will lead to a smaller confidence interval in the range of X that contains the most observations? – Dave May 12 '17 at 17:17
  • Also, is there any kind of statistical test that I can perform on the resulting CIs? – Dave May 12 '17 at 17:26
  • There should always be a greater confidence (smaller intervals) in the data range with the most observations, as that region of the model is usually best characterized by the data. I do not know of a specific statistical test. – James Phillips May 13 '17 at 00:35