0

I know that when we have an experiment that involves a normal distribution, regression to the mean kind of just falls out as a necessary result. But even though this is touted as a law of statistics, does it really hold for every distribution?

jrex
  • 101
  • 1

1 Answers1

4

Regression to the mean doesn't really have anything to do with the normal distribution, it has to do with imperfectly correlated variables.

The classic example is that the very tallest parents will have children who, while above average, are not as tall as the parents. Height is normally distributed (or close enough) but that's not the real issue.

The very highest income parents will have kids who are above average in income, but not the very top. Income is not normally distributed.

The parents with the most friends will have kids who have more friends than average, but not as many as their parents. Number of friends is a count variable and far from normal.

Whenever you have two variables that are imperfectly correlated, the top on one will be less than the top on the other.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 1
    (+1) In other words, the mean is not always the best measure of location. – Carl Mar 26 '18 at 06:39
  • @Carl That remark is incorrect. In the standard regression context it is provable, from the assumptions and the givens (such as the loss function) that the conditional mean indeed is *the* unique best measure of location. – whuber Mar 26 '18 at 14:07
  • @whuber ..for assumptions resulting in loss of generality. Counter examples, (1) Cauchy distributed residuals. (2) If we wish to determine the path of a ridge line from random $x, y, z$ measurements, we would regress for mode, not mean. – Carl Mar 26 '18 at 19:09
  • @Carl Neither of those is considered part of the standard regression context. – whuber Mar 26 '18 at 19:18
  • @whuber Q: "...does it really hold for every distribution?" A: No, example Cauchy. – Carl Mar 26 '18 at 19:21
  • @whuber We do R.V. transformation to improve regression results. Outliers, clumps of data at extreme values, etc. are problematic. https://stats.stackexchange.com/a/209000/99274 – Carl Mar 26 '18 at 19:55
  • @Carl You have wandered far from the subject. I see no point in further comments here. – whuber Mar 26 '18 at 20:11
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/75122/discussion-between-carl-and-whuber). – Carl Mar 27 '18 at 00:01
  • how about heteroscedastic time series? – jrex Mar 27 '18 at 20:15