In Thinking, Fast and Slow, Daniel Kahneman poses the following hypothetical question:
(P. 186) Julie is currently a senior in a state university. She read fluently when she was four years old. What is her grade point average (GPA)?
His intention is to illustrate how we often fail to account for regression to the mean when making predictions about certain statistics. In the subsequent discussion, he advises:
(P. 190) Recall that the correlation between two measures—in the present case reading age and GPA —is equal to the proportion of shared factors among their determinants. What is your best guess about that proportion? My most optimistic guess is about 30%. Assuming this estimate, we have all we need to produce an unbiased prediction. Here are the directions for how to get there in four simple steps:
- Start with an estimate of average GPA.
- Determine the GPA that matches your impression of the evidence.
- Estimate the correlation between reading precocity and GPA.
- If the correlation is .30, move 30% of the distance from the average to the matching GPA.
My interpretation of his advice is as follows:
- Use "She read fluently when she was four years old" to establish a standard score for Julie's reading precocity.
- Determine a GPA that has a corresponding standard score. (The rational GPA to predict would correspond to this standard score if the correlation between GPA and reading precocity were perfect.)
- Estimate what percentage of variations in GPA can be explained by variations in reading precocity. (I assume he is referring to the coefficient of determination with "correlation" in this context?)
- Because only 30% of the standard score of Julie's reading precocity can be explained by factors that can also explain the standard score of her GPA, we are only justified in predicting that the standard score of Julie's GPA will be 30% of what it would be in the case of perfect correlation.
Is my interpretation of Kahneman's procedure correct? If so, is there a more formal mathematical justification of his procedure, especially step 4? In general, what is the relationship between the correlation between two variables and changes/differences in their standard scores?