2

I have a question (reproduced below) from an exam. It seems to be presumed that the greater the (product moment) correlation coefficient, the "more appropriate" and the "better" the model. Is such a presumption valid?

In an experiment the following information was gathered about air pressure $P$, measured in inches of mercury, at different heights above sea-level $h$, measured in feet.

h: 2000, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000.
P: 27.8, 24.9, 20.6,  16.9,  13.8,  11.1,  8.89,  7.04,  5.52,  4.28.

(i) Find the product moment correlation coefficient between (a) $h$ and $P$, (b) $\ln h$ and $P$, (c) $\sqrt{h}$ and $P$.

(Answers: (a) -0.9807, (b) -0.9748, (c) -0.9986.)

(ii) Using the most appropriate case from part (i), find the equation which best models air pressure at different heights.

(The answer is that we're supposed to use (c).)

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 2
    On this information the most appropriate model physically is missing from the choices given to you! – Nick Cox Jun 05 '16 at 07:31
  • @NickCox: Sorry, but I have no idea what that might be. Care to explain? –  Jun 05 '16 at 07:39
  • 2
    Pressure declining exponentially with height. https://en.wikipedia.org/wiki/Barometric_formula – Nick Cox Jun 05 '16 at 08:18
  • 2
    This exam appears to be teaching that maximizing the size of the correlation coefficient is an appropriate way to choose a nonlinear transformation of the variables. If you have any choice in the matter, run away from this exam and all other materials from the same source as fast as you can. Find another way to learn the subject. – whuber Jun 05 '16 at 15:46
  • 1
    @whuber: If you could expand on your comment and explain why exactly I should run away from this exam, I'd appreciate it. Thanks! –  Jun 06 '16 at 03:03
  • 1
    My full response to that is at http://www.quantdec.com/misc/MAT8406/Meeting07/Diagnostic_Plots.pdf. Please see any reliable book on exploratory data analysis, such as Tukey's original *EDA*. For intuition concerning why relying on the correlation coefficient is such a bad idea, consider (1) [how limited its square, $R^2$, is for assessing linearity](http://stats.stackexchange.com/a/13317/919) and (2) the fact that two variables, $h$ and $P$, are involved but the correlation coefficient treats them symmetrically: how, then, are you to determine which one to transform? – whuber Jun 06 '16 at 13:33

0 Answers0