2

I'm trying to reproduce a result from a book (see bottom) and it doesn't work. I would like to do some further readings about this method but he doesn't specifically give the method other than a formula.

I've already tried identifying the method using Wikipedia with no success.

This is the linear model: $Z = cX + dY$

He provides an equation for $c$ :

$$ c = \frac{{\rm corr}(X,Z) - {\rm corr}(Y,Z){\rm corr}(X,Y)}{1-{\rm corr}(X,Y)^2} $$

The weight $d$ is calculated equivalently. He then writes that $c$ and $d$ can be used to calculate the squared error. Using trial and error, I figured out that the correlation coefficient ${\rm corr}()$ is very likely Spearman's $\rho$ (At least that's the method he used so far to calculate correlation coefficients.). Additionally, he mentions that the means of $X$ and $Y$ are assumed to be vanishing.

I'm relatively new to linear regression, so at first I thought it's least squares but this equation doesn't look like it to me.

Does anyone recognize this method and can give me a name, so I can read more about it?

This all comes from a popular science book on football/soccer statistics. The book is in German. The formula can be found in Appendix A7.3 on pp. 297. The particular example, I'm trying to reproduce can be found on pp. 140.

Gerome Bochmann
  • 227
  • 3
  • 11
  • 2
    Please make the reference more precise than "a book". – Nick Cox Mar 16 '14 at 17:55
  • I didn't use it since it's a German book. I'll nevertheless put it in. – Gerome Bochmann Mar 16 '14 at 17:58
  • In English-language discussions becoming zero is sometimes described as vanishing. That however is the least of the puzzles here. – Nick Cox Mar 16 '14 at 18:03
  • Although I don't read German, I second @NickCox's suggestion. The reference (including page #) is needed at a minimum. In addition, an excerpt might be nice. On a different note, are you sure that the denominator isn't square-rooted (ie, $\sqrt{1-{\rm corr}(X,Y)^2}$)? – gung - Reinstate Monica Mar 16 '14 at 18:03
  • @gung The denominator is not square rooted. I will try solving it with a rooted denominator though. – Gerome Bochmann Mar 16 '14 at 18:16

2 Answers2

3

The equation given for $c$ is suspiciously like the equation for a semi-partial correlation1:
$$ r_{Z(X|Y)} = \frac{{\rm corr}(X,Z) - {\rm corr}(Y,Z){\rm corr}(X,Y)}{\sqrt{1-{\rm corr}(X,Y)^2}}\ , $$ except that your denominator does not include the square root. That might be a typo2. As a result, I wonder if the author isn't talking about the following structural equations model (SEM) with $Z$ caused by $X$ and $Y$, which are themselves correlated:

enter image description here

This is a rather low-powered usage of SEM, it's just that you are analyzing a correlation matrix according to a specified underlying pattern and finding the path coefficients (i.e., $c$ and $d$) that will optimally reproduce the observed pattern of correlations using the specified path model. Because you are working with the correlation matrix, the variables will all have mean zero. The paths turn out to be the semi-partial correlations because you have specified that $X$ and $Y$ are correlated, but $Z$ is simply a function of $X$ and $Y$, their inter-correlation notwithstanding.

1. To learn more about semi-partial correlations, see this website or my answer here: What's the order of correlation?
2. If it's not a typo, I have no idea what this might be.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Now, that I read this, the appendix before the one on regression describes how to calculate a partial correlation coefficient. I will try this out, it looks like a hot lead. – Gerome Bochmann Mar 16 '14 at 19:57
  • 1
    The equation I list is the *semi-partial* correlation. If it where a *partial* correlation, there would be a $\sqrt{1-{\rm corr}(Y,Z)^2}$ included in the denominator as well. That would be an even bigger typo. To understand these, try reading the linked resources. – gung - Reinstate Monica Mar 16 '14 at 20:04
  • 1
    +1 A single upvote seems inadequate, I think this answer is what SE was made for. – Glen_b Mar 16 '14 at 22:37
0

The formula cited is simply the formula for the beta coefficient, or the standardized regression coefficient. Thus, the book is simply stating the formula for calculating standardized versions of c and d in the regression equation stated.

Any stats program should be able to provide standardized regression coefficients. However, often unstandardized regression coefficients will be provided as default and some option will need to be selected to return the standardized values. Perhaps this is why you could not get the same results.