5

In regression, the beta value represents the increase in $y$ if $x$ changes one unit. The standardized beta gives the same information, but for increase in standard deviations. But why can't $y$ increase by 2 SDs if $x$ increases by 1 SD, for example? I mean, if the influence is heavy enough?

Also, I assumed the correlation coefficient is just a measure for how much two variables vary together, independent of the slope. But in regression, the slope is dependent on the correlation coefficient. Are these two different information?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • Velcome to the site! – kjetil b halvorsen May 15 '14 at 15:36
  • 3
    This result is the [Cauchy-Schwarz Inequality](http://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality#Statement_of_the_inequality) in disguise: the standardized $x$'s and standardized $y$'s can be viewed as unit vectors (in a space whose dimension equals the data count, but that doesn't matter because according to Euclid two vectors--which implicitly include their common origin--determine a plane or just a line). The standardized $\beta$ is the dot product of these vectors, which is the cosine of the angle between them. Cosines of angles always lie between $-1$ and $1$. – whuber May 15 '14 at 16:11

2 Answers2

4

I define "standardized beta" as the slope of a regression line when all variables (all $X$'s and $Y$) have been standardized first. If you have a simple linear regression model (i.e., only one $X$ variable), the standardized beta is the same as Pearson's product-moment correlation, $r$1. As a result, the standardized beta is bound by the interval $[-1,\ 1]$, just like $r$ is. The reason is given by @whuber in his comment above. However, it might help to try to work through this more slowly.

Let's start by considering the formulas for the estimated slope of a regression line, $\hat\beta_1$ and for $r$:

$$ \hat\beta_1=\frac{\text{Cov}(x,y)}{\text{Var}(x)} \qquad\qquad r=\frac{\text{Cov}(x,y)}{\text{SD}(x)\text{SD}(y)} $$

Now, if both $x$ and $y$ have been standardized first (so that their means are $0$ and their SDs are $1$), then the denominator of $\hat\beta_1$, i.e. the variance of $x$, will be $1^2=1$, and the denominator or $r$, i.e. the SD of $x$ times the SD of $y$, will be $1\times 1=1$. They will be the same. And the numerators are the same no matter what. Thus, the standardized beta is the same as $r$, and has all the same properties (e.g., the same possible range).

On an intuitive level, we can still ask the question why can't the standardized beta / $r$ go above 1? It seems like it ought to be possible to do this. Let's try to make an example. I'll use R. I don't know if you use R, but you can download it for free and run this example; I'll try to make it as self-explanatory as possible.

set.seed(4077)  # this makes the example exactly reproducible
  # here are the true parameters we'll use:  
N  = 30  # we will work with 30 data
b0 = 5   # the true intercept will be 5
b1 = 0   # at first, the true intercept is 0, no relationship
  # let's make our X data & some residuals:
resids = rnorm(30, mean=0, sd=1)
x      = rnorm(N, mean=50, sd=7)
  # now we can generate Y from X, our residuals & our parameters:
y      = b0 + b1*x + resids
  # let's get the means & SDs of X & Y, & their covariance:
mean(x)                   # 51.72901
sd(x)                     # 7.7859
mean(y)                   # 4.82287
sd(y)                     # 0.8541943
cov(x,y)                  # 1.71654
  # with these we can predict the estimated slope & correlation:
cov(x,y)/(sd(x)^2)        # 0.02831629
cov(x,y)/(sd(x)*sd(y))    # 0.2581003
  # let's check the estimated slope & correlation:
coef(lm(y~x))[2]          # 0.02831629
cor(x,y)                  # 0.2581003

  # what happens to the slope if we standardize both x & y first?
x.s = (x - mean(x))/sd(x)
y.s = (y - mean(y))/sd(y)
  # the slope now equals the correlation above:
coef(lm(y.s~x.s))[2]      # 0.2581003  

In that case the true value of b1 was $0$, let's make it $1$:

b1 = 1
y1 = b0 + b1*x + resids
  # let's see what happened:
mean(y1)                  # 56.55189
sd(y1)                    # 8.048787
cov(x,y1)                 # 62.33678
  # calculating b1 & r:
cov(x,y1)/(sd(x)^2)       # 1.028316
cov(x,y1)/(sd(x)*sd(y1))  # 0.9947298
  # checking:
coef(lm(y1~x))[2]         # 1.028316
cor(x,y1)                 # 0.9947298
  # let's try the standardized version:
y1.s = (y1 - mean(y1))/sd(y1)
  # here is the standardized beta (from now on, I'll dispense with
  #  also calculating r & then double checking with pre-set functions):
cov(x.s,y1.s)/(sd(x.s)^2) # 0.9947298

That looks about right, so let's make b1=2:

b1 = 2
y2 = b0 + b1*x + resids
  # here's the estimated slope:
cov(x,y2)/(sd(x)^2)       # 2.028316
  # and now we can see the standardized beta:
y2.s = (y2 - mean(y2))/sd(y2)
cov(x.s,y2.s)/(sd(x.s)^2) # 0.9986374

What happened? The slope came out right, but the standardized beta didn't become larger than $1$. Let's try a bigger number, b1=7:

b1 = 7
y7 = b0 + b1*x + resids
  # here's the estimated slope:
cov(x,y7)/(sd(x)^2)       # 7.028316
  # and now we can see the standardized beta:
y7.s = (y7 - mean(y7))/sd(y7)
cov(x.s,y7.s)/(sd(x.s)^2) # 0.9998863

We're still not getting standardized betas $>1$. The reason is that, although the unstandardized slope and the covariance is getting larger and larger, the standard deviation of $y$ is getting larger too.

cov(x,y2)                 # 122.957
cov(x,y7)                 # 426.0582
sd(y2)                    # 15.81382
sd(y7)                    # 54.72799

Once that fact is incorporated, the standardized beta is constrained to fall within the interval $[-1,\ 1]$.

1. For more on the relationship between correlation and regression, it may help you to read my answer here: What is the difference between linear regression on y with x and x with y?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
2

Simple linear regression provides a simple way of addressing the idea behind your question. Recall that

$\hat{\beta_1}=rS_y/S_x$ and $\hat{\beta_0} = \bar{Y}-\hat{\beta_1}\bar{x}$, which leads to the linear regression equation

$\hat{Y} = \hat{\beta_0} + \hat{\beta_1}x$ which can be rewritten as

$\hat{Y} = (\bar{Y}-r\dfrac{S_y}{S_x}\bar{x}) + r\dfrac{S_y}{S_x}x$.

After a little algebra, the previous equation can be written as

$\dfrac{\hat{Y} -\bar{Y}}{S_y} = r(\dfrac{x -\bar{x}}{S_x})$

Now, notice that if $x$ is is one sd above it's mean, then $y$ is predicted to be $r$ sd's above or below it's mean depending on whether the correlation $r$ is positive or negative, and $r$ is of course bounded between $-1$ and $1$ This is known as the regression effect, or regressing to the mean, since $Y$ is predicted to be closer to it's mean than $x$ is to its mean.

Analogously, let $\hat{Y_1}$ and $\hat{Y_2}$ correspond to the predicted values for $x_1$ and $x_2$ respectively. It can then be shown after a little algebra that

$\dfrac{\hat{Y_2} -\hat{Y_1}}{S_y} = r(\dfrac{x_2 -x_1}{S_x})$, which implies that a 1 sd change in $x$ can be associated with no more than an $r$ sd predicted change in $Y$.

jsk
  • 2,810
  • 1
  • 12
  • 25