Range of standardized beta in linear regression

Question

In regression, the beta value represents the increase in $y$ if $x$ changes one unit. The standardized beta gives the same information, but for increase in standard deviations. But why can't $y$ increase by 2 SDs if $x$ increases by 1 SD, for example? I mean, if the influence is heavy enough?

Also, I assumed the correlation coefficient is just a measure for how much two variables vary together, independent of the slope. But in regression, the slope is dependent on the correlation coefficient. Are these two different information?

This result is the [Cauchy-Schwarz Inequality](http://en.wikipedia.org/wiki/Cauchy%E2%80%93Schwarz_inequality#Statement_of_the_inequality) in disguise: the standardized $x$'s and standardized $y$'s can be viewed as unit vectors (in a space whose dimension equals the data count, but that doesn't matter because according to Euclid two vectors--which implicitly include their common origin--determine a plane or just a line). The standardized $\beta$ is the dot product of these vectors, which is the cosine of the angle between them. Cosines of angles always lie between $-1$ and $1$. — whuber, May 15 '14 at 16:11

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

I define "standardized beta" as the slope of a regression line when all variables (all $X$'s and $Y$) have been standardized first. If you have a simple linear regression model (i.e., only one $X$ variable), the standardized beta is the same as Pearson's product-moment correlation, $r$¹. As a result, the standardized beta is bound by the interval $[-1,\ 1]$, just like $r$ is. The reason is given by @whuber in his comment above. However, it might help to try to work through this more slowly.

Let's start by considering the formulas for the estimated slope of a regression line, $\hat\beta_1$ and for $r$:

$$ \hat\beta_1=\frac{\text{Cov}(x,y)}{\text{Var}(x)} \qquad\qquad r=\frac{\text{Cov}(x,y)}{\text{SD}(x)\text{SD}(y)} $$

Now, if both $x$ and $y$ have been standardized first (so that their means are $0$ and their SDs are $1$), then the denominator of $\hat\beta_1$, i.e. the variance of $x$, will be $1^2=1$, and the denominator or $r$, i.e. the SD of $x$ times the SD of $y$, will be $1\times 1=1$. They will be the same. And the numerators are the same no matter what. Thus, the standardized beta is the same as $r$, and has all the same properties (e.g., the same possible range).

On an intuitive level, we can still ask the question why can't the standardized beta / $r$ go above 1? It seems like it ought to be possible to do this. Let's try to make an example. I'll use R. I don't know if you use R, but you can download it for free and run this example; I'll try to make it as self-explanatory as possible.

set.seed(4077)  # this makes the example exactly reproducible
  # here are the true parameters we'll use:  
N  = 30  # we will work with 30 data
b0 = 5   # the true intercept will be 5
b1 = 0   # at first, the true intercept is 0, no relationship
  # let's make our X data & some residuals:
resids = rnorm(30, mean=0, sd=1)
x      = rnorm(N, mean=50, sd=7)
  # now we can generate Y from X, our residuals & our parameters:
y      = b0 + b1*x + resids
  # let's get the means & SDs of X & Y, & their covariance:
mean(x)                   # 51.72901
sd(x)                     # 7.7859
mean(y)                   # 4.82287
sd(y)                     # 0.8541943
cov(x,y)                  # 1.71654
  # with these we can predict the estimated slope & correlation:
cov(x,y)/(sd(x)^2)        # 0.02831629
cov(x,y)/(sd(x)*sd(y))    # 0.2581003
  # let's check the estimated slope & correlation:
coef(lm(y~x))[2]          # 0.02831629
cor(x,y)                  # 0.2581003

  # what happens to the slope if we standardize both x & y first?
x.s = (x - mean(x))/sd(x)
y.s = (y - mean(y))/sd(y)
  # the slope now equals the correlation above:
coef(lm(y.s~x.s))[2]      # 0.2581003

In that case the true value of b1 was $0$, let's make it $1$:

b1 = 1
y1 = b0 + b1*x + resids
  # let's see what happened:
mean(y1)                  # 56.55189
sd(y1)                    # 8.048787
cov(x,y1)                 # 62.33678
  # calculating b1 & r:
cov(x,y1)/(sd(x)^2)       # 1.028316
cov(x,y1)/(sd(x)*sd(y1))  # 0.9947298
  # checking:
coef(lm(y1~x))[2]         # 1.028316
cor(x,y1)                 # 0.9947298
  # let's try the standardized version:
y1.s = (y1 - mean(y1))/sd(y1)
  # here is the standardized beta (from now on, I'll dispense with
  #  also calculating r & then double checking with pre-set functions):
cov(x.s,y1.s)/(sd(x.s)^2) # 0.9947298

That looks about right, so let's make b1=2:

b1 = 2
y2 = b0 + b1*x + resids
  # here's the estimated slope:
cov(x,y2)/(sd(x)^2)       # 2.028316
  # and now we can see the standardized beta:
y2.s = (y2 - mean(y2))/sd(y2)
cov(x.s,y2.s)/(sd(x.s)^2) # 0.9986374

What happened? The slope came out right, but the standardized beta didn't become larger than $1$. Let's try a bigger number, b1=7:

b1 = 7
y7 = b0 + b1*x + resids
  # here's the estimated slope:
cov(x,y7)/(sd(x)^2)       # 7.028316
  # and now we can see the standardized beta:
y7.s = (y7 - mean(y7))/sd(y7)
cov(x.s,y7.s)/(sd(x.s)^2) # 0.9998863

We're still not getting standardized betas $>1$. The reason is that, although the unstandardized slope and the covariance is getting larger and larger, the standard deviation of $y$ is getting larger too.

cov(x,y2)                 # 122.957
cov(x,y7)                 # 426.0582
sd(y2)                    # 15.81382
sd(y7)                    # 54.72799

Once that fact is incorporated, the standardized beta is constrained to fall within the interval $[-1,\ 1]$.

_{1. For more on the relationship between correlation and regression, it may help you to read my answer here: What is the difference between linear regression on y with x and x with y?}

score 2 · Answer 2 · answered May 17 '14 at 05:14

Simple linear regression provides a simple way of addressing the idea behind your question. Recall that

$\hat{\beta_1}=rS_y/S_x$ and $\hat{\beta_0} = \bar{Y}-\hat{\beta_1}\bar{x}$, which leads to the linear regression equation

$\hat{Y} = \hat{\beta_0} + \hat{\beta_1}x$ which can be rewritten as

$\hat{Y} = (\bar{Y}-r\dfrac{S_y}{S_x}\bar{x}) + r\dfrac{S_y}{S_x}x$.

After a little algebra, the previous equation can be written as

$\dfrac{\hat{Y} -\bar{Y}}{S_y} = r(\dfrac{x -\bar{x}}{S_x})$

Now, notice that if $x$ is is one sd above it's mean, then $y$ is predicted to be $r$ sd's above or below it's mean depending on whether the correlation $r$ is positive or negative, and $r$ is of course bounded between $-1$ and $1$ This is known as the regression effect, or regressing to the mean, since $Y$ is predicted to be closer to it's mean than $x$ is to its mean.

Analogously, let $\hat{Y_1}$ and $\hat{Y_2}$ correspond to the predicted values for $x_1$ and $x_2$ respectively. It can then be shown after a little algebra that

$\dfrac{\hat{Y_2} -\hat{Y_1}}{S_y} = r(\dfrac{x_2 -x_1}{S_x})$, which implies that a 1 sd change in $x$ can be associated with no more than an $r$ sd predicted change in $Y$.

Range of standardized beta in linear regression

2 Answers2