What's the added value of SD line over regression line when examining association between 2 variables?

Question

I'm trying to incorporate different practices to use when exploring a new data set. Especially, how to examine the association between two variables.

Steps for example (not necessarily by order):

plot a y-by-x scatter plot of the raw data to see the relationship visually.
compute summary statistics for each variable (mean and sd)
compute correlation coefficient r
draw the OLS regression line, compute its slope and intercept
etc....

I've come across the "SD line" in Freedman's Statistics book, which is defined as:

"the line that goes through the point of averages and climbs at the rate of one vertical SD for each horizontal SD" Freedman, D., Pisani, R., & Purves, R. (2007). Statistics (4th edn).

Since this book ("Statistics") is a canonical textbook, I consider its choice to discuss the SD line as an indication for the line's importance. However, a simple google search for the term "SD line" doesn't yield as many independent results. Most of them come directly from Freedman's book. This tells me it's not a central concept in bivariate analyses in general.

When comparing the SD with the OLS regression line, it seems like the regression line is more informative (than the SD line) for predicting y from x. Therefore, I'm wondering if bothering to plot the SD line has any benefit or added value that I would not already have when plotting the regression line.

Example using `mtcars` dataset, focusing on association between weight and mpg

data(mtcars)

## calculate means
mean_wt <- mean(mtcars$wt)
mean_mpg <- mean(mtcars$mpg)

## calculate standard deviations
sd_wt <- sd(mtcars$wt)
sd_mpg <- sd(mtcars$mpg)

## scatter plot
plot(x = mtcars$wt, y = mtcars$mpg)

## add the "point of averages"
points(mean_wt, mean_mpg, col = "red", cex = 1.5, pch = 16)

## calculate the slope of the sd line
slope <- -1*sd_mpg/sd_wt

## plot the sd line
curve(expr = x*slope + (mean_mpg - slope*mean_wt), add = TRUE, col = 'blue', lwd = 2, type = "l", lty = 2)

## plot the regression line
model <- lm(mpg ~ wt, data = mtcars)
abline(model, col = "orange", lwd = 2)

## legend
legend("topright",
       legend = c("Regression line", "SD line"),
       col = c("orange", "blue"),
       lty = c(1, 2),
       lwd = c(2, 2))

Thus, my question: how can the SD line increase one's understanding about the relationship between two variables, in a way that's either adding or complementing on what the regression line already tells?

See also https://stats.stackexchange.com/questions/201243/why-do-we-need-the-sd-line , although that question has no answers and people told in the comments that they did not get the question. — Sextus Empiricus, Jan 26 '20 at 17:37
Yes, I saw that post too, but unfortunately it wasn't informative. — Emman, Jan 26 '20 at 17:44
See some discussion [here](https://books.google.com.bo/books?id=HCuNDwAAQBAJ&pg=PA130&lpg=PA130&dq=%22SD+line%22+of+freedman+pisani&source=bl&ots=JYdXeDKBF-&sig=ACfU3U23SXpP-r7zm0ZKuj2-9bOq5--Izw&hl=no&sa=X&ved=2ahUKEwj8-eXNwKHnAhViHLkGHTFTBIYQ6AEwDXoECAgQAQ#v=onepage&q=%22SD%20line%22%20of%20freedman%20pisani&f=false). — kjetil b halvorsen, Jan 26 '20 at 18:01

Sextus Empiricus · Accepted Answer · 2020-01-26T21:23:11.080

The SD line is a didactical and visual aid to help seeing the relation for the slope of the regular regression line.

$$\text {slope regression } = r_{xy} \, \frac {\sigma_y}{\sigma_x} = r_{xy} \, \text {slope SD line} $$

The SD line shows how x and y are varying and this can give a more or less steep or flat line depending on the ratio $ \frac {\sigma_y}{\sigma_x}$.

The regression line will be always with a smaller slope than the SD line(You might relate this to regression to the mean). By how much smaller will depend on the correlation. The SD line will help to see and get this view/interpretation of the regression line.

The higher $R^2$ the more the model explains the variance in the data, and the closer the regression line will be to the SD line.

The image below may illustrate how that SD line helps/works. For data with $\sigma_x = \sigma_y = 1$ but with different correlations the SD line and the regression line are drawn. Note that the regression line is closer to te SD line for larger correlations (but still always with a smaller slope).

# random data
set.seed(1)
x <- rnorm(100,0,1)
y <- rnorm(100,0,1)

#normalizing
x <- (x-mean(x))/sd(x)
y <- (y-mean(y))/sd(y)

#making x and y uncorrelated
x <- x-cor(x,y)*y
cor(x,y)
x <- x/sd(x)

# plotting cases with sd_x=sd_y=1 and different correlations
for (rho in c(0.1,0.3,0.5,0.7)) {
  b <- sqrt(1/(1-rho^2)-1)
  z <- (y+b*x)/sqrt(1+b^2)
  plot(x,z,
       xlim = c(-5,5),ylim=c(-5,5),
       pch=21,col=1,bg=1,cex=0.7 )
  title(bquote(rho == .(rho)),line = 1)
  lines(c(-10,10),c(-10,10),lty=2)
  lines(c(-10,10),c(-10,10)*rho)
  if (rho == 0.1) {
    legend(-5,5,c("sd line","regression line"),lty=c(2,1),cex=0.9)
  }
}

Similar descriptions

http://www.jerrydallal.com/LHSP/regeff.htm
https://books.google.ch/books?id=fW_9BV5Wpf8C&pg=PA18 Statistical Models: Theory and Practice by David A. Freedman

Is the average of betas from Y ~ X and X ~ Y valid?

To put your answer in other words, assessing the correlation strength could be done in two ways: (1) **numerically** by the difference between the correlation coefficient *r* from 1 or -1; or (2) **visually** by observing the angle between the regression line and the SD line. And this is pretty much the usefulness of the SD line. Did I sum up your point correctly? — Emman, Jan 27 '20 at 06:25

What's the added value of SD line over regression line when examining association between 2 variables?

Example using `mtcars` dataset, focusing on association between weight and mpg

1 Answers1

Linked

Related

What's the added value of SD line over regression line when examining association between 2 variables?

Example using mtcars dataset, focusing on association between weight and mpg

1 Answers1

Linked

Related

Example using `mtcars` dataset, focusing on association between weight and mpg