12

Is there a method to understand if two lines are (more or less) parallel? I have two lines generated from linear regressions and I would like to understand if they are parallel. In other words, I would like to get the different of the slopes of those two lines.

Is there an R function to calculate this?

EDIT: ... and how can I get the slope (in degrees) of a linear regression line?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Dail
  • 2,147
  • 12
  • 40
  • 54

3 Answers3

25

I wonder if I am missing something obvious, but couldn't you do this statistically using ANCOVA? An important issue is that the slopes in the two regressions are estimated with error. They are estimates of the slopes in the populations at large. If the concern is whether the two regression lines are parallel or not in the population then it doesn't make sense to compare $a_1$ with $a_2$ directly for exact equivalence; they are both subject to error/uncertainty that needs to be taken into account.

If we think about this from a statistical point of view, and we can combine the data on $x$ and $y$ for both data sets in some meaningful way (i.e. $x$ and $y$ in both sets are drawn from the two populations with similar ranges for the two variables it is just the relationship between them that are different in the two populations), then we can fit the following two models:

$$\hat{y} = b_0 + b_1x + b_2g$$

and

$$\hat{y} = b_0 + b_1x + b_2g + b_3xg$$

Where $b_i$ are the model coefficients, and $g$ is a grouping variable/factor, indicating which data set each observation belongs to.

We can use an ANOVA table or F-ratio to test if the second, more complex model fits the data better than the simpler model. The simpler model states that the slopes of the two lines are the same ($b_1$) but the lines are offset from one another by an amount $b_2$.

The more complex model includes an interaction between the slope of the line and the grouping variable. If the coefficient for this interaction term is significantly different from zero or the ANOVA/F-ratio indicates the more complex model fits the data better then we must reject the Null hypothesis that that two lines are parallel.

Here is an example in R using dummy data. First, data with equal slopes:

set.seed(2)
samp <- factor(sample(rep(c("A","B"), each = 50)))
d1 <- data.frame(y = c(2,5)[as.numeric(samp)] + (0.5 * (1:100)) + rnorm(100),
                 x = 1:100,
                 g = samp)
m1 <- lm(y ~ x * g, data = d1)
m1.null <- lm(y ~ x + g, data = d1)
anova(m1.null, m1)

Which gives

> anova(m1.null, m1)
Analysis of Variance Table

Model 1: y ~ x + g
Model 2: y ~ x * g
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 122.29                           
2     96 122.13  1   0.15918 0.1251 0.7243

Indicating that we fail to reject the null hypothesis of equal slopes in this sample of data. Of course, we'd want to assure ourselves that we had sufficient power to detect a difference if there really was one so that we were not lead to erroneously fail to reject the null because our sample size was too small for the expected effect.

Now with different slopes.

set.seed(42)
x <- seq(1, 100, by = 2)
d2 <- data.frame(y = c(2 + (0.5 * x) + rnorm(50),
                       5 + (1.5 * x) + rnorm(50)),
                 x = x,
                 g = rep(c("A","B"), each = 50))
m2 <- lm(y ~ x * g, data = d2)
m2.null <- lm(y ~ x + g, data = d2)
anova(m2.null, m2)

Which gives:

> anova(m2.null, m2)
Analysis of Variance Table

Model 1: y ~ x + g
Model 2: y ~ x * g
  Res.Df     RSS Df Sum of Sq     F    Pr(>F)    
1     97 21132.0                                 
2     96   103.8  1     21028 19439 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Here we have substantial evidence against the null hypothesis and thus we can reject it in favour of the alternative (in other words, we reject the hypothesis that the slopes of the two lines are equal).

The interaction terms in the two models I fitted ($b_3xg$) give the estimated difference in slopes for the two groups. For the first model, the estimate of the difference in slopes is small (~0.003)

> coef(m1)
(Intercept)           x          gB        x:gB 
2.100068977 0.500596394 2.659509181 0.002846393

and a $t$-test on this would fail to reject the null hypothesis that this difference in slopes is 0:

> summary(m1)

Call:
lm(formula = y ~ x * g, data = d1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.32886 -0.81224 -0.01569  0.93010  2.29984 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.100069   0.334669   6.275 1.01e-08 ***
x           0.500596   0.005256  95.249  < 2e-16 ***
gB          2.659509   0.461191   5.767 9.82e-08 ***
x:gB        0.002846   0.008047   0.354    0.724    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.128 on 96 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.9939 
F-statistic:  5347 on 3 and 96 DF,  p-value: < 2.2e-16 

If we turn to the model fitted to the second data set, where we made the slopes for the two groups differ, we see that the estimated difference in slopes of the two lines is ~1 unit.

> coef(m2)
(Intercept)           x          gB        x:gB 
  2.3627432   0.4920317   2.8931074   1.0048653 

The slope for group "A" is ~0.49 (x in the above output), whilst to get the slope for group "B" we need to add the difference slopes (give by the interaction term remember) to the slope of group "A"; ~0.49 + ~1 = ~1.49. This is pretty close to the stated slope for group "B" of 1.5. A $t$-test on this difference of slopes also indicates that the estimate for the difference is bounded away from 0:

> summary(m2)

Call:
lm(formula = y ~ x * g, data = d2)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1962 -0.5389  0.0373  0.6952  2.1072 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.362743   0.294220   8.031 2.45e-12 ***
x           0.492032   0.005096  96.547  < 2e-16 ***
gB          2.893107   0.416090   6.953 4.33e-10 ***
x:gB        1.004865   0.007207 139.424  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.04 on 96 degrees of freedom
Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994 
F-statistic: 5.362e+04 on 3 and 96 DF,  p-value: < 2.2e-16
Gavin Simpson
  • 37,567
  • 5
  • 110
  • 153
  • thank you so much for this very good explanation. My goal is to understand if the sloper are less or more the same so I think i iwll use ANOVA to test it. – Dail Nov 15 '11 at 10:40
  • if I have two distint vectors and I would like to compare their slops but i don't have the y (lm(x~y), how can I use ANOVA? I tried anova(lm(x~1),lm(y~1)) but I get a warning – Dail Nov 15 '11 at 10:50
  • What do you mean by vectors here? In the R sense or the mathematical sense? This is very different from the question you posed, so please start a new question - *do* *not* edit this one - it is impossible to conduct follow-ups of such broad a nature in comments. – Gavin Simpson Nov 15 '11 at 11:01
  • no wait, I have to compare two model with ANOVA...ok, but If I create a model with this formula: x~1 and another model with y~1 I get the warning. I'm talking about in the R sense. How can I do? – Dail Nov 15 '11 at 11:07
  • 1
    @Dail if you fitted two regression to get two slopes/lines, you have x and y data for both data sets. As I said in my Answer, if the xs and ys are comparable in the two data sets, then you can just combine all the data *and* add a grouping variable. My example shows how to do this using dummy data, but you already have x and y data, it is the data you used to fit the separate regressions. – Gavin Simpson Nov 15 '11 at 11:41
  • +1, very good answer. Better than mine, since I answered what OP asked, and this is probably the answer to the question OP wanted to ask. – mpiktas Nov 15 '11 at 11:42
8

The first question is actually from geometry. If you have two lines of the form:

$$y=a_1x+b_1$$ $$y=a_2x+b_2$$

then they are parallel if $a_1=a_2$. So if the slopes are equal then then the lines are parallel.

For the second question, use the fact that $\tan \alpha=a_1$, where $\alpha$ is the angle the line makes with $x$-axis, and $a_1$ is the slope of the line. So

$$\alpha=\arctan a_1$$

and to convert to degrees, recall that $2\pi=360$. So the answer in the degrees will be

$$\alpha=\arctan a_1\cdot \frac{360}{2\pi}.$$

The R function for $\arctan$ is called atan.

Sample R code:

> x<-rnorm(100)
> y<-x+1+rnorm(100)/2
> mod<-lm(y~x)
> mod$coef
    (Intercept)           x 
      0.9416175   0.9850303 
    > mod$coef[2]
        x 
0.9850303 
> atan(mod$coef[2])*360/2/pi
       x 
44.56792 

The last line is the degrees.

Update. For the negative slope values conversion to degrees should follow different rule. Note that the angle with the x-axis can get values from 0 to 180, since we assume that the angle is above the x-axis. So for negative values of $a_1$, the formula is:

$$\alpha=180-\arctan a_1\cdot \frac{360}{2\pi}.$$

Note. While it was fun for me to recall high-school trigonometry, the really useful answer is the one given by Gavin Simpson. Since the slopes of regression lines are random variables, to compare them statistical hypothesis framework should be used.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
mpiktas
  • 33,140
  • 5
  • 82
  • 138
  • thank you! how to get the slope from the regression? do i have to get coefficient and intercept ? – Dail Nov 15 '11 at 08:00
  • maybe the linear regression return the degrees directly with some function? – Dail Nov 15 '11 at 08:32
  • saying degress = +45 and degress = -315 are not the same line? whare are not talking about the same line? – Dail Nov 15 '11 at 10:22
1

... following up on @mpiktas' answer, here's how you would extract slope from a lm object and apply the above formula.

# prepare some data, see ?lm
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2,10,20, labels=c("Ctl","Trt"))
weight <- c(ctl, trt)

lm.D9 <- lm(weight ~ group)
# extract the slope (this is also used to draw a regression line if you wrote abline(lm.D9)
coefficients(lm.D9)["groupTrt"] 
      groupTrt 
   -0.371 
# use the arctan*a1 / (360 / (2*pi)) formula provided by mpiktas
atan(coefficients(lm.D9)["groupTrt"]) * (360/(2 * pi)) 
 groupTrt 
-20.35485 
180-atan(coefficients(lm.D9)["groupTrt"]) * (360/(2 * pi))
 groupTrt 
200.3549 
mpiktas
  • 33,140
  • 5
  • 82
  • 138
Roman Luštrik
  • 3,338
  • 3
  • 31
  • 39