0

Using stata, I generate two random variables and regress them with each other.

clear
set obs 1000
gen rand1 = uniform()
gen rand2 = uniform()
reg rand1 rand2, nocons
reg rand2 rand1, nocons

And I found weird patterns.

  1. How can both regressions have coefficient smaller than 1? Intuitively I can't get it.

  2. Why are coefficients from both regressions always smaller than 1?

user42459
  • 111
  • 1
  • 2
  • 1
    See https://stats.stackexchange.com/questions/22718. Notice, too, that the formula for the slope estimate is a ratio of random variables. The expectation of the numerator is $1/4$ and that of the denominator is $1/3,$ implying that with large samples (and 1000 is sufficiently large) the expectation of the slope estimate will be close to $1/4/(1/3) = 3/4.$ – whuber May 31 '20 at 14:17
  • Re (2): it's not always the case that both coefficient estimates are less than $1.$ This becomes clear when you use a smaller value of `obs`. Indeed, with `obs` set to `1`, the expected value of either regression coefficient is *infinite!* – whuber Jun 01 '20 at 14:38

1 Answers1

2

Note that omitting the intercept the estimate for the coefficient $\beta$ is given by <source>

$\hat{\beta} = \frac{\sum^n_{i=1} x_iy_i}{\sum^n_{i=1}x_i^2}$

in other words dividing the sum of $x$ times $y$ by the sum of squares of $x$ $-$ can also take the mean. Due to the random ordering matching $x$ and $y$ values in the linear model you end up with a smaller numerator and as a result your coefficient will be smaller than 1.

I must admit that this is probably not an entirely satisfactory answer as it is based mainly on observed patterns; I don't know if there is a better mathematical explanation for it. You can check the former for yourself though. Just take the sum of $xy$ as they were generated and compare it with the sum when you order both variables (dividing them by the sum of squares of $x$).

horseoftheyear
  • 508
  • 6
  • 12
  • Hi: Note that the coefficient of the regression based on $y \sim x$ is not one over the coefficient of the regression based on $x \sim y$, if that's why you're puzzled by why they are both less than 1.0. – mlofton May 30 '20 at 22:23
  • @mlofton I know that. But if the line emanating from (0,0) is lower than 45 degree line, I am wondering about the condition under which the opposite could be also lower than 45 degree line. – user42459 May 31 '20 at 00:20
  • 1
    The mathematical explanation is cauchy schwarz inequality/chebyshev inequality – Pig May 31 '20 at 17:13
  • 1
    @user42459: Pig is probably correct but the easiest thing to do would be to just calculate $\hat{\beta}$ in both cases. You have the data and the formula. – mlofton May 31 '20 at 21:29