0

Question: Am I correct in assuming that the below statement only makes sense if we're conditioning on realised data?

I'm finding it hard to understand the following statement;

For a simple normal linear model we have;

Let $Y=f(X) = \beta_0 + \beta_1 X + \varepsilon$ and $\varepsilon > \sim N(0,\sigma^2)$ $\Rightarrow$ $Y \sim N(\beta_0 + \beta_1 X \; , > \; \sigma^2)$

The issue I have is that the our data has a distrubition.

Unless what is meant above is that we have realisations of our data $(X=x)$ and are conditioning on them, i.e. ;

$Y\;|\;X=x \sim N(\beta_0 + \beta_1 X \; , \; \sigma^2)$

Otherwise I feel like I am missing something important.

Another way to chareterise my confusion is;

$P_Y = P_X + P_{\varepsilon} \neq P_{\varepsilon}$

Where $P_J$ denotes the probability distribution of their respective random variables.

Tim
  • 108,699
  • 20
  • 212
  • 390
physicalcog
  • 111
  • 3
  • Btw, as a comment, with probability distributions, this would be something more like $P_X \times P_\varepsilon$ assuming independence. You would sum probabilities with mutually exclusive events, or with mixture distribution etc. – Tim Dec 02 '17 at 14:08

1 Answers1

1

You are correct that if $X$ was a random variable, then the variance would be $\sigma^2_X + \sigma^2_\varepsilon$, but the standard formulation of linear regression assumes that $X$ is fixed. Of course, this doesn't mean that you cannot use linear regression when $X$ is a random variable, this is what we do, and this is another possible formulation of the model, yet thinking of $X$ as fixed simplifies lots of things, so it is often described like this.

Tim
  • 108,699
  • 20
  • 212
  • 390