0

This is a follow-up question to 1 and 2. So we have the normal linear model \begin{align*} \textbf{Y} = \textbf{X}\beta + \epsilon \end{align*}

where $\epsilon\sim\mathcal{N}(\textbf{0},\sigma^{2}\textbf{I})$, $\mu_{i} = \beta_{0} + \beta_{1}x_{i1} + \ldots + \beta_{p}x_{ip}$ and $\mu = \textbf{X}\beta$. As far as I have understood, we take $n$ observations \begin{align*} Y_{1} & = \beta_{0} + \beta_{1}x_{11} + \ldots + \beta_{p}x_{1p} + \epsilon_{1}\\ Y_{2} & = \beta_{0} + \beta_{1}x_{21} + \ldots + \beta_{p}x_{2p} + \epsilon_{2}\\ &\vdots\\ Y_{n} & = \beta_{0} + \beta_{1}x_{n1} + \ldots + \beta_{p}x_{np} + \epsilon_{n}\\ \end{align*}

and apply the least square method, for instance, to obtain $\hat{\beta} = (\textbf{X}^{T}\textbf{X})^{-1}\textbf{X}^{T}\textbf{Y}$.

The problem which concerns me is the interpretation of such process. Let us suppose, for example, that $Y$ represents the income, $x_{1}$ indicates the gender and $x_{2}$ stands for the age. Thus we draw someone from target population and obtain the first observation

\begin{align*} Y_{1} = \beta_{0} + \beta_{1}x_{11} + \beta_{2}x_{12} + \epsilon_{1} \end{align*}

After so, we draw another person (with replacement) from the target population and obtain the second observation \begin{align*} Y_{2} = \beta_{0} + \beta_{1}x_{21} + \beta_{2}x_{22} + \epsilon_{2} \end{align*}

We repeat such process until $n$ observations are made. Once we have $\hat{\beta}$ at hand, we can estimate $\mu$ according to $\hat{\mu} = \textbf{X}\hat{\beta}$. Moreover, we can also estimate the variance $\sigma^{2}$ through the estimator \begin{align*} S^{2} = \frac{(\textbf{Y} - \textbf{X}\hat{\beta})^{T}(\textbf{Y} - \textbf{X}\hat{\beta})}{n-p-1} \end{align*}

where it is assumed that $\operatorname{Rank}(\textbf{X}) = p+1$ and $\textbf{X}$ has full rank.

My first question is: am I describing the observation process rightly?

My second question is: how should we interpret the distribution $\textbf{Y} = \mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I})$?

The last question may be confusing me because, in the context of inference, we normally assume the sample consists in independent identically distributed random variables and in the multiple linear regression problem we just assume independence. In other words, the distribution of $\textbf{Y}$ corresponds to the distribution of the sample $(Y_{1},Y_{2},\ldots,Y_{n})$ and the means $\mu_{i}$ do not need to be the same. Is it correct?

Any help is appreciated. Thanks in advance!

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
  • The first two-thirds of your post explicitly answers the second question. The first question is almost tautological, but maybe what one could add is that this "observation process" is *one* of many ways where this multiple regression formulation is applicable. For the last question, you have implicitly assumed more than you state at the point where you first refer to "the variance." *What* is this the variance of? Thinking about this ought to help you understand the answer. – whuber May 17 '19 at 15:18
  • The variance $\sigma^{2}$ refers to the distribution of $\textbf{Y}$, right? As to the hypothesis concerning the distribution of $\textbf{Y}$, each marginal is independent from the others, but they are not identically distributed. Am I reasoning correctly? –  May 17 '19 at 16:12
  • The variance is an *assumed, common* variance among all the $\epsilon_i.$ That's the only way you can justify the estimator $S^2.$ – whuber May 17 '19 at 16:16
  • Thanks for the response but I am still concerned about my second question. What we are modeling is the sample distribution, where each $Y_{i}$ is independent from the others, but they are not equally distributed. At least that's the conclusion I've got. –  May 17 '19 at 16:22
  • That's correct: the $Y_i$ are not equally distributed. Your account of that is very clear. Their expectations vary, as given by $X\beta.$ – whuber May 17 '19 at 16:59
  • Thank you very much! Just one more question: can we recover the distribution of $Y$ from the distribution of $\textbf{Y}$? –  May 17 '19 at 19:12
  • I cannot make sense of that question, because I understand your notation to mean $\mathbf{Y}$ is composed of the $Y_i$ as components. What, then, could "$Y$" possibly mean? – whuber May 17 '19 at 19:14
  • In the inference context, if it is given a population whose distribution is given by $p(x|\theta)$, we make use of the likelihood function $$L(\textbf{x}|\theta) = \prod_{i=1}^{n}p(x_{i}|\theta)$$ in order to estimate the parameter $\theta$, from whence we get $\hat{\theta}(\textbf{x})$. Based on this, we can describe the population distribution using $\hat{\theta}$. I do not know if the analogy is valid, but here we have the population distribution $Y\sim\mathcal{N}(\mu,\sigma^{2})$, but what we determine is the distribution of the sample $\textbf{Y}$. Am I reasoning right? –  May 17 '19 at 19:26
  • You seem to be using "population distribution" in at least two different senses. Moreover, "$Y\sim\mathcal{N}(\mu,\sigma^{2})$" does not describe the setting of your question. – whuber May 17 '19 at 19:40
  • I am just wondering. Maybe I am wrongly digressing. Anyway, I thought we could express the distribution of the income, as in the example I have proposed, based on the tools proposed by the multiple linear regression. –  May 17 '19 at 19:43

1 Answers1

0

Let us be explicit about what it is that you observe and what it is that you do not observe.

You observe $Y_i, x_{i1}, x_{i2}$ for $i=1,\ldots,n.$

You do not observe $\beta_0,$ $\beta_1,$ $\beta_2,$ or $\varepsilon_i.$

What you are calling $\widehat\mu_i$ is usually called $\widehat Y_i$ and is called the $i$th fitted value. It is the estimated expected value of a $Y$ value corresponding to the observed values of $x_{i1},x_{i2}.$ And as you say, the values of $\widehat Y_i$ for $i=1,\ldots,n$ in general differ from each other.

I don't understand what you mean by interpreting the distribution of $\textbf{Y} = \mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I}),$ but normally one writes $$ \textbf{Y} \sim \mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I}) $$ rather than $$ \textbf{Y} = \mathcal{N}(\textbf{X}\beta,\sigma^{2}\textbf{I}). $$ One could say that for any measurable set $A\subseteq\mathbb R^n$ one has \begin{align} & \Pr(\textbf{Y}\in A) \\[10pt] = {} & \frac 1 {\sqrt{(2\pi)^n\det(\sigma^2 \textbf{I})}} \\[8pt] & {} \times \int\limits_A \exp\left( \frac{-1}2 (\textbf{y} - \textbf{X}\beta)^\top \left( \sigma^2 \textbf{I}\right)^{-1} ( \textbf{y} - \textbf{X}\beta) \right) \, d\textbf{y} \end{align}

Michael Hardy
  • 7,094
  • 1
  • 20
  • 38
  • That's exactly my doubt: how do we interpret such integral? –  May 20 '19 at 22:18
  • @user1337 : I'm still not sure how to understand your last question. Maybe this will shed some light. One writes $\textbf{Y} \sim \operatorname N(\textbf{X}\beta, \sigma^2 I_n),$ but one could just as well write $\textbf{Y} = \textbf{X}\beta + \varepsilon$ and $\varepsilon \sim\operatorname N(0,\sigma^2 I_n).$ To say that $\varepsilon \sim\operatorname N(0,\sigma^2 I_n)$ means that the scalar components $\varepsilon_1,\ldots,\varepsilon_n$ are jointly normally distributed and each has expectation $0$ and variance $\sigma^2$ and the covariance between any two of them is $0. \qquad$ – Michael Hardy May 20 '19 at 23:25
  • In the first place, thanks for the response. My question is: how should we interpret each marginal distribution? Does it give the distribution corresponding to each sampling? –  May 20 '19 at 23:49