2

What is the normal equation for multivariate linear regression?

In the case of monovariate linear regression, using ordinary least squares, to obtain $\theta^* = \text{argmin}_{\theta} \sum_{i=1}^m (\theta^T X_i - Y_i)^2$ one can use the closed form $\theta^* = (X^TX)^{-1}X^TY$ (proof: (1)).

I wonder how to get a closed form of $\theta$ when $Y_i$ is a vector, i.e. in the case of multivariate linear regression. (I know one could use gradient descent instead).

The minimization criterion I'd like to use for the multivariate linear regression is the following: $$\theta^* = \text{argmin}_{\theta} \sum_{i=1}^m{|| \theta X_i - Y_i ||^2}$$


(1) Proof of the normal equation:

Using matrix notation, the sum of squared residuals is given by

$$S(b) = (y-Xb)'(y-Xb)$$

Since this is a quadratic expression and $S(b) \geq 0$, the global minimum will be found by differentiating it with respect to $b$:

$$0 = \frac{dS}{db'}(\hat\beta) = \frac{d}{db'}\bigg(y'y - b'X'y - > y'Xb + b'X'Xb\bigg)\bigg|_{b=\hat\beta} = -2X'y + 2X'X\hat\beta$$

By assumption matrix $X$ has full column rank, and therefore $X'X$ is invertible and the least squares estimator for $β$ is given by

$$\hat\beta = (X'X)^{-1}X'y$$

(Longer version of the proof)

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
  • In multivariate case, you have to determine the minimization criterion first, since $Y - XB$ is no longer a scalar. The typical trace or determinant criteria give the same normal equation. – Zhanxiong Oct 28 '15 at 02:49
  • @Zhanxiong Thanks, I have added the minimization criterion. – Franck Dernoncourt Oct 28 '15 at 02:53

1 Answers1

6

Let me clarify your model by specifying the dimensionality of response, predictor and parameter. $$Y_i = B^T X_i + E_i, \quad i = 1, 2, \ldots, m. \tag{1}$$ where $Y_i$ is a $q \times 1$ column vector, $X_i$ is $p \times 1$ column vector, $B$ is a $p \times q$ matrix (which is in remarkable contrast to univariate linear regression where $\beta$ is a $p$-vector). $E_i$ is a $q \times 1$ error vector. In matrix form, $(1)$ is equivalent to $$Y = XB + E,$$ where \begin{align*} & Y = \begin{bmatrix} Y_1^T \\ \cdots \\ Y_m^T \end{bmatrix} \in \mathbb{R}^{m \times q} \\ & X = \begin{bmatrix} X_1^T \\ \cdots \\ X_m^T \end{bmatrix} \in \mathbb{R}^{m \times p} \\ & E = \begin{bmatrix} E_1^T \\ \cdots \\ E_m^T \end{bmatrix} \in \mathbb{R}^{m \times q} \end{align*}

Based on your description (suppose that $\|\cdot\|$ is the $L^2$ norm), what you want to minimize is $$\sum_{i = 1}^n \|Y_i - B^T X_i\|^2 = \sum_{i = 1}^m (Y_i - B^TX_i)^T(Y_i - B^TX_i) = \text{tr}((Y - XB)^T(Y - XB))$$ Differentiate it with respect to $B$, then set it to $0$, using that $\partial{\text{tr}(A)}/\partial A = I$ for any square matrix $A$ and $\partial(A^TZA)/\partial A = 2AZ$ for compatible matrices $A, Z$, we get the normal equation $$X^TXB = X^TY.$$

Zhanxiong
  • 5,052
  • 21
  • 24