9

I am stuck trying to understand the basic calculation of ordinary least squares. From wikipedia

$$y = \beta X^T + \varepsilon$$

where $X$ is the independent variable, $Y$ is the dependent variable and $X^T$ denotes the transpose of $X$.

Why are we taking the transpose? In a simple linear function $f(x,y):y= ax + b$, we do not use the transpose of $x$, we just use the variable $X$ as is. So I am not sure why in linear regression, we need to take the transpose of $X$?

Tim
  • 108,699
  • 20
  • 212
  • 390
Victor
  • 5,925
  • 13
  • 43
  • 67
  • 6
    Hint: What are the dimensions of the matrices and vectors involved? They will determine which multiplications even make sense. – whuber Jan 02 '15 at 16:30
  • A depiction of $X$ appears in the question at http://stats.stackexchange.com/questions/117406: perhaps that clears things up? – whuber Jan 02 '15 at 16:43
  • Also useful will be to investigate what it means for matrices to be _conformable for multiplication_. – Graeme Walsh Jan 02 '15 at 16:49
  • 2
    I don't see the expression you're using anywhere in the article you link to. Where did you see it? What are the dimensions of $x$ and $\beta$ in your equation? – Glen_b Jan 03 '15 at 00:47

1 Answers1

11

Let me begin by saying it seems as though you're not being very careful when you write mathematical expressions. Nevertheless, I think I can understand what you're asking, so I will try to answer.

To begin, we ought to straighten out the notation being used. Doing this will enable us to communicate more clearly and identify any misunderstandings that may exist.

When you write $f(x,y):y= ax + b$, I take it that you really mean the following $$ y = ax + b $$ where $a$ and $b$ are constants and real numbers (although, strictly speaking they need not be real numbers). Furthermore, $y$ is a function of only one variable, $x$, so we write $f(x)$ instead of $f(x,y)$. Pay particular attention to the fact that the commutative property of multiplication is satisfied when multiplying both $a$ and $x$ together. In other words, the order in which $a$ and $x$ are multiplied does not matter.

Now enter the other equation; namely, what you've written as $y = \beta X^{T} + \epsilon$. This is actually matrix notation and it has a different meaning to the (more everyday?) algebraic notation shown in the previous paragraph. The linear regression model can, in fact, be more commonly written in the following form $$ y = X \beta + \epsilon $$ where $y$ is a $(T \times 1)$ vector, $X$ is a $(T \times K)$ matrix, $\beta$ is a $(K \times 1)$ vector, and $\epsilon$ is a $(T \times 1)$ vector. The notation being used in brackets refers to the dimensions of each vector or matrix, where the first number refers to the number of rows and the second to the number of columns. So, for example, $X$ is a matrix with $T$ rows and $K$ columns.

Note the distinction! $X$ and $\beta$ are matrices not like $a$ and $x$, which are just scalar values. Crucially, the commutative property of (matrix) multiplication is not always satisfied in the world of linear algebra; the general case is that it is not satisfied. In other words, in general, $X\beta \neq \beta X$. Furthermore, such operation may not even be permitted, since in matrix algebra, matrices must be conformable for multiplication if they are to be multiplied.

In order for two matrices to be conformable for matrix multiplication, we say that the number of columns of the left matrix must be the same as the number of rows of the right matrix. In the linear regression model, $X \beta$ is possible because $X$, the left matrix, has $K$ columns and $\beta$, the right matrix, has $K$ rows. On the other hand, $\beta X$ would not be possible because $\beta$, the first matrix, has $1$ column while $X$, the second matrix, has $T$ rows - unless, of course, $T = 1$.

At this stage, please revise what you meant when you wrote: $y = \beta X^{T} + \epsilon$.

Lastly, the Wikipedia page that you link us to shows that the linear regression model can also be written in a form involving a transpose: $$ y_{i} = x_{i}^{T} \beta + \epsilon_{i} $$ where both $x_{i}$ and $\beta$ are column vectors of dimension $(p \times 1)$.

If anything I've written has made sense, you should be able to decipher why $x_{i}^{T} \beta$ and not $x_{i} \beta$.

Graeme Walsh
  • 3,927
  • 2
  • 24
  • 44