Why is the transpose related to the dual space?

Question

A matrix with $m$ rows and $n$ columns in the real numbers is a map from $M : \mathbb{R}^n \to \mathbb{R}^m$; the transpose of this matrix is then a map $M^T: \mathbb{R}^m \to \mathbb{R}^n$.

However, it seems like the transpose is related to the dual space, i.e., here $M^T: \mathbb{R}^{m*} \to \mathbb{R}^{n*}.$ Intuitively, why is it natural to associate the transpose with the dual space? It doesn't seem to follow with our intuition when dealing with matrices.

@BenGrossmann instead of mapping from the codomain to the domain, it maps from the dual of the codomain to the dual of the domain — a6623, Nov 11 '22 at 19:02
The transpose of a matrix is best associated with the concept of the adjoint, from functional analysis. Given a linear map $T : X \to Y$, where $X$ and $Y$ are normed linear spaces (or even topological vector spaces), the adjoint $T^*$ is a linear map from $Y^*$ to $X^*$, defined by $T(f) = f \circ T$, where $\circ$ is regular function composition. Once you filter that down to the matrix level, interpreting the elements of $\Bbb{R}^n$ as column vectors and the elements of the dual as row vectors, with matrix multiplication being their pairing, this just turns into the transpose of the matrix. — Theo Bendit, Nov 11 '22 at 19:06
Here's an [old answer of mine](https://math.stackexchange.com/questions/3306408/row-and-column-space-connection-in-adjoint-dot-product/3306463#3306463) that isn't exactly an answer to this question, but it might be of some use. — Theo Bendit, Nov 11 '22 at 19:13
@TheoBendit could you elaborate a little more on how $T^*$, or in this case, $M^*$ would end up being the same as the transpose of the matrix? Right now, I have that $M^*(f) = fM$ where $fM$ denotes matrix multiplication. Which means, given a row vector $f$ of dimension $m$, then I multiply it by the matrix $M$. Working with a small example (e.g., $m = 2$ and $n = 3$ and the elements $ 1, 2, 3, 4, 5, 6$ I'm struggling to turn this into a transpose. — a6623, Nov 11 '22 at 19:28
@a6623 Yes, I agree with you so far. I was about to explain further, but my explanation was a circular. Yes, the adjoint, from a functional analysis perspective, even with these conventions, is just right-multiplication by $M$. The transpose comes out when we turn this back into a map from $\Bbb{R}^m$ to $\Bbb{R}^n$ (rather than their duals). You need some way of taking row vectors and turning them into column vectors. The conventional way is with the transpose, but there's nothing in the explanation so far that mandates this method. So, I think the answer is, "it's convention". — Theo Bendit, Nov 11 '22 at 19:38
This convention appears in Ben's (excellent) answer in the form of taking "standard" bases for $\Bbb{R}^m$ and $\Bbb{R}^n$ and their duals. But, the only solid reason to use these bases are aesthetic, which is once again the reason we preference the transpose. — Theo Bendit, Nov 11 '22 at 19:40
@TheoBendit I appreciate the complement! I wouldn't say that aesthetics are the the *only* solid reason (though that is the primary reason I lean on for the purposes of my answer). Really, the standard dual basis is the [dual basis](https://en.wikipedia.org/wiki/Dual_basis) to the standard basis. The aesthetic niceness of the standard basis is a direct consequence of the definition of multiplication and the definition of a matrix relative to a pair of bases. — Ben Grossmann, Nov 11 '22 at 19:49
@BenGrossmann Yeah, I think I will stick by my admittedly bold statement. Even the idea of a dual basis is aesthetic; what would be wrong with a basis whose dot products came out to be $2\delta_{ij}$ instead? Or indeed any fixed invertible matrix? Don't get me wrong, the aesthetics are very persuasive. Everything else certainly *feels* artificial, in the same way that expressing ordered $n$-tuples as coordinates in terms of non-standard bases feels artificial, just due to the way we notate said $n$-tuples. But, we could step around this if we wanted. — Theo Bendit, Nov 11 '22 at 19:54
@TheoBendit You make a strong argument, I just hate the conclusion :). I'd argue that the incentive is more about "convenience" than "aesthetics", but that doesn't make me feel any better about it — Ben Grossmann, Nov 11 '22 at 19:58
[Another related post](https://math.stackexchange.com/a/1138700/81360) — Ben Grossmann, Nov 11 '22 at 20:04

Ben Grossmann · Accepted Answer · 2022-11-11T19:59:21.283

Here's one perspective.

Elements of $\Bbb R^n$ are typically considered to be "column vectors". That is, elements of $\Bbb R^n$ are represented by or are thought of as $n \times 1$ matrices. That said, we can treat the elements of any vector space as column vectors if we represent vectors with their coordinate vectors relative to some basis: given a basis $\mathcal B = \{v_1,\dots,v_n\}$ of a vector space $V$, $(a_1,\dots,a_n)^T$ is the coordinate vector corresponding to the vector $v = a_1 v_1 + \cdots + a_n v_n$. For a linear map $\phi:V \to W$ and basis $\mathcal B_1, \mathcal B_2$ of $V,W$ respectively, the matrix representation of $\phi$ relative to the bases $\mathcal B_1$ and $\mathcal B_2$ is the matrix $[\phi]$ such that if $x$ is the coordinate vector of $v$, $[\phi]x$ is the coordinate vector of $\phi(v)$. Notationally, we can write $$ [\phi]^{\mathcal B_1}_{\mathcal B_2} [v]_{\mathcal B_1} = [\phi(v)]_{\mathcal B_2}. $$ When an $m \times n$ matrix $M$ is treated as a linear map, $M$ is the matrix relative to the standard bases $\mathcal B_1, \mathcal B_2$ of $\Bbb R^n,\Bbb R^m$ (i.e. $\mathcal B_1 = \{(1,0,\dots,0)^T,\dots,(0,\dots,0,1)^T\}$).

On the other hand, elements of $(\Bbb R^n)^*$ can naturally be thought of as row-vectors. Indeed, the matrix of a linear map from $\Bbb R^n$ to $\Bbb R$ is a $1 \times n$ matrix, which is to say a "row vector". As you noted, any map $M:\Bbb R^n \to \Bbb R^m$ induces a "dual map" (which I will call $M^*$) from $(\Bbb R^m)^*$ to $(\Bbb R^n)^*$. This map is defined by $$ M^*(r) = rM, \qquad r \in \Bbb R^{1 \times m}. $$ Now, what happens when we try to treat our row vectors as column vectors? That is, what is the matrix representation of $M^*$?

Note that $(\Bbb R^m)^*$ and $(\Bbb R^n)^*$ are $m$ and $n$ dimensional spaces respectively, so this matrix needs to be an $n \times m$ matrix, which as you note is a map from $\Bbb R^m$ to $\Bbb R^n$. To find a matrix representation, we first need a basis. The "natural" choice of basis for the vector space of row vectors is the "standard dual basis", where we take $$ \mathcal B_1 = \{(1,0,\dots,0),\dots,(0,\dots,0,1)\} \subset \Bbb R^{1 \times m} $$ and $\mathcal B_2$ similarly. The nice thing about this choice of basis is that for any row-vector $r \in \Bbb R^m$, the coordinate vector of $r$ is simply the column-vector $[r]_{\mathcal B_1} = r^T$.

As it turns out, the matrix $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ relative to this basis is the transpose, $M^T$! That's why we think of the transpose as a representation of the dual map.

Here is a proof confirming that this is the case. Let $\{e_1^{(m)},\dots,e_m^{(m)}\}$ denote the canonical basis of $\Bbb R^n$, so that $\{e_1^{(m)T},\dots,e_m^{(m)T}\}$ is the canonical dual basis. To get the $j$th column of a matrix representation of a map, we can look at what that map does to the $j$th element of the basis. In this case, we have $$ M^*(e_j^{(m)T}) = e_j^{(m)T}M = \pmatrix{m_{j1} & \cdots & m_{jn}} = m_{j1} e_1^{(n)T} + \cdots + m_{jn} e_n^{(n)T}. $$ So, the matrix representation $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ should take the coordinate vector corresponding to $e_j^{(m)T}$ (which is simply $e_j^{(m)}$) to the vector $y = \pmatrix{m_{j1} & \cdots & m_{jn}}^T$. This amounts to saying that the $j$th column of $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ is equal to $y$, which is the $j$th row of $M$.

In other words, $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ is the matrix whose columns are the rows of $M$, which is to say that $[M^*]^{\mathcal B_1}_{\mathcal B_2} = M^T$.

Why is the transpose related to the dual space?

1 Answers1

Linked