Here's one perspective.
Elements of $\Bbb R^n$ are typically considered to be "column vectors". That is, elements of $\Bbb R^n$ are represented by or are thought of as $n \times 1$ matrices. That said, we can treat the elements of any vector space as column vectors if we represent vectors with their coordinate vectors relative to some basis: given a basis $\mathcal B = \{v_1,\dots,v_n\}$ of a vector space $V$, $(a_1,\dots,a_n)^T$ is the coordinate vector corresponding to the vector $v = a_1 v_1 + \cdots + a_n v_n$. For a linear map $\phi:V \to W$ and basis $\mathcal B_1, \mathcal B_2$ of $V,W$ respectively, the matrix representation of $\phi$ relative to the bases $\mathcal B_1$ and $\mathcal B_2$ is the matrix $[\phi]$ such that if $x$ is the coordinate vector of $v$, $[\phi]x$ is the coordinate vector of $\phi(v)$. Notationally, we can write
$$
[\phi]^{\mathcal B_1}_{\mathcal B_2} [v]_{\mathcal B_1} = [\phi(v)]_{\mathcal B_2}.
$$
When an $m \times n$ matrix $M$ is treated as a linear map, $M$ is the matrix relative to the standard bases $\mathcal B_1, \mathcal B_2$ of $\Bbb R^n,\Bbb R^m$ (i.e. $\mathcal B_1 = \{(1,0,\dots,0)^T,\dots,(0,\dots,0,1)^T\}$).
On the other hand, elements of $(\Bbb R^n)^*$ can naturally be thought of as row-vectors. Indeed, the matrix of a linear map from $\Bbb R^n$ to $\Bbb R$ is a $1 \times n$ matrix, which is to say a "row vector". As you noted, any map $M:\Bbb R^n \to \Bbb R^m$ induces a "dual map" (which I will call $M^*$) from $(\Bbb R^m)^*$ to $(\Bbb R^n)^*$. This map is defined by
$$
M^*(r) = rM, \qquad r \in \Bbb R^{1 \times m}.
$$
Now, what happens when we try to treat our row vectors as column vectors? That is, what is the matrix representation of $M^*$?
Note that $(\Bbb R^m)^*$ and $(\Bbb R^n)^*$ are $m$ and $n$ dimensional spaces respectively, so this matrix needs to be an $n \times m$ matrix, which as you note is a map from $\Bbb R^m$ to $\Bbb R^n$. To find a matrix representation, we first need a basis. The "natural" choice of basis for the vector space of row vectors is the "standard dual basis", where we take
$$
\mathcal B_1 = \{(1,0,\dots,0),\dots,(0,\dots,0,1)\} \subset \Bbb R^{1 \times m}
$$
and $\mathcal B_2$ similarly. The nice thing about this choice of basis is that for any row-vector $r \in \Bbb R^m$, the coordinate vector of $r$ is simply the column-vector $[r]_{\mathcal B_1} = r^T$.
As it turns out, the matrix $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ relative to this basis is the transpose, $M^T$! That's why we think of the transpose as a representation of the dual map.
Here is a proof confirming that this is the case. Let $\{e_1^{(m)},\dots,e_m^{(m)}\}$ denote the canonical basis of $\Bbb R^n$, so that $\{e_1^{(m)T},\dots,e_m^{(m)T}\}$ is the canonical dual basis. To get the $j$th column of a matrix representation of a map, we can look at what that map does to the $j$th element of the basis. In this case, we have
$$
M^*(e_j^{(m)T}) = e_j^{(m)T}M = \pmatrix{m_{j1} & \cdots & m_{jn}} = m_{j1} e_1^{(n)T} + \cdots + m_{jn} e_n^{(n)T}.
$$
So, the matrix representation $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ should take the coordinate vector corresponding to $e_j^{(m)T}$ (which is simply $e_j^{(m)}$) to the vector $y = \pmatrix{m_{j1} & \cdots & m_{jn}}^T$. This amounts to saying that the $j$th column of $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ is equal to $y$, which is the $j$th row of $M$.
In other words, $[M^*]^{\mathcal B_1}_{\mathcal B_2}$ is the matrix whose columns are the rows of $M$, which is to say that $[M^*]^{\mathcal B_1}_{\mathcal B_2} = M^T$.