Let us denote $X^\top X$ by $A$. By construction, it is a $n\times n$ square symmetric positive semi-definite matrix, i.e. it has an eigenvalue decomposition $A=V\Lambda V^\top$, where $V$ is the matrix of eigenvectors (each column is an eigenvector) and $\Lambda$ is a diagonal matrix of non-negative eigenvalues $\lambda_i$ sorted in the descending order.
You want to maximize $$\operatorname{Tr}(D^\top A D),$$ where $D$ has $l$ orthonormal columns. Let us write it as $$\operatorname{Tr}(D^\top V\Lambda V^\top D)=\operatorname{Tr}(\tilde D^\top\Lambda \tilde D)=\operatorname{Tr}\big(\tilde D^\top \operatorname{diag}\{\lambda_i\}\, \tilde D\big)=\sum_{i=1}^n\lambda_i\sum_{j=1}^l\tilde D_{ij}^2.$$
This algebraic manipulation corresponds to rotating the coordinate frame such that $A$ becomes diagonal. The matrix $D$ gets transformed as $\tilde D=V^\top D$ which also has $l$ orthonormal columns. And the whole trace is reduced to a linear combination of eigenvalues $\lambda_i$.
What can we say about the coefficients $a_i=\sum_{j=1}^l\tilde D_{ij}^2$ in this linear combination? They are row sums of squares in $\tilde D$, and hence (i) they are all $\le 1$ and (ii) they sum to $l$. If so, then it is rather obvious that to maximize the sum, one should take these coefficients to be $(1,\ldots, 1, 0, \ldots, 0)$, simply selecting the top $l$ eigenvalues. Indeed, if e.g. $a_1<1$ then the sum will increase if we set $a_1=1$ and reduce the size of the last non-zero $a_i$ term accordingly.
This means that the maximum will be achieved if $\tilde D$ is the first $l$ columns of the identity matrix. And accordingly if $D$ is the first $l$ columns of $V$, i.e. the first $l$ eigenvectors. QED.
(Of course this is a not a unique solution. $D$ can be rotated/reflected with any $l\times l$ orthogonal matrix without changing the value of the trace.)
This is very close to my answer in Why does PCA maximize total variance of the projection? This reasoning follows @whuber's comment in that thread:
[I]s it not intuitively obvious that given a collection of wallets of various amounts of cash (modeling the non-negative eigenvalues), and a fixed number $k$ that you can pick, that selecting the $k$ richest wallets will maximize your total cash? The proof that this intuition is correct is almost trivial: if you haven't taken the $k$ largest, then you can improve your sum by exchanging the smallest one you took for a larger amount.