Given a covariance matrix $\mathbf\Sigma$, the first principal component $u_1$ is the unit vector that maximizes variance $u_1'\mathbf\Sigma u_1$. Do there exist similar expressions that the first $k$ principal components optimize when taken together? In other words, what do we maximize/minimize when we take out these principal components greedily?
One thought is that the first $k$ principal components define a subspace that maximizes the sum of norms of the projected vectors. This is indeed true when we maximize variance while calculating the first principal component. However, I'm not sure if this intuition, or something else, holds in general.