I'll try to expand on the answer by @tankeco. For simplicity, we consider unweighted least squares regression, i.e., with weights $W=I$. Also, I believe the conclusion below works for a general least squares regression, so not restricted to a polynomial least squares regression.
Suppose we have an $N\times d$ design matrix $X_d$, an $N$-vector response $y$, and we fit $\hat{\beta}_d$ using least squares. The variance of the predicted value $\hat{f}_d(x_0)$ at an arbitrary $x_0\in\mathbb{R}^d$ is $\text{var}(\hat{f}_d(x_0))=x_0^T(X_d^TX_d)^{-1}x_0$ (assuming WLOG $\sigma^2=1$). Now we add an additional predictor $x_{d+1}\in\mathbb{R}^N$, so we have an augmented $N\times(d+1)$ design matrix $X=[X_d,x_{d+1}]$. We fit $\hat{\beta}$ using least squares again, and the variance of the predicted value $\hat{f}(x_0')$ at $x_0'=(x_0^T,w)\in\mathbb{R}^{d+1}$ is $\text{var}(\hat{f}(x_0'))=x_0'^T(X^TX)^{-1}x_0'$. The question is, how to show that $\text{var}(\hat{f}(x_0'))\geq\text{var}(\hat{f}_d(x_0))$, so that the variance increases with more predictors. Since $\text{var}(\hat{f}(x_0))=\|l(x_0)\|^2$, we see that $\|l(x_0)\|^2$ increases with the dimension $d$.
First, we look at the two optimization problems
$$\min_{\beta}(y-X_d\beta)^T(y-X_d\beta)\text{ and }\min_{\beta}(y-X\beta)^T(y-X\beta).$$
Setting $\beta_{d+1}=0$ in the latter, we recover the former problem, so the latter in a sense includes the former, and hence the optimized value is smaller. (This is $L_{d+1}\leq L_d$ in @tankeco's answer.) In equations,
$$(y-X_d\hat{\beta}_d)^T(y-X_d\hat{\beta}_d)\geq(y-X\hat{\beta})^T(y-X\hat{\beta}).$$
Now using $X\hat{\beta}=Hy$, where $H=X(X^TX)^{-1}X^T$ and similarly for $H_d$, we see that
$$y^T(I-H_d)^T(I-H_d)y\geq y^T(I-H)^T(I-H)y.$$
As a projection matrix, $H$ is idempotent, and so is $I-H$, and hence
$$y^T(I-H_d)y\geq y^T(I-H)y.$$
Equivalently,
$$y^TH_dy\leq y^THy.$$
If you're familiar with the geometry of least squares projections, then another way to view this is through the identity
$$\|y\|^2=\|\hat{y}\|^2+\|y-\hat{y}\|^2.$$
With more predictors, $\|y-\hat{y}\|^2$ gets smaller (as the residual sum of squares), so $\|\hat{y}\|^2$ gets larger. Now expressing $H_d$ and $H$ using $X$, we have
$$y^TX_d(X_d^TX_d)^{-1}X_d^Ty\leq y^TX(X^TX)^{-1}X^Ty.$$
Recall that we wish to show, for an arbitrary $x_0'=(x_0^T,w)\in\mathbb{R}^{d+1}$, that $\text{var}(\hat{f}(x_0'))\geq\text{var}(\hat{f}_d(x_0))$, or equivalently,
$$x_0'^T(X^TX)^{-1}x_0'\geq x_0^T(X_d^TX_d)^{-1}x_0.$$
It's important to note that the inequality above involving $y$ works for any $y\in\mathbb{R}^N$, and also that $X$ is assumed to be of full rank (with $d+1\leq N$). This means that for any $x_0'\in\mathbb{R}^{d+1}$, there exists a $y\in\mathbb{R}^N$ such that $x_0'=X^Ty$. Now substitute this $y$ in and the proof is complete.
I believe the weighted case is similar using the arguments above.