Regularization and vector projections are connected through the idea of constrained optimization and the Karush-Kuhn (no relation)-Tucker conditions.
What are the KKT conditions?
Briefly, these state that, if $x$ is a solution to the problem "minimize $f(x)$ subject to $g(x) \le 0$", then $x$ is also a solution to the problem $\nabla f(x) = \lambda \nabla g(x)$ for some scalar $\lambda$. But this is equivalent to saying $\nabla f(x) - \lambda \nabla g(x) = 0$, which means that $x$ minimizes the unconstrained optimization problem "minimize $f(x) - \lambda g(x)$".
The intuition is that either:
$g(x) < 0$. In this case, $x$ is an "interior solution" so the gradient of $f$ must be zero at that point. (If it weren't zero, we could move a little bit in that direction from $x$, while maintaining $g(x) < 0$, and have a higher value for $f(x)$. Then we set $\lambda = 0$ and we're done.
Or, $g(x) = 0$. In this case, $x$ is on the edge of the possible solution space. Locally, this edge looks like a hyperplane orthogonal to the gradient $\nabla g(x)$, because the way you maintain the $g(x) = 0$ constraint is to not move up or down the gradient at all. But that means that the only direction the gradient $\nabla f$ could possibly point is the exact same direction as $\nabla g$--if it had any component that was orthogonal to $\nabla g$, we could move $x$ a little bit in that direction, stay on the orthogonal hyperplane $g(x) = 0$, and increase $f(x)$.
How the KKT conditions explain the relationship between constrained minimization and regularization
If $g(x) = |x| - c$ for some norm and some constant $c$, then the constraint $g(x) \le 0$ means that $x$ lies on a sphere of radius $c$ under that norm. And in the unconstrained formulation, subtracting $\lambda g(x)$ from the function you want to maximize is what ends up applying the regularization penalty: you're really subtracting $\lambda |x| + \lambda c$ (and the constant $\lambda c$ doesn't matter for optimization).
People often take advantage of this "duality" between unconstrained and constrained optimization. For an example that I could find quickly by Googling see On the LASSO and its dual.
Why are projections important here?
OK, so why is someone writing a paper on fast projections, though?
Basically, one way you can do general constrained optimization--"maximize $f(x)$ subject to $x \in X$"-- is to do the following:
- Take any iterative algorithm for unconstrained maximization of $f(x)$
- Start with a guess $x_0$
- Take one step of the algorithm: $x_0^\prime \leftarrow step(x_0)$
- Then project back onto the set $X$: $x_1 \leftarrow P_X(x_0^\prime)$.
- And repeat until convergence.
For instance, this is how projected gradient descent is derived from ordinary gradient descent. Of course, optimizing your projection function $P_X$ is vitally important here.
Putting it all together
So, suppose that you want to solve the LASSO: $$\arg\min_\beta (\mathbf{y} - \beta^\prime \mathbf{X})^2 + \lambda ||\beta||_1$$
That's the unconstrained version. By the KKT conditions, adding the regularization term is equivalent to constraining the solution to lie in $||\beta||_1 \le c$ for some constant $c$. But that's just the $\ell_1$-ball with radius $c$!
So you could imagine solving this with projected (sub)gradient descent.* If you did, your $P_X$ function would be a projection onto the unit ball, and you want to make that fast.
*I don't think people actually do this, because there are more efficient ways. But those might use projections also. EDIT: as @Dougal points out, a more sophisticated variant of projected subgradient descent was good enough to write a paper about in 2008.