So, let's simplify things and say that your $p$ predictor variables are orthonormal. This means that your $m \times p$ sample matrix $X$ has the property that $X^{T}\cdot X = I$. Now, let's use this assumption to expand the LASSO objective function:
$||{y - X\beta}||^{2}_{2} + 2\lambda ||\beta||_{1}
= y^{t}y - \beta^{T}X^{T}X\beta - 2y^{T}X\beta + 2\lambda||\beta||_{1}\\ = y^{t}y - \beta^{T}\beta - 2y^{T}X\beta + 2\lambda||\beta||_{1}$
Where I have modified the objective by replacing $\lambda$ with $2\lambda$ for reasons of algebraic shenanigans that will become clear soon. Now the $l_{1}$ norm is not differentiable, so you can't take the gradient. It is convex, however, so we may use the subgradient. The relevant point here is that the subgradient is just the regular gradient, except at the point where the $l_{1}$ norm is not differentiable - at that point, it is equal to any vector which produces a 'tangent' plane below the function. In the one dimensional case for an absolute value function $|x|$, this means at $x = 0$, the subderivative is any slope in the interval $[-1, 1]$. So let's focus on a single one of the $p$ coordinates of $\beta$. The derivative with respect to $\beta_{j}$ for some $1 \leq j \leq p$ is just the $j^{th}$ component of the subgradient. This is
$2\beta_{j} - 2y^{T}x_{j} +2\lambda\cdot\partial|\beta_{j}|$
Now the subdifferential $\partial|\beta_{j}|$ is $1$ when $\beta_{j} > 0$, $-1$ when $\beta_{j} < 0$ and any value in the interval $[-1, 1]$ when $\beta_{j} = 0$. The last case is what interests us. Since we are trying to find a value that minimizes the objective function, in all three cases we want the subgradient to contain $0$ anyway (note that the subgradient is a set). So now we have two constraints:
$ 0 \in 2\beta_{j} - 2y^{T}x_{j} +2\lambda\cdot\partial|\beta_{j}|\ \ \ (minimization)\\
\partial|\beta_{j}| \in [-1, 1] \ \ \ \ \ \ (\beta_{j} = 0)$
Dropping the first terms since it's equal to 0 and solving the top equation for $\lambda$,
$0 \in -2y^{T}x_{j} + [-2\lambda, 2\lambda]$
Where I'm using the interval notation to denote all possible value of the subdifferential. Since $0$ has to be somewhere between the endpoints of the interval, we can deduce two inequalities:
$-2y^{T}x_{j} - 2\lambda < 0 \\
-2y^{T}x_{j} + 2\lambda >0$
Which combined tell us that whenever $\lambda > |y^{T}x_{j}|$, $0$ is in our subgradient and we have satisfied minimization criterion. Now you can do this for each coordinate $j$, and take the maximum in order to find the $\lambda$ you seek. In fact, all $p$ values of lambda will form a sequence that tells you when the $j^{th}$ coefficient is set to $0$. The question is then, what is the significance of $|y^{T}x_{j}|$?
This is just the absolute value of the dot product, or covariance, of the output vector $y$ with the $j^{th}$ predictor variable $x_{j}$. So when your regularization penalty $\lambda$ exceeds the covariance of any predictor with the output, the penalty is so great that the regularization drops all terms from the model.
Now several simplifying assumptions were made in order to make the explanation easier, as most sets of predictors are not going to be orthonormal, and scaling across the predictors will play a role here, but this should give you a general sense of how the regularization interacts with the predictors.
The above description was adapted from course notes in high-dimensional statistics at Rutgers. You can find a similar set of course notes from Yale here: http://statsmaths.github.io/stat612/. Lectures 17 and 19 are most relevant to LASSO methods.