It's true that LASSO encourages sparseness in the sense that $\beta$s which are very close to zero are set to zero exactly. So if the tendency is to maintain features which have large values of $\beta$ this may not lead to better prediction in the sense of RMSE.
A $\beta$ for a newly introduced feature may be very large because that feature has a low prevalence or low variability, so it enhances prediction in a small group of observations that is very different at the sacrifice of loosing predictive accuracy among the masses who were discriminated better by smaller $\beta$s
As an example here is a trivariate relationship between a continuous feature $x$ and a binary feature $w$ with an outcome $Y$. The $x$ effect explains much more of the variability in these data than the $w$ effect despite the $x$ effect being smaller overall than the $w$ effect, even after standardization. Lasso would favor $w$ in a model over $x$ because of its magnitude, but that alone does not suffice to lead to good prediction, we merely select $w$ because it is good at discriminating participants. This is the type of feature LASSO tends to select.

This underscores the importance of using cross-validation to select the tuning parameter in a LASSO model. In one such as this, you would find a much better predictive accuracy by including both effects.