Norm in mathematics is some function that measures "length" or "size" of a vector. Among the popular norms, there are $\ell_1$, $\ell_2$ and $\ell_p$ norms defined as
$$\begin{align}
\|\boldsymbol{x}\|_1 &= \sum_i | x_i | \\
\| \boldsymbol{x}\|_2 &= \sqrt{ \sum_i |x_i|^2 } \\
\| \boldsymbol{x}\|_p &= \left( \sum_i | x_i |^p \right)^{1/p}
\end{align}$$
In machine learning, we often want to predict target values $y$ using function $f$ of features $\mathbf{x}$ parametrized by a vector of parameters $\boldsymbol{\theta}$. To achieve this, we minimize the loss function $\mathcal{L}$. We sometimes want to penalize the parameters, by forcing them to have small values. The rationale for using regularization is described, for example here, here, or here. One of the ways of achieving this, is by adding the regularization terms, e.g. $\ell_2$ norm (often used squared, as below) of the vector of weights, and minimizing the whole thing
$$
\underset{\boldsymbol{\theta}}{\operatorname{arg\,min}} \; \mathcal{L}\big(y, \,f(\mathbf{x}; \boldsymbol{\theta}) \big) + \lambda\, \|\boldsymbol{\theta}\|_2^2
$$
where $\lambda\ge0$ is a hyperparameter. So basically, we use the norms in here to measure the "size" of the model weights. By adding the size of the weights to the loss function, we force the minimization algorithm to seek for such solution that along with minimizing the loss function, would make the "size" of weights smaller. The $\lambda$ hyperparameter lets you control how large effect this should have on the optimization algorithm.
Indeed, using $\ell_2$ as penalty may be seen as equivalent of using Gaussian priors for the parameters, while using $\ell_1$ norm would be equivalent of using Laplace priors (but in practice, you need much stronger priors, check e.g. the paper Shrinkage priors for Bayesian penalized regression by van Erp et al).
For more details check e.g. Why L1 norm for sparse models, Why does the Lasso provide Variable Selection?, or When should I use lasso vs ridge? threads.