My question is at the end of the post. I tried to give as much information as I can to clarify my understanding and to point out as precisely as possible where I am stuck.
Independent variables or features may be fixed or random
I have recently read that when performing a regression, the independent variables or features may be fixed or random [Independent variable = Random variable?], like for instance for the linear regression where "the values $x_{ij}$ may be viewed as either observed values of random variables $Xj$ or as fixed values chosen prior to observing the dependent variable" [https://en.wikipedia.org/wiki/Linear_regression].
Regularization and prior distribution
Besides, I know that the regularization in machine learning objective function corresponds to a prior knowledge on the parameter. For instance, the L2 regularization assumes that the parameters follow a centered Normal distribution and the L1 regularization assumes that the parameters follow a Laplace distribution. For me, this is clear when the parameters are fixed but I have trouble to understand this when they are random variables.
1. Fixed independent variables case
When the independent variables are fixed, we model the dependent variable with a chosen distribution. For instance, for the linear regression, we use the following model: $$Y\sim\mathcal{N}(X\beta,\sigma^{2}I)$$
Frequentist approach
In the frequentist approach, $\hat{\beta}$ can then be obtained be maximizing the likelihood: $$\hat{\beta}=\underset{\beta}{\mathrm{argmax}}(f_Y(y))$$
Bayesian approach
When considering a prior Normal distribution on $\beta$ for instance, the previous model changes into: $$Y|\beta\sim\mathcal{N}(X\beta,\sigma^{2}I)$$ $$\beta\sim\mathcal{N}(0,\sigma^{2}_{\beta}I_{\beta})$$ In this case, $\hat{\beta}$ can then be obtained be maximizing the maximum a posteriori: $$\hat{\beta}=\underset{\beta}{\mathrm{argmax}}(f_{\beta|Y}(\beta|y))$$ Thanks to Bayes' theorem, $f_{\beta|Y}(\beta|y)=\frac{f_{Y|\beta}(y|\beta)f_{\beta}(\beta)}{f_{Y}(y)}$, we can then obtain $\hat{\beta}$ be maximizing the following quantity (since the denominator does not depend on $\beta$), which corresponds to a machine learning objective function including a regularization term: $$\begin{align} \hat{\beta}&=\underset{\beta}{\mathrm{argmax}}(f_{Y|\beta}(y|\beta)f_{\beta}(\beta))\\ &=\underset{\beta}{\mathrm{argmin}}(-log(f_{Y|\beta}(y|\beta)) -log(f_{\beta}(\beta))) \end{align}$$
2. Random independent variables case
When the independent variables are random variables, the previous anesthesia model changes to: $$Y|X,\beta\sim\mathcal{N}(X\beta,\sigma^{2}I)$$ $$\beta\sim\mathcal{N}(0,\sigma^{2}_{\beta}I_{\beta})$$ Like previously, $\hat{\beta}$ can then be obtained be maximizing the maximum a posteriori: $$\hat{\beta}=\underset{\beta}{\mathrm{argmax}}(f_{\beta|Y,X}(\beta|y,x))$$ I think that the desired result should be the following to get the same objective function as previously, including the regularization term (unless I am wrong): $$\begin{align} \hat{\beta}&=\underset{\beta}{\mathrm{argmax}}(f_{\beta|Y,X}(\beta|y,x))\\ &=\underset{\beta}{\mathrm{argmax}}(f_{Y|\beta,X}(y|\beta,x)f_{\beta}(\beta))\\ &=\underset{\beta}{\mathrm{argmin}}(-log(f_{Y|\beta,X}(y|\beta,x))-log(f_{\beta}(\beta))) \end{align}$$ If so, I cannot find out why. In particular, I don't know why the first and second lines are equal. Thanks to Bayes' theorem, I know that: $$\begin{align} f_{\beta|Y,X}(\beta|y,x)&=\frac{f_{\beta,Y,X}(\beta,y,x)}{f_{Y,X}(y,x)}\\&=\frac{f_{Y|\beta,X}(y|\beta,x)f_{\beta,X}(\beta,x)}{f_{Y,X}(y,x)}\\&=\frac{f_{Y|\beta,X}(y|\beta,x)f_{X|\beta}(x|\beta)f_{\beta}(\beta)}{f_{Y,X}(y,x)} \end{align}$$ But I am stuck there. I could find the desired result if $X$ and $\beta$ are independent (I don't know whether this is true, an assumption of the model ...).
Could someone help me understand the maximum a posteriori/regularization when the independent variables / features are random variables? Can the previous formula be simplified? If so, why?