5

When reading up on ridge regression, I saw it stated that it has a "gaussian prior." I realized that I don't know what the word prior means in this context and what it is applied to?

I should note my question isn't limited to ridge regression, but rather what does having a "prior" mean?

gunes
  • 49,700
  • 3
  • 39
  • 75
student010101
  • 334
  • 2
  • 10
  • It sounds like you may not be familiar with Bayesian estimation, but rather classical/frequentist statistics. Priors are a key component of Bayesian modelling. Reading an introduction textbook on Bayesian statistics may be helpful. Ridge regression isn’t restricted to one approach or the other. – Earlien Jul 11 '20 at 23:38
  • 1
    @Earlien, yeah I think that's accurate to say. The only "Bayesian"-related knowledge I have is Bayes rule. But I couldn't extrapolate that (cursory) knowledge to the current context. – student010101 Jul 12 '20 at 00:32

1 Answers1

5

Prior is a belief you have on some quantity, typically on a set of parameters, without having any look at the data. If data is involved, the belief you have is updated and is called as posterior.

In ridge regression, a gaussian prior on regression coefficients means that the coefficients are assumed to be distributed according to Gaussian/Normal distribution. Of course, one needs to assume mean and covariance structure as well.

gunes
  • 49,700
  • 3
  • 39
  • 75
  • Ah I see. Isn't the gaussian distribution on the coefficients also a byproduct of the typical assumption that the errors are normally distributed, or is that not true? In addition, why does and lasso (laplace I think) have different priors? I don't understand how the difference in regularization term results in a different prior. – student010101 Jul 11 '20 at 18:11
  • 1
    The Gaussian prior on the coefficients is separate from the Gaussian prior over the error terms. Those are two different assumptions with very different effects. As to why Ridge is equivalent to a Gaussian prior and LASSO is equivalent to a Laplace prior, see, e.g. [this question](https://stats.stackexchange.com/questions/177210/why-is-laplace-prior-producing-sparse-solutions). – jhin Jul 11 '20 at 18:44
  • As mentioned in @jhin's comment, prior on parameters and the error term are different. – gunes Jul 11 '20 at 19:53