3

While reading a highly cited paper I found completely new terminology. Google and student-friends who are studying mathematics do not know it. Can you help please?

It appears in the discussion of $l_p$-norm penalty terms in regression.

[...] A value of $p = 2$ leads to the ridge estimate, while $p = 0$ corresponds to traditional model selection. It is well known that the estimates have a parsimonious property (with some components being exactly zero) for $p \leq 1 $ only, while the optimization problem in (3) is only convex for $p \geq 1$ [...]

What does "parsimonious" mean in this context?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Jenya
  • 33
  • 2
  • 1
    It appears that the intended meaning is given in the parentheses: "parsimonious property" means that some of the coefficient estimates can be exactly zero. – mark999 Nov 30 '16 at 08:14

2 Answers2

1

They are referring to the fact that $l_1$ tends to reduce coefficients to 0, so as stated in this answer:

A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible.

Simon Thordal
  • 230
  • 1
  • 10
0

I found the source of the citation here, and we can check the etymology of the word parsimony as follows:

economy, thrift, frugality, sparingness in the use of expenditure of means

Simply put, parsimony in your scenario means less complexity (a smaller number of independent parameters). A regularization method to prevent overfitting and improve generalization.

It can also be called Occam's razor which in statistics and probability means that adding a variable in the model should have its penalty. Why the amount of variables matters? Because more variables we add into the probabilistic graphical model, the more parameters the model has, the more “complex” a model is, the better we can fit it to data. Then we can always improve the model just by adding variables, which is actually nonsense. So we should consider taking the number of independent parameters into account.

There are many criteria for the penalty such as p-value, adjusted $R^2$, Akaike Information Criterion(AIC), Bayesian Information Criterion(BIC), DIC, Bayes factor, Mallow’s $C_p$.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52