What does "parsimonious" property mean? (Appears in text about $l_p$ penalties in regression)

Question

While reading a highly cited paper I found completely new terminology. Google and student-friends who are studying mathematics do not know it. Can you help please?

It appears in the discussion of $l_p$-norm penalty terms in regression.

[...] A value of $p = 2$ leads to the ridge estimate, while $p = 0$ corresponds to traditional model selection. It is well known that the estimates have a parsimonious property (with some components being exactly zero) for $p \leq 1 $ only, while the optimization problem in (3) is only convex for $p \geq 1$ [...]

What does "parsimonious" mean in this context?

It appears that the intended meaning is given in the parentheses: "parsimonious property" means that some of the coefficient estimates can be exactly zero. — mark999, Nov 30 '16 at 08:14

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

1

They are referring to the fact that $l_1$ tends to reduce coefficients to 0, so as stated in this answer:

A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible.

edited Apr 13 '17 at 12:44

Community

1

answered Nov 30 '16 at 08:18

Simon Thordal

230
1
10

score 0 · Answer 2 · answered Nov 03 '20 at 15:00

I found the source of the citation here, and we can check the etymology of the word parsimony as follows:

economy, thrift, frugality, sparingness in the use of expenditure of means

Simply put, parsimony in your scenario means less complexity (a smaller number of independent parameters). A regularization method to prevent overfitting and improve generalization.

It can also be called Occam's razor which in statistics and probability means that adding a variable in the model should have its penalty. Why the amount of variables matters? Because more variables we add into the probabilistic graphical model, the more parameters the model has, the more “complex” a model is, the better we can fit it to data. Then we can always improve the model just by adding variables, which is actually nonsense. So we should consider taking the number of independent parameters into account.

There are many criteria for the penalty such as p-value, adjusted $R^2$, Akaike Information Criterion(AIC), Bayesian Information Criterion(BIC), DIC, Bayes factor, Mallow’s $C_p$.

What does "parsimonious" property mean? (Appears in text about $l_p$ penalties in regression)

2 Answers2

Linked