I am trying to make sense of what a saturated model is. AFAIK it's when you have as many features as observations.
Can we say a saturated model is a special-case of an extremely overfitted model?
I am trying to make sense of what a saturated model is. AFAIK it's when you have as many features as observations.
Can we say a saturated model is a special-case of an extremely overfitted model?
@Tomka's right. A saturated model fits as many parameters as possible for a given set of predictors, but whether it's over-fitted or not depends on the number of observations for each unique pattern of predictors. Suppose you have a linear model with 100 observations of $y$ on $x=0$ and 100 on $x=1$. Then the model $\operatorname{E}Y = \beta_0 +\beta_1 x$ is saturated but surely not over-fitted. But if you have one observation of $y$ for each of $x=(0,1,2,3,4)^\mathrm{T}$ the model $\operatorname{E}Y = \beta_0 +\beta_1 x +\beta_2 x^2 +\beta_3 x^3 +\beta_4 x^4$ is saturated & a perfect fit—doubtless over-fitted†.
When people talk about saturated models having as many parameters as observations, as in the linked web page & CV post, they're assuming a context of one observation for each predictor pattern. (Or perhaps sometimes using 'observation' differently—are 100 individuals in a 2×2 contingency table 100 observations of individuals, or 4 observations of cell frequencies?)
† Don't take "surely" & "doubtless" literally, by the way. It's possible for the first model that $\beta_1$ is so small compared to $\operatorname{Var}Y$ you'd predict better without trying to estimate it, & vice versa for the second.