I'm trying to wrap my head around why you want a Cp score to be close to the number of regressors (rather than just minimizing it and getting close to zero)
I am using Cp as a scoring function in a stepwise regression algorithm. Currently, my function seeks to minimize Cp in order to evaluate which term to add or take away from the model. However, everything I have heard says a good Cp score is around the value p.
Looking at the equation for Mallows' Cp
$C_p = \frac{SSE_p}{S^2}-N+2P$
It looks like it already adds $2P$ to the equation, so a good scoring function would be equal to
$C_p = \frac{SSE_p}{S^2}-N+2P-P$, where the extra $-P$ looks extraneous.
Could somebody give an intuitive explanation as to why a good score has $C_p = P$?