I'm reading through the ESL book and I'm on the part of ridge regression where the effective degrees of freedom are defined $$ df(\lambda) = tr(X(X'X + \lambda I)^{-1}X') = \sum_{j=1}^p{\frac{d_j^2}{d_j^2 + \lambda}} $$
I have no idea where this is coming from, what's the idea and intuition behind it.