My goal is to calculate various information criteria for generalised linear models (e.g., the AIC). To do this, we need to calculate the effective degrees of freedom of the trained model. In an unregularised model, this is typically taken to be the number of parameters in the model but it is not clear to me how to deal with the case when we regularise the model.
For a Gaussian noise model this seems well studied by Hui Zou. For a lasso model, it appears that we take the number of non-zero parameters (which is shown to be an unbiased estimator of the df). For ridge regression, the trace of the projection matrix ($S = (X^TX + \lambda I)^{-1}X^TX$) can be used to estimate the degrees of freedom. Zou shows that a combination of these approaches can be used for the elastic net.
My question is can (and if so how) can these results be generalised to any GLM (i.e., not only where we minimise the squared loss)? I would assume (and Park provides additional evidence for this) that the same approach as above can be used for L1 regularised GLMs. It doesn't seem clear how to generalise the results for the ridge or elastic net regression?
I also want to note that this does not duplicate either this Stack Overflow post or this one which only use the number of parameters in the fitted model to calculate the degrees of freedom (thus I think entirely ignore the effect of the L2 regularisation).