I am trying to understand the state of the art of a Gamma GLM regression with an elastic net penalty because I need to recreate it in a SAS environment (pity me). To my knowledge, the only package, in any language, that allows for this is the H2O package. Glmnet only applies to the Gaussian (standard elastic net regression) and Binomial distribution (logistic elastic net).
From a high level, I understand glmnet to work by taking the log likelihood and optimizing it one variable at a time, and then cycling through until convergence. On the otherhand, I think H2O relies on gradient descent.
Is anyone aware of any literature on the elastic net that applies to any standard GLM model, like the Gamma, or even the Poisson?
Have issues like local optima been explored in this general context? My understanding, from this question, is that the objective function is concave so there should be a local max. However using SAS with a (perhaps naive) implementation using PROC NLMIXED, where one specifies the log likelihood, $$(Scale-1)*log(y) - (Scale/\mu)*y + Scale*\log(Scale)-Scale*\log(\mu)-\log\Gamma(Scale),$$ and solves it using off the shelf methods, seems to result in local optima when the number of variables are large.
If not covered implicitly before, what is the most reliable efficient/precise way of solving for a Gamma GLM with a ridge, or elastic, penalty? Given that H2O uses the gradient descent, I think, are there any good references on this if I wanted to implement it myself?