How to get log-likelihood from squared deviance in Scikit Learn

Question

The score() function computes D^2, the percentage of deviance explained, but I'd like to get the log-likelihood to calculate BIC. What's the formula to go from deviance to log-likelihood?

Score function reference:

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.GammaRegressor.html#sklearn.linear_model.GammaRegressor.score

There is no method to get the devaince. Sklearn focuses more on prediction that inferential stats so it lacks a lot of the functionality you would find in say R's `glm`. Have you looked at statsmodels? — Demetri Pananos, Nov 10 '20 at 23:43
@DemetriPananos The link OP offered pretty much contradicts what you said — Firebug, Nov 11 '20 at 00:28
@DemetriPananos Statsmodels is awesome, and one of their devs hangs on this Stack (I think he has email alerts for the statsmodels tag, but whatever). I remember trying to do inference on a multinomial logistic regression in Python and having a difficult time pulling out the deviance until I started using Statsmodels instead of SKLearn. It might be possible in SKL, but SM made it way easier. — Dave, Nov 11 '20 at 00:36
@Firebug Oh, that is a new development. Hadn't caught up with that update. — Demetri Pananos, Nov 11 '20 at 03:43
@Dave yes, I know one of the devs hangs out here. I think I've angered them once before. — Demetri Pananos, Nov 11 '20 at 03:43

score 1 · Accepted Answer · answered Nov 10 '20 at 23:35

If you have deviance, refer to this answer, which I'll quote below:

$$\begin{matrix} \text{Null Deviance} \quad \quad \text{ } \text{ } & & \text{ } D_{TOT} = 2(\hat{\ell}_{S} - \hat{\ell}_0), \\[6pt] \text{Explained Deviance} & & D_{REG} = 2(\hat{\ell}_{p} - \hat{\ell}_0), \\[6pt] \text{Residual Deviance}^\dagger \text{ } & & \text{ } D_{RES} = 2(\hat{\ell}_{S} - \hat{\ell}_{p}). \\[6pt] \end{matrix}$$ In these expressions the value $\hat{\ell}_S$ is the maximised log-likelihood under a saturated model (one parameter per data point), $\hat{\ell}_0$ is the maximised log-likelihood under a null model (intercept only), and $\hat{\ell}_{p}$ is the maximised log-likelihood under the model (intercept term and $p$ coefficients).

So, starting from Explained Deviance, $D_{REG}$:

$$D_{REG} = 2(\hat{\ell}_{p} - \hat{\ell}_0)$$

Therefore:

$$\hat{\ell}_{p}=\frac{D_{REG}}{2}+\hat{\ell}_0$$

You'll have to estimate $\hat{\ell}_0$ if you want to compute the exact value. If you simply want to compare models, then that term is constant among them, and can be safely ignored.

_{Ben (https://stats.stackexchange.com/users/173082/ben), Is R-squared truly an invalid metric for non-linear models?, URL (version: 2018-07-31): https://stats.stackexchange.com/q/359997}

How to get log-likelihood from squared deviance in Scikit Learn

1 Answers1