Use MSE in cv.glmnet for Poisson models?

Question

I want to compare different methods (like Poisson regression using Lasso, a convolutional NN, etc.) in terms of prediction error. As error measures I chose the MSE, the MdAPE (median absolute percentage error) and the relMAE (relative mean absolute error). I wonder if it is senseful to calculate the MSE for the Poisson model w\ lasso regularization, because for the estimation of the regularization parameter $\lambda$ which is done by cv.glmnet() the deviance is used per default for Poisson models. So I receive a $\lambda$ that has minimum deviance but actually I consider the MSE to compare the models... should I rather use the MSE in cv.glmnet() to determine $\lambda$?

Or can somebody tell me what's the intuition behind using the deviance as an error measure for Poisson models?

Why do you use different point prediction accuracy measures? They will typically be minimized in expectation by quite different functionals of the future distribution. For Poisson regression, I would be particularly careful about anything APE- or AE-related. This may be helpful: [Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)?](https://stats.stackexchange.com/q/45875/1352) — Stephan Kolassa, Apr 11 '19 at 15:50
I thought it would be better if I could say, "for all three measures the NN works better than the Poisson model", to show that it is not due to one specific measure. That's what I saw in papers when they compared different models.. — msloryg, Apr 11 '19 at 16:15
Thanks for the link, I read your paper and what I got is that the point is that the absolute errors are biased because they forecast integers (due to the median). But, when using a count model, I actually want to have integer forecasts. Referring to the example of the outcome being $Poi(0.8)$ distributed, what is the point of having a forecast of $0.8$? Yes, it is minimizing the MSE but I don't get what the practical interpretation of count of $0.8$ is.. I would assume $0.8$ is closer to one than to zero, hence the count will rather be one and I have the same estimate as with the MAD.. — msloryg, Apr 11 '19 at 16:54
"what is the point of having a forecast of 0.8?" That is an *excellent* question. Let's turn it around: what's the point of having a forecast of 1.0 (which minimizes the expected MAE for a Pois(0.8) distribution)? What's the point of having a forecast of 0.0 (which is the mode of a Pois(0.8) distribution, i.e., the most likely value)? To answer this question, you need to figure out *why you want to forecast* in the first place. — Stephan Kolassa, Apr 12 '19 at 09:22
For instance, my day job is forecasting demands for supermarkets. If the forecast will be used to determine whether to run a promotion or not, what we are interested in is the expected value, which would be 0.8 in our case. This may not sound useful, but it is once you aggregate over multiple products in multiple stores. Conversely, if the forecast is used for replenishment, we need a high quantile to have sufficient safety stock. In this case, a point forecast of 2 or 3 would be appropriate, which correspond to the 95% and 99% quantiles of the Pois(0.8) distribution. — Stephan Kolassa, Apr 12 '19 at 09:24
Bottom line: what functional of the (often implicit) predictive density is optimal depends on what you will use the forecast for. So your point forecast accuracy measure should reflect your loss function. I have a forthcoming paper (an invited commentary on the M4 forecasting competition) in the *International Journal on Forecasting* on this. In the meantime, you may want to take a look at [Gneiting (2011, *IJF*)](https://doi.org/10.1016/j.ijforecast.2009.12.015), which is technical, though. — Stephan Kolassa, Apr 12 '19 at 09:26

Use MSE in cv.glmnet for Poisson models?

0 Answers0