13

Trying to learn some Python and Sklearn, but for my work I need to run regressions that use error distributions from the Poisson, Gamma, and especially Tweedie families.

I don't see anything in the documentation about them, but they are in several parts of the R distribution, so I was wondering if anyone has seen implementations anywhere for Python. It would be extra cool if you can point me towards SGD implementations of the Tweedie distribution!

Archie
  • 572
  • 6
  • 16
joe
  • 255
  • 1
  • 2
  • 8
  • The most robust GLM implementations in Python are in [statsmodels]statsmodels.sourceforge.net, though I'm not sure if there are SGD implementations. – Trey May 31 '14 at 14:10
  • Thanks Trey. It looks like there's no support for Tweedie, but they do have some discussion of Poisson and Gamma distributions. – joe May 31 '14 at 21:33

2 Answers2

13

There is movement to implement generalized linear models with Poisson, gamma, and Tweedie error distributions in scikit-learn.

Statsmodels has implementations of generalized linear models with Poisson, Tweedie, and gamma error distributions.

While I'm updating this answer, Spark ML also (experimentally) supports Poisson, Tweedie, and gamma distributions.

Neal
  • 248
  • 3
  • 6
  • 5
    I'm working on it: https://github.com/madrury/py-glm – Matthew Drury Sep 01 '17 at 13:47
  • @MatthewDrury Awesome! – Neal Sep 01 '17 at 13:54
  • @MatthewDrury nice! I just started using GLM's, and [statsmodels](https://www.statsmodels.org) has some limitations. Not sure I understand the math fully, but could your [inner-solve](https://github.com/madrury/py-glm/blob/master/glm/glm.py#L205) be replaced with an arbitrary least-squares-type solver? I was thinking this would add flexibility (e.g. pass in [sklearn.ElasticNet](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html) to get scalability/regularization/etc. "for free"?). – GeoMatt22 Jul 18 '18 at 02:13
3

H2O has Generalized Linear Models.

They use H2O Frames though, so you can't use Pandas/Numpy directly.

Jakub Bartczuk
  • 5,526
  • 1
  • 14
  • 36