3

I note with interest an article which has been suggested to me over here: http://blog.stata.com/tag/poisson-regression/

I have the problems outlined in this blog - namely zeros (in the independent variables). The article suggests the use of Poisson even though there is no count data in the dependent variable.

What should I be on the look out for in terms of checking that Poisson is suitable for my data and that the results I am obtaining are meaningful?

In previous posts I have already indicated the nature of my data. The dependent variable is investment in millions of pounds, whilst the independent variables include age of firm, and a number of indicators. I have also other regressions to run which include a number of counts in the independent variable which also could be zero.

In my case, investments tend to be skewed towards lower amounts, and so is the age of firm. There is an issue with using OLS - the normality assumption of residuals does not hold, given the bias towards smaller investments and younger firms.. also there is evidence of heteroskedasticity (ok I can regress with robust standard errors to account for this). I have tried using mboxcox and the ladder feature to try and identify possible transformations but all efforts till now proved useless.

If I use iqr - I note that i have < 2% mild outliers, and no severe outliers.

Trust you'd be able to help further on this...

Cesare Camestre
  • 699
  • 3
  • 15
  • 28
  • Take a look at the second paper mentioned here for a simulation with lots of zeros case: stats.stackexchange.com/a/38588/7071. – dimitriy Jul 03 '13 at 23:52
  • Have you thought about binning your outcome and predictions and making a confusion matrix? The standard goodness-of-fit measures like `estat gof` and Long & Freese's `countfit` from `spost` suite will not really work in the non-count case. – dimitriy Jul 03 '13 at 23:55
  • 2
    I can't see that zeros in the independent variables have any bearing on the merits or use of Poisson regression. – Nick Cox Jul 04 '13 at 00:32
  • How dimitriy? Im having sleepless nights cause of this – Cesare Camestre Jul 04 '13 at 02:46
  • Agreed Nic, yet I need a way forward. count is independent variables. – Cesare Camestre Jul 04 '13 at 02:49
  • Ill post my normality plots tomorrow.. for ols residualsThey dont look too bad; its the normality tests which are worrying – Cesare Camestre Jul 04 '13 at 03:01

1 Answers1

-1

That seems bizarre. The Poisson GLM certainly can converge and consistently estimate relative rates even when the outcome is non-integral (for instance, in the case of aggregate data with frequency weights, as is used in binomial regression). Additionally, using robust standard error estimates ensures that the 95% confidence intervals are asymptotically correct, you can tremendously inflate type I error by using a working Poisson probability model for the outcome when there is no mean variance relationship. Doing this for a simply log transformation is insane.

If it were me, I would achieve a log transformation (without log transforming my outcome) using the following syntax:

glm y educ exp exp2, family(gaussian) link(log)

This uses the GLM infrastructure to automatically log transform the outcome, but no idiotic mean-variance relationship is specified. Furthermore,

glm y educ exp exp2, family(gaussian) link(log) vce(robust)

Using the robust variance estimator will ensure that standard error estimates are consistent in the presence of heteroscedasticity.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • 5
    I think idiotic and insane are a bit strong here. The suggestion is that the Poisson model should be estimated using the Huber/White/Sandwich estimator of the variance-covariance matrix, which does not assume $E(y_i \vert x) = Var(y_i \vert x)$. – dimitriy Jul 04 '13 at 00:25
  • 1
    It does not assume that, but such is the working probability model for the mean variance relationship. Indeed the point estimates from a glm with robust standard errors will be the same as those without, (hence estimable parameters are relative rates for outcomes, whether or not that makes sense) so by assuming such a working probability model, you assign completely arbitrary weights to your observations which *will* affect coverage of 95% confidence intervals. – AdamO Jul 04 '13 at 00:59
  • There is a long literature that shows that all that is required is that your model for the mean is correct. Still a good place to start is: McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (Monographs on statistics and applied probability 37). Chapman Hall, London. – Maarten Buis Jul 04 '13 at 07:48
  • 1
    The standard errors are correct, but the weights still influence the estimated. The regression coefficients are the exact same as non-robust poisson model which is not what we wanted. I encourage you to quickly simulate data, run the Poisson and log transformed models, and convince yourself you will get different parameter estimates using these methods. – AdamO Jul 05 '13 at 21:17
  • As I understand it, the theory says that with a large enough sample both parameter estimates will converge. But this doesn't address the real-world setting of finite samples. Is this your take, @AdamO? – Lepidopterist Mar 30 '17 at 16:00