Identify GLM-Poisson, GLM-binomial, LM-normal, or GLM-normal only by looking to data

Question

Is it possible to recognize if a dataset fits GLM-Poisson, GLM-binomial, LM-normal, or GLM-normal only by looking at dataset? For example, consider the following datasets:

NMES 1988 (Demand for Medical Care)
Olympic TV,
Orange County,
PhDPublications,
PSID1976 (Labour Force Participation Data),
Ship Accidents,
Swiss Labor,
Affairs,
US Traffic Fatalities

Can we say they fit a GLM-Poisson since they have discrete predictors?

You are aware that we are talking in here about *conditional* distributions, so at best you'd need to have the ability to look at the data in multiple dimensions at the same time..? — Tim, Nov 27 '17 at 15:34

score 4 · Accepted Answer · answered Nov 27 '17 at 16:01

There are several issues here.

Note that what you call "LM-normal" and "GLM-normal" are the same thing. You may be confusing statistical concepts with the R codes that are used to instantiate them.
The distributions / types of predictors is irrelevant. We are generally interested in the response distribution.
Critically, however, we are interested in the conditional response distribution, not the marginal distribution (see: What if residuals are normally distributed, but y is not?)

The last point means it is often difficult to identify the correct distribution just by looking at the variable in isolation (e.g., in a histogram).

Typically, we identify a distribution / appropriate model by first thinking about the nature of our response data? Are they continuous? (If so, they could be normal, but they cannot be binomial or Poisson.) If they are counts, are they out of a known total (like the number of heads from a number of coin flips)? If so, they would be binomial. If there is no upper limit, they could be Poisson (but actual Poisson data are rare).

Thank you. By looking at the dataset, I meant what you explained in the last paragraph. It answered my question. — Leila, Nov 27 '17 at 16:07

Identify GLM-Poisson, GLM-binomial, LM-normal, or GLM-normal only by looking to data

1 Answers1