I would like to employ count data as covariates while fitting a logistic regression model. My question is:
- Do I violate any assumption of the logistic (and, more in general, of the generalized linear) models by employing count, non-negative integer variables as independent variables?
I found a lot of references in the literature regarding hot to use count data as outcome, but not as covariates; see for example the very clear paper: "N E Breslow (1996) Generalized Linear Models: Checking Assumptions and Strengthening Conclusions, Congresso Nazionale Societa Italiana di Biometria, Cortona June 1995", available at http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf.
Loosely speaking, it seems that glm assumptions may be expressed as follows:
- iid residuals;
- the link function must correctly represent the relationship among dependent and independent variables;
- absence of outliers
Does everybody know whether there exists any other assumption/technical problem that may suggest to use some other type of models for dealing with count covariates?
Finally, please notice that my data contain relatively few samples (<100) and that count variables' ranges can vary within 3-4 order of magnitude (i.e. some variables has value in the range 0-10, while other variables may have values within 0-10000).
A simple R example code follows:
\###########################################################
\#generating simulated data
var1 <- sample(0:10, 100, replace = TRUE);
var2 <- sample(0:1000, 100, replace = TRUE);
var3 <- sample(0:100000, 100, replace = TRUE);
outcome <- sample(0:1, 100, replace = TRUE);
dataset <- data.frame(outcome, var1, var2, var3);
\#fitting the model
model <- glm(outcome ~ ., family=binomial, data = dataset)
\#inspecting the model
print(model)
\###########################################################