I have read similar posts to this but my problem is not resolved by the answers given. I want to do a v simple linear regression to see if bite incidence is related to district, zone (vacc or control) and year. As you can see in the output one of the districts RORYA is given NA coefficients, and I get the message "Coefficients: (1 not defined because of singularities)". I have read up on this and it seems its to do with co-linearity of factors. One solution given is to add -1 to the call, which removes the intercept but does not solve my problem as RORYA district still has NAs in the summary output.
Another solution I have tried is changing the order of the explanatory variables in the call. This does change things...Rorya district suddenly has coefficients but the Zone variable becomes NA'd. Neither of which is good as I would like a coefficent for all the explanatory variables.
I was wondering whether anyone might know why this is happening and whether there is a solution to this problem so that all the variables can have coefficients?
Thanks in advance.
A Reproducible example:
df <- structure(list(DISTRICT = structure(c(1L, 6L, 5L, 3L, 2L, 4L,
1L, 6L, 5L, 3L, 2L, 4L, 1L, 6L, 5L, 3L, 2L, 4L, 1L, 6L, 5L, 3L,
2L, 4L), .Label = c("BUNDA", "MASWA", "MUSOMA", "RORYA", "SERENGETI",
"TARIME"), class = "factor"), zone = structure(c(2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("c", "v"), class = "factor"),
year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("2010",
"2011", "2012", "2013"), class = "factor"), bites = c(7.461327937,
NA, NA, NA, 35.16164185, 26.39109338, 57.89990479, 1.47191729,
3.608371422, 51.36718605, NA, 16.21167165, 46.85713945, 15.89670673,
5.212092054, 259.8137381, 30.80276062, 20.73585909, 10.44585911,
9.420270656, 7.617673001, 307.4586643, 27.31565565, 30.16124958
), deaths = c(0, NA, NA, NA, 0, 1.508062479, 0.298453117,
0, 0, 0, NA, 2.262093719, 0.298453117, 0.294383458, 0, 2.233355915,
0.581184163, 1.131046859, 0.298453117, 0.588766916, 1.202790474,
2.977807887, 0, 1.885078099)), .Names = c("DISTRICT", "zone",
"year", "bites", "deaths"), row.names = c(NA, -24L), class = "data.frame")
Code:
summary(df )
names(df)
attach(df)
is.numeric(year)
df$year <- as.factor(as.character(df$year))
is.factor(df$year)
model1 <- lm(bites ~ zone + DISTRICT-1 +year, data = df)
summary(model1)
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_1.0.0
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 digest_0.6.4 gtable_0.1.2 MASS_7.3-34 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2
[9] reshape2_1.4 scales_0.2.4 stringr_0.6.2 tools_3.1.0