2

I'm interested in analyzing a data set in which I have 6 dependent variables = foreclosure rates across 500+ zip codes for 6 consecutive years, and multiple predictor variables. I have already run individual regressions using a quasi-poisson distribution to see how the predictors impact the DVs independently, like so:

    model1<- glm(Rate2008 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())
    model2<- glm(Rate2009 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())
    model3<- glm(Rate2010 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())
    model4<- glm(Rate2011 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())
    model5<- glm(Rate2012 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())
    model6<- glm(Rate2013 ~ P1 + P2 + P3, data=mydata, family=quasipoisson())

The problem, of course, is that these DVs are not independent, and so I need a way to run the regression whereby I can include all 6 in the model, allowing me to take into consideration this dependence. Everything I've googled so far suggests something like this (from a previous cross validate question: Multivariate multiple regression in R ):

  Y <- cbind(mydata$Rate2008, mydata$Rate2009,
  mydata$Rate2010, mydata$Rate2011, mydata$Rate2012, mydata$Rate2013)

  model7<- lm(Y ~ P1 + P2 + P3, data=mydata,)
  summary(manova(model7))

My questions are:

1) Because these are rates (continuous, but do not go below zero), shouldn't I be using glm (and the poisson or quasi-poisson distribution) instead of lm? Or does this not matter given the multivariate/dependent nature of the DVs?

2) I'm not sure how to interpret the results from model 7 - the output doesn't distinguish between the six variables inputted into the 'Y' matrix. Any ideas why?

MYR
  • 21
  • 3

0 Answers0