0

I'm looking at the NMMAPS dataset, where pollution levels were measured in several cities over multiple days. I want to create a model to see which characteristics are linked to high pollution levels. Instead of creating a traditional model with all the data points, I thought of splitting the data into cities. Then I wanted to use the same model on each of these data sets and then finally combine the CI of each of the coefficients in order to find a new CI (by taking their intersections).

Would this be a valid approach?

  • 1
    Short answer: this is invalid. You should rater use single hierarchical regression model. – Tim Apr 29 '18 at 16:17
  • @Tim On a similar note, would it be valid if you wanted to find out which variables are significant? –  Apr 29 '18 at 16:27
  • 1
    I agree with @Tim that a hierarchical regression model is appropriate here - you can think of this type of model as essentially a collection of city-specific models. The city-specific models can be set up so that they allow the effects of each pollutant to be different across cities. The hierarchical regression model would produce an estimates of the pollutant effects for a "typical" city but also give an indication of how variable the effects of the other cities would be about these effects. – Isabella Ghement Apr 29 '18 at 16:52
  • Check if this answers your question: https://stats.stackexchange.com/questions/205359/calculating-group-mean-and-confidence-interval-from-single-subject-means-and-con/205368#205368 – Tim Apr 30 '18 at 09:25

0 Answers0