I'm just looking for ideas for this. Say we want to predict the presence of a certain "thing" in a country by each postcode - let's take the UK, so there could be around 1.7 million postcodes - and you want to build a model where for each postcode, you're outputting either a 1 or a 0.
A constraint could be the "thing" has uptake in 400k postcodes. You can imagine typical demographic data for predictor variables: population, income data, age range...
However I feel independence between the outputs is too strong of an assumption - or at least, I'd like to this before applying models that assume it.
I'm not really sure where to start so perhaps someone could give me some guidance - It wouldn't quite be logistic regression since the output variable would have some spatial correlation. There's also possibly enough training data with this large a set of postcodes - the model could be built using post sectors first, perhaps, but we'd still like a level of spatial correlation included...
Any ideas or food for thought welcome, I'm kind of stumbling around in the dark so far. Am I right for instance in saying that multivariate regression would not work here, because the binary output variables are not independent?