I have data for thousands of basins (rain, model streamflow, other indicators). But obviously, these basins can be clustered as there is something called catchment similarity. Similar basins are not not dependent on geographical proximity, but a wide number of factors. So, to develop a post-processor (eg. http://www.adv-geosci.net/29/51/2011/), I can -
- Develop a single multivariate model for the entire dataset. Is it a waste of information? Or
- Cluster the data and develop individual regression models for each cluster.
What is the state of the art in this kind of clustering-regression kind of thing? Any ideas or other approaches. are welcome.
Please note that I will have to cluster/classify on my own first.