1

I'm working with land surface models. These models basically take a bunch of meteorological forcing data (downward radiation, wind, rain, humidity, etc), and run it through some biogeochemical/physical equations, and spit out a bunch of output data ($CO_2$ update/release, net radiation, evaporation, runoff, etc.). There are some problems with the models, and I'm trying to use clustering methods to split the input variables up into subdomains, so I can look at the responses in the output in a non-linear manner.

My question is whether it is valid and/or sensible to cluster on structural variables (e.g. time and space related variables). The input variables, like downward radiation, are highly dependent on these variables (e.g. the amount of sunshine is largely determined by latitude, time of year, and time of day). So I guess it doesn't make much sense to cluster on both the structural and the input variables. But clustering on the structural variables may offer things that clustering on the input variables doesn't - the ability to categorise model output by time of day, for example. Is there any reason why using structural variables should be avoided? Or is it entirely dependent on the model in question?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
naught101
  • 4,973
  • 1
  • 51
  • 85
  • 1
    Honestly, I did not quite understand what are "structural variables" - you didn't give definition or characterization. Also, think - do you really need clustering. Maybe you need just binning (categorization by hand)? – ttnphns May 13 '14 at 07:01
  • @ttnphns: I don't have a better word for structural variables (time and space variables). The differences between them and the other independent variables (the input variables) is that the model code doesn't care about where in time and space it is running the calculations - it just cares about the input variables. – naught101 May 13 '14 at 09:09
  • @ttnphns: re: binning, the data is multi-dimensional. Is there any common method for binning multi-dimensional data? You're right in that the data isn't *actually* clustered, but clustering is a neat way to bin the data in a fairly objective way, I would have thought.. – naught101 May 21 '14 at 23:00

0 Answers0