I have as predictor variables geospatial raster data (~30 layers of various height models, modeled features, processed satellite imagery, ...) and as groundtruth/target variable a year per pixel (1867 - 2017). The data is best illustrated by a picture:
- (1) The groundtruth is available for small patches and at times singular pixels (a total of ~23'000 pixels)
- (2) The image shows one of many predictor data layers. They all overlap and are (mostly) downsampled to match the groundtruth resolution (25x25m).
- (3) There are "holes" in the raster data where neither data is available nor a prediction required.
Right now I'm using a random forest regression (in R). Each individual pixel is a datapoint with 30 predictors and (if available) one groundtruth label (a year), that I'm using to train the random forest. Prediction is then done for the full area covered by the 30 raster layers.
This however ignores spatial context of the data, as each datapoint is treated individually. Since I have a lot of information in the spatial context, I'm looking for a method to make use of it, e.g. using a 16x16 window to predict each pixel.
Can you recommend a method to leverage the spatial context here?
(Optionally) more specific: After doing some research, I found that CNNs might be a promising approach - given that a plethora of CNN architectures and frameworks for it are available and I'm confident to modify architectures, if I have a starting point. However I'm lacking a starting point, as regression doesn't seem to be a trivial/widespread use-case for CNNs (vs. segmentation / classification) as well as the nature of my data (not individual pictures, but a continous spatial raster). I could work around the holes and resolution issues, but I'd need an architecture that allows me to get started.
Are there any readily available CNN architectures, that would make a good starting point for this problem? (given that I have previous experience with Tensorflow, I'd probably start there)