Making use of spatial context for a regression problem (possibly CNNs)?

Question

I have as predictor variables geospatial raster data (~30 layers of various height models, modeled features, processed satellite imagery, ...) and as groundtruth/target variable a year per pixel (1867 - 2017). The data is best illustrated by a picture:

(1) The groundtruth is available for small patches and at times singular pixels (a total of ~23'000 pixels)
(2) The image shows one of many predictor data layers. They all overlap and are (mostly) downsampled to match the groundtruth resolution (25x25m).
(3) There are "holes" in the raster data where neither data is available nor a prediction required.

Right now I'm using a random forest regression (in R). Each individual pixel is a datapoint with 30 predictors and (if available) one groundtruth label (a year), that I'm using to train the random forest. Prediction is then done for the full area covered by the 30 raster layers.

This however ignores spatial context of the data, as each datapoint is treated individually. Since I have a lot of information in the spatial context, I'm looking for a method to make use of it, e.g. using a 16x16 window to predict each pixel.

Can you recommend a method to leverage the spatial context here?

(Optionally) more specific: After doing some research, I found that CNNs might be a promising approach - given that a plethora of CNN architectures and frameworks for it are available and I'm confident to modify architectures, if I have a starting point. However I'm lacking a starting point, as regression doesn't seem to be a trivial/widespread use-case for CNNs (vs. segmentation / classification) as well as the nature of my data (not individual pictures, but a continous spatial raster). I could work around the holes and resolution issues, but I'd need an architecture that allows me to get started.

Are there any readily available CNN architectures, that would make a good starting point for this problem? (given that I have previous experience with Tensorflow, I'd probably start there)

score 1 · Answer 1 · answered Jun 11 '21 at 15:45

I'm not sure I see that this is a regression problem per se, since your target variables are discrete years and not fractions of a year. If that is the case, then I think you have a standard segmentation problem for which Unet is the best architecture. Unet is mostly used with 1 channel (grayscale) and 3-channel (RGB) images, whereas you have 30 channels but it's the same concept. However, if you really do want fractional years for output then it's a regression problem, yet with high spatial resolution required on the output. That I think is not a common CNN application at all, which just makes it all the more interesting. Unet could still be the right network, and how to change it for continuous output might come down to just adding a Dense layer with a number of 'linear' activated outputs equal to the number of input pixels, or it might be non-trivial. Unet is the way to start in my opinion though, as output is required at high resolution. As for missing data for which no prediction is needed, I believe you can turn this into a constraint in a custom loss function, to ignore the loss terms from those pixels. Keras/Tensorflow have a number of standard models built-in. I don't think that includes Unet, but Unet architecture is easy to find and copy from various internet sites or github.

Making use of spatial context for a regression problem (possibly CNNs)?

1 Answers1