1

I am working with observed weather data vs. modeled weather data at the same location. The observed weather data is recorded every 6 hours and the modeled weather data has daily resolution (averaged over the whole day). Given the poor temporal resolution of the modeled weather data, I want to add data to it that matches the distribution of the observed weather data while keeping it's original data points. For this example, I will only include the Temperature columns from each dataset for 20 days of a year.

Obs = c(265.9328, 268.9379, 273.2499, 271.6766, 270.9370, 270.8728, 270.8097, 270.2863, 269.7002, 270.7541, 272.2853, 272.5288, 272.5497, 272.3303, 272.7226, 273.0089, 273.3442, 274.1492, 274.4493, 272.8262,265.9328, 268.9379, 273.2499, 271.6766, 270.9370, 270.8728, 270.8097, 270.2863, 269.7002, 270.7541, 272.2853, 272.5288, 272.5497, 272.3303, 272.7226, 273.0089, 273.3442, 274.1492, 274.4493, 272.8262,265.9328, 268.9379, 273.2499, 271.6766, 270.9370, 270.8728, 270.8097, 270.2863, 269.7002, 270.7541, 272.2853, 272.5288, 272.5497, 272.3303, 272.7226, 273.0089, 273.3442, 274.1492, 274.4493, 272.8262,265.9328, 268.9379, 273.2499, 271.6766, 270.9370, 270.8728, 270.8097, 270.2863, 269.7002, 270.7541, 272.2853, 272.5288, 272.5497, 272.3303, 272.7226, 273.0089, 273.3442, 274.1492, 274.4493, 272.8262)
Mod =  c(260.8257, 260.7667, 265.2768, 267.0014, 267.7482, 269.0105, 266.1317, 264.7206, 271.3192, 271.5151, 269.7125, 270.3311, 272.2444, 271.4842, 269.0684, 268.9821, 270.6512, 268.3054, 269.4005, 268.9082)

If I want to force Mod to have the same resolution as Obs, I guess I would have to add NAs to Mod. How do I populate those NAs based on the distribution of Obs while using the information from Mod (either by keeping the values or having the daily Mod values influence the new dataset in some way)?

I imagine a function that looks something like this;

temporal.downscale <- function(observed, modeled, distribution_type = "Normal")

Where a new dataset is created with the resolution and normal distribution associated with the observed but with the data values of modeled. I'm relatively new to stats and programming, so the guts of this function is where I'm having trouble.

j_simskii
  • 21
  • 3
  • 1
    Are the model data *snapshots* with a separation of 1 day, or *averages* over a day? (or something else?) – GeoMatt22 Sep 26 '16 at 19:55
  • @GeoMatt22 they are averages over a day – j_simskii Sep 26 '16 at 19:59
  • OK. I suspected it might be. Then "keep the original data points" is not so simple, because the model "points" are actually *intervals*. Your case is more like [downscaling](https://en.wikipedia.org/wiki/Downscaling) than interpolation. The downscaling problem has no unique answer. You are asking for a new "model" series that has a given *moving average*, which is [ill posed](https://en.wikipedia.org/wiki/Well-posed_problem), as the smoothing destroys information. For model-data comparison, upscaling the data may be better. (There is a reason climate is more predictable than weather!) – GeoMatt22 Sep 26 '16 at 20:17
  • @GeoMatt22 right, that makes sense. Oddly enough, the modeled dataset is coming from a climate model hence why it's such poor resolution, but my aim is to increase this resolution. Perhaps a solution could lie in upscaling the observed data and downscaling the modeled data? And I would have to use a moving average in order to create the third dataset? – j_simskii Sep 26 '16 at 20:52
  • 1
    What is the goal? What will the downscaled data be used for? As an example, say both inputs are considered accurate (so comparison is *not* the goal), and the goal is to get downscaled predictions that incorporate both. Then one framework you could use would be [tag:gaussian-process] regression, with a varying mean based on the model, and conditioned locally to the observations. This would produce a distribution of possible values at each time. – GeoMatt22 Sep 26 '16 at 21:10
  • @GeoMatt22 the goal is to run an ecosystem model with this downscaled dataset and compare the uncertainty of the ecosystem model with downscaled data vs. modeled data vs. observed data. Gaussian-process regression sounds like it could solve my problem, I'll need to look into it further to try and understand it. Thanks! – j_simskii Sep 26 '16 at 21:22
  • @GeoMatt22 Any ref about using Gaussian-process for this type of problems? That seems exactly what meteorologists should use for downscaling climate observations. – horaceT Sep 27 '16 at 00:02

1 Answers1

0

Yes, by all means interpolate, smoothly if you can, as a first step to getting a curve shape.

The 24 hour temperatures if they are meant to be end of day averages and if they are changing linearly in time, occurred 12 hours sooner than at the end of the day. Similarly, if the six hour temperatures are given at the end of that time period, and if they changed linearly with time, actually represent occurrences three hours earlier. For different shaped temperature curves, this deconvolution of temperatures will have temporal offsets to earlier times that are approximately 1/2 of the measurement interval, with the exact proportion depending on the functional shape, moreover, in the non-linear temperature case, the shape of the curve will also be distorted by smoothing, i.e., temporal averaging. It is theoretically possible to correct for these effects, but with 24-h averaged data, one cannot reconstruct the more or less sinusoidal day versus night temperature variations. With six hour readings, one can reconstruct diurnal variations, if not exactly. To know what the effect is, one examines the integral of a candidate function from t-tm to t, where tm is the time period of averaging. The correction needed is then found by fitting the results of integration to the data, and looking up what that means before integration. I am sure there are more general ways of doing this deconvolution, but cannot think of any offhand.

So, we need to know if the six-hour temperature are snapshots at those times or averages. Also, we need to know how overcast it is going to be to deconvolve for the correct diurnal temperature variation, whether or not a cold or hot front is moving in to upset our calculations, if there is precipitation, what the wind speed and air temperature versus temperature of the ground, or water we are over is, along with the relative humidity. In other words, to even get a good local time temperature from a 24 hour prediction, we need a lot of data processing and lots of models. So, a lot depends on how accurately you need to process your data, and which factors you want to account for while doing it.

Advice, start simple. Add factors as it becomes obvious they are needed.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • they are two datasets from the same location but one is modeled. I want to tweak the modeled dataset by improving it's resolution and having it match the distribution of the observed so that the modeled data is more accurate. Would you recommend interpolation for this? – j_simskii Sep 26 '16 at 19:35
  • This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - [From Review](/review/low-quality-posts/117998) – gung - Reinstate Monica Sep 26 '16 at 19:36
  • @gung NO problem, if you tell me how to include a graph in a comment. The reason I used an answer box is because I needed to show the data to ask what it means. I will now convert it into an answer, having gotten a response, and leave you with the question as to how to do this for the future, please? – Carl Sep 26 '16 at 19:40
  • You can't. The SE system isn't perfect & this seems to expose a limitation. You might add the plot to the question, if it's relevant there, I don't know. You could also raise the issue on meta.CV. But this is still out of bounds. – gung - Reinstate Monica Sep 26 '16 at 19:45
  • 1
    I suppose you might delete the last paragraph (& maybe the figure). They continue to make it seem like this is a request for more information instead of an answer. – gung - Reinstate Monica Sep 26 '16 at 19:52
  • 3
    You can always include a link to a graph in a comment. (If you want to use the SE image upload system you can even start writing an "answer", upload a graph as part of that, take that link and put it in a comment, but never post the answer itself!) – Silverfish Sep 26 '16 at 19:55
  • @Carl unfortunately the modeled data is only available at daily resolution, so interpolation will be necessary. My question now becomes, how do I force the modeled data to resemble the distribution of the observed data. Is there a function that will grab the nearest observed value in between two modeled values? So for example, If I have observed as (270, 265, 268) and modeled as (271, NA, 269) it can go and grab the 270 from the observed? – j_simskii Sep 26 '16 at 19:59
  • Yes, well, I answered that although it didn't look like it, perhaps. The best way to do that is to get your hands on a forecasting model that predicts temperatures every 6-h to begin with. Alternatively, find out how to do that on this site by asking which forecasting model is the best one to use. – Carl Sep 26 '16 at 20:03
  • @Carl the model predictions are daily *averages*, while I gather the desired output is 6-hour *snapshots*. That makes this an ill-posed deconvolution problem, typically requiring some regularization to solve reasonably. However the regularization will typically need to impose *smoothing* (small derivative, rather than small *values*). If only there were a [method](http://stats.stackexchange.com/a/234281/127790) to impose this! – GeoMatt22 Sep 26 '16 at 20:23
  • @GeoMatt22 Problem is that any imposed solution will be heuristic as well. Yes, we can, for example, assume that temperature changes have a limiting slope, but, if that is due to a cold or hot front moving in, the time to change temperatures may have more to do with the reluctance of our thermometer to show a change than with the weather itself. That is why I suggested that if we are measuring temperatures every six hours, that we should be running the forecast software to update every six hours. That approach will always be superior to an interpolation. – Carl Sep 27 '16 at 00:27
  • @Carl my comment was not particularly serious, just an attempt at an inside joke :) The OP did not specify enough information to really judge solutions. – GeoMatt22 Sep 27 '16 at 00:43
  • @GeoMatt22 Ha! So it was, I did not get it as I was scratching my head to much to notice. ;) – Carl Sep 27 '16 at 01:10