6

How can assess the accuracy of the output of a deterministic mathematical model?

For example, a climate model can predict the mean annual temperature (MAT) for a specific location. I can use the model to predict thirty years of MAT in New York City, $T_\text{model}$. Now let's say I have the observed MAT in New York City for the same 30 years, $T_\text{obs}$.

Is there a statistical test for the hypothesis $T_\text{obs}=T_\text{model}$? Can I assess the accuracy of the mathematical model?

Abe
  • 3,561
  • 7
  • 27
  • 45
Steven
  • 133
  • 2
  • 5
  • Specifically, is there a way to test this using R statistical software? – Steven Jun 19 '12 at 20:47
  • Also, if its not too much trouble, might someone be able to recommend me a p-value to use for the test? – Steven Jun 19 '12 at 20:55
  • Do you have an estimate of uncertainty in the model output, or is it a point estimate? If not, I think that the model will, by definition, be wrong. For example, if the model estimates MAT$=20$ and the actual MAT was $20 + x$, where x is a continuous variable, then the model is wrong because $20\neq20+x$ – Abe Jun 19 '12 at 21:24
  • I hope you don't mind my edits for clarity - Also, I have asked a related follow up question here: http://stats.stackexchange.com/q/30771/2750 – Abe Jun 19 '12 at 22:09
  • Would someone be able to answer for me why a Chi Square Goodness of Fit test would not be appropriate for assessing how well the model fits the actual data? – Steven Jun 19 '12 at 22:49
  • perhaps you could ask about using $\chi^2$GOF as a separate question – Abe Jun 20 '12 at 14:31

4 Answers4

4

A decent first step might be to compute the correlation between your model's predictions ("data A") and the observed temperatures ("data B"). Correlations range from -1 to +1: 0 indicates no (linear) relationship between the predicted and observed values, while higher values suggest that your model better agrees (up to a scale factor) with the observed data. Correlations can easily be computed in R with the cor function. The cor.test function does some testing of association between two variables.

You should also just plot the data and take a look at it. Your model might not perform equally well under all conditions (e.g., maybe it breaks down around freezing temperatures)

There are more sophisticated things you could try, but I think these are a reasonable first step.

Matt Krause
  • 19,089
  • 3
  • 60
  • 101
  • In this situation, the null hypothesis of interest is not $\rho=0$, as is the default in `cor.test`, but $\rho=1$, which would obviously produce a degenerate distribution. – StasK Jun 19 '12 at 22:13
  • I don't see how calculating the correlation coefficient assesses how well the model fits the data. For instance, if when I plot data A and data B together, it could end up that the plot perfectly produces a line y = 0.5x. This would have a correlation coefficient of 1. But it does not mean that the model fits the data well; for example when x = 2000, y = 1000 – Steven Jun 19 '12 at 22:14
  • @Stask, I'm afraid I don't follow. Wouldn't the null hypothesis be "this model is cannot predict", i.e., rho=0, and the alternative hypothesis "this model captures some info about temperature" i.e., rho>0? – Matt Krause Jun 19 '12 at 22:21
  • @Steven That's a valid point. I would argue that a model which perfectly predicts, say, half or twice tomorrow's temperature from other data is still a fantastic model. Since correlation works up to a scale factor, you could just rescale the model's prediction at the end, if you need an actual temperature. – Matt Krause Jun 19 '12 at 22:27
3

Your question sounds confusing. When you say accuracy of the model, are you just referring to how well it predicts or do you mean how well it simulates the behavior of weather in New York City? I don't think you can assess the latter. As to the former, I would compute the mean square prediction error. By that I mean use the model to predict the mean annual temperature for each of the 30 years (presumably based on available inputs that the mode needs to get the estimates) and take the average squared difference between it and the actual recorded mean annual temperatures. This gives you an estimate but not the accuracy of the estimate. So you may have a standard for the accuracy and you want to test the hypothesis that accuracy is better than a certain level. Now I can give a somewhat vague description of how to do this. It is admittedly vague because I do not know what inputs go into the model to make the prediction. The idea would be to make small perturbations to the input and see how these perturbations affect the accuracy of the prediction. This would give you a distribution of mean square errors from which you could estimate a p-value for your hypothesis. All this assumes that you have a sensible way to perturb the inputs that would characterize the sampling variability in the inputs. The resulting estimates would then provide a representation of the variability of the individual predictions and from that the variability in the estimated means square error of prediction.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143
  • Are you are saying that a P-value requires an estimate of both model and observation uncertainty? – Abe Jun 19 '12 at 21:30
  • I vaguely recall a statistical test for this kind of situation- the Chi Square Goodness of Fit test. Do you all think this would be an appropriate test for my situation? I am just looking to assess how well the model fits the observed data. This is what I mean by testing whether or not the model is "accurate" – Steven Jun 19 '12 at 21:44
  • @Abe Frist of all a p-value has to refer to an hypothesis test. it represents the probability that if the null hypotheis is ture that you would obtain an estimate of the size of the observed value or higher. So you need a test. I suggested testing that the prediction accuracy (not the model accuracy or the uncertainty in the observed temperatures) is good in the sense that the mean square error is less than some standard A. Your model and your data give you one estimate of this accuracy. – Michael R. Chernick Jun 19 '12 at 21:50
  • To perform a test you need to get a distirbution for the test statistic under that hypothesis that the actual mean square error is A. Ding repeated peerturbations of the inputs gives a set of estimates of the temperatures and from that a set of mean square errors. You use that distribution to obtain your p-value. – Michael R. Chernick Jun 19 '12 at 21:52
  • Unfortunately I do not have access to the mathematical model, just the output (specifically the 30 year temperatures in New York City). So I cannot manipulate inputs. Why will the Chi Square test not work? – Steven Jun 19 '12 at 22:05
  • 1
    @Steven If you can't vary the inputs to get various estimates of the temperature you have no way to see the distribution of the prediction accuracy.I do not know what you have in mind for the chi square test. You have 30 years of average temperatures and thirty model estimates. I fyou treat these as independent estimates of an unknown target temperature i could see using mean square error as a measure of accuracy but I din't see a chi suqare statistic entering in here. That would mean having some form of a contingency table where you would compare expected counts to observed counts. – Michael R. Chernick Jun 19 '12 at 22:48
  • Is the approach of [Kennedy and O'Hagan 2001, Bayesian calibration of computer models. J. R. Stat. Soc. B](http://www.stat.lsa.umich.edu/pdfs/KA2001.pdf) in line with your suggestion? – Abe Jun 20 '12 at 15:56
  • @Abe I think they are dealing with a different situation. They are talking about calibration which involves inferring unknown parameters based on observed results. I was think that this problem deals just with the accuracy of the prediction without regard to the model parameters. – Michael R. Chernick Jun 20 '12 at 16:07
2

I would suggest two approaches to assessing whether or not a deterministic mathematical model is performing well - neither of which actually involve a statistical test, and which especially do not involve trying to reduce model performance to a p-value.

  1. How well does your model predict parameters? If your model estimates parameters from data, how well does this estimate agree with observed parameters from data other than what you fit the model to?
  2. Does it generate the correct answer when confronted with parameters that result in a known change. For example, if your model is given all the parameters that occur before a heat wave, does it correctly produce said heat wave?

As someone else has suggested, you could also compare the error between your predicted output and the actual output of the system, though this just gives you a number that you're trying to minimize, not actually a statistical estimate. Designing mathematical models to be tested statistically is very hard to do backwards - the elements you need generally need to be discussed in the model design step, just like with studies.

Fomite
  • 21,264
  • 10
  • 78
  • 137
0

I recently devised a validation frame for deterministic solar irradiance forecasts. It bases on the insight that outcome and prediction of a perfect forecast must be mathematically exchangeable. It is generally applicable for the forecast of continuous stochastic variables. See https://doi.org/10.1016/j.renene.2021.08.032