0

I have a record containing the maximum and the minimum monthly temperatures at a particular station. The record shows information for each month from January 1908 to March 2012. However, some of the temperature values have been blanked out.

Sample Data    
    yyyy    month   tmax    tmin
    1908    January 5.0 -1.4
    1908    February    7.3 1.9
    1908    March   6.2 0.3
    1908    April   Missing_1   2.1
    1908    May Missing_2   7.7
    1908    June    17.7    8.7
    1908    July    Missing_3   11.0
    1908    August  17.5    9.7
    1908    September   16.3    8.4
    1908    October 14.6    8.0
    1908    November    9.6 3.4
    1908    December    5.8 Missing_4
    1909    January 5.0 0.1
    1909    February    5.5 -0.3
    1909    March   5.6 -0.3
    1909    April   12.2    3.3
    1909    May 14.7    4.8
    1909    June    15.0    7.5
    1909    July    17.3    10.8
    1909    August  18.8    10.7 

I want to find out the missing values. Which model suits best for this kind of problem? I am trying using linear regression here. Is it the right approach?

Nick Cox
  • 48,377
  • 8
  • 110
  • 156

2 Answers2

2

The question seems confused. You ask about modelling, but your focus appears to be just replacing missing values. Replacing missing values does not absolutely require a model for the entire dataset.

To estimate particular missing values, you could consider using some appropriate interpolation method. The simplest is linear interpolation. Other methods that might work as well or better are cubic interpolation, cubic spline interpolation and piecewise cubic Hermite interpolation. Once you try one, it may be sensible to try others to see how far they agree.

To model the entire series, using sines and cosines in linear regression is often a good method, but over a century or so, you need to think about trends too, including the possibility that the trend is itself complicated. There are any number of versatile time series model with a regression flavour that could be used, but you might need to do much more reading and/or take appropriate courses before they are within your reach.

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
0

It is impossible to say which approach is best. Is this your whole data set? Then you have too few observations for sensible regression. In case you got more: Sure try it out, use monthly and yearly dummies and 1 month lag and lead as regressors / separate models for min and max. Should work out fine. In case this is your whole data set: Use mean values.

MaHo
  • 391
  • 2
  • 11