4

Do you see trends in my residual plots? These residuals plot show the standardized residuals against fitted values, origin period, calendar period, and development period. The patterns in any direction can be result of trends. There seems to be a trend on my bottom left plot, do you agree?

1 This is a research project. I am asking your inputs to confirm my assumptions.

Jenny
  • 51
  • 6
  • How much data do you have? Are these time-series? – gung - Reinstate Monica Jan 23 '15 at 13:50
  • I have 100 data points. Yes, they are time series. I appreciate your prompt reply. – Jenny Jan 23 '15 at 13:53
  • Have you already modeled the error structure (eg, AR & MA components), & is it stationary? – gung - Reinstate Monica Jan 23 '15 at 13:59
  • 2
    Additionally to @gung's comment please show a plot of your residuals versus the observed series. – IrishStat Jan 23 '15 at 14:10
  • @IrishStat there's a reason why people tend to avoid plotting residuals vs observed in regression-type models, though. It might be worth pointing out to the OP the caution needed in interpreting such plots. – Glen_b Jan 23 '15 at 15:39
  • If the variability of the residuals is relatable to the level of the Y values this may suggest the need for a power transform . Please see http://stats.stackexchange.com/questions/18844/when-and-why-to-take-the-log-of-a-distribution-of-numbers/18852#18852 – IrishStat Jan 23 '15 at 16:22
  • If that was the case you'd expect to see some indication in residuals vs fitted, such as changing spread as you go across the plot. – Glen_b Jan 23 '15 at 16:24
  • @gung the data are more like age-period-cohort data than straight time-series. – Glen_b Jan 23 '15 at 16:27
  • @gung (IrishStat & Glen_b) Yes, my data input is in an upper triangle shape. Thank you so much for all your help. This is my first post here and I am so moved by your kindness. Once I find a good model, I will graph the observed values, fitted values, and the estimations all together. – Jenny Jan 24 '15 at 01:47
  • @Glen_b, what does OP stands for? – Jenny Jan 24 '15 at 02:20
  • Sorry -- "original poster" i.e. the person whose post is being discussed (which is you). I was suggesting to Irishstat that when asking for that plot he make it clear that the residuals will be correlated with the original responses, so you have to interpret the plot with care. By contrast, the information (on variance being related to mean) that he was seeking should (if the fit for the mean is reasonable) be discernable in the residuals-vs-fitted plot you already show. [In fact this model is *already* heteroskedastic, what you would see in the plot is any unaccounted-for heteroskedasticity.] – Glen_b Jan 24 '15 at 02:25
  • Thank you for your explanation. Yes, it looks like heteroskedacitiy to me. I plan to do a statistical approach latter. For now, I am trying to explore the ChainLadder package and the different implementations in R. – Jenny Jan 24 '15 at 02:42
  • I see no clear indication of heteroskedasticity there (though I wouldn't have used this model). There's maybe a weak suggestion of it in the calendar period plot, but you have to keep in mind that the number of points is increasing from left to right, so the range would be expected to increase. – Glen_b Jan 24 '15 at 05:57

2 Answers2

4

I agree, at first glance it looks like a change in calendar year trend at 1992. (Indeed, you can't see much trend in the other two directions, because the chain ladder has parameters that broadly pick up the trend changes in those directions, so that's the one plot you tend to look for changing trends in).

Consider just the left half of the calendar period plot (up to 1992):

enter image description here

Ignore the red line if you can, and just consider where you think the data seem to be heading (keeping in mind to get the overall "gist" -- try not to let one point overly influence your impression. I find it helps to cover points and see if my impression changes in their absence; if it does, that points overly influencing my impression).

If the trend was similar after that, where would you think it should be for 1993-96?

Now look at the rest (noting the multiple almost-coincident points at the bottom right):

enter image description here

and (again, ignoring the red line) try to "backcast" the first half of the diagram.

If you're like me, you'd probably tend to draw those trends in somewhat different places, so it would make us concerned because the model simply isn't able to capture changes in that direction (though it's rather profligate with parameters in the other directions).

I think there are a variety of plausible ways to see the plot, but it does seem as if there's some kind of change there in the middle. Of course, there's not a lot of data, so we have to be fairly cautious about over-interpreting our impression there. We have a tendency to see more pattern than is really there.

[If you could leave out old payments with the chain ladder, it would be tempting to see if it made a difference to only look at the more recent years, but because it works with predicting cumulated data from previous cumulatives, you can't; the old payments are unavoidably there. You might look at patterns in one-year-ahead prediction errors, but they wouldn't be all that different from what you see here.]

It looks perhaps more like a level drop immediately after 1992, but the loess line tends to smooth over those kind of effects. (I should clarify -- the strength of the indication is not very strong ... but the consequences may be quite substantial.)

In short: there's at least some indication of a change in trend in the plot there, but it's not especially strong, and it's not completely clear that it's quite like the red line suggests.

[The Chain Ladder model has no mechanism for dealing with such a trend change, though they're common -- due to changes in economic inflation, in social trends or judicial trends, in effects peculiar to the specific line of business (such as superimposed inflation), or sometimes to effects at the company level.]

However, the fit to the most recent years of data isn't bad, so it might not be too problematic to use that forecast for the next year (but you'd have to worry about further trend changes - it's no unusual to see a short term trend change followed by a return to approximately the slope of the original trend, for example).

--

The data looks sort of familiar for some reason. Is that one of the built in data sets in the package, or from something else?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • That would be where I've seen it -- I think I had a research student who used it for an example a few years ago. When you say 'project' ... what are we talking about? I don't think it would be suitable for me to review something for coursework. – Glen_b Jan 24 '15 at 01:46
  • If you have specific questions, of course they can be posted (though if it's something for coursework you should normally indicate that in the question - in that case broad guidance of the sort you asked for here is reasonable). – Glen_b Jan 24 '15 at 01:53
1

I think it is hard to judge a trend just from the picture. I suggest you do some white noise testing on your residuals in order to be certain that they are not random. For example, you could try using our R package hwwntest, which includes different types of white noise tests.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Delyan Savchev
  • 341
  • 1
  • 5
  • After briefly reviewing hwwntest am I right that your are assuming that the series being analyzed is free of Pulses, Level/Step shifts, Seasonal Pulses,LOcal Time Trends AND has constant variance ? If so then perhaps you could add a disclaimer so as the user would be forewarned. – IrishStat Jan 23 '15 at 22:09
  • Yes, the white noise hypothesis is such - an iid process with constant and finite variance. We are talking about residuals (which also may well be normalised/studentised) in this question. Thus, if the model is correct, the residuals should be just random noise, otherwise the model is missing something, right? – Delyan Savchev Jan 24 '15 at 23:21
  • correct if they are not white noise residuals the next step is to find out what exactly is causing the test to reject white noise and that is where intelligent software can be useful to sort whether there are deterministic components and/or ARIMA structure and/or non constancy in the parameters and/or non-constancy of the variance present in the current residuals. – IrishStat Jan 25 '15 at 00:01
  • I Agree. In the case of the forum question asked, I understand that the researcher is looking for a certain white noise departure for the residuals, be it a trend or something else. Thus, if you look closely at hwwntest or the paper therein, you could see that for one test - genwwn - it provides a theoretical statistical power function, which could be evaluated with respect to a specific alternative hypothesis from an ARMA class model. Of course, all those depend on the research problem and approach. – Delyan Savchev Jan 25 '15 at 12:19
  • Does it also suggest the presence of specific deterministic structure such as in the data under review ; 2 level shifts and 5 weekly dummies reflecting significant activity for 5 of the 7 days. – IrishStat Jan 25 '15 at 12:29
  • Well, it is an R white noise testing package and not a full time series diagnostic software suite :) What you say (trends, shifts etc) is supposed to be eliminated when you are at the residuals analysis stage... – Delyan Savchev Jan 25 '15 at 12:36
  • OK I was just curious as to what the program suggested one should do. It then is up to the user to use alternative schemes to detect the specific Gaussian violations and develop appropriate remedial action. – IrishStat Jan 25 '15 at 13:03
  • It also has good performance against heavy-tailed noise such as Cauchy or Student's T with 2 or 3 degrees of freedom. If this might classify as a Gaussian violation. – Delyan Savchev Jan 25 '15 at 13:10