Forecasting unemployment rate

Question

I have a data set of 100 geographic regions for which the unemployment rate has been observed during the last 9 years. Now, I want to simulate/forecast from this data the next year unemployment rate for all the 100 regions. What would be a suitable approach? I was thinking to use an auto-regressive mixed effect model, but I fear that I do not have enough data...

Regards

What are the geographic units and do you expect them to have any geographic dependency (i.e. spatial autocorrelation?) Although we would always like to have more data you have a reasonable amount of observations to project estimates. — Andy W, Oct 18 '10 at 15:28
The 100 regions belong to the same country, so I expect that they are correlated in time and in space (in fact I began with a correlation clustering exercise that shows that they are correlated)... — teucer, Oct 18 '10 at 15:35
What is the frequency (i.e. is it weekly,monthly, quarterly, annual data) ? — user603, Oct 18 '10 at 15:44

user603 · Accepted Answer · 2010-10-19T15:03:26.300

6

The Arellano-Bond estimator has been designed for precisely this type of problems. You will find a short non-technical paper with a examples here. In a nutshell, it combines the information embedded in the large number of cross-section to make up for the small number of points in each series. This estimator is widely used and implemented: it is avalaible in the default gretl package, but also in stata via the XTABOND2 package and in R too, via the plm package (you should easily find a large number of paper using it).

EDIT:

Given that spatial correlation may indeed be informative (see Andy's post), i would advice to add a variable:

$s_{it} = u_{it} - \bar{u}_{-it}$

where $u_{it}$ is (eventually the $\log()$ of) the unemployment rate of region $i$ at time $t$ and $\bar{u}_{-it}$ its average value among $k$ geographical neighbors of region $i$ (excluding region $i$). I would advise trying different values of $k$ until small changes in $k$ do not affect the estimation end-result/conclusions. Then, for efficient and consistent estimation of $\beta_s$ (the coefficient associated with the variable $s$) i would use OLS for the main effect and allow for a random component to the error terms to account for inter-regional heterogenity in $\beta_s$; thereby leveraging the fact that the R package plm allows to combine gmm (i.e. Arellano-Bond) and random effect coefficients.

Concerning Andy W's remark: you could read these two documents for a non technical summary. The full paper version is here. Note the reliance on both a large number of cross section and time dimension.

PS: Thanks @Srikant. I think i get it now :)

edited Oct 19 '10 at 15:03

answered Oct 18 '10 at 19:19

user603

21,225
3
71
135

Given that the unemployment rate must be strictly positive, I guess, for simulation purposes, I could model the log returns for example. Is this reasonable? – teucer Oct 18 '10 at 21:49
@kwak, could you provide a link to the Hamilton work you talked about in a comment to my answer. Also do you know of any work that used the A/B estimator and included spatial effects in the models? – Andy W Oct 19 '10 at 04:40
@Andy W:see edited answer above. – user603 Oct 19 '10 at 07:02
@Teucer: certainly. There are some facilities in the plm package to do these transformations (i.e. log, difference, dlog,...) in a single command-line: look for the **dynformula()** function. – user603 Oct 19 '10 at 07:04
@kwak Actually what you propose is very close to what I have in mind: I was using the package lme4 to model the log returns. However, if I well understood, the difference lies in the estimator: lme4 uses REML whereas with plm you can use GMM. Is that correct? If so, why GMM is in this case superior to REML? Btw do you think that one can use the package heavy which is fitting with heavy tailed distributions (e.g. Student t)? – teucer Oct 19 '10 at 07:08
@Teucer. The difference between the two approaches is discussed in the vignette to the plm package. I would think that gmm estimation would be more efficient in your case, because one can assume that the innovations to the unemployment rate have a well behaved Gaussian distribution (i.e. as aggregation of a large number of small shocks). This assumption also explain why i don't understand the need to use the **heavy** package. – user603 Oct 19 '10 at 07:20
@kwak On using heavy: I believe that the estimates are more robust with heavy tailed innovations (my measurements are imperfect), but I might be wrong. Another reason was that I did a correlation clustering and looked at the distribution of log returns per year for each cluster, a student t distribution fits not too bad. But maybe it is irrelevant here. I have some questions: now assume I have more points, let's say about 25 points per region, would you still use GMM or REML for estimation? Where is the threshold? For a small sample is the asymptotic efficiency relevant? – teucer Oct 19 '10 at 09:12
@Teucer:> *Another reason was that I did a correlation clustering and looked at the distribution of log returns per year for each cluster, a student t distribution fits not too bad.* this does not in itself justifies using a $t$ distribution: mixes of Gaussian distribution with varying variances, for instance, will also converge to a fat tailed distribution. – user603 Oct 19 '10 at 09:32
@Teucer:> * For a small sample is the asymptotic efficiency relevant?* I think there is a slight misunderstanding here. If your assumptions on the distribution of the residuals are correct, then asymptotic efficiency is a measure of mean accuracy of your estimates (the 'asymptotic' here refers to the relative average precision over a large number of estimation instances). – user603 Oct 19 '10 at 09:35
@Teucer:> *now assume I have more points, let's say about 25 points per region, would you still use GMM or REML for estimation? Where is the threshold?* At some point, the balance indeed tilts. And for larger sample sizes the gains in efficiency do not outweighs the costs in (statistical **AND** computational) complexity of the GMM approach over the REML one. The exact point depends also to the extend to which your data conforms to the working assumptions underlying each model (REML does not,imho, have less requirements, just different ones).... – user603 Oct 19 '10 at 09:39
@Teucer:> It's certainly a good sign if the two approaches do not lead to wildly different results (in the hypothetical that this would not be the case, one should explain why) – user603 Oct 19 '10 at 09:40
2

@kwak, the spatial effect you suggest is what is referred to as local Geary's C, Global formula, http://en.wikipedia.org/wiki/Geary's_C, (or here is a link for the local version http://www.passagesoftware.net/webhelp/Introduction.htm#Local_Geary_s_c.htm ) you could also consider local Moran's I, http://en.wikipedia.org/wiki/Indicators_of_spatial_association – Andy W Oct 19 '10 at 11:41
also to note how you define k is only limited to your imagination. There is currently no consensus on what is proper or improper, although many people suggest you try to optimize it like you suggested. – Andy W Oct 19 '10 at 13:05
@kwak @Andy thx for all the explanations. Just another question: now let's assume that I can further aggregate my regions on some broader regions (synthetic, using correlation clustering, or economic) would it make sens to compute the average $\bar{u}_{−it}$ on these regions? – teucer Oct 19 '10 at 13:13
@teucer, your asking if it would make sense to use that spatial neighborhood average as a predictor right? That is close to what local Moran's I does, it makes no difference if you choose neighbors based on theoretical reasons or empirical ones. I can probably guess why kwak initially suggested using the spatial differences as opposed to the average, but I will let kwak clarify. – Andy W Oct 19 '10 at 13:22
@kwak, I also think the random component to the error term is a very good idea. There are many logical situations in which you wouldn't expect Beta(s) to be the same in different regions. I remember an example that New York City probably influences its neighbors, but the neighbors of NYC are less likely to influence it. – Andy W Oct 19 '10 at 14:40
@Teucer:> As Andy W said, any distance measure from which you can obtain a matrix of pairwise distances (geographic, economic as well as those coming from a clustering algorithm and there variations) are certainly to be tried. @Andy W: i don't advise Teucer to directly use the local average as a predictor because of the risk of correlation between the component of the residuals accounting for heterogeneity in the $\beta_s$'s and the local average. Is this what you had in mind ? – user603 Oct 19 '10 at 14:57
@Andy W:> that (NYC) is a very good intuitive example. Worth using as an illustration. – user603 Oct 19 '10 at 14:58
1

@kwak, that was actually not the reason I imagined (I was thinking more along the lines of stationary estimates of Beta(s)). Local Moran's I does scale the average (ie it Z scores the average of the neighbors based on global mean and variance), but I think your concern is still legitimate. Is that a big deal though if your only interested in prediction? – Andy W Oct 19 '10 at 15:16
@Andy W:> *(ie it Z scores the average of the neighbors based on global mean and variance)* As you said, i don't think this alleviate my concern. The main issue for the use of RE is that we need $E(x_{it})\approx E(x_{jt})$ for any $i \neq j$. this most likely holds true if x_{it} is $u_{it}-\bar{u}_{-it}$ (or as Teucer suggested, $\log(u_{it})-\log(u_{-it})$) but it most certainly does not hold when x_{it} is $\bar{u}_{it}$ because it could be that $\bar{u}_{jt}\neq\bar{u}_{it}$. At any rate, this hypothesis has to be tested (Haussman test). – user603 Oct 19 '10 at 15:57
@Andy W:> *Is that a big deal though if your only interested in prediction?* Can you post this as a separate question on the main board? It is a very important point, one which should interest many future readers. – user603 Oct 19 '10 at 16:11
@Teucer:> I'm sorry i do not understand your last message. But you can certainly edit your question. – user603 Oct 19 '10 at 17:56
@kwak I think it will help me a lot if you can edit your answer with the model formulas and the R code: I have read the vignette of plm several times, but I do not understand the model specification! Thanks in advance for the effort... – teucer Oct 19 '10 at 18:20
I would have used the following specification with lme4: fm – teucer Oct 19 '10 at 18:29
@Teucer:> i will do this, but not tonight ; – user603 Oct 19 '10 at 18:43
@kwak, When I get the time I will post the question on the role of regression model assumptions when one is solely interested in prediction. Feel free to post it yourself in the meantime though if you would like. – Andy W Oct 22 '10 at 14:44

Andy W · Answer 2 · 2010-10-19T15:23:18.643

Given the nature of your data I would suggest you investigate the use of exponential smoothing as well as fitting ARIMA type models, especially due to the temporal constraints within your data. Although I wouldn't doubt spatial dependencies exist, I would be abit skeptical about their usefulness in forecasting (in what I would imagine are fairly large areas), especially since any spatial dependency will likely be already captured (at least to a certain extent) in previous observations in the series.

Where the spatial dependencies may be helpful is if you have small area estimation problems, and you can use the spatial dependency in your data to help smooth out your estimations in those noisy geographic regions. This may not be a problem though since you have aggregated data for a full year.

You shouldn't take my word for it though, and should investigate economics literature on the subject and assess various forecasting methods yourself. Its quite possible other variables are useful predictors of future unemployment in similar panel settings.

Edit:

First I'd like to clarify that I did not mean that the OP should simply prefer some type of exponential smoothing over other techniques. I think the OP should assess performance of various forecasting methods using a hold out sample of 1 or 2 time periods. I do not know the literature for forecasting unemployment, but I have not seen any method so obviously superior that others should be dismissed outright in any context.

Kwak mentions a key point I did not consider initially (and Stephan's comment makes the same point very succinctly as well). The panel nature of the data allows one to estimate an auto-regressive compenent in the model much easier than in a single time series. So I would follow his suggestion and consider the A/B estimator a good bet to provide the best forecast accuracy.

I'm still sticking with my initial suggestion though that I am skeptical of the usefulness of the spatial dependence, and one should assess a models predictive accuracy with and without the spatial component. In terms of prediction it is not simply whether some sort of spatial auto-correlation exists, it is whether that spatial auto-correlation is useful in predicting future values independent of past observations in the series.

For simplifying my reasoning, lets denote

$R_{t}$ corresponds to a geographic region $R$ at time $t$

$R_{t-1}$ corresponds to a geographic region $R$ at the previous time period

$W_{t-1}$ corresponds to however one wants to define the spatial relationship for for the neighbors of $R_{t}$ at the previous time period

In this case $R$ is some attribute and $W$ is that same attribute in the neighbors of $R$ (i.e. an endogenous spatial lag.)

In pretty much all cases of lattice areal data, we have a relationship between $R$ and $W$. Two general explanations for this relationship are

1) The General Social Process Theory

This is when there are processes that affect $R$ and $W$ simultaneously that result in similar values with some sort of spatial organization. The support of the data does not distinguish between the forces that shape attributes in a broader scope than the areal units encompass. (I imagine there is a better name for this, so if someone could help me out.)

2) The Spatial Externalities Theory

This is when some attribute of $W$ directly affects an attribute of $R$. Srikant's example of job diffusion is an example of this.

In the context of forecasting, the general social process model may not be all that helpful in forecasting. In this case, $R_{t-1}$ and $W_{t-1}$ are reflective of the same external shocks, and so $W_{t-1}$ is less likely to have exogenous power to predict $R_{t}$ independent of $R_{t-1}$.

IMO the spatial externalities case I would expect $W_{t-1}$ to have a greater potential to forecast $R_{t}$ independent of $R_{t-1}$ in the short run because $R_{t-1}$ and $W_{t-1}$ can be reflective of different external shocks to the system. This is my opinion though and you typically can't distinguish between the general social process model and the spatial externalities model through empirical means in a cross sectional design (they are probably both occurring to a certain extent in many contexts). Hence I would attempt to validate its usefulness before simply incorporating it into the forecast. Better knowledge of the literature and social processes would definately be helpful here to guide your model building. In criminology only in a very limited set of circumstances does the externalities model make sense (but I imagine it is more likely in economics data). Models of spatial hedonic housing prices often show very strong spatial effects, and in that context I would expect the spatial component to have a strong ability to forecast housing prices. (I like Luc Anselin's explanation of these two different processes better than mine in this paper, PDF here)

Often how we define $W$ is a further problem in this setting. Most conceptions of $W$ are very simplistic and probably aren't entirely reflective of real geographic processes. Here kwaks suggestion of adding a random component to the $W$ effect for each $R$ makes alot of sense. An example would be we would expect New York City to influence its neighbors, but we wouldn't expect NYC's neighbors to have all that much influence on NYC. This still doesn't solve how to either decide what is a neighbor or how to best represent the effects of neighbors. What kwak suggests is essential a local version of Geary's C (spatial differences), local Moran's I (spatial averages) is a common approach as well.

I'm still alittle surprised at the negative responses to my suggestion to use simpler smoothing methods (even if they are meant for univariate time series). Am I naive to think exponential smoothing or some other type of moving window technique won't perform at least comparably well enough to more complicated procedures to assess it? I would be more worried if the series were such that we would expect seasonal components, but that is not the case here.

@Andy Perhaps, spatial dependencies exist because employers tend to co-locate in certain geographical areas which has spill-over effects to neighboring areas? For example, during the IT bust of 2000 I am sure several regions around the Silicon valley would have had high unemployment rates but perhaps not in Detroit (which is dominated by the auto industry). — , Oct 18 '10 at 18:58
@Srikant you are right. I was thinking about it some more and spatial externalities are much more likely to occur in economic data than in crime data I am used to working with. Although I would still be skeptical with a lag of a year, it deserves more attention than my answer would suggest. — Andy W, Oct 18 '10 at 19:02
The effects that Srikant has highlighted cannot be ignored to my mind, these need to be taken into account... — teucer, Oct 18 '10 at 19:10
@Srilant:> what you see is correct, and J. Hamilton has done some work on economic contagion between US states. But these sorts of model need more than 9 observations per cross section to be estimated. For this types of situation, i recommend the A/B estimator (see my answer) since it has been designed, *precisely* for these types of situation (small **T** large **N**). — user603, Oct 18 '10 at 19:21
I wouldn't use a univariate time series method with 9 observations. — Rob Hyndman, Oct 18 '10 at 21:12
Yes, simple methods usually work better than complex ones (sometimes to a surprising extent). However, in this case univariate methods would throw away 99% of the data - whenever we forecast for one geography, we disregard all the other geographies. And I would definitely expect some kind of panel data model to be better than a univariate one. Not so much because employers co-locate, but because regulatory, tax, central bank and other external factors will be common and largely similar drivers of unemployment for *all* geographies. — Stephan Kolassa, Oct 19 '10 at 07:49

Forecasting unemployment rate

2 Answers2

Linked