4

In a nutshell, here's what I have:

  • Annual population estimates for the State
  • Periodical (5 years) age, population, and basic census data per zones

Here's what I want to do:

  • Create a simplistic model to generate the data for the missing years between the period for each zone, and have the total sums add up to where the yearly state population estimate is.

All in all, I'm looking for a non complicated statistical model that is able to generate values with an acceptable (doesn't have to be super high) precision.

dassouki
  • 1,219
  • 1
  • 17
  • 25

1 Answers1

3

About the simplest thing you can do is interpolate normalized counts over time and (almost) the simplest form of interpolation is linear.

Specifically, suppose $y_i$ is the state population at time $i$ and $x_i$ is some other count (by age, tract, or whatever). Define $\xi_i = x_i/y_i$. Suppose $i$ is a year for which you do not have the periodic data. Let $i_{-}$ and $i_{+}$ be the years immediately preceding and following $i$, respectively, for which $x_i$ is available. The linearly interpolated estimate of $\xi_i$ is

$$\hat{\xi}_i = \frac{\xi_{i_{-}} (i_{+} - i) + \xi_{i_{+}} (i - i_{-})} {i_{+} - i_{-}} \text{.}$$

The estimate of $x_i$ is

$$\hat{x}_i = \hat{\xi}_i y_i.$$

The sums will come out correctly because this estimator is linear with weights summing to unity. For example, suppose you are tracking two variables $x$ and $z$ which count complementary parts of the population (such as males and females), so that $x_i+z_i = y_i$ whenever you have all three counts. Defining $\xi_i = x_i/y_i$ as before and, similarly, $\zeta_i = z_i/y_i$, the two fractions sum to unity: $\xi_i + \zeta_i = y_i/y_i = 1$ for all $i$. Therefore the interpolated fractions also sum to unity:

$$\hat{\xi}_i + \hat{\zeta}_i = \frac{\xi_{i_{-}} (i_{+} - i) + \xi_{i_{+}} (i - i_{-})} {i_{+} - i_{-}} + \frac{\zeta_{i_{-}} (i_{+} - i) + \zeta_{i_{+}} (i - i_{-})} {i_{+} - i_{-}}$$

$$= \frac{(\xi_{i_{-}} + \zeta_{i_{-}}) (i_{+} - i) + (\xi_{i_{+}} + \zeta_{i_{+}}) (i - i_{-})} {i_{+} - i_{-}}$$

$$= \frac{(i_{+} - i) + (i - i_{-})} {i_{+} - i_{-}}$$

$$= 1.$$

Whence $\hat{x}_i + \hat{z}_i = y_i(\hat{\xi}_i + \hat{\zeta}_i) = y_i$ as desired. This generalizes to population partitions of any size, such as age distributions.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Thanks @Whuber, with your GIS experience, what do you recommend to do? – dassouki May 29 '11 at 15:57
  • @das It depends on what you need the data for and how accurate you need them to be. If you have related questions that will primarily be of interest to GIS people, why not post them on GIS.SE? You can reference this thread to provide context, make progress, and avoid repeating yourself. – whuber May 29 '11 at 17:25
  • I'm not sure GIS is the place, as really i'm trying to disaggregate census data. MY zones are already identified, although I might ask there what is the proper way to divide census zones so they're relatively the same size – dassouki May 30 '11 at 01:49
  • @das What is the motivation for making them the same size? And do you mean same *area* or same *population*? – whuber May 30 '11 at 17:11
  • @whuber The same area size. The end goal is to disaggregate the data as much as possible sort of like going from object to atoms – dassouki May 30 '11 at 17:24
  • @dass It's unclear then why area is of interest. The "atoms" in population data are individual people, households, and housing units. They are rarely distributed equally by area. – whuber May 30 '11 at 17:26
  • @whuber, true, but I don't have any of that data, and I'm trying to figure out a way to disaggregate it. I have census data, which roughly the zones are 2,500 -> 4,000 people per zone, I would like to disaggregate it up to 25 to 40 people per zone – dassouki May 30 '11 at 17:29
  • @dass One approach is to model the tract-only data in terms of the data also available at the block level. This sounds like [a question I asked here last year](http://stats.stackexchange.com/questions/4445/model-for-population-density-estimation). – whuber May 30 '11 at 17:33
  • @whuber, seems interesting especially the use of raster objects. Did you get a paper out of it, any recommendations or pitfalls to avoid? – dassouki May 30 '11 at 17:41