Model for population density estimation

Question

A database of (population, area, shape) can be used to map population density by assigning a constant value of population/area to each shape (which is a polygon such as a Census block, tract, county, state, whatever). Populations are usually not uniformly distributed within their polygons, however. Dasymetric mapping is the process of refining these density estimates by means of auxiliary data. It is an important problem in the social sciences as this recent review indicates.

Suppose, then, that we have available an auxiliary map of land cover (or any other discrete factor). In the simplest case we can use obviously uninhabitable areas like waterbodies to delineate where the population isn't and, accordingly, assign all the population to the remaining areas. More generally, each Census unit $j$ is carved into $k$ portions having surface areas $x_{ji}$, $i = 1, 2, \ldots, k$. Our dataset is thereby augmented to a list of tuples

$$(y_{j}, x_{j1}, x_{j2}, \ldots, x_{jk})$$

where $y_{j}$ is the population (assumed measured without error) in unit $j$ and--although this is not strictly the case--we may assume every $x_{ji}$ is also exactly measured. In these terms, the objective is to partition each $y_{j}$ into a sum

$$ y_j = z_{j1} + z_{j2} + \cdots + z_{jk} $$

where each $z_{ji} \ge 0$ and $z_{ji}$ estimates the population within unit $j$ residing in land cover class $i$. The estimates need to be unbiased. This partition refines the population density map by assigning the density $z_{ji}/x_{ji}$ to the intersection of the $j^{\text{th}}$ Census polygon and the $i^{\text{th}}$ land cover class.

This problem differs from standard regression settings in salient ways:

The partitioning of each $y_{j}$ must be exact.
The components of every partition must be non-negative.
There is (by assumption) no error in any of the data: all population counts $y_{j}$ and all areas $x_{ji}$ are correct.

There are many approaches to a solution, such as the "intelligent dasymetric mapping" method, but all those I have read about have ad hoc elements and an obvious potential for bias. I am seeking answers that suggest creative, computationally tractable statistical methods. The immediate application concerns a collection of c. $10^{5}$ - $10^{6}$ Census units averaging 40 people apiece (although a sizable fraction have 0 people) and about a dozen land cover classes.

@Rob Thank you, and thanks to all the people who looked at this: I saw your comments before they were deleted and am grateful for your efforts. — whuber, Nov 12 '10 at 16:25
Whuber, would you be willing to provide an answer to your own question here? It seems that you have gained some insight into this problems since initially posing it. Sorry to enter this as a question instead of a comment, but I don't have enough reputation to comment. — fgregg, Jan 03 '11 at 18:23
I'm still working on it. My initial model was similar to the one @Srikant Vadali proposed, but with an equality constraint on total population imposed. I implemented it as a Poisson GLM with *linear* link. It worked rather poorly on realistic test data. My suspicion is that the population density in an area may depend as much, or even more, on the characteristics of the *surrounding* area as it does on the characteristics of that area itself. I'm still open to creative suggestions or other references! — whuber, Jan 03 '11 at 19:08
Also this one: P. A Zandbergen and D. A Ignizio, “Comparison of Dasymetric Mapping Techniques for Small-Area Population Estimates,” Cartography and Geographic Information Science 37, no. 3 (2010): 199–214. http://www.ingentaconnect.com/content/acsm/cagis/2010/00000037/00000003/art00004 Which seems to call out for blending. — fgregg, Jan 03 '11 at 21:08
This paper might be useful: Hwahwan Kim and Xiaobai Yao, “Pycnophylactic interpolation revisited: integration with the dasymetric-mapping method,” International Journal of Remote Sensing 31, no. 21 (2010): 5657. http://www.informaworld.com/10.1080/01431161.2010.496805 — fgregg, Jan 03 '11 at 21:10
You know, dasymetric mapping ultimately as an ecological inference problem. The recent work of K. Imai might be helpful: http://pan.oxfordjournals.org/content/16/1/41.abstract — fgregg, Jan 10 '11 at 22:04

radek · Accepted Answer · 2010-12-13T18:08:03.437

4

You might want to check work of Mitchel Langford on dasymetric mapping.

He build rasters representing population distribution of Wales and some of his methodological approaches might be useful here.

Update: You might also have a look at work of Jeremy Mennis (especially these two articles).

edited Dec 13 '10 at 18:08

answered Nov 11 '10 at 16:13

radek

1,207
2
15
37

2

Thank you. That work provides a pointer into a web of recent research on dasymetric mapping. – whuber Dec 09 '10 at 21:55

score 2 · Answer 2 · 2010-11-11T17:01:08.360

Interesting question. Here is a tentative stab at approaching this from a statistical angle. Suppose that we come up with a way to assign a population count to each area $x_{ji}$. Denote this relationship as below:

$z_{ji} = f(x_{ji},\beta)$

Clearly, whatever functional form we impose on $f(.)$ will be at best an approximation to the real relationship and thus the need to incorporate error into the above equation. Thus, the above becomes:

$z_{ji} = f(x_{ji},\beta) + \epsilon_{ji}$

where,

$\epsilon_{ji} \sim N(0,\sigma^2)$

The distributional error assumption on the error term is for illustrative purposes. If necessary we can change it as appropriate.

However, we need an exact decomposition of $y_{ji}$. Thus, we need to impose a constraint on the error terms and the function $f(.)$ as below:

$\sum_i{\epsilon_{ji}} = 0$

$\sum_i{f(x_{ji},\beta)} = y_j$

Denote the stacked vector of ${z_{ji}}$ by $z_j$ and the stacked deterministic terms of ${f(x_{ji},\beta)}$ by $f_j$. Thus, we have:

$z_j \sim N(f_j,\sigma^2 I) I({f_j}' e = y_j) I((z_j-f_j)' e = 0)$

where,

$e$ is a vector of ones of appropriate dimension.

The first indicator constraint captures the idea that the sum of the deterministic terms should sum to $y_j$ and the second one captures the idea that the error residuals should sum to 0.

Model selection is trickier as we are decomposing the observed $y_j$ exactly. Perhaps, a way to approach model selection is to choose the model that yields the lowest error variance i.e., the one that yields the lowest estimate of $\sigma^2$.

Edit 1

Thinking some more the above formulation can be simplified as it has more constraints than needed.

$z_{ji} = f(x_{ji},\beta) + \epsilon_{ji}$

where,

$\epsilon_{ji} \sim N(0,\sigma^2)$

Denote the stacked vector of ${z_{ji}}$ by $z_j$ and the stacked deterministic terms of ${f(x_{ji},\beta)}$ by $f_j$. Thus, we have:

$z_j \sim N(f_j,\sigma^2 I) I({z_j}' e = y_j)$

where,

$e$ is a vector of ones of appropriate dimension.

The constraint on $z_j$ ensures an exact decomposition.

@Srikant Thank you. I was thinking along similar lines when I posed the question and have since tested out a GLM (Poisson distribution with *linear* link) as well as some other models. Unfortunately, it now looks like any model based solely on land cover type and proportion will not work well: a sample of these data suggests that population patterns depend on a larger spatial context. At a minimum, then, we would need to include spatially lagged covariates in a linear model. — whuber, Dec 09 '10 at 21:03

Model for population density estimation

2 Answers2

Linked