How can I rediscretize my data?

Question

Related to my previous question, I have a dataset of 2D points with an associated label (this label can take 6 different values). As suggested in the answers to my other question, this can be modeled as a marked point process (or 6 different point processes), allowing to apply standard tools to study this dataset.

I would like to take the approach that I first suggested in my first question, and try to apply PCA on this dataset, to see if the different types of points are correlated or not (i.e. are some types always happening together?). Here's how I want to do it:

Split my 2D space in a grid
For each cell of that grid, count the number of points of each type. For one cell, this gives me a point in $\mathbb{R}^6 : x_i = (N_1(A_i), N_2(A_i), N_3(A_i), N_4(A_i), N_5(A_i), N_6(A_i))$, where $N_k(A_i)$ is the number of points of the $k^{th}$ point process (corresponding to points of type $k$) in the cell $A_i$
Combine all the $x_i$ into a matrix $X \in \mathbb{R}^{6 \times M}$ and apply PCA to this matrix.

My question is the following: how do I build the grid? In other words, how do I rediscretize this dataset?

Indeed, the intensities of each process are not equal: some types appear more than others. If I just use a regular grid (all cells have the same area), the resulting points will have one or two components that dominates the others.

I was thinking of building my grid such that each cell has at most $N$ points, thus bounding the norm of the data points, but I don't think this will solve my "balance" problem.

Any suggestion, or pointer to litterature, are appreciated.

You are proposing a *quadrat count analysis.* There is much about this in the ecology literature. For an introduction see http://oai.cwi.nl/oai/asset/10611/10611A.pdf . (Stick to peer-reviewed papers, preferably those that don't shy away from explicitly invoking probability models, and avoid the GIS and geographical literature, which tends toward the overly simple and uncritical.) — whuber, Mar 21 '11 at 14:14

How can I rediscretize my data?

0 Answers0