Spatial clustering in R with XY data

Question

I have a matrix of data (215 rows, 618 cols) the data is xy positional data from a square surface. Most of the data is 0, and very few are 1. When I plot this data I see that the 1's form 2 small clusters...I'd like to use a clustering technique to automatically colour the clusters and to know how many 1's (cells) make up each cluster..? Can I use kmeans or DBSCAN for this..? the examples i've seen answered seem to be on xy numbers data (if that makes sense) and not xy positional data with only 1's & 0's.

Any help would be appreciated. Paul.

score 1 · Accepted Answer · answered Jul 19 '16 at 08:27

The simple way to do this is to consider positions with a value of one as an observation at that point. Then use something like k-means etc... to do the clustering.

e.g.

A $4\times4$ grid,

$\begin{array}{c|cccc} x\y & 1 & 2 & 3 & 4 \\ \hline 1 & 1 & 1 & 0 & 0\\ 2 & 0 & 1 & 0 & 0\\ 3 & 0 & 0 & 0 & 1\\ 4 & 0 & 0 & 1 & 1\\ \end{array}$

could be treated as a set of observations by their coordinates,

$\begin{array}{cc} x & y \\ \hline 1 & 1 \\ 1 & 2 \\ 2 & 2 \\ 3 & 4 \\ 4 & 3 \\ 4 & 4 \\ \end{array}$.

score 1 · Answer 2 · edited Apr 13 '17 at 12:44

You should transform your data from the current, image-like representation (with values being at a certain x/y position of a matrix) to a data.frame, that has an x, y, and value/target column:

# some dummy data
myData <- data.frame(expand.grid(x=1:20, y=1:20))
myData$target <- ifelse(randu[,1] < 0.8, 0, 1)
# this is how your data could look like
print(myData)
#   x y target
# 1 1 1      0
# 2 2 1      0
# 3 3 1      1
# 4 4 1      0
# 5 5 1      0
# 6 6 1      0

From here on you could e.g. use further approaches, or visualize your data directly (just 2 sample plots that might be a start for further investigation - I would recommend looking at e.g. this answer for more ways):

# classic levelplot
library(lattice)
levelplot(x = target ~ x*y, myData, col.regions=c(0,1))

# scatterplot with alpha
library(scales)
plot(x = myData$x, y = myData$y, pch=19, col= alpha(myData$target+1, 0.5), cex=5)

One more thing: you seem to have a target variable in your data (the 0 or 1 values). Note that clustering is usually unsupervised, hence applied on data without a target variable. It could be that techniques similar to e.g. Nearest Centroid Classification would serve better for your purpose.

Spatial clustering in R with XY data

2 Answers2