36

How do I add a neat polygon around a group of points on a scatterplot? I am using ggplot2 but am disappointed with the results of geom_polygon.

The dataset is over there, as a tab-delimited text file. The graph below shows two measures of attitudes towards health and unemployment in a bunch of countries:

scatterplot with density2d

I would like to switch from geom_density2d to the less fancy but empirically more correct geom_polygon. The result on unsorted data is unhelpful:

enter image description here

How do I draw 'neat' polygons that behave as contour paths around the min-max y-x values? I tried sorting the data to no avail.

Code:

print(fig2 <- ggplot(d, aes(man, eff, colour=issue, fill=issue)) + 
geom_point() + geom_density2d(alpha=.5) + labs(x = "Efficiency", y = "Mandate"))

The d object is obtained with this CSV file.

Solution:

Thanks to Wayne, Andy W and others for their pointers! The data, code and graphs have been posted to GitHub. The result looks like this:

result

Fr.
  • 1,343
  • 3
  • 11
  • 22
  • 6
    The term you might be looking for is the *convex hull* of the points (or potentially the alpha hull). You should be able to find an R function to calculate these and then be able to add them as layers to the plot. – Andy W Feb 14 '12 at 17:14
  • Thanks for pointing out the correct terminology! I have failed to use `?chull` with `ggplot2` so far. I am not sure that I am coding it right, and hope that someone has done it already. – Fr. Feb 14 '12 at 17:55
  • Could you add your R code to question? – Yuriy Petrovskiy Feb 14 '12 at 20:07
  • One thing to note: what you're displaying are the maxima, which may be "outliers". I believe the R package `alphahull` works similar to finding the convex hull, but allows you to adjust it inwards/outwards to try to do something like confidence intervals. – Wayne Feb 22 '12 at 14:49
  • @Wayne, an alpha hull is not a confidence interval (in any way imaginable). See [this gis.se question](http://gis.stackexchange.com/q/1200/751) for a brief description and some references of what an alpha hull is. Perhaps your thinking of bivariate confidence ellipses, or maybe even bagplots (bi-variate boxplots for identifying outliers). – Andy W Feb 23 '12 at 16:27
  • Looking at the final graph using an alpha hull is unlikely to change the general interpretation, although all of the polygons will be slightly smaller and unemployment will be more obviously seperated from health and pensions. I'm less familiar with the bivariate confidence ellipses in such a situation, but I suspect they would look similar/have similar interpretation to the original contour plot posted (although constrained to be more regular ellipses in all applications I have seen). – Andy W Feb 23 '12 at 16:29
  • @Andy W: Ah, okay. I've used the package once and it was my impression that it could shrink the convex-like hull in a way that might avoid outliers. As you point out, bivariate confidence ellipses would probably be what Fr. would actually want. The overall issue is that a convex hull will happily include outliers no matter how extreme and could thus be misleading if not dealt with in some way (Perhaps a pre-processing step, perhaps in the graphing step.) – Wayne Feb 23 '12 at 16:41
  • @Wayne, The alpha hull would include any outlier as well. It shrinks the overall area of the polygon but actually makes the outer ring of the polygon more detailed and includes *more* points as vertex's in the polygon ring. All of the same polygon vertex's for the convex hull would be included in the alpha hull. – Andy W Feb 23 '12 at 16:54

3 Answers3

36

With some googling I came across the website of Gota Morota who has an example of doing this already on her website. Below is that example extended to your data.

enter image description here

library(ggplot2)
work <- "E:\\Forum_Post_Stuff\\convex_hull_ggplot2"
setwd(work)

#note you have some missing data
mydata <- read.table(file = "emD71JT5.txt",header = TRUE, fill = TRUE)
nomissing <- na.omit(mydata) #chull function does not work with missing data

#getting the convex hull of each unique point set
df <- nomissing
find_hull <- function(df) df[chull(df$eff, df$man), ]
hulls <- ddply(df, "issue", find_hull)

plot <- ggplot(data = nomissing, aes(x = eff, y = man, colour=issue, fill = issue)) +
geom_point() + 
geom_polygon(data = hulls, alpha = 0.5) +
labs(x = "Efficiency", y = "Mandate")
plot
Andy W
  • 15,245
  • 8
  • 69
  • 191
  • Thanks, I'll revise the code accordingly. Unfortunately, your image file does not seem to load here, but the code is there. – Fr. Feb 15 '12 at 10:52
  • @Fr. , What exactly is the problem? – Andy W Feb 15 '12 at 12:46
  • @AndyW Unfortunately, the code does not support missing values, and I did not find a way to tweak it to do so. – Fr. Feb 15 '12 at 18:02
  • @Fr., How exactly do you want missing data values to be handled besides eliminating those observations? Any reasonable imputation technique would result in the points being *inside* the convex hulls of the non-missing observations. – Andy W Feb 15 '12 at 18:07
  • @AndyW I mean that the `NA` kill the `chull` function. I would expect it to just ignore it, but it fails at doing so and I did not find a way to use `na.omit()` to make it work. I'm sure it's possible, I just don't have the hackery skills to go beyond the previous solution. – Fr. Feb 15 '12 at 20:07
  • @Fr. It still isn't clear how the code I provided doesn't generalize to your use case (i.e. solve the problem with missing data). Feel free to ask for clarification if there is anything you do not understand. I dealt with the missing data by taking the original data frame (what I named `mydata`) and made a new dataframe (named `nomissing`) by deleting caseslistwise with missing information. I then made a custom function to return the convex hull for each of the factors within the "issue" variable. Then it is as simple as plotting the scatterplot and adding a layer of the convex hulls. – Andy W Feb 16 '12 at 17:36
  • @AndyW I just succeeded at shifting to your code! I'll have the vanity to post that on GitHub, as I'm learning to use that too. – Fr. Feb 22 '12 at 13:26
  • Edit done to original post. Thanks again for taking the time to help me. The mistake was mine, I was passing option to `geom_point` instead of `ggplot`, which was causing the polygons to be empty. – Fr. Feb 22 '12 at 13:42
9

If I understand your problem, you're looking for the convex hull of health and of unemployment. There are probably several packages to do this in R, one of which is package geometry. I'd imagine that the points are sorted in order around the perimeter, but you'd have to check that.

EDIT: Here's an example, which doesn't use ggplot, but I hope it's useful. The example in the chull documentation seems to be wrong, which might be throwing you off:

X <- matrix(rnorm(2000), ncol = 2)
X.chull <- chull (X)
X.chull <- c(X.chull, X.chull[1])
plot (X)
lines (X[X.chull,])

EDIT 2: OK, here is something using ggplot2. We turn X into a data.frame with variables x and y. Then:

library(ggplot2)
X <- as.data.frame(X)
hull <- chull(X)
hull <- c(hull, hull[1])
ggplot(X, aes(x=x, y=y)) + geom_polygon(data=X[hull,], fill="red") + geom_point()

Note that the geom_point is using the data (X) and aes from the ggplot, while I'm overriding it in the geom_polygon.

To get it fully, you'd need to put the x and y for the hull for both issues into bar, using a third column issue to differentiate them.

Ken Williams
  • 1,670
  • 1
  • 12
  • 14
Wayne
  • 19,981
  • 4
  • 50
  • 99
9

As of this afternoon, I've wrapped the chull function inside an R package as a geom_convexhull function.

Once the package is loaded, it can be used as any other geom, in your case it should be something like :

ggplot(d, aes(man, eff, colour=issue, fill=issue)) + 
  geom_convexhull(alpha=.5) + 
  geom_point() + 
  labs(x = "Efficiency", y = "Mandate"))

The package is available on github : https://github.com/cmartin/ggConvexHull

  • Thanks a lot for this! I was getting frustrated from undesired output when trying to apply `chull` across a grouping factor until I found this. – jogall Feb 10 '18 at 16:03