12

I just came across this great analysis which is both interesting and beautiful visually:

http://www.nytimes.com/interactive/2012/11/02/us/politics/paths-to-the-white-house.html

I am curious how such a "path tree" can be constructed using R. What data and algorithm does one need to construct such a path tree?

Thanks.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
  • Roughly: check all the $2^9$ combinations of the winner in each state and put the results in a 9-dim binary hypertable, reorder into a tree based on the information gain, prune the redundant branches. –  Nov 04 '12 at 14:01
  • easy Eh @mbq?! ;-) – Gavin Simpson Nov 04 '12 at 14:19
  • 1
    I think they actually did it slightly differently: Rank the states by EV, then see what happens if each candidate wins, going down the tree. So, you don't need to generate $2^9$ and then prune. – Peter Flom Nov 04 '12 at 17:42
  • http://source.mozillaopennews.org/en-US/articles/nyts-512-paths-white-house/ (via [Scott Murray](https://twitter.com/alignedleft/status/265504534741266432)). – chl Nov 05 '12 at 17:58

1 Answers1

10

It is natural to use a recursive solution.

The data must consist of a list of the states in play, their electoral votes, and the presumed starting advantage to the left ("blue") candidate. (A value of $47$ comes close to reproducing the NY Times graphic.) At each step, the two possibilities (left wins or loses) are examined; the advantage is updated; if at that point the outcome (win, loss, or tie) can be determined--based on the remaining votes--then the calculation halts; otherwise, it is repeated recursively for the remaining states in the list. Thus:

paths.compute <- function(start, options, states) {
  if (start > sum(options)) x <- list(Id="O", width=1)
  else if (start < -sum(options)) x <- list(Id="R", width=1)
  else if (length(options) == 0 && start == 0) x <- list(Id="*", width=1)
  else {
    l <- paths.compute(start+options[1], options[-1], states[-1])
    r <- paths.compute(start-options[1], options[-1], states[-1])
    x <- list(Id=states[1], L=l, R=r, width=l$width+r$width, node=TRUE)
  }
  class(x) <- "path"
  return(x)
}

states <- c("FL", "OH", "NC", "VA", "WI", "CO", "IA", "NV", "NH")
votes <- c(29, 18, 15, 13, 10, 9, 5, 6, 4)
p <- paths.compute(47, votes, states)

This effectively prunes the tree at each node, requiring much less computation than exploring all $2^9=512$ possible outcomes. The rest is just graphical detail, so I will discuss only those parts of the algorithm that are essential for an effective visualization.

Image

The full program follows. It is written in a moderately flexible manner to enable the user to adjust many of the parameters. The crucial part of the graphing algorithm is the tree layout. To do this, plot.path uses a width field to allocate proportionally the remaining horizontal space to the two descendants of each node. This field is calculated initially by paths.compute as the total number of leaves (descendants) beneath each node. (If some such calculation is not made, and the binary tree is simply split in half at each node, then by the ninth state there is only $1/512$ of the total width available for each leaf, which is far too narrow. Anybody who has started to draw a binary tree on paper has soon experienced this problem!)

The vertical positions of the nodes are arranged in a geometric series (with common ratio a) so that the spacing gets closer in the deeper parts of the tree. The thicknesses of the branches and sizes of the leaf symbols are scaled by depth, too. (This will cause problems with the circular symbols at the leaves, because their aspect ratios will change as a varies. I haven't bothered to fix that up.)

paths.compute <- function(start, options, states) {
  if (start > sum(options)) x <- list(Id="O", width=1)
  else if (start < -sum(options)) x <- list(Id="R", width=1)
  else if (length(options) == 0 && start == 0) x <- list(Id="*", width=1)
  else {
    l <- paths.compute(start+options[1], options[-1], states[-1])
    r <- paths.compute(start-options[1], options[-1], states[-1])
    x <- list(Id=states[1], L=l, R=r, width=l$width+r$width, node=TRUE)
  }
  class(x) <- "path"
  return(x)
}

plot.path <- function(p, depth=0, x0=1/2, y0=1, u=0, v=1, a=.9, delta=0,
               x.offset=0.01, thickness=12, size.leaf=4, decay=0.15, ...) {
  #
  # Graphical symbols
  #
  cyan <- rgb(.25, .5, .8, .5); cyan.full <- rgb(.625, .75, .9, 1)
  magenta <- rgb(1, .7, .775, .5); magenta.full <- rgb(1, .7, .775, 1)
  gray <- rgb(.95, .9, .4, 1)
  #
  # Graphical elements: circles and connectors.
  #
  circle <- function(center, radius, n.points=60) {
    z <- (1:n.points) * 2 * pi / n.points
    t(rbind(cos(z), sin(z)) * radius + center)
  }
  connect <- function(x1, x2, veer=0.45, n=15, ...){
    x <- seq(x1[1], x1[2], length.out=5)
    y <- seq(x2[1], x2[2], length.out=5)
    y[2] = veer * y[3] + (1-veer) * y[2]
    y[4] = veer * y[3] + (1-veer) * y[4]
    s = spline(x, y, n)
    lines(s$x, s$y, ...)
  }
  #
  # Plot recursively:
  #
  scale <- exp(-decay * depth)
  if (is.null(p$node)) {
    if (p$Id=="O") {dx <- -y0; color <- cyan.full} 
    else if (p$Id=="R") {dx <- y0; color <- magenta.full}
    else {dx = 0; color <- gray}
    polygon(circle(c(x0 + dx*x.offset, y0), size.leaf*scale/100), col=color, border=NA)
    text(x0 + dx*x.offset, y0, p$Id, cex=size.leaf*scale)
  } else {  
    mid <- ((delta+p$L$width) * v + (delta+p$R$width) * u) / (p$L$width + p$R$width + 2*delta)
    connect(c(x0, (x0+u)/2), c(y0, y0 * a), lwd=thickness*scale, col=cyan, ...)
    connect(c(x0, (x0+v)/2), c(y0, y0 * a), lwd=thickness*scale, col=magenta,  ...)
    plot(p$L, depth=depth+1, x0=(x0+u)/2, y0=y0*a, u, mid, a, delta, x.offset, thickness, size.leaf, decay, ...)
    plot(p$R, depth=depth+1, x0=(x0+v)/2, y0=y0*a, mid, v, a, delta, x.offset, thickness, size.leaf, decay, ...)
  }
}

plot.grid <- function(p, y0=1, a=.9, col.text="Gray", col.line="White", ...) {
  #
  # Plot horizontal lines and identifiers.
  #
  if (!is.null(p$node)) {
    abline(h=y0, col=col.line, ...)
    text(0.025, y0*1.0125, p$Id, cex=y0, col=col.text, ...)
    plot.grid(p$L, y0=y0*a, a, col.text, col.line, ...)
    plot.grid(p$R, y0=y0*a, a, col.text, col.line, ...)
  }
}

states <- c("FL", "OH", "NC", "VA", "WI", "CO", "IA", "NV", "NH")
votes <- c(29, 18, 15, 13, 10, 9, 5, 6, 4)
p <- paths.compute(47, votes, states)

a <- 0.925
eps <- 1/26
y0 <- a^10; y1 <- 1.05

mai <- par("mai")
par(bg="White", mai=c(eps, eps, eps, eps))
plot(c(0,1), c(a^10, 1.05), type="n", xaxt="n", yaxt="n", xlab="", ylab="")
rect(-eps, y0 - eps * (y1 - y0), 1+eps, y1 + eps * (y1-y0), col="#f0f0f0", border=NA)
plot.grid(p, y0=1, a=a, col="White", col.text="#888888")
plot(p, a=a, delta=40, thickness=12, size.leaf=4, decay=0.2)
par(mai=mai)
whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 2
    That's a pretty nice solution. And the graphics are impressive. There is also a `partitions` package that might have provided a structure for enumerating the possibilities. – DWin Feb 23 '13 at 19:48
  • Wow, Whuber, there aren't enough V's to mark your answer with! – Tal Galili Feb 23 '13 at 21:54