Statistical language in "The Avengers"

Question

In the Marvel Movie "The Avengers," there's a scene in which Bruce Banner, looking for a piece of alien technology called the "tesseract," says that he is going to "rough out a tracking algorithm, just a basic cluster recognition." Is this a coherent thing to say? I'm a novice in statistics, but I have heard of cluster analysis and pattern recognition. Is cluster recognition used to refer to the same thing and what relation does it have to tracking algorithms? (Or did Bruce Banner lie?)

transcript from the movie

Phil Coulson

We're sweeping every wirelessly accessible camera on the planet. Cell phones, laptops... If it's connected to a satellite, it's eyes and ears for us.

Natasha (Black Widow)

That's still not gonna find them in time.

Dr. Banner (The Hulk)

You have to narrow your field. How many spectrometers do you have access to?

Nick Fury

How many are there?

Dr. Banner (The Hulk)

Call every lab you know. Tell them to put the spectrometers on the roof and calibrate them for gamma rays. I'll rough out a tracking algorithm, basic cluster recognition. At least we could rule out a few places. Do you have somewhere for me to work?

More info about the tesseract (E.g. it emits gamma radiation)

In light of the existence of an upvoted answer, I think this question is not too unclear to be answered. I'm voting to leave open. — gung - Reinstate Monica, Feb 16 '17 at 13:36
I don't agree with the answers, but that is mostly because information is missing about how the tracking was done or how the algorithm helped (and I will have to rewatch that movie to remind myselve what is was about) and we can not answer with certainty. I can imagine that cluster analysis can be used to find (unknown) patterns and those patterns may help to track/find something. I wouldn't be surprised if such a technique would have been used in the tv-show 'numb3rs', although it is a bit more convoluted and the clusters are not directly the answer (I imagine it as some pre-processing step). — Sextus Empiricus, Jan 28 '20 at 08:22
After reading the script (which is much much more boring than the movie) as well as information about the tesseract, it seems to be all about the tesseract emitting gamma radiation. But I do not yet understand how cluster analysis is gonna help with this. Banner mentions that it is supposed to decrease the search area. — Sextus Empiricus, Jan 28 '20 at 08:56

score 6 · Accepted Answer · answered Feb 16 '17 at 08:54

To "recognise" something it must first exist, so you will be using a supervised algorithm while clustering is an unsupervised class of machine learning methods. Clustering algorithms group in terms of similarity, rather then recognize known patterns. So I'd say it sounds like another example where there is less science and more fiction in the movies and where unrelated geeky terms are used in random combination to sound scientific...

score 1 · Answer 2 · edited Jun 11 '20 at 14:32

Clusters of detected gamma photons are being studied in order to find potential sources of gamma rays

The cube or tesseract is emitting gamma radiation. In order to find sources of gamma radiation (and thus a potential location of the tesseract) one can use algorithms to detect clusters in the detected locations of gamma radiation.

Note that the estimated locations/directions of the observed photons are not so accurate, with error, so statistics comes into play. Whenever the detected photons are a lot near each other then this may indicate that they are related to a source radiating gamma photons.

Finding clusters of gamma rays is a way to find out whether a detected gamma ray is background or belongs, along with other detected gamma rays, to some potential common source.

Astronomers have been using the minimal spanning tree algorithm to find clusters of (ptentially) associated detected gamma rays( see for instance: Campana 2008 ).

An example image of how this works

An example image of how this works can be generated with the R statistical software (see below):

It is a similar image as those found in the works (but I can not find an image with clear free license):

Campana, R., et al. "Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies." Astrophysics and Space Science 347.1 (2013): 169-182. link to axiv paper https://arxiv.org/abs/1305.2025
Campana, Riccardo, et al. "A Minimal Spanning Tree algorithm for source detection in γ-ray images." Monthly Notices of the Royal Astronomical Society 383.3 (2008): 1166-1174. link to journal

library(emstreeR)

## 2D artifical data
set.seed(1)
n <- 20
n2 <- 400-n*3
## c1 to c3 are artificial clusters
## c4 is background noise
c1 <- data.frame(y = rnorm(n, 45, sd = 1),
                 x = rnorm(n, 130, sd = 1))
c2 <- data.frame(y = rnorm(n, 50, sd = 1),
                 x = rnorm(n, 125, sd = 1))
c3 <- data.frame(y = rnorm(n, 55, sd = 1),
                 x = rnorm(n, 135, sd = 1))
c4 <- data.frame(y = runif(n2, 40,60),
                 x = runif(n2, 120,150))
d <- rbind(c1, c2, c3, c4)

## MST:
out <- ComputeMST(d)

## 2D plot of points:
plot(-100,-100,xlim = c(120,150), ylim = c(40,60), xlab="latitude", ylab="longitude")
points(out$x,out$y,
       pch = 21, col = 1, bg = 1, cex=0.4)
title("approximate spatial distribution \n of detected signals", cex.main=1)

plot(-100,-100,xlim = c(120,150), ylim = c(40,60), xlab="latitude", ylab="longitude")
points(out$x,out$y,
       pch = 21, col = 1, bg = 1, cex=0.4)
title("red  lines: small edges \n green dots: connected with n >= 10", cex.main = 1)

# draw clusters seperately with large size

library(igraph)
edgevector <- as.numeric(matrix(cbind(out$from[edgeselect],out$to[edgeselect]),2,byrow=TRUE))
graph <- make_graph(edgevector, directed = FALSE)

groepen <- groups(components(graph))
sizes <- which(components(graph)$csize>=10)

for (s in sizes) {
  coordinates <- unlist(groepen[s])
  points(out$x[coordinates],
         out$y[coordinates],col=3)
}


# draw the tree and use mean distance as boundaries between clusters
boundary = mean(out$distance)
edgeselect = out$distance<boundary
colors = rgb(0.75+edgeselect*0.25,
             0.75-edgeselect*0.75,
             0.75-edgeselect*0.75)
for (i in 1:400) {
  lines(c(out$x[out$from[i]],out$x[out$to[i]]),
        c(out$y[out$from[i]],out$y[out$to[i]]), 
        col = colors[i])
}

score 0 · Answer 3 · answered Jan 28 '20 at 07:39

this is so funny. I was re-watching the Avengers and heard Banner say that. I have been learning ML for the past year, so I was wondering if anyone else caught that.

Clustering analysis is the process of minimizing distance between data points and maximizing centroids (clusters). As Tim suggested, this is unsupervised learning so there is no 'target/known variable'.

To answer your question, I think it is indeed a coherent thing to say. I'm assuming a clustering algo can be created to analyze certain variable including the gamma radiation these labs pick up over time. I'm also assuming that the readings may fluctuate, even if the cube is not near it. With readings of multiple labs, the cluster algorithm can cluster labs together that had elevated readings at certain times. To track and perhaps look at movement of the tessaract. To ,at least, narrow down the search of where it has been or where it is. This I'm assuming would be part of the algorithm. Afterward may be a search of the cellphones/laptops etc. that Fury had mentioned in that area, or other anomalies in the area (crazy deaths due to loki, missing/stolen item/reports of the materials loki was after, location of reactors etc.).

Statistical language in "The Avengers"

3 Answers3

Clusters of detected gamma photons are being studied in order to find potential sources of gamma rays

An example image of how this works