5

Question: Best way to derive an ellipse formula when given bunch of points (say 60 points which when connected draws an ellipse)?

Background: Using R Momocs library function conf_ell which returns confidence ellipse coordinates. Would be nice to abstract those points into a formula.

You can see my ellipse points in the image below;

enter image description here

Any suggestions much appreciated, Thanks

Neil Varnas
  • 161
  • 5
  • @Tim thanks Tim. Well not sure how to be clearer than this to be honest. Let's try this way: you have 60 dots on a coordinate plot. When you connect them you get an ellipse. Is it possible to derive an ellipse formula from those points ? If yes, what would be the approach. – Neil Varnas Mar 21 '17 at 21:05
  • 1
    You might be interested in [this paper](https://www.emis.de/journals/BBMS/Bulletin/sup962/gander.pdf). – J. M. is not a statistician Mar 22 '17 at 02:54

1 Answers1

13

A straightforward way, especially when you expect the points to fall exactly on an ellipse (yet which works even when they don't), is to observe that an ellipse is the set of zeros of a second order polynomial

$$0 = P(x,y) = -1 + \beta_x\,x + \beta_y\,y + \beta_{xy}\,xy + \beta_{x^2}\,x^2 + \beta_{y^2}\,y^2$$

You can therefore estimate the coefficients from five or more points $(x_i,\,y_i)$ using least squares (without a constant term). The response variable is a vector of ones while the explanatory variables are $(x_i,\,y_i, \,x_iy_i,\,x_i^2,\,y_i^2)$.

Here are 60 points drawn with considerable error. Figure

The fit is shown as a black ellipse. The center and axes of the true underlying ellipse are plotted for reference.

When the points are known to fall on an ellipse, you may use any five of the points to estimate its parameters. (When you work out the normal equations you will obtain an explicit formula for the ellipse in terms of those ten coordinates.) It's best to choose the points situated widely around the ellipse rather than clustered in one place.


This is the R code used to do the work. The three lines in the middle after "Estimate the parameters" illustrate the use of least squares to find the coefficients.

center <- c(1,2)
axis.main <- c(3,1)
axis.lengths <- c(4/3, 1/2)
sigma <- 1/20               # Error SD in each coordinate
n <- 60                     # Number of points to generate
set.seed(17)
#
# Compute the axes.
#
axis.main <- axis.main / sqrt(crossprod(axis.main))
axis.aux <- c(-axis.main[2], axis.main[1]) * axis.lengths[2]
axis.main <- axis.main * axis.lengths[1]
axes <- cbind(axis.main, axis.aux)
#
# Generate points along the ellipse.
#
s <- seq(0, 2*pi, length.out=n+1)[-1]
s.c <- cos(s)
s.s <- sin(s)
x <- axis.main[1] * s.c + axis.aux[1] * s.s + center[1] + rnorm(n, sd=sigma)
y <- axis.main[2] * s.c + axis.aux[2] * s.s + center[2] + rnorm(n, sd=sigma)
#
# Estimate the parameters.
#
X <- as.data.frame(cbind(One=1, x, y, xy=x*y, x2=x^2, y2=y^2))
fit <- lm(One ~ . - 1, X)
beta.hat <- coef(fit)
#
# Plot the estimate, the point, and the original axes.
#
evaluate <- function(x, y, beta) {
  if (missing(y)) {
    y <- x[, 2]; x <- x[, 1]
  }
  as.vector(cbind(x, y, x*y, x^2, y^2) %*% beta - 1)
}
e.x <- diff(range(x)) / 40
e.y <- diff(range(y)) / 40
n.x <- 100
n.y <- 60
u <- seq(min(x)-e.x, max(x)+e.x, length.out=n.x)
v <- seq(min(y)-e.y, max(y)+e.y, length.out=n.y)
z <- matrix(evaluate(as.matrix(expand.grid(u, v)), beta=beta.hat), n.x)
contour(u, v, z, levels=0, lwd=2, xlab="x", ylab="y", asp=1)
arrows(center[1], center[2], axis.main[1]+center[1], axis.main[2]+center[2],
       length=0.15, angle=15)
arrows(center[1], center[2], axis.aux[1]+center[1], axis.aux[2]+center[2],
       length=0.15, angle=15)
points(center[1], center[2])
points(x,y, pch=19, col="Red")
XavierStuvw
  • 115
  • 5
whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Since I've installed R only couple of weeks ago I'll need to read this several times to actually understand it. I think in your R code you have chosen a center and you build your vectors around it, wheres I dont have that. One thing that looks very promising is your last sentence (about the 5 points). Ive slightly updated the question with more detail, although I think the situation remains the same. Thank a lot, Ill go through your answer and see how can I implement it, will come back to you, thanks – Neil Varnas Mar 22 '17 at 07:08
  • 1
    In order to illustrate a technique, one needs data. Since you neither supplied data nor described them, I had to assume they would be in *some* kind of form. When points can be plotted in `R`, as shown in your edit, then at some earlier stage they were available in the same logical format I have used here. The part of my code that computes the coefficients consists of just three lines: one to put the data into a form usable by `lm`; the call to `lm`; and extracting the coefficients. That's all it takes (and there are even simpler ways but they are less clear). – whuber Mar 22 '17 at 13:26
  • although I appreciate your answer there's no need to be blunt. in my question I said: points on the ellipse. So that kind of speaks for itself, can pick 5 or 10 or 60 points and you'll have the data. I shall try to plug in the output of the function which I'm using to generate ellipse data into your code. Thanks again – Neil Varnas Mar 22 '17 at 13:42
  • Neil, the severe length limitation imposed by this site on comments necessitates bluntness. I am sorry about that and its tendency to create the wrong impressions. I hope you have found the information helpful. – whuber Mar 22 '17 at 18:29
  • You could probably estimate the centre of the ellipses from the averages of the two data sets, just intuitively. I guess that the coordinates of the centre must be embedded in the $\beta$ coefficients since $$ (x -x_0)^2 + (y-y_0)^2 + (x-x_0)(y-y_0) = 1$$ is also an [ellipse](https://en.wikipedia.org/wiki/Ellipse), I presume. More knowledgeable folks may well want to correct this hint. Also, see https://stats.stackexchange.com/a/288145/133721 for an alternative formulation based on the coordinates of the foci. – XavierStuvw Jun 30 '17 at 10:49
  • @Xavier Although estimating the center as the centroid of the points is intuitively reasonable for the data shown, it's not the least squares estimate. To see how bad the centroid could be as an estimate, imagine supplementing the points in the figure with a million points clustered near one location on the perimeter. That would put the centroid of the points right near that location, while the least squares estimate would barely change. – whuber Jun 30 '17 at 13:35