12

In a real-valued multivariate case, is there a way to uniformly sample the points from the surface where the Mahalanobis distance from the mean of the is a constant?

EDIT: This just boils down to sampling points uniformly from the surface of a hyper-ellipsoid that satisfies the equation,

$$(x-\mu)^T \Sigma^{-1}(x-\mu) = d^2.$$

To be more precise, by "uniformly", I mean sample such that each area element $dA$ of the hyper-surface contains the same probability mass.

amoeba
  • 93,463
  • 28
  • 275
  • 317
  • 1
    Correct me if I'm wrong: are you asking "given a random variable $ X $, how can I uniformly sample from the points that are a given Mahalanobis distance $ c $ away from $ \mathbb{E}[X] $?" – Kevin Li Sep 16 '18 at 22:17
  • @Kevin Li Yes, that is right. – sachin vernekar Sep 17 '18 at 00:46
  • 1
    I think we will need a suitable definition of "uniformly." The reason is this: in two dimensions, this set of points lies along some ellipse. Is one supposed to sample from that ellipse in such a way that equal lengths have equal chances, or that equal *angles* have equal chances, or so that equal lengths *when the variables are standardized* have equal chances, or in some other way? If you could explain what this sampling aims to achieve, that might give us enough information to know what you are trying to ask. – whuber Sep 17 '18 at 14:07
  • @whuber I have made the question more clear now. I mean you fix the standardized distance (Mahalanobis distance) from the mean and sample from the surface of the hyper-ellipsoid. But note in my case, the hyper-ellipsoid is rotated. – sachin vernekar Sep 18 '18 at 07:24
  • That's still unclear, though: the problem is that you haven't specified what density to use for sampling from that surface. "Uniformly" could mean various different things, as I tried to explain in the example I gave. – whuber Sep 18 '18 at 13:58
  • @whuber I need to sample uniformly from the surface given by the equation in my question. This is similar to the case where you want to sample uniformly from the surface of a sphere. This means all the points in the surface have equal chances. Let me know if this is clear. – sachin vernekar Sep 18 '18 at 20:39
  • I'm afraid it's not clear, because--as I have attempted to describe--"uniformly" depends on how you represent the surface. Different forms of representation give rise to different probability distributions. – whuber Sep 18 '18 at 20:58
  • @whuber I want to do exactly the same as the one mentioned in http://corysimon.github.io/articles/uniformdistn-on-sphere/ but for a hyper-ellipsoid. I am afraid I can't make it even more precise with my limited knowledge in statistics. – sachin vernekar Sep 18 '18 at 21:28
  • 1
    These surfaces are rarely spheres. If you do want to sample from a sphere, see the generalization at https://stats.stackexchange.com/questions/310346. Note that sampling uniformly from a *standardized* sphere, as described at https://stats.stackexchange.com/questions/62092/bottom-to-top-explanation-of-the-mahalanobis-distance/62147#62147, and then transforming the sample back to the original units does *not* give a uniform sample of the ellipsoid. That's why your question still is unclear. – whuber Sep 18 '18 at 21:40
  • 1
    I understand that uniformly sampling from the surface of the sphere and then mapping it to ellipsoid won't give uniform samples on the ellipsoid. So I need a method that does sample uniformly from the surface of an ellipsoid. – sachin vernekar Sep 18 '18 at 22:51
  • 1
    Do you want to have the sample uniform on the surface of an ellipsoid, in the sense that each area element dA of the hyper-surface contains the same probability mass? – Sextus Empiricus Sep 20 '18 at 11:29
  • 1
    Why, how and where are you going to apply this uniform sample? Such information may help to come with a best/sufficient strategy. *For instance, when the different ellipsoid axes are not to much different then you can use rejection sampling by (1) sampling on a sphere, (2) squeezing it into an ellipsoid, (3) compute the rate by which the surface area was squeezed (4) reject samples according to the inverse of that rate.* – Sextus Empiricus Sep 20 '18 at 11:54
  • @Martinjin Weterings Yes, sample uniformly such that each area element dA of the hyper-surface contains the same probability mass. I am using it in the context of anomaly detection using a classifier. I understand that there is an approximate way to do this as you mentioned, but I just wanted to know if sampling uniformly was possible at all otherwise. – sachin vernekar Sep 20 '18 at 20:23
  • @Alex R. The link gives a method to sample from inside of a hyperellipsoid and not from the surface. – sachin vernekar Sep 21 '18 at 05:58
  • 1
    @sachinvernekar I was not explaining an approximate method. The rejection sampling method that I describe will give a sample according to the *exact* uniform distribution. It is just that the method is inefficient. Also possibly, depending on how you are applying the sample, you may not need a uniform sample and there could be other ways to solve your problem (I mean, not the sampling on an ellipsoid problem, but the underlying problem). – Sextus Empiricus Sep 21 '18 at 08:48
  • @Martin Weterings Got it! Thanks for the clarification. – sachin vernekar Sep 21 '18 at 09:54

1 Answers1

5

When the different ellipsoid axes are not too much different then it is feasible to use rejection sampling (with large differences you reject a lot making it less feasible)

  • (1) sample on a hyper-sphere
  • (2) squeezing it into a hyper-ellipsoid
  • (3) compute the rate by which the surface area was squeezed
  • (4) reject samples according to that rate.

2D example

example

set.seed(1)
#some matrix to transform n-sphere (in this case 2x2)
m <- matrix(c(1, 0.55, 0.55, 0.55), 2)

# sample multinomial with identity covariance matrix
x <- cbind(rnorm(3000, 0, 1), rnorm(3000, 0, 1))
l1 <- sqrt(x[,1]^2 + x[,2]^2)

# perpendicular vector
per <- cbind(x[,2], -x[,1])

# transform x
x <- x %*% m
# transform perpendicular vector (to see how the area transforms)
per2 <- per %*% m

# get onto unit-"sphere"/ellipsoid
x <- x/l1

# this is how the area contracted
contract <- sqrt(per2[,1]^2 + per2[,2]^2) / sqrt(per[,1]^2 + per[,2]^2)

# then this is how we should choose to reject samples 
p <- contract/max(contract)

# rejecting
choose <- which( rbinom(n=length(p), size=1, p=p) == 1)

#plotting
plot(x[1:length(choose), 1], x[1:length(choose), 2],
     xlim=c(-1.2, 1.2), ylim=c(-1.2, 1.2),
     xlab = expression(x[1]), ylab = expression(x[2]),
     bg=rgb(0, 0, 0, 0.01), cex=0.6, pch=21, col=rgb(0, 0, 0, 0.01))
title("squeezed uniform circle \n ")

#plotting
plot(x[choose,1], x[choose,2],
     xlim=c(-1.2, 1.2), ylim=c(-1.2, 1.2),
     xlab = expression(x[1]), ylab = expression(x[2]),
     bg=rgb(0, 0, 0, 0.01), cex=0.6, pch=21, col=rgb(0, 0, 0, 0.01))
title("squeezed uniform circle \n  with rejection sampling")
Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161