0

I have data that I can assume will be multivariate normal with a known mean vector mu and known covariance matrix sigma, and I'm looking to identify points that fall outside the 95% ellipse. I think the proper way to do this is to apply a transform to the data to a standard multivariate normal, compute the euclidean distances to zero, and compare against the value derived from running mvtnorm::qmvnorm(.95,tail='both',mean=mu,sigma=sigma). However, if this is correct (and feel free to correct me if there's a simpler solution to labelling the points outside the 95% ellipse), I'm struggling with the transform part. I know I'd subtract the mean vector mu from the observed data matrix, and I need to do something involving the resulting matrix and sigma, but I'm lost on what. Some code below showing some data and ending where I'm stuck:

mu = c(100,50)
sigma = matrix(c(100,210,210,900),nrow=2)
obs_data = MASS::mvrnorm(1e2,mu=mu,Sigma=sigma)
transformed = t(t(obs_data) - mu)
#what now?
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
  • 1
    See https://stats.stackexchange.com/questions/62092. – whuber Aug 19 '20 at 18:44
  • 1
    Ah! So the Mahalanobis distance is the euclidean distance-from-zero after the transform, so I can just use the `stats::mahalanobis()` function. Neat! Quick follow-up @whuber: do compare these distances to the 95% ellipse (or hypersphere) would I be getting the comparison value via `mvtnorm::qmvnorm(.95,tail='both',mean=mu,sigma=sigma)` or `mvtnorm::qmvnorm(.95,tail='both',sigma=diag(2))`? (i.e. do I need to tell `qmvrnorm` the original mean & covariance?) I'm getting very different results between these two. – Mike Lawrence Aug 19 '20 at 20:03
  • Oh, looking at the help page for `stats::mahalanobis()` and it shows that the distances should follow the same quantile function as a chi-square distribution with df=ncol(x), so I don't even need `qmvnorm`. When I use `qchisq(.95,df=ncol(dat))`, I do indeed observe that 95% of the distances exceed this value. So I wonder what `qmvnorm()` is doing or is supposed to be used for? – Mike Lawrence Aug 19 '20 at 20:57
  • 1
    `qmvnorm` gives you the half-width of a mean-centered square (or, generally, hypercube) in the original coordinates. In the transformed coordinates that region would correspond to a generalized rhombus. – whuber Aug 19 '20 at 21:02

1 Answers1

0

With the help of the content here, I was able to work this out and thought I'd post the resulting code and verification:

#define a function to transform a standard multivariate normal (zero mean, 
# identity covariance) to a non-standard multivariate normal with specified
# means and covariance.
std_to_mvn = function(Z,mu,sigma){
    tchol_sigma = t(chol(sigma))
    X_0centered = t(tchol_sigma %*% t(Z))
    X = t(t(X_0centered)+mu) #add means
    return(X)
}

#define a function to transform a multivariate normal with mean mu and covariance
# sigma to a standard multivariate normal with zero mean and identity covariance
mvn_to_std = function(X,mu,sigma){
    tchol_sigma = t(chol(sigma))
    X_0centered = t(t(X)-mu)
    Z = t(solve(tchol_sigma,t(X_0centered)))
}

#Check that we can induce the non-standard multivariate normal

#first,sample from a "standard" multivariate normal
z = MASS::mvrnorm(
    n = 1e5
    , mu = rep(0,times=3)
    , Sigma = diag(3)
)

#next, specify the parameters (arbitrary; taken from https://jamesmccaffrey.wordpress.com/2017/11/03/example-of-calculating-a-covariance-matrix/)
mu = c(68, 600, 40)
sigma = matrix(
    c(
        11.5 , 50, 34.75
        , 50, 1250, 205
        , 34.75, 205, 110
    )
    , nrow = 3
    , ncol = 3
)

#now, transform and check that we get what we expected
x = std_to_mvn(z,mu,sigma)
max(abs(colMeans(x)-mu)) # <1
max(abs(cov(x)-sigma)) # <1


#Check that we can transform a non-standard multivariate normal to standard
x = MASS::mvrnorm(
    n = 1e5
    , mu = mu
    , Sigma = sigma
)

#now, transform and check that we get what we expected
z = mvn_to_std(x,mu,sigma)
max(abs(colMeans(z))) # <1
max(abs(cov(z)-diag(3))) # <1

Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65