How to compare a new measurement to an existing multivariate distribution?

Question

I have a dataset that describes the position and rotation of an object at different points in time using four dimensions. I want to use this sample of observations to get a sense of what positions and rotations are possible/likely for this object.

Ultimately, I want to be able to take a new measurement of the object and estimate how "likely" the new four-dimensional measurement is (e.g., is this measurement similar to those in the dataset or very different/rare?). What would be a good way to characterize the multivariate distribution of scores and compare a new measurement to this distribution?

I was thinking that maybe I could use multivariate kernel density estimation in the dataset. To estimate the "likelihood" of the new measurement, I would then take the density of the region corresponding to that new measurement.

Would this be a reasonable approach? What assumptions would it make? Can you think of a better or alternative approach? Thanks.

Great question, I'm really interested in the answers. What are your four dimensions? If it's 2D, wouldn't you have 2 position and 1 angle? And if 3D, 3 positions and 3 angles? Also not sure how KDE would work with a periodic variable, but there's some discussion here: https://stats.stackexchange.com/questions/5011/bias-for-kernel-density-estimator-periodic-case — naught101, Oct 29 '18 at 00:50

score 2 · Answer 1 · answered Oct 31 '18 at 19:05

It sounds like you need to use a Gaussian Process (GP) model. Here is a short, but complete note. Here is a really in depth book on using GP.

If I understand your question, the probability you're looking for is described on page 3 of Reference 1. More directly, you want $p(x_{A}|x_{B})$, where $x_{A}$ is your new point and $x_{B}$ is the old data.

score 0 · Answer 2 · answered Oct 31 '18 at 00:43

One possible approach among many others is to fit a Gaussian mixture model. Depending how your data is distributed this may or may not be a good choice. But it will give you a predictive distribution for future observations. Plus you can also increase the complexity of the model by adding extra components to the mixture. To estimate the p-values for new datapoints though you would need to running a simple Monte-Carlo as there is no analytic expression.

If your data has angles though, you may want to transform them in some other space where the data is not bounded.

How to compare a new measurement to an existing multivariate distribution?

2 Answers2