5

I have a dataset that describes the position and rotation of an object at different points in time using four dimensions. I want to use this sample of observations to get a sense of what positions and rotations are possible/likely for this object.

Ultimately, I want to be able to take a new measurement of the object and estimate how "likely" the new four-dimensional measurement is (e.g., is this measurement similar to those in the dataset or very different/rare?). What would be a good way to characterize the multivariate distribution of scores and compare a new measurement to this distribution?

I was thinking that maybe I could use multivariate kernel density estimation in the dataset. To estimate the "likelihood" of the new measurement, I would then take the density of the region corresponding to that new measurement.

Would this be a reasonable approach? What assumptions would it make? Can you think of a better or alternative approach? Thanks.

Jeffrey Girard
  • 3,922
  • 1
  • 13
  • 36
  • 1
    Great question, I'm really interested in the answers. What are your four dimensions? If it's 2D, wouldn't you have 2 position and 1 angle? And if 3D, 3 positions and 3 angles? Also not sure how KDE would work with a periodic variable, but there's some discussion here: https://stats.stackexchange.com/questions/5011/bias-for-kernel-density-estimator-periodic-case – naught101 Oct 29 '18 at 00:50

2 Answers2

2

It sounds like you need to use a Gaussian Process (GP) model. Here is a short, but complete note. Here is a really in depth book on using GP.

If I understand your question, the probability you're looking for is described on page 3 of Reference 1. More directly, you want $p(x_{A}|x_{B})$, where $x_{A}$ is your new point and $x_{B}$ is the old data.

0

One possible approach among many others is to fit a Gaussian mixture model. Depending how your data is distributed this may or may not be a good choice. But it will give you a predictive distribution for future observations. Plus you can also increase the complexity of the model by adding extra components to the mixture. To estimate the p-values for new datapoints though you would need to running a simple Monte-Carlo as there is no analytic expression.

If your data has angles though, you may want to transform them in some other space where the data is not bounded.

sega_sai
  • 670
  • 7
  • 12