2

Say I have $n$ values which approximate some distribution $D$. If I have a value $x$ and I would like to know how likely it is that $x$ came from $D$, I could simply determine $x$'s percentile. By calculating what percentage of the $n$ values is larger/smaller than $x$, I know how extreme $x$ is.

  1. How does this generalise to multiple dimensions?

Say I have $n$ 2D values. You may assume it is taken from a binormal distibution, as shown in the figure below. The mean is the red dot.

  1. How extreme / likely is the green dot?

enter image description here

Mathematica code to reproduce figure:

Needs["MultivariateStatistics`"]
myData=RandomVariate[MultinormalDistribution[{2,1},{{0.3,-0.15},{-0.15,0.3}}],10^3];
Show[ListPlot[myData,PlotRange->{{0,4},{0,2}}],Graphics[PolytopeQuantile[myData,0.5]],Graphics[{PolytopeQuantile[myData,0.8],{PointSize[Large],Red,Point[{2,1}]},{PointSize[Large],Green,Point[{1.3,1.7}]}}]]

The figure includes two ellipsoids at values $0.5$ and $0.8$, which generalise the concept of Quantiles.

  1. Without using trial-and-error, can I determine the ellipsoid which runs through the green dot?

In my case, you may assume the data is normal, unimodal and that the covariance matrix can be determined from the $n$ values. So the following question is a bonus:

  1. Can this be generalised to multimodal distributions? And how does that relate to High Density Regions?

And finally:

  1. What is the correct nomenclature? Which statistical terms relate to this problem?
LBogaardt
  • 494
  • 1
  • 4
  • 18

1 Answers1

0

One measure of likelihood of a value is the Mahalanobis Distance, which works for arbitrary dimensions.

It must also relate to the multidimensional quantiles, but I am not sure how exactly.

enter image description here

Mathematica code:

Needs["MultivariateStatistics`"]
myData=RandomVariate[MultinormalDistribution[{2,1},{{0.3,-0.15},{-0.15,0.3}}],10^4];
myMean=Mean[myData]
myCovariance=Covariance[myData]
myPoint={1.3,1.7}
mahalanobisDistance[covariance_,vector_,mean_]:=Sqrt[(vector-mean).Inverse[covariance].(vector-mean)]
myDistance=mahalanobisDistance[myCovariance,myPoint,myMean]
myQuantile=CDF[NormalDistribution[],myDistance]
Show[ListPlot[myData,PlotRange->{{0,4},{0,2}}],Graphics[{PolytopeQuantile[myData,myQuantile],{PointSize[Large],Red,Point[myMean]},{PointSize[Large],Green,Point[myPoint]}}]]
LBogaardt
  • 494
  • 1
  • 4
  • 18