Say I have $n$ values which approximate some distribution $D$. If I have a value $x$ and I would like to know how likely it is that $x$ came from $D$, I could simply determine $x$'s percentile. By calculating what percentage of the $n$ values is larger/smaller than $x$, I know how extreme $x$ is.
- How does this generalise to multiple dimensions?
Say I have $n$ 2D values. You may assume it is taken from a binormal distibution, as shown in the figure below. The mean is the red dot.
- How extreme / likely is the green dot?
Mathematica code to reproduce figure:
Needs["MultivariateStatistics`"]
myData=RandomVariate[MultinormalDistribution[{2,1},{{0.3,-0.15},{-0.15,0.3}}],10^3];
Show[ListPlot[myData,PlotRange->{{0,4},{0,2}}],Graphics[PolytopeQuantile[myData,0.5]],Graphics[{PolytopeQuantile[myData,0.8],{PointSize[Large],Red,Point[{2,1}]},{PointSize[Large],Green,Point[{1.3,1.7}]}}]]
The figure includes two ellipsoids at values $0.5$ and $0.8$, which generalise the concept of Quantiles.
- Without using trial-and-error, can I determine the ellipsoid which runs through the green dot?
In my case, you may assume the data is normal, unimodal and that the covariance matrix can be determined from the $n$ values. So the following question is a bonus:
- Can this be generalised to multimodal distributions? And how does that relate to High Density Regions?
And finally:
- What is the correct nomenclature? Which statistical terms relate to this problem?