3

Given $K$ triplets $t_k:=(a_k^{(1)},a_k^{(2)},a_k^{(3)}) \in \mathbb{R}^3, k=1,...,K$ and the triplet to test against them $t=(a^{(1)},a^{(2)},a^{(3)}) \in \mathbb{R}^3$.

How can one test whether $t$ differs significantly from the $K$ other triplets, in the sense that $t$ has "little probability to occur in the distribution of the $t_k$s"? That is, I want to test whether this particular triplet is somehow significantly different from the others.

If this weren't triplets, e.g. if I just wanted to compare $a^{(1)}$ against $a_k^{(1)},k=1,...,K$, I would calculate the mean $\mu$ of all $a_k^{(1)},k=1,...,K$ and $a^{(1)}$ and then calculate $\alpha=\#\{k : |\mu - a_k^{(1)}| < |\mu - a_w^{(1)}|\}$. Then I could say $a^{(1)}$ is further apart from the mean as $\frac{\alpha}{K}$ other cases, i.e. reject the hypothesis that $a^{(1)}$ is just randomly distributed around the mean.

mdewey
  • 16,541
  • 22
  • 30
  • 57
faew
  • 31
  • 2
  • What are the triplets? That is, what is $a^{(1)}$, an integer, a real number, a letter ... ? Also, what exactly do you mean by "different" (this will depend on what the triplets are) – Peter Flom Jan 11 '14 at 16:50
  • Sorry, I forgot about it. The triplets are in $\mathbb{R}^3$ (added that to the question). I think, that different in the sense of euclidian distance, best reflects what I intuitively would want. – faew Jan 11 '14 at 16:54
  • 3
    This appears to be related to multivariate outlier detection, e.g. http://stats.stackexchange.com/questions/213/what-is-the-best-way-to-identify-outliers-in-multivariate-data – Michael M Jan 11 '14 at 17:03
  • 1
    What if I just do it the same way as I drafted the version for one dimension? The mean simply becomes a mean vector, and the distance becomes a distance between points in euclidian space.. – faew Jan 11 '14 at 17:12

1 Answers1

1

I can think of a number of ways in which a triplet could be unusual. You mention Euclidean distance, which is certainly one of them - but first you'd have to decide distance from what; one choice would be the mean of each of the members of the triplet. Then you could find mean of the first, second and third members of the triplets and calculate, for each triplet, how far each was from its mean. Then square the distances and sum them. You would then have a Euclidean distance for each triplet and you could order them from large to small and find quantiles.

Another metric would be median absolute distance. Yet another would be maximum distance of any member of the triplet from the mean for that member.

Unless the triplets are fairly odd, I think all these measures will be highly correlated.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276