15

Context

I have two sets of data that I want to compare. Each data element in both sets is a vector containing 22 angles (all between $-\pi$ and $\pi$). The angles relate to a given human pose configuration, so a pose is defined by 22 joint angles.

What I am ultimately trying to do is determine the "closeness" of the two sets of data. So for each pose (22D vector) in one set, I want to find its nearest neighbour in the other set, and create a distance plot for each of the closest pairs.

Questions

  • Can I simply use Euclidean distance?
    • To be meaningful, I assume that the distance metric would need to be defined as: $\theta = |\theta_1 - \theta_2| \quad mod \quad \pi$, where $|...|$ is absolute value and mod is modulo. Then using the resulting 22 thetas, I can perform the standard Euclidean distance calculation, $\sqrt{t_1^2 + t_2^2 + \ldots + t_{22}^2}$.
    • Is this correct?
  • Would another distance metric be more useful, such as chi-square, or Bhattacharyya, or some other metric? If so, could you please provide some insight as to why.
Sycorax
  • 76,417
  • 20
  • 189
  • 313
Josh
  • 595
  • 1
  • 4
  • 14
  • what is your motive behind "determining the closeness" of these data sets? What you are trying to do may be Ok.., but may inference can be improved if you are interested in other specifics like `mean configuration` or equivalent. – suncoolsu Feb 04 '11 at 21:58
  • 3
    As a side note: I don't think you mean $|\theta_1 - \theta_2| \mod \pi$. Rather something like $\min\{|\theta_1 - \theta_2|, 2 \pi - |\theta_1 - \theta_2|\}$. – Erik P. Feb 04 '11 at 23:16
  • 4
    Rather than working with angles, I suggest converting to (x, y)-coordinates on the unit-circle first. You can then calculate normally (distances and the like), and averaging isn't a problem like with angles. – caracal Feb 04 '11 at 23:19
  • 1
    @suncoolsu The motive is to determine if my training set has enough similar examples to infer the poses of my test set. I am using a learning-based approach to pose estimation, and I am beginning to suspect that the data I am using is too different across data sets. I want to ensure that for each pose in a test set, there is at least a "reasonably similar" pose in the training set. Visually (reanimating the poses) it appears that my suspicion is well-founded, but I want to analyse it quantitatively. – Josh Feb 05 '11 at 01:36
  • @caracal Can you please explain what problem exists when averaging angles? Is it specific to a particular representation, or a general problem regardless of representation? Thanks. – Josh Feb 05 '11 at 02:06
  • 2
    @Josh Erik P.'s suggestion is a good one. Alternatively, consider each angle $\theta$ to be a point $\left(\cos(\theta), \sin(\theta)\right)$ on the unit circle and compute the Euclidean distances between them using the usual (Pythagorean) formula. The difference between these distances and the angular distances shouldn't matter. (I believe this may be what Caracal suggested, too.) – whuber Feb 05 '11 at 02:34
  • @Erik, I just corrected what @Josh had and I think @whuber is right. – suncoolsu Feb 05 '11 at 03:38
  • 2
    @Josh The average of, e.g., $\pi/4$ and $7\pi/4$ is $\pi$. In many circumstances, this doesn't make sense, and it should be $0$ instead. In your specific situation, this might not be an issue since maybe human joints don't have a range of motion past $\pi$. Also, in your case, maybe you want the aforementioned average to be $\pi$ since the joint motion is uni-directional. @whuber's suggestion is exactly what I meant. – caracal Feb 05 '11 at 10:45
  • 3
    Your problem will probably become much easier to solve if you can specify the consequences of "getting it wrong". So if you say the data sets are the same or similar, but they in fact aren't, what will happen to you? Will it depend on "how wrong" your decision was? What will happen if you declare the data/poses different, but they are in fact the same or similar? What is lost? answering these questions will help determine *what matters* for the comparison *you* want to make. This ensures that you are answering the right question. – probabilityislogic Feb 05 '11 at 13:39
  • @whuber Using the unit circle seems like the best approach. I can then convert each distance between two angles back into an angle ( theta = cos^-1((2-dist^2)/2) ), given the properties the unit circle and the law of cosines, right? – Josh Feb 05 '11 at 17:40
  • @caracal Thanks for the example and also the suggestion of the unit circle representation; I understand what you mean now. – Josh Feb 05 '11 at 17:42
  • 1
    @probabilityislogic The consequence is simply a pose that is not representative of the ground truth pose. It would depend on the estimation approach used as to what would occur. If knn was used then the nearest neighbour would be too distant. What is considered "too distant"? It depends on the application, but as a general rule, it should at least be indicative of the action being performed (e.g. in a walking sequence, you would always want to have the person in an upright position). However, in ridge regression, the learnt mapping is wrong and likely to result in output that is rubbish. – Josh Feb 05 '11 at 17:56
  • @probabilityislogic Adding to my previous comment, the learnt mapping isn't necessarily wrong (of course you have to be careful of overfitting, and other considerations... perhaps you can't even use a linear regressor given the problem statement), but applying a learnt mapping for doing push-ups to a walking sequence is going to give very wrong output poses in every case. Whereas, a jogging sequence mapping, whilst still not perfect, would yield much better results, even though it is still not the best mapping to use. – Josh Feb 05 '11 at 18:02
  • @Josh, could you put up a couple of pictures / stick diagrams of various poses ? They'd be a nice example of the relation between reality and strings of numbers. – denis Jul 04 '11 at 08:57

5 Answers5

7

you can calculate the covariance matrix for each set and then calculate the Hausdorff distance between the two set using the Mahalanobis distance.

The Mahalanobis distance is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant.

skyde
  • 435
  • 6
  • 10
3

What are you trying to do with the nearest neighbor information?

I would answer that question, and then compare the different distance measures in light of that.

For example, say you are trying to classify poses based on the joint configuration, and would like joint vectors from the same pose to be close together. A straightforward way to evaluate the suitability of different distance metrics is to use each of them in a KNN classifier, and compare the out-of-sample accuracies of each of the resulting models.

benhamner
  • 2,723
  • 1
  • 17
  • 15
2

This sounds like it is similar to a certain application of Information Retrieval (IR). A few years ago I attended a talk about gait recognition that sounds similar to what you are doing. In Information Retrieval, "documents" (in your case: a person's angle data) are compared to some query (which in your case could be "is there a person with angle data (.., ..)"). Then the documents are listed in the order of the one that matches the closest down to the one that matches the least. That, in turn, means that one central component of IR is putting a document in some kind of vector space (in your case: angle space) and comparing it to one specific query or example document or measuring their distance. (See below.) If you have a sound definition of the distance between two individual vectors, all you have to do is coming up with a measure for the distance of two data sets. (Traditionally in IR the distance in vector space model is calculated either by the cosine measure or Euclidean distance but I don't remember how they did it in that case.) In IR there is also a mechanism called "relevance feedback" that, conceptually, works with the distance of two sets of documents. That mechanism normally uses a measure of distance that sums up all individual distances between all pairs of documents (or in your case: person vectors). Maybe that is of use to you.

The following page has some papers that seem relevant to your issue: http://www.mpi-inf.mpg.de/~mmueller/index_publications.html Especially this one http://www.mpi-inf.mpg.de/~mmueller/publications/2006_DemuthRoederMuellerEberhardt_MocapRetrievalSystem_ECIR.pdf seems interesting. The talk of Müller that I attended mentions similarity measures from Kovar and Gleicher called "point cloud" (see http://portal.acm.org/citation.cfm?id=1186562.1015760&coll=DL&dl=ACM) and one called "quaternions". Hope, it helps.

xmjx
  • 765
  • 5
  • 11
2

This problem is called Distance Metric Learning. Every distance metric can be represented as $\sqrt{(x-y)^tA(x-y)}$ where $A$ is positive semi-definite. Methods under this sub-area, learn the optimal $A$ for your data. In fact, if the optimal $A$ happens to be an identity matrix, it is okay to use euclidean distances. If it is the inverse covariance, it would be optimal to use the Mahalanobis distance, and so on and so forth. Hence, a distance metric learning method must be used to learn the optimal $A$, to learn the right distance metric.

hearse
  • 2,355
  • 1
  • 17
  • 30
0

One problem with using the angles as a proxy for shape is that small perturbations in the angles can lead to large perturbations in the shape. Further, different angle configurations could result in the same (or similar) shape.