1

I read the paper and I understand that anchoring one image and select corresponding semi-hard positives and negatives is an efficient way of generating samples.

However, I don't understand why the distinction between the anchor and the positive still exists in the loss function. In other words, given a triplet that's already chosen, both the anchor and the positive corresponds to the same person. Why not just add the loss of the distance between the positive and the negative as well? Is there a reason behind this or it's just an alternative?

Formally, the triplet loss is defined as:

$$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha \large ] \small_+ $$

and why not use: $$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 - \mid \mid f(P^{(i)}) - f(N^{(i)}) \mid \mid_2^2+ \alpha \large] \small_+ $$

Conceptually, the original loss function "pushes" the anchor towards the positive and away from the negative. Isn't pushing both the positive and the anchor away from the negative a good thing?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Uduse
  • 113
  • 4

2 Answers2

0

The key idea of triplet loss is in assumption, that distance between A and P should be less than distance between A and N.

Formally:

\begin{align} d(A,P) &\leq d(A,N) \\ d(A,P) - d(A,N) &\leq 0 \end{align}

That's why $d(A,N)$ part exists in the loss function.

Unfortunately, $d$ tends to be 0, but we can avoid it by using some margin: \begin{align} d(A,P) - d(A,N) &\leq -a \\ d(A,P) - d(A,N) + a &\leq 0 \end{align} Now we can change symbol $\leq$ to $=$ and solve obtained task: $$ d(A,P) - d(A,N) + a = 0 $$ P.S. In your case, distance $d$ is equal to $\ell_2$-norm

sebp
  • 1,787
  • 13
  • 24
0

From your function we can obtain this: $$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - (\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \mid \mid f(P^{(i)}) - f(N^{(i)}) \mid \mid_2^2)+ \alpha \large] \small_+ $$

which means the distance between the anchor(A) and the positive(P) should be less than the addition of the distance between the anchor and the negative(N) and the distance between P and the N, and it may make the A and the N very similar. For example, N is 2 unit distance on the left side of A which is 3 unit distance on the left side of P.

N A   P
Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52