Questions tagged [triplet-loss]

Triplet losses are defined in terms of the contrast between three inputs. Each of the inputs has an associated class label, and the goal is to map all inputs of the same class to the same point, while all inputs from other classes are mapped to different points some distance away. It's called a triplet because the loss is computed using an anchor, a sample belonging to the same class as the anchor, and a sample belonging to a different class.

The goal of triplet loss is to find an embedding such that $$ \left\lVert f(x^a_i) - f(x^p_i) \right\rVert_2^2+\alpha < \left\lVert f(x_i^a)-f(x_i^n)\right\rVert_2^2 \forall \left(f(x_i^a),f(x_i^p),f(x_i^n)\right)\in\mathcal{T} \tag{*} $$ where $\mathcal{T}$ is the set of all possible triplets. A triplet is composed of an anchor point, a positive point (same class as the anchor), and a negative point (distinct class from the anchor).

Clearly, iterating over all possible triplets becomes enormously expensive when the data set is even moderately sized. Therefore, it's common to carefully choose which triplets to use when computing the loss. This means that instead of $\mathcal{T}$, training proceeds on some well-chosen $\mathcal{S} \subset \mathcal{T}$

The loss is zero when the inequality $(*)$ holds, and becomes larger the more that this inequality is violated, giving us the loss function

$$L = \sum_{i\in \mathcal{S}} \max \left \{ 0, \lVert f(x^a_i) - f(x^p_i) \rVert_2^2 - \lVert f(x^a_i) - f(x^n_i) \rVert_2^2 +\alpha \right\} $$

16 questions
6
votes
1 answer

How does FaceNet (Google's facerecognition) handles a new image?

I am currently researching in the facerecognition field. And I can not understand how the facenet algorithm handels a new image They use an euclidean space for image representation. Which means that the elements in this space represent the images…
5
votes
1 answer

In training a triplet network, I first have a solid drop in loss, but eventually the loss slowly but consistently increases. What could cause this?

I haven't even finished 1 epoch, so I don't think it could any sort of overfitting. I am training on a very large amount of data (27 gb of text) so it'll still be a while before I even reach one epoch. The loss now has been increasing for twice as…
5
votes
0 answers

Triplet Deep Learning Embedding Loss Functions

Triplet embeddings consist of mapping a group of images to an embedding space, such that images deemed more similar to each other end up closer together. The "triplet" comes from training, where we have (A,P,N), where A is our anchor image, P is a…
Alex R.
  • 13,097
  • 2
  • 25
  • 49
3
votes
1 answer

Facenet: Using Ensembles of Face Embedding Sets

The Facenet is a deep learning model for facial recognition. It is trained for extracting features, that is to represent the image by a fixed length vector called embedding. After training, for each given image, we take the output of the second last…
2
votes
1 answer

Learning useful semantic representations of data

Training a neural network on its final task (e.g. classification) right from the beginning is not always the best way to go. I'd like to make a short list of recognized methods of motivating a NN to learn useful representations of data. This is in…
2
votes
1 answer

Maximizing AUC based on point cloud distance

Let $V$ be an $n$ dimensional space with sets of positive class vectors $P$ and negative class vectors $N$. The task is to find a vector $x$ such that AUC is maximized, based on ranking generated by computing distances between $x$ and $P,V$. So in a…
Alex R.
  • 13,097
  • 2
  • 25
  • 49
1
vote
0 answers

Resnet model does not train with triplet loss, while VGG16 is able to train, why?

I am trying to do a transfer learning with ResNet50V2 model using triplet loss function. I have kept Include_top = False, input shape = (160,160,3) with Imagenet weights. The last 3 layers of my model is shown in the below image with 6 million…
1
vote
0 answers

Why are we interested in gradient with respect to input?

I am learning about sampling methods for Deep Embedding Learning. I was reading an article named: "Sampling Matters in Deep Embedding Learning" (https://arxiv.org/abs/1706.07567). In the following paragraph the authors explain why sampling (too)…
1
vote
2 answers

Why the Triplet Loss function distincts anchor and positive?

I read the paper and I understand that anchoring one image and select corresponding semi-hard positives and negatives is an efficient way of generating samples. However, I don't understand why the distinction between the anchor and the positive…
Uduse
  • 113
  • 4
1
vote
2 answers

Does triplet loss help for document similarity search?

I build CNN network over documents with triplet loss. And compare documents with cosine similarity. It does really find similar docs and catches interesting dependencies. But simple tfidf model does it better in terms of my test set. I use…
1
vote
1 answer

Overcome underfitting on train data using CNN architecture

I use 2 layer CNN network for NLP task with triplet loss with margin 0.2. The task is to learn document embeddings to find similar docs. My architecture is similar to this https://arxiv.org/abs/1406.3830 I use truncated_normal init function from…
0
votes
0 answers

Can siamese model trained with euclidean distance as distance metric use cosine similarity during inference?

If I have 3 embeddings Anchor, Positive, Negative from a Siamese model trained with Euclidean distance as distance metric for triplet loss. During inference can cosine similarity similarity be used? I have noticed if I calculate Euclidean distance…
0
votes
0 answers

Triplet loss for text embedding and text similarity?

I am working on a triplet loss based model for text embedding. Short description: I have a database about online shop, I need to find the suitble product when users enter a text on search bar. I want a model work better than matching string and…
0
votes
0 answers

speaker recognition: training on enrollment data

I'm working on a speaker recognition challenge. I have already trained my model on the voxceleb2 dataset in triplet setup. Now, for the challenge, I have two sets. enrollment (1 audio/subject) [IDs given] test (random number of audios without…
0
votes
0 answers

Is there any algorithms/models to generate embedings of sequential data (other than RNN)?

I know that RNN can be used for such task. For instance facenet used rnn with triplet loss. But maybe there are some less sophisticated alternatives to try first?
1
2