How are the confidence scores in YOLO computed?

Question

I have read Yolo Loss function explanation but none of the answers discuss how box confidence scores are computed. The YOLO paper uses the following loss function:

I'm confused about how confidence scores $\hat C_i$ and $C_i$ are found. Here $i$ is the $i$th cell. For $C_i$, the paper seems to imply that $C_i = IOU_{pred}^{truth}$, which is confusing because the predicted box changes every epoch, implying that the target is not a constant?

Is $\hat C_i$ directly computed by the network? In the paper, they define predicted confidence score as $P(Object)*IOU_{pred}^{truth}$ which should ideally equal $C_i = IOU_{pred}^{truth}$ if an object is in the cell, but I'm not sure if this is just an interpretation of the scores (for Equation 1 in the paper) or if we're supposed to estimate $P(Object)$ somehow first to find $\hat C_i$.

The linked thread has an answer that says "So what is the real value from the label for the confidence score for each bbox $\hat{C}_{ij}$ ? It is the intersection over union of the predicted bounding box with the one from the label." So $\hat{C}_{i}$ depends on the bounding box prediction obtained from the network. (I don't know why the linked thread mentions $\hat{C}_{ij}$ instead of $\hat{C}_i$.) What is unclear? — Sycorax, Jan 03 '22 at 23:47
@Sycorax If $C_i=IOU^{truth}_{pred}$, doesn't this imply that $C_i$ is not fixed (since the predicted box changes during training) and we can't actually calculate $C_i$ prior to training (since there's no prediction)? The answer also doesn't state if $\hat C_i$ directly computed or estimated using $P(Object)*IOU_{pred}^{truth}$. — Yandle, Jan 04 '22 at 01:50
As stated in the paper: "*If no object exists in that cell, the confidence scores should be zero. Otherwise we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth*". So you are right that $C_i$ is not known before training. What is known before training is if there is an object in each cell or not. In other words, we know a priori if Pr(Object)=1 or Pr(Object)=0 for each of the cells $\Rightarrow$ If an object lies on a cell, then $C_i=$IOU, otherwise $C_i=0$. — Javier TG, Jan 22 '22 at 17:38
@JavierTG My primary confusion is that if $C_i = IOU$, then this implies that ground truth $C_i$ is not fixed, and the model could be chasing a moving target? — Yandle, Jan 23 '22 at 00:17
Please can you elaborate on what you are refering with chasing a moving target? — Javier TG, Jan 23 '22 at 00:37
@JavierTG If $C_i = IOU^{truth}_{pred}$, and the predicted coordinates of the predicted box changes during training, I would assume that $IOU^{truth}_{pred}$ and hence $C_i$ (the ground truth) is also changing? — Yandle, Jan 23 '22 at 23:09
Yes, that's it, the ground-truth value $C_i$ is computed during training (the IOU with the ground-truth box). This makes sense because, as the author mentions in the paper, if the network predicts that there is a bounding box in an image region where there is no object, then we want the predicted confidence to be low (0). Thereby we achieve this effect if we force the network to predict the IOU of each box. — Javier TG, Jan 24 '22 at 00:47

How are the confidence scores in YOLO computed?

0 Answers0