In the case of YOLO, how does the network assign a box in it's grid based on the midpoint of the object?

Question

My question is that how in YOLO, the networks does the midpoint of grid cell think ? I'm not completely sure I understand it. How can we know the midpoint of any object before actually knowing where the object is ?

Is it done via learning from the training data ? I can't completely seem to understand the training / learning process of the YOLO network. Any help would be appreciated.

here, YOLO is the paper You Only Look Once: Unified, Real-Time Object Detection by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi.

score 1 · Answer 1 · answered Aug 03 '20 at 22:04

During the training, the ground truth bounding box is known, so one can easily compute its geometric centre and assign the object to the grid cell that contains this point. The loss function (eq. 3 in the original paper, also discussed in this question), for example, only considers a bounding box from thus identified grid cell but ignores the bounding boxes predicted by neighbouring grid cells, even if the object spills into them.

In the case of YOLO, how does the network assign a box in it's grid based on the midpoint of the object?

1 Answers1