My question is that how in YOLO, the networks does the midpoint of grid cell think ? I'm not completely sure I understand it. How can we know the midpoint of any object before actually knowing where the object is ?
Is it done via learning from the training data ? I can't completely seem to understand the training / learning process of the YOLO network. Any help would be appreciated.
here, YOLO is the paper You Only Look Once: Unified, Real-Time Object Detection by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi.