Improving Language-Based Object Detection by Explicit Generation of Negative Examples
The recent progress in language-based object detection with an open-vocabulary can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training from image captions with grounded bounding boxes (ground truth or pseudo-labeled) enable the models