Publication Date: 4/14/2021
Authors: Buyu Liu, NEC Laboratories America, Inc., Bingbing Zhuang, NEC Laboratories America, Inc., Manmohan Chandraker, NEC Laboratories America, Inc., UC San Diego
Abstract: We propose an end to end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion reasoned layouts in perspective space as well as a parametric bird’s eye view (BEV) space. In contrast to prior works that require dense supervision such as semantic labels in perspective view, our method only requires human annotations for parametric attributes that are cheaper and less ambiguous to obtain. To solve this challenging task, our design is comprised of modules that incorporate inductive biases to learn occlusion reasoning, geometric transformation and semantic abstraction, where each module may be supervised by appropriately transforming the parametric annotations. We demonstrate how our design choices and proposed deep supervision help achieve meaningful representations and accurate predictions. We validate our approach on two public datasets, KITTI and NuScenes, to achieve state of the art results with considerably less human supervision.
Publication Link: https://arxiv.org/pdf/2104.06730.pdf