Material Links
Link to: Media Analytics
MEDIA ANALYTICS
Link to: MA Projects
PROJECTS
Link to: MA People
PEOPLE
Link to: MA Publications
PUBLICATIONS
Link to: MA Patents
PATENTS
Deep Supervision With Intermediate Concepts
PAMI 2019 | We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images. We obtain state-of-the-art performances on 2D and 3D keypoint localization, instance segmentation and image classification, outperforming alternative forms of supervision such as multi-task training.
Collaborators: Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Gregory D. Hager, Manmohan Chandraker
Project Site
Deep Supervision with Intermediate Concepts
1Johns Hopkins University 2Microsoft 3NEC Labs America 4University of California, San Diego
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018
(Top) A concept hierarchy with three concepts {y1, y2, y3} on a 2D input space. Dash arrows indicate the finer decomposition within the previous concept in the hierarchy. Each color represents one individual class defined by the concept. (Bottom) Deep supervision with three concepts {y1, y2, y3}.
Abstract
Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box mapping, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human and machine vision suggest that there are often intermediate representations that are intrinsic to an inference task, and which provide essential structure to improve generalization. In this work, we explore an approach for injecting prior domain structure into neural network training by supervising hidden layers of a CNN with intermediate concepts that normally are not observed in practice. We formulate a probabilistic framework which formalizes these notions and predicts improved generalization via this deep supervision method. One advantage of this approach is that we are able to train only from synthetic CAD renderings of cluttered scenes, where concept values can be extracted, but apply the results to real images. Our implementation achieves the state-of-the-art performance of 2D/3D keypoint localization and image classification on real image benchmarks including KITTI, PASCAL VOC, PASCAL3D+, IKEA, and CIFAR100. We provide additional evidence that our approach outperforms alternative forms of supervision, such as multi-task networks.
Papers
Image Classification Results on CIFAR100
Classification error of different methods on CIFAR100. The first four are previous methods and pre-act ResNet-1001 is the current state-of-the-art. The remaining four are results of our method (DISCO) and its variants.
Keypoint Localization Results on KITTI-3D
PCK [alpha=0.1] accuracies (%) of different methods for 2D and 3D keypoint localization on KITTI-3D dataset. Last column represents angular error in degrees. WN-gt-yaw uses groundtruth pose of the test car. The bold numbers indicates the best result on groundtruth object bounding boxes. The last row presents the accuracies of our method (DISCO) on detection results from RCNN.
Keypoint Localization Results on PASCAL VOC
PCK [alpha=0.1] accuracies (%) of different methods for 2D keypoint localization on the car category of PASCAL VOC. Bold numbers indicate the best results.
Object Segmentation Results on PASCAL3D+
Object segmentation accuracies (%) of different methods on PASCAL3D+. Best results are shown in bold.
Qualitative Results on KITTI-3D and PASCAL VOC
Visualization of 2D/3D prediction, visibility inference and instance segmentation on KITTI-3D (left) and PASCAL VOC (right). Last row shows failure cases. Circles and lines represent keypoints and their connections. Red and green indicate the left and right sides of a car, orange lines connect two sides. Dashed lines connect keypoints if one of them is inferred to be occluded. Light blue masks present segmentation results.
Keypoint Localization Results on IKEA
3D PCK curves of our method (DISCO) and 3D-INN on sofa (a), chair (b) and bed (c) classes of IKEA dataset. In each figure, X axis stands for alpha of PCK and Y axis represents the accuracy.
Qualitative Results on IKEA
Qualitative comparison between 3D-INN and our method (DISCO) for 3D structure prediction on IKEA dataset.
Acknowledgements
Part of this work was done during Chi Li’s internship at NEC Labs America. We acknowledge the support by NSF under grants IIS-127228 and IIS-1637949. We also thank Rene Vidal, Alan L. Yuille, Austin Reiter and Chong You for helpful discussions. This website template is inspired by this website.