Projects | Deep Supervision With Intermediate Concepts

MEDIA ANALYTICS

PROJECTS

PEOPLE

PUBLICATIONS

PATENTS

Deep Supervision With Intermediate Concepts

We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images. We obtain state-of-the-art performances on 2D and 3D keypoint localization, instance segmentation and image classification, outperforming alternative forms of supervision such as multi-task training.

Collaborators: Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Gregory D. Hager, Manmohan Chandraker

Deep Supervision with Intermediate Concepts Paper

Chi Li¹ M. Zeeshan Zia² Quoc-Huy Tran³ Xiang Yu³ Gregory D. Hager¹ Manmohan Chandraker^3,4

¹Johns Hopkins University ²Microsoft ³NEC Labs America ⁴University of California, San Diego

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018

(Top) A concept hierarchy with three concepts {y₁, y₂, y₃} on a 2D input space. Dash arrows indicate the finer decomposition within the previous concept in the hierarchy. Each color represents one individual class defined by the concept. (Bottom) Deep supervision with three concepts {y₁, y₂, y₃}.

[PDF] [Bibtex]

Abstract

Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box mapping, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human and machine vision suggest that there are often intermediate representations that are intrinsic to an inference task, and which provide essential structure to improve generalization. In this work, we explore an approach for injecting prior domain structure into neural network training by supervising hidden layers of a CNN with intermediate concepts that normally are not observed in practice. We formulate a probabilistic framework which formalizes these notions and predicts improved generalization via this deep supervision method. One advantage of this approach is that we are able to train only from synthetic CAD renderings of cluttered scenes, where concept values can be extracted, but apply the results to real images. Our implementation achieves the state-of-the-art performance of 2D/3D keypoint localization and image classification on real image benchmarks including KITTI, PASCAL VOC, PASCAL3D+, IKEA, and CIFAR100. We provide additional evidence that our approach outperforms alternative forms of supervision, such as multi-task networks.

Deep Supervision with Intermediate Concepts Paper

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Paper

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
[PDF] [Supp] [Bibtex]

Abstract

Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box mapping, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human and machine vision suggest that there are often intermediate representations that are intrinsic to an inference task, and which provide essential structure to improve generalization. In this work, we explore an approach for injecting prior domain structure into neural network training by supervising hidden layers of a CNN with intermediate concepts that normally are not observed in practice. We formulate a probabilistic framework which formalizes these notions and predicts improved generalization via this deep supervision method. One advantage of this approach is that we are able to train only from synthetic CAD renderings of cluttered scenes, where concept values can be extracted, but apply the results to real images. Our implementation achieves the state-of-the-art performance of 2D/3D keypoint localization and image classification on real image benchmarks including KITTI, PASCALVOC, PASCAL3D+, IKEA, and CIFAR100.We provide additional evidence that our approach outperforms alternative forms of supervision, such as multi-task networks.

Image Classification Results on CIFAR100

Classification error of different methods on CIFAR100. The first four are previous methods and pre-act ResNet-1001 is the current state-of-the-art. The remaining four are results of our method (DISCO) and its variants.

Keypoint Localization Results on KITTI-3D

PCK [alpha=0.1] accuracies (%) of different methods for 2D and 3D keypoint localization on KITTI-3D dataset. Last column represents angular error in degrees. WN-gt-yaw uses groundtruth pose of the test car. The bold numbers indicates the best result on groundtruth object bounding boxes. The last row presents the accuracies of our method (DISCO) on detection results from RCNN.

Keypoint Localization Results on PASCAL VOC

PCK [alpha=0.1] accuracies (%) of different methods for 2D keypoint localization on the car category of PASCAL VOC. Bold numbers indicate the best results.

Object Segmentation Results on PASCAL3D+

Object segmentation accuracies (%) of different methods on PASCAL3D+. Best results are shown in bold.

Qualitative Results on KITTI-3D and PASCAL VOC

Visualization of 2D/3D prediction, visibility inference and instance segmentation on KITTI-3D (left) and PASCAL VOC (right). Last row shows failure cases. Circles and lines represent keypoints and their connections. Red and green indicate the left and right sides of a car, orange lines connect two sides. Dashed lines connect keypoints if one of them is inferred to be occluded. Light blue masks present segmentation results.

Keypoint Localization Results on IKEA

3D PCK curves of our method (DISCO) and 3D-INN on sofa (a), chair (b) and bed (c) classes of IKEA dataset. In each figure, X axis stands for alpha of PCK and Y axis represents the accuracy.

Qualitative Results on IKEA

Qualitative comparison between 3D-INN and our method (DISCO) for 3D structure prediction on IKEA dataset.

Acknowledgements

Part of this work was done during Chi Li’s internship at NEC Labs America. We acknowledge the support by NSF under grants IIS-127228 and IIS-1637949. We also thank Rene Vidal, Alan L. Yuille, Austin Reiter and Chong You for helpful discussions. This website template is inspired by this website.

Deep Supervision Publications

Nothing Found

Sorry, no posts matched your criteria

Projects | Deep Supervision With Intermediate Concepts

Deep Supervision With Intermediate Concepts

Deep Supervision with Intermediate Concepts Paper

Abstract

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Paper

Abstract

Image Classification Results on CIFAR100

Keypoint Localization Results on KITTI-3D

Keypoint Localization Results on PASCAL VOC

Object Segmentation Results on PASCAL3D+

Qualitative Results on KITTI-3D and PASCAL VOC

Keypoint Localization Results on IKEA

Qualitative Results on IKEA

Acknowledgements

Deep Supervision Publications

Nothing Found

Contact Us

About Us

Our Pages

Read Our Blog Posts