We conduct research in computer vision and machine learning, with a focus on sustaining excellence in three main directions: (1) scene understanding; (2) recognition and representation; and (3) adaptation, fairness and privacy. Key applications of our research include visual surveillance and autonomous driving. We tackle fundamental problems in computer vision, such as object detection, semantic segmentation, face recognition, 3D reconstruction and behavior prediction. We develop and leverage breakthroughs in deep learning, particularly with a flavor of weak supervision, metric learning and domain adaptation.

ECCV 2020 Object Detection with a Unified Label Space from Multiple Datasets
Xiangyun Zhao, Samuel Schulter, Gaurav Sharma, Yi-Hsuan Tsai, Manmohan Chandraker, Ying Wu

Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged form arbitrary existing datasets. However, naïve merging of datasets is not possible in this case, due to inconsistent object annotations. To address this challenge, we design a framework which works with such partial annotations, and we exploit a pseudo labeling approach that we adapt for our specific case.

PDF | Supplementary | Project Site | Dataset
WACV 2020 |  Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones
Jinwoo Choi, Gaurav Sharma, Manmohan Chandraker, and Jia-Bin Huang

We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-DRONE, containing 5,250 videos to evaluate the task. We tackle both problem settings with 1) same and 2) different action label sets for the source (e.g., Kinetics dataset) and target domains (drone videos).

PDF | Project Site | Dataset
CVPR 2019 | A Parametric Top-View Representation of Complex Road Scenes
Ziyan Wang , Buyu Liu, Samuel Schulter, Manmohan Chandraker

We address the problem of inferring the layout of complex road scenes given a single camera as input. To achieve that, we first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making. Moreover, the design of our top-view scene model allows for efficient sampling and thus generation of large-scale simulated data, which we leverage to train a deep neural network to infer our scene model's parameters. Finally, we design a Conditional Random Field (CRF) that enforces coherent predictions for a single frame and encourages temporal smoothness among video frames.

PDF | Project Site | Dataset
ICCV 2019 |  Domain Adaptation for Structured Output via Discriminative Patch Representations
Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches. we show that our framework is complementary to existing domain adaptation techniques.

PDF | Supplementary | Project Site | Dataset
CVPR 2017 | Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing
Chi Li, Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker

We propose a deep CNN architecture to localize object semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. We exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer a causal sequence of intermediate concepts. We render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. The utility of our deep supervision is demonstrated by state-of-the-art performances on real image benchmarks for 2D and 3D keypoint localization and instance segmentation. 

PDF | Dataset
NeurIPS 2016 | Universal Correspondence Network
Christopher B. Choy, JunYoung Gwak, Silvio Savarese, Manmohan Chandraker

We present deep metric learning to obtain a feature space that preserves geometric or semantic similarity. Our visual correspondences span across rigid motions to intra-class shape or appearance variations. Our fully convolutional architecture, along with a novel correspondence contrastive loss allows faster training by effective reuse of computations, accurate gradient computation and linear time testing instead of quadratic time for typical patch similarity methods. We propose a convolutional spatial transformer to mimic patch normalization in traditional features like SIFT.  

PDF | Supplementary | Project Site | Code