We conduct research in computer vision and machine learning, with a focus on sustaining excellence in three main directions: (1) scene understanding; (2) visual recognition and representation learning; and (3) adaptation, fairness and privacy. Key applications of our research include visual surveillance and autonomous driving. We tackle fundamental problems in computer vision, such as object detection, semantic segmentation, face recognition, 3D reconstruction and behavior prediction. We develop and leverage breakthroughs in deep learning, particularly with a flavor of weak supervision, metric learning and domain adaptation.

Adaptation, Fairness and Privacy

We develop AI techniques and policies that benefit society by disrupting traditional utility-cost trade-offs. Our solutions achieve better accuracy and fairness of service across geographic, social or economic boundaries, while ensuring lower costs and higher guarantees of privacy.

ECCV 2020 Shuffle and Attend: Video Domain Adaptation
Jinwoo Choi, Gaurav Sharma, Samuel Schulter, Jia-Bin Huang

We address the problem of domain adaptation in videos for the task of human action recognition. Inspired by image-based domain adaptation, we propose to (a) learn to align important (discriminative) clips to achieve improved representation for the target domain and (b) employ a self-supervised task which encourages the model to focus on actions rather than scene context information in order to learn representations which are more robust to domain shifts.

ECCV 2020 Domain Adaptive Semantic Segmentation Using Weak Labels
Sujoy Paul, Yi-Hsuan Tsai, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker

We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human annotator in a new weakly-supervised domain adaptation (WDA) paradigm for semantic segmentation. Using weak labels is both practical and useful, since (i) collecting image-level target annotations is comparably cheap in WDA and incurs no cost in UDA, and (ii) it opens the opportunity for category-wise domain alignment.

PDF | Supplementary
ECCV 2020 Learning to Optimize Domain Specific Normalization for Domain Generalization
Seonguk Seo, Yumin Suh, Dongwan Kim, Geeho Kim, Jongwoo Han, Bohyung Han

We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain. For each domain, the activations are normalized by a weighted average of multiple normalization statistics. The normalization statistics are kept track of separately for each normalization type if necessary.

CVPR 2020 | Private-kNN: Practical Differential Privacy for Computer Vision
Yuqing Zhu, Xiang Yu, Manmohan Chandraker, Yu-Xiang Wang

The Private Aggregation of Teacher Ensembles (PATE) approach requires the training sets for the teachers to be disjoint. As such, achieving desirable privacy bounds requires an often impractical amount of labeled data. We propose a data-efficient scheme, which altogether avoids splitting the training dataset. Our approach allows the use of privacy-amplification by subsampling and iterative refinement of the kNN feature embedding. Comparing to PATE, we achieve comparable or better utility while reducing more than 90% privacy cost, thereby providing the “most practical method to-date” in computer vision. 

AAAI 2020 | Adversarial Learning of Privacy-Preserving and Task-Oriented Representations
Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang

Our aim is to learn privacy-preserving and task-oriented representations that defend against model inversion attacks. To achieve this aim, we propose an adversarial reconstruction-based framework for learning latent representations that cannot be decoded to recover the original input images. By simulating the expected behavior of adversary, our framework is realized by minimizing the negative pixel reconstruction loss or the negative feature reconstruction (i.e., perceptual distance) loss.  

WACV 2020 | Active Adversarial Domain Adaptation
Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker

We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts. Specifically, our importance weight promotes samples with large uncertainty in classification and diversity from labeled examples, thus serves as a sample selection scheme for active learning. 

WACV 2020 |  Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones
Jinwoo Choi, Gaurav Sharma, Manmohan Chandraker, and Jia-Bin Huang

We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-DRONE, containing 5,250 videos to evaluate the task.

PDF | Project Site | Dataset
CVPR 2019 |  Gotta Adapt ’Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild
Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker

We provide a solution that allows knowledge transfer from fully annotated source images to unlabeled target ones, often captured in a different condition. We adapt at multiple semantic levels from feature to pixel, with complementary insights for each type. Utilizing the proposal, we achieve better recognition accuracy of car images in unlabeled surveillance domains by adapting the knowledge from car images on the web.

ICLR 2019 |  Unsupervised Domain Adaptation for Distance Metric Learning
Kihyuk Sohn, Wenling Shang, Xiang Yu, Manmohan Chandraker

We propose Feature Transfer Network, a novel deep neural network for image-based face verification and identification that can adapt to biases like ethnicity, gender or age in a target set. Unlike existing methods, our network can even handle novel identities existing in the target domain. Our framework excels at both within-domain and cross-domain utility tasks, thus retaining discriminatory power in the adaptation. 

ICCV 2019 |  Domain Adaptation for Structured Output via Discriminative Patch Representations
Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches. we show that our framework is complementary to existing domain adaptation techniques.

PDF | Supplementary | Project Site | Dataset
CVPR 2018 | Learning to Adapt Structured Output Space for Semantic Segmentation
Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker

We develop a semantic segmentation method for adapting source ground truth labels to the unseen target domain. To achieve it, we consider semantic segmentation as structured prediction with spatial similarities between the source and target domains, and then adopt multi-level adversarial learning in the output space. We show that our method can perform adaptation under various settings, including synthetic-to-real and cross-city scenarios. 

PDF | Supplementary
ICCV 2017 | Reconstruction-Based Disentanglement for Pose-invariant Face Recognition
Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris N. Metaxas, Manmohan Chandraker

Generic data-driven deep face features might confound images of the same identity under large poses with other identities. We propose a feature reconstruction metric learning to disentangle identity and pose information in the latent feature space. The disentangled feature space encourages identity features of the same subject to be clustered together despite the pose variation. Experiments on both controlled and in-the-wild face datasets show that our method consistently outperforms the state-of-the-art, especially on images with large head pose variations.