We conduct research in computer vision and machine learning, with a focus on sustaining excellence in three main directions: (1) scene understanding; (2) recognition and representation; and (3) adaptation, fairness and privacy. Key applications of our research include visual surveillance and autonomous driving. We tackle fundamental problems in computer vision, such as object detection, semantic segmentation, face recognition, 3D reconstruction and behavior prediction. We develop and leverage breakthroughs in deep learning, particularly with a flavor of weak supervision, metric learning and domain adaptation.

CVPR 2021 Fusing the Old with the New: Learning Relative Camera Pose with Geometry-Guided Uncertainty
Bingbing Zhuang, Manmohan Chandraker

How to fuse solutions from classical geometric methods and deep learning methods in a principled way? We present a framework that learns relative camera pose estimation along with its probabilistic fusion with the estimation from geometric methods. The fusion is achieved by learning the network uncertainty under explicit guidance by the geometric uncertainty, thereby learning to take into account the geometric solution in relation to the network prediction. The learning is driven by a self-attention graph neural network.

CVPR 2021 Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
Sriram Narayanan, Ramin Moslemi, Francesco Pittaluga, Buyu Liu

Our work addresses two key challenges in trajectory prediction, learning multimodal outputs, and better predictions by imposing constraints using driving knowledge. Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA). But they highly depend on their initialization to provide diverse outputs. Our first contribution proposes a novel Divide-And-Conquer (DAC) approach that acts as a better initialization technique to WTA objective, resulting in diverse outputs without any spurious modes. Further we introduce a novel trajectory prediction framework called ALAN that uses existing lane centerlines as anchors to provide trajectories constrained to the input lanes.

BMCV 2020 Adaptation Across Extreme Variations using Unlabeled Bridges
Shuyang Dai, Kihyuk Sohn, Yi-Hsuan Tsai, Lawrence Carin, Manmohan Chandraker

We tackle an unsupervised domain adaptation problem when the domain discrepancy between labeled source and unlabeled target domains is large, due to many factors of inter- and intra-domain variation. We propose to decompose domain discrepancy into multiple but smaller discrepancies by introducing unlabeled bridging domains that connect the source and target domains, and thus it is easier to minimize each of them. We realize our proposed approach through an extension of the domain adversarial neural network with multiple discriminators, and each of which accounts for reducing discrepancies between unlabeled (bridge, target) domains and a mix of all precedent domains including source.

PDF | Supplementary
ECCV 2020 Shuffle and Attend: Video Domain Adaptation
Jinwoo Choi, Gaurav Sharma, Samuel Schulter, Jia-Bin Huang

We address the problem of domain adaptation in videos for the task of human action recognition. Inspired by image-based domain adaptation, we propose to (a) learn to align important (discriminative) clips to achieve improved representation for the target domain and (b) employ a self-supervised task which encourages the model to focus on actions rather than scene context information in order to learn representations which are more robust to domain shifts.

ECCV 2020 Object Detection with a Unified Label Space from Multiple Datasets
Xiangyun Zhao, Samuel Schulter, Gaurav Sharma, Yi-Hsuan Tsai, Manmohan Chandraker, Ying Wu

Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged form arbitrary existing datasets. However, naïve merging of datasets is not possible in this case, due to inconsistent object annotations. To address this challenge, we design a framework which works with such partial annotations, and we exploit a pseudo labeling approach that we adapt for our specific case.

PDF | Supplementary | Project Site | Dataset
ECCV 2020 Domain Adaptive Semantic Segmentation Using Weak Labels
Sujoy Paul, Yi-Hsuan Tsai, Samuel Schulter, Amit K. Roy-Chowdhury, Manmohan Chandraker

We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human annotator in a new weakly-supervised domain adaptation (WDA) paradigm for semantic segmentation. Using weak labels is both practical and useful, since (i) collecting image-level target annotations is comparably cheap in WDA and incurs no cost in UDA, and (ii) it opens the opportunity for category-wise domain alignment.

PDF | Supplementary | Project SiteVideo
ECCV 2020 Learning to Optimize Domain Specific Normalization for Domain Generalization
Seonguk Seo, Yumin Suh, Dongwan Kim, Geeho Kim, Jongwoo Han, Bohyung Han

We propose a simple but effective multi-source domain generalization technique based on deep neural networks by incorporating optimized normalization layers that are specific to individual domains. Our approach employs multiple normalization methods while learning separate affine parameters per domain. For each domain, the activations are normalized by a weighted average of multiple normalization statistics. The normalization statistics are kept track of separately for each normalization type if necessary.

ECCV 2020 Improving Face Recognition by Clustering Unlabeled Faces in the Wild
Aruni RoyChowdhury, Xiang Yu, Kihyuk Sohn, Erik Learned-Miller, Manmohan Chandraker

We propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty.

ECCV 2020 Image Stitching and Rectification for Hand-Held Cameras
Bingbing Zhuang, Quoc-Huy Tran

We derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input — two consecutive frames from a video stream, wherein the interframe motion is restricted from being arbitrarily large. 

PDFSupplementary | Video
ECCV 2020 Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
Yuliang Zou, Pan Ji, Quoc-Huy Tran, Jia-Bin Huang, Manmohan Chandraker

Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module. 

PDF | SupplementaryVideo
ECCV 2020 Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction
Lokender Tiwari, Pan Ji, Quoc-Huy Tran, Bingbing Zhuang, Saket An, Manmohan Chandraker

Classical monocular Simultaneous Localization And Mapping (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the others shortcomings. 

PDF | SupplementaryVideo
ECCV 2020 SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction
Sriram N N, Buyu Liu, Francesco Pittaluga, Manmohan Chandraker

We propose advances that address two key challenges in future trajectory prediction: (i) multimodality in both training data and predictions and (ii) constant time inference regardless of number of agents. Existing trajectory predictions are fundamentally limited by lack of diversity in training data, which is difficult to acquire with sufficient coverage of possible modes. 

PDF | SupplementaryVideo
CVPR 2020 Towards Universal Representation Learning for Deep Face Recognition
Yichun Shi, Xiang Yu, Kihyuk Sohn, Manmohan Chandraker, Anil K. Jain

Traditional recognition models require target domain data to adapt from the high-quality training data to conduct unconstrained/low-quality face recognition. Model ensemble is further needed for a universal representation purpose which significantly increases model complexity. In contrast, our universal face representation learning (URFace) works only on original training data without any target domain data information, and can deal with unconstrained and unseen testing scenarios. 

CVPR 2020 | Private-kNN: Practical Differential Privacy for Computer Vision
Yuqing Zhu, Xiang Yu, Manmohan Chandraker, Yu-Xiang Wang

The Private Aggregation of Teacher Ensembles (PATE) approach requires the training sets for the teachers to be disjoint. As such, achieving desirable privacy bounds requires an often impractical amount of labeled data. We propose a data-efficient scheme, which altogether avoids splitting the training dataset. Our approach allows the use of privacy-amplification by subsampling and iterative refinement of the kNN feature embedding. Comparing to PATE, we achieve comparable or better utility while reducing more than 90% privacy cost, thereby providing the “most practical method to-date” in computer vision. 

CVPR 2020 | Peek-a-boo: Occlusion Reasoning in Indoor Scenes with Plane Representations
Ziyu Jiang, Buyu Liu, Samuel Schulter, Zhangyang Wang, Manmohan Chandraker

We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane. We infer these planes from a single input image with a novel neural network architecture. It consists of a two-branch category-specific module that aims to predict layout and objects of the scene separately so that different types of planes can be handled better. We also introduce a novel loss function based on plane warping that can leverage multiple views at training time for improved occlusion-aware reasoning.  

CVPR 2020 | Understanding Road Layout from Videos as a Whole
Buyu Liu, Bingbing Zhuang, Samuel Schulter, Pan Ji, Manmohan Chandraker

We address the problem of inferring the layout of complex road scenes from video sequences. To this end, we formulate it as a top-view road attributes prediction problem and our goal is to predict these attributes for each frame both accurately and consistently. In contrast to prior work, we exploit the following three novel aspects: leveraging camera motions in videos, including context cues and incorporating long-term video information. Specifically, we introduce a model that aims to enforce prediction consistency in videos. 

AAAI 2020 | Adversarial Learning of Privacy-Preserving and Task-Oriented Representations
Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang

Our aim is to learn privacy-preserving and task-oriented representations that defend against model inversion attacks. To achieve this aim, we propose an adversarial reconstruction-based framework for learning latent representations that cannot be decoded to recover the original input images. By simulating the expected behavior of adversary, our framework is realized by minimizing the negative pixel reconstruction loss or the negative feature reconstruction (i.e., perceptual distance) loss.  

WACV 2020 | Active Adversarial Domain Adaptation
Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker

We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes it to weigh samples to account for distribution shifts. Specifically, our importance weight promotes samples with large uncertainty in classification and diversity from labeled examples, thus serves as a sample selection scheme for active learning. 

WACV 2020 |  DAVID: Dual-Attentional Video Deblurring
Junru Wu, Xiang Yu, Ding Liu, Manmohan Chandraker, Zhangyang Wang

Blind video deblurring is a challenging task because the blur due to camera shake, object movement and defocusing is heterogeneous in both temporal and spatial dimensions. Traditional methods train on datasets synthesized with a single level of blur, and thus do not generalize well across levels of blurriness. To address this challenge, we propose a dual attention mechanism to dynamically aggregate temporal cues for deblurring with an end-to-end trainable network structure. Extensive ablative studies and qualitative visualizations further demonstrate the advantage of our method in handling real video blur.

WACV 2020 |  Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones
Jinwoo Choi, Gaurav Sharma, Manmohan Chandraker, and Jia-Bin Huang

We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-DRONE, containing 5,250 videos to evaluate the task. We tackle both problem settings with 1) same and 2) different action label sets for the source (e.g., Kinetics dataset) and target domains (drone videos).

PDF | Project Site | Dataset
PAMI 2019 | Deep Supervision with Intermediate Concepts
Chi Li, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker

We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images. We obtain state-of-the-art performances on 2D and 3D keypoint localization, instance segmentation and image classification, outperforming alternative forms of supervision such as multi-task training. 

PDF | Project Site
CVPR 2019 | A Parametric Top-View Representation of Complex Road Scenes
Ziyan Wang , Buyu Liu, Samuel Schulter, Manmohan Chandraker

We address the problem of inferring the layout of complex road scenes given a single camera as input. To achieve that, we first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making. Moreover, the design of our top-view scene model allows for efficient sampling and thus generation of large-scale simulated data, which we leverage to train a deep neural network to infer our scene model's parameters. Finally, we design a Conditional Random Field (CRF) that enforces coherent predictions for a single frame and encourages temporal smoothness among video frames.

PDF | Project Site | Dataset
CVPR 2019 | Feature Transfer Learning for Face Recognition with Under-Represented Data
Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

Training with under-represented data leads to biased classifiers in conventionally-trained deep networks. We propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples. A Gaussian prior of the variance is assumed across all subjects and the variance from regular ones are transferred to the under-represented ones. This encourages the under-represented distribution to be closer to the regular distribution. Further, an alternating training regimen is proposed to simultaneously achieve less biased classifiers and a more discriminative feature representation. 

CVPR 2019 | Structure-And-Motion-Aware Rolling Shutter Correction
Bingbing Zhuang, Quoc-Huy Tran, Pan Ji, Loong Fah Cheong, Manmohan Chandraker

In this paper, we first make a theoretical contribution by proving that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a Convolutional Neural Network-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and perform RS image correction. We propose a geometrically meaningful way to synthesize large-scale training data and identify a geometric ambiguity that arises for training.  

PDF | Supplementary | Project Site
CVPR 2019 |  Gotta Adapt ’Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild
Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker

We provide a solution that allows knowledge transfer from fully annotated source images to unlabeled target ones, often captured in a different condition. We adapt at multiple semantic levels from feature to pixel, with complementary insights for each type. Utilizing the proposal, we achieve better recognition accuracy of car images in unlabeled surveillance domain by adapting the knowledge from car images in the web. 

ICLR 2019 |  Unsupervised Domain Adaptation for Distance Metric Learning
Kihyuk Sohn, Wenling Shang, Xiang Yu, Manmohan Chandraker

We propose Feature Transfer Network, a novel deep neural network for image-based face verification and identification that can adapt to biases like ethnicity, gender or age in a target set. Unlike existing methods, our network can even handle novel identities existing in the target domain. Our framework excels at both within-domain and cross-domain utility tasks, thus retaining discriminatory power in the adaptation. 

ICLR 2019 | Learning to Simulate
Nataniel Ruiz, Samuel Schulter, Manmohan Chandraker

Simulation can be a useful tool when obtaining and annotating train data is costly. However, optimal tuning of simulator parameters itself can be a laborious task. We implement a meta-learning algorithm in which a reinforcement learning agent, as the met learner, automatically adjusts the parameters of a non-differentiable simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. 

ICML 2019 | Neural Collaborative Subspace Clustering
Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang, Hongdong Li

We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces. In contrast to previous attempts, our model runs without the aid of spectral clustering. This makes our algorithm one of the kinds that can gracefully scale to large datasets. At its heart, our neural model benefits from a classifier which determines whether a pair of points lies on the same subspace or not. Essential to our model is the construction of two affinity matrices, one from the classifier and one based on a notion of subspace self-expressiveness, to supervise training in a collaborative scheme. 

ICCV 2019 |  GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition
Hao Zhou, Xiang Yu, David Jacobs

Traditional intrinsic image decomposition focuses on decomposing images into reflectance and shading, leaving surfaces normals and lighting entangled in shading. In this work, we propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting component, and jointly predict reflectance and surface normals. The global SH models the holistic lighting while local SH account for the spatial variation of lighting. Also, a novel non-negative lighting constraint is proposed to encourage the estimated SH to be physically meaningful.

ICCV 2019 |  Domain Adaptation for Structured Output via Discriminative Patch Representations
Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker

We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches. we show that our framework is complementary to existing domain adaptation techniques.

PDF | Supplementary | Project Site | Dataset
IROS 2019 | Degeneracy in Self-Calibration Revisited and a Deep Learning Solution for Uncalibrated SLAM
Bingbing Zhuang, Quoc-Huy Tran, Gim Hee Lee, Loong Fah Cheong, Manmohan Chandraker

We first revisit the geometric approach to radial distortion self-calibration, and provide a proof that explicitly shows the ambiguity between radial distortion and scene depth under forward camera motion. In view of such geometric degeneracy and the prevalence of forward motion in practice, we further propose a learning approach that trains a convolutional neural network on a large amount of synthetic data to estimate the camera parameters, and show its application to SLAM without knowing camera parameters a prior.  

PDF | Supplementary | Project Site
IROS 2019 | Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
Siddharth Srivastava, Frederic Jurie, Gaurav Sharma

We address the problem of 3D object detection from 2D monocular images in autonomous driving scenarios. We lift the 2D images to 3D representations using learned neural networks and leverage existing networks working directly on 3D data to perform 3D object detection and localization. We show that, with carefully designed training mechanism and automatically selected minimally noisy data, such a method is not only feasible, but gives higher results than many methods working on actual 3D inputs acquired from physical sensors.

ECCV 2018 | Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences
Mohammed E. Fathy, Quoc-Huy Tran, M. Zeeshan Zia, Paul Vernaza, Manmohan Chandraker

While a metric loss applied to the deepest layer of a CNN is expected to yield ideal features, the growing receptive field and striding effects cause shallower features to be better at high precision matching. We leverage this insight along with hierarchical supervision to learn more effective descriptors for geometric matching. We evaluate for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across multiple datasets. 

PDF | Project Site
ECCV 2018 | Learning to Look around Objects for Top-View Representations of Outdoor Scenes
Samuel Schulter, Menghua Zhai, Nathan Jacobs, Manmohan Chandraker

We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our model does not require costly or subjective human annotations for occluded areas or the top-view, but rather uses readily available annotations for standard semantic segmentation. 

ECCV 2018 | Zero-Shot Object Detection
Ankan Bansal, Karan Sikka , Gaurav Sharma , Rama Chellappa , Ajay Divakaran

We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes that are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and propose two background-aware approaches for learning robust detectors. Finally, we propose novel splits of two standard detection datasets – MSCOCO and VisualGenome, and present extensive empirical results.

ECCV 2018 | R2P2: A Reparameterized Pushforward Policy for Diverse, Precise Generative Path Forecasting
Nicholas Rhinehart, Kris M. Kitani, Paul Vernaza

We propose a method to forecast a vehicle’s ego-motion as a distribution over spatiotemporal paths, conditioned on features embedded in an overhead map. The method learns a policy inducing a distribution over simulated trajectories that is both “diverse” (produces most paths likely under the data) and “precise” (mostly produces paths likely under the data). We achieve this balance through minimization of a symmetrized cross-entropy between the distribution and demonstration data. 

PDF | Supplementary
CVPR 2018 | Fast and Accurate Online Video Object Segmentation via Tracking Parts
Jingchun Cheng, Yi-Hsuan Tsai, Wei-Chih Hung, Shengjin Wang, Ming-Hsuan Yang

We propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion, and cluttered background. Second, we construct an efficient region-of-interest segmentation network to generate part masks, with a similarity-based scoring function to refine these object parts and generate final segmentation outputs. 

CVPR 2018 | Learning to Adapt Structured Output Space for Semantic Segmentation
Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker

We develop a semantic segmentation method for adapting source ground truth labels to the unseen target domain. To achieve it, we consider semantic segmentation as structured prediction with spatial similarities between the source and target domains, and then adopt multi-level adversarial learning in the output space. We show that our method can perform adaptation under various settings, including synthetic-to-real and cross-city scenarios. 

PDF | Supplementary
Teaser Figure ACCV 2018 | Unseen Object Segmentation in Videos via Transferable Representations
Yi-Wen Chen , Yi-Hsuan Tsai, Chu-Ya Yang , Yen-Yu Lin , Ming-Hsuan Yang

We exploit existing annotations in source images and transfer such visual information to segment videos with unseen object categories. Without using any annotations in the target video, we propose a method to jointly mine useful segments and learn feature representations that better adapt to the target frames. The entire process is decomposed into two tasks: 1) solving a submodular function for selecting object-like segments, and 2) learning a CNN model with a transferable module for adapting seen categories in the source domain to the unseen target video.

Teaser Figure ACCV 2018 | Scalable Deep k-Subspace Clustering
Tong Zhang, Pan Ji, Mehrtash Harandi, Richard Hartley, Ian Reid

We introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm. To achieve our goal, we propose a scheme to update subspaces within a deep neural network. This in turn frees us from the need of having an affinity matrix to perform clustering. Unlike previous attempts, our method can easily scale up to large datasets, making it unique in the context of unsupervised learning with deep architectures 

ICCV 2017 | SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
Jingchun Cheng, Yi-Hsuan Tsai, Shengjin Wang, Ming-Hsuan Yang

We propose an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos. The proposed SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework. The unified framework can be trained iteratively offline to learn a generic notion, or fine-tuned online for specific objects. 

ICCV 2017 | Towards Large-Pose Face Frontalization in the Wild
Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker

Despite recent advances in deep face recognition, severe accuracy drops are observed under large pose variations. Learning pose-invariant features is feasible but needs expensively labeled data. In this work, we focus on frontalizing faces in the wild under various head poses. We propose a novel deep 3D Morphable Model (3DMM) conditioned Face Frontalization Generative Adversarial Network (GAN), termed as FF-GAN, to generate neutral head pose face images, showing photo-realistic visual effects. 

ICCV 2017 | Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker

Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition. To address this, we propose an image to video feature-level domain adaptation method to learn discriminative video frame representations. It is achieved by distilling knowledge from the network to a video adaptation network, performing feature restoration through synthetic data augmentation and learning a domain-invariant feature through a domain adversarial discriminator. Experiments on YouTube Faces and IJB-A demonstrate our method achieves state-of-the-art accuracy on video face recognition. 

ICCV 2017 | Reconstruction-Based Disentanglement for Pose-invariant Face Recognition
Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris N. Metaxas, Manmohan Chandraker

Generic data-driven deep face features might confound images of the same identity under large poses with other identities. We propose a feature reconstruction metric learning to disentangle identity and pose information in the latent feature space. The disentangled feature space encourages identity features of the same subject to be clustered together despite of the pose variation. Experiments on both controlled and in-the-wild face datasets show that our method consistently outperforms the state-of-the-art, especially on images with large head pose variations. 

ICCV 2017  | Scene Parsing with Global Context Embedding
Wei-Chih Hung, Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, Xin Lu, Ming-Hsuan Yang

We present a scene parsing method that utilizes global context information based on both the parametric and non-parametric models. Compared to previous methods that only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts. We show that the proposed method can eliminate false positives that are not compatible with the global context representations. 


Page 1 | Page 2