MEDIA ANALYTICS
PEOPLE
PUBLICATIONS
PATENTS
UniSeg: Learning Semantic Segmentation from Multiple Datasets with Label Shifts
With increasing applications of semantic segmentation, numerous datasets have been proposed in the past few years. Yet labeling remains expensive, thus, it is desirable to jointly train models across aggregations of datasets to enhance data volume and diversity. However, label spaces differ across datasets and may even be in conflict with one another.
READ MORE
Object Detection With a Unified Label Space From Multiple Datasets
ECCV 2020 | Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged from arbitrary existing datasets.
READ MORE
Towards Universal Representation Learning for Deep Face Recognition
CVPR 2020 | Traditional recognition models require target domain data to adapt from high-quality training data to conduct unconstrained/low-quality face recognition. Model ensemble is further needed for a universal representation purpose, which significantly increases model complexity.
READ MORE
Unsupervised & Semi-Supervised Domain Adaptation for Action Recognition From Drones
WACV 2020 | We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing, fully-annotated action-recognition datasets and unannotated (or only a few annotated) videos from drones.
READ MORE
Adversarial Learning of Privacy-Preserving & Task-Oriented Representations
AAAI 2020 | Our aim is to learn privacy-preserving and task-oriented representations that defend against model inversion attacks. To achieve this, we propose an adversarial reconstruction-based framework for learning latent representations that cannot be decoded to recover the original input images.
READ MORE
Domain Adaptation for Structured Output via Discriminative Patch Representations
PAMI 2019 | We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches.
READ MORE
A Parametric Top-View Representation of Complex Road Scenes
CVPR 2019 | We address the problem of inferring the layout of complex road scenes given a single camera as input. We first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making.
READ MORE
Feature Transfer Learning for Face Recognition With Under-Represented Data
CVPR 2019 | Training with under-represented data leads to biased classifiers in conventionally trained deep networks. We propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
READ MORE
Gotta Adapt ’Em All: Joint Pixel & Feature-Level Domain Adaptation for Recognition in the Wild
CVPR 2019 | We provide a solution that allows knowledge transfer from fully annotated source images to unlabeled target ones, which are often captured in a different condition. We adapt at multiple semantic levels from feature to pixel, with complementary insights for each type. Utilizing the proposal, we achieve better recognition accuracy of car images in an unlabeled surveillance domain by adapting the knowledge from car images on the web.
READ MORE
Neural Collaborative Subspace Clustering
CML 2019 | We introduce Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces. In contrast to previous models, ours runs without the aid of spectral clustering. This enables our algorithm to gracefully scale to large datasets.
READ MORE
Zero-Shot Object Detection
ECCV 2018 | We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes that are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD.
READ MORE
Learning Efficient Object-Detection Models With Knowledge Distillation
NeurIPS 2017 | Deep object detectors require prohibitive runtimes to process an image for real-time applications. Model compression can learn compact models with fewer parameters, but accuracy is significantly degraded. In this work, we propose a new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation and hint learning.
READ MORE
Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
CCV 2017 | Despite rapid advances in face recognition, there remains a clear gap between the performance of still-image-based face recognition and video-based face recognition. To address this, we propose an image-to-video feature-level domain adaptation method to learn discriminative video-frame representations.
READ MORE
Towards Large-Pose Face Frontalization in the Wild
ICCV 2017 | Despite recent advances in deep face recognition, severe accuracy drops are observed under large pose variations. Learning pose-invariant features is feasible but needs expensively labeled data. In this work, we focus on frontalizing faces in the wild under various head poses.
READ MORE
Scene Parsing With Global Context Embedding
ICCV 2017 | We present a scene-parsing method that utilizes global context information based on both parametric and non-parametric models. Compared to previous methods which only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts.
READ MORE
Learning Random-Walk Label Propagation for Weakly-Supervised Semantic Segmentation
CVPR 2017 | Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data. Given cheaply obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings using random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss.
READ MORE
Improved Deep Metric Learning With Multi-Class N-Pair Loss Objective
NeurIPS 2016 | We tackle the problem of unsatisfactory convergence of training a deep neural network for metric learning by proposing multi-class N-pair loss. Unlike many other objective functions that ignore the information lying in the interconnections between the samples, N-pair loss utilizes full interaction of the examples from different classes within a batch.
READ MORE
A 4D Light-Field Dataset & CNN Architectures for Material Recognition
ECCV 2016 | We introduce a new light-field dataset of materials and take advantage of the recent success of deep learning to perform material recognition on the 4D light field. Our dataset contains 12 material categories, each with 100 images taken with a Lytro Illum, from which we extract about 30,000 patches in total.
READ MORE
SVBRDF-Invariant Shape & Reflectance Estimation From Light-Field Cameras
CVPR 2016 | We derive a spatially-varying (SV) BRDF-invariant theory for recovering 3D shape and reflectance from light-field cameras. Our key theoretical insight is a novel analysis of diffuse plus single-lobe SVBRDFs under a light-field setup. We show that although direct shape recovery is not possible, an equation relating depths and normals can still be derived.
READ MORE
WarpNet: Weakly Supervised Matching for Single-View Reconstruction
CVPR 2016 | Our WarpNet matches images of objects in fine-grained datasets without using part annotations. It aligns an object in one image with a different object in another by exploiting a fine-grained dataset to create artificial data for training a Siamese network with an unsupervised discriminative learning approach.
READ MORE
Embedding Label Structures for Fine-Grained Feature Representation
CVPR 2016 | We model the multi-level relevance among fine-grained classes for fine-grained categorization. We jointly optimize classification and similarity constraints in a proposed multi-task learning framework, and we embed label structures such as hierarchy or shared attributes into the framework by generalizing the triplet loss.
READ MORE
Fine-Grained Image Classification by Exploring Bipartite-Graph Labels
CVPR 2016 | We exploit the rich relationships among fine-grained classes for fine-grained image classification. We model the relations using the proposed bipartite-graph labels (BGL) and incorporate them into CNN training. Our system is computationally efficient in inference thanks to the bipartite structure.
READ MORE
Exploit All the Layers: Fast & Accurate CNN Object Detector With Scale-Dependent Pooling & Cascaded Rejection Classifiers
CVPR 2016 | We propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process after receiving the images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion and cluttered background.
READ MORE
Understanding & Improving Convolutional Neural Networks via Concatenated Rectified Linear Units
ICML 2016 | We propose a novel framework for monocular traffic scene recognition, relying on a decomposition into high-order and atomic scenes to meet those challenges. High-order scenes carry semantic meaning useful for AWS applications, while atomic scenes are easy to learn and represent elemental behaviors based on 3D localization of individual traffic participants.
READ MORE
Improving Face Recognition by Clustering Unlabeled Faces in the Wild
ECCV 2020 | We propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and it greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty.
READ MORE
Atomic Scenes for Scalable Traffic Scene Recognition in Monocular Videos
WACV 2016 | We propose a novel framework for monocular traffic scene recognition, relying on a decomposition into high-order and atomic scenes to meet those challenges. High-order scenes carry semantic meaning useful for AWS applications, while atomic scenes are easy to learn and represent elemental behaviors based on 3D localization of individual traffic participants.
READ MORE