Projects | Media Analytics

MEDIA ANALYTICS

PEOPLE

PUBLICATIONS

PATENTS

UniSeg: Learning Semantic Segmentation from Multiple Datasets with Label Shifts

With increasing applications of semantic segmentation, numerous datasets have been proposed in the past few years. Yet labeling remains expensive, thus, it is desirable to jointly train models across aggregations of datasets to enhance data volume and diversity. However, label spaces differ across datasets and may even be in conflict with one another.

Domain Adaptive Semantic Segmentation Using Weak Labels

We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human annotator in a new weakly supervised domain adaptation (WDA) paradigm for semantic segmentation.

Image Stitching and Rectification for Hand-Held Cameras

We derive a new differential homography that can account for the scanline-varying camera poses in rolling shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input: two consecutive frames from a video stream wherein the interframe motion is restricted from being arbitrarily large.

Object Detection With a Unified Label Space From Multiple Datasets

Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged from arbitrary existing datasets.

Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations

We address the challenge of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane.

Unsupervised & Semi-Supervised Domain Adaptation for Action Recognition From Drones

We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing, fully-annotated action-recognition datasets and unannotated (or only a few annotated) videos from drones.

Degeneracy in Self-Calibration Revisited & a Deep Learning Solution for Uncalibrated SLAM

We first revisit the geometric approach to radial distortion self-calibration and provide a proof that explicitly shows the ambiguity between radial distortion and scene depth under forward camera motion. In view of such geometric degeneracy and the prevalence of forward motion in practice, we further propose a learning approach that trains a convolutional neural network on a large amount of synthetic data to estimate the camera parameters and show its application to SLAM without knowing camera parameters prior.

Domain Adaptation for Structured Output via Discriminative Patch Representations

We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches.

Deep Supervision With Intermediate Concepts

We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images.

A Parametric Top-View Representation of Complex Road Scenes

We address the problem of inferring the layout of complex road scenes given a single camera as input. We first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making.

Structure-and-Motion-Aware Rolling Shutter Correction

We make a theoretical contribution by proving that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a convolutional neural network-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and performs RS image correction.

Hierarchical Metric Learning & Matching for 2D & 3D Geometric Correspondences

While a metric loss applied to the deepest layer of a CNN is expected to yield ideal features, the growing receptive field and striding effects cause shallower features to be better at high-precision matching. We leverage this insight, along with hierarchical supervision, to learn more effective descriptors for geometric matching. We evaluate for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across multiple datasets.

Deep Supervision With Shape Concepts for Occlusion-Aware 3D Object Parsing

We propose a deep CNN architecture to localize object semantic parts in 2D images and 3D space while inferring their visibility states given a single RGB image. We exploit domain knowledge to regularize the network by deeply supervising its hidden layers. In doing so, we sequentially infer a causal sequence of intermediate concepts.

Universal Correspondence Network

We present deep metric learning to obtain a feature space that preserves geometric or semantic similarity. Our visual correspondences span across rigid motions to intra-class shape or appearance variations. Our fully convolutional architecture, along with a novel correspondence contrastive loss, allows faster training by effective reuse of computations, accurate gradient computation and linear time testing instead of quadratic time for typical patch similarity methods.

Media Analytics | Projects | Archive

UniSeg: Learning Semantic Segmentation from Multiple Datasets with Label Shifts

Domain Adaptive Semantic Segmentation Using Weak Labels

Image Stitching and Rectification for Hand-Held Cameras

Object Detection With a Unified Label Space From Multiple Datasets

Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations

Unsupervised & Semi-Supervised Domain Adaptation for Action Recognition From Drones

Degeneracy in Self-Calibration Revisited & a Deep Learning Solution for Uncalibrated SLAM

Domain Adaptation for Structured Output via Discriminative Patch Representations

Deep Supervision With Intermediate Concepts

A Parametric Top-View Representation of Complex Road Scenes

Structure-and-Motion-Aware Rolling Shutter Correction

Hierarchical Metric Learning & Matching for 2D & 3D Geometric Correspondences

Deep Supervision With Shape Concepts for Occlusion-Aware 3D Object Parsing

Universal Correspondence Network

Contact Us

About Us

Our Pages

Read Our Blog Posts