MEDIA ANALYTICS
PEOPLE
PUBLICATIONS
PATENTS
UniSeg: Learning Semantic Segmentation from Multiple Datasets with Label Shifts
With increasing applications of semantic segmentation, numerous datasets have been proposed in the past few years. Yet labeling remains expensive, thus, it is desirable to jointly train models across aggregations of datasets to enhance data volume and diversity. However, label spaces differ across datasets and may even be in conflict with one another.
READ MORE
Domain Adaptive Semantic Segmentation Using Weak Labels
We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human annotator in a new weakly supervised domain adaptation (WDA) paradigm for semantic segmentation.
READ MORE
Image Stitching and Rectification for Hand-Held Cameras
We derive a new differential homography that can account for the scanline-varying camera poses in rolling shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input: two consecutive frames from a video stream wherein the interframe motion is restricted from being arbitrarily large.
READ MORE
Object Detection With a Unified Label Space From Multiple Datasets
Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces. The practical benefits of such an object detector are obvious and significant—application-relevant categories can be picked and merged from arbitrary existing datasets.
READ MORE
Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations
We address the challenge of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane.
READ MORE
Unsupervised & Semi-Supervised Domain Adaptation for Action Recognition From Drones
We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing, fully-annotated action-recognition datasets and unannotated (or only a few annotated) videos from drones.
READ MORE
Degeneracy in Self-Calibration Revisited & a Deep Learning Solution for Uncalibrated SLAM
We first revisit the geometric approach to radial distortion self-calibration and provide a proof that explicitly shows the ambiguity between radial distortion and scene depth under forward camera motion. In view of such geometric degeneracy and the prevalence of forward motion in practice, we further propose a learning approach that trains a convolutional neural network on a large amount of synthetic data to estimate the camera parameters and show its application to SLAM without knowing camera parameters prior.
READ MORE
Domain Adaptation for Structured Output via Discriminative Patch Representations
We tackle domain adaptive semantic segmentation via learning discriminative feature representations of patches in the source domain by discovering multiple modes of patch-wise output distribution through the construction of a clustered space. With such guidance, we use an adversarial learning scheme to push the feature representations of target patches in the clustered space closer to the distributions of source patches.
READ MORE
Deep Supervision With Intermediate Concepts
We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images.
READ MORE
A Parametric Top-View Representation of Complex Road Scenes
We address the problem of inferring the layout of complex road scenes given a single camera as input. We first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making.
READ MORE
Structure-and-Motion-Aware Rolling Shutter Correction
We make a theoretical contribution by proving that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a convolutional neural network-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and performs RS image correction.
READ MORE
Hierarchical Metric Learning & Matching for 2D & 3D Geometric Correspondences
While a metric loss applied to the deepest layer of a CNN is expected to yield ideal features, the growing receptive field and striding effects cause shallower features to be better at high-precision matching. We leverage this insight, along with hierarchical supervision, to learn more effective descriptors for geometric matching. We evaluate for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across multiple datasets.
READ MORE
Deep Supervision With Shape Concepts for Occlusion-Aware 3D Object Parsing
We propose a deep CNN architecture to localize object semantic parts in 2D images and 3D space while inferring their visibility states given a single RGB image. We exploit domain knowledge to regularize the network by deeply supervising its hidden layers. In doing so, we sequentially infer a causal sequence of intermediate concepts.
READ MORE
Universal Correspondence Network
We present deep metric learning to obtain a feature space that preserves geometric or semantic similarity. Our visual correspondences span across rigid motions to intra-class shape or appearance variations. Our fully convolutional architecture, along with a novel correspondence contrastive loss, allows faster training by effective reuse of computations, accurate gradient computation and linear time testing instead of quadratic time for typical patch similarity methods.
READ MORE