MEDIA ANALYTICS
PEOPLE
PUBLICATIONS
PATENTS
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
CVPR 2021 | Our work addresses two key challenges in trajectory prediction: (i) learning multimodal outputs and (ii) improving predictions by imposing constraints using driving knowledge. Recent methods have achieved strong performance using multi-choice learning objectives like winner-takes-all (WTA), but they highly depend on their initialization to provide diverse outputs. Our first contribution proposes a novel divide-and-conquer (DAC) approach.
READ MORE
Adaptation Across Extreme Variations using Unlabeled Bridges
BMCV 2020 | We tackle an unsupervised domain adaptation problem: when the domain discrepancy between labeled source and unlabeled target domains is large, due to many factors of inter- and intra-domain variation. We propose decomposing domain discrepancy into multiple smaller discrepancies by introducing unlabeled bridging domains that connect the source and target domains; this makes it easier to minimize each.
READ MORE
SMART: Simultaneous Multi-Agent Recurrent Trajectory Prediction
ECCV 2020 | We propose advances that address two key challenges in future trajectory prediction: (i) multi-modality in both training data and predictions and (ii) constant time inference regardless of number of agents. Existing trajectory predictions are fundamentally limited by lack of diversity in training data, which is difficult to acquire with sufficient coverage of possible modes.
READ MORE
Image Stitching and Rectification for Hand-Held Cameras
ECCV 2020 | We derive a new differential homography that can account for the scanline-varying camera poses in rolling shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in this paper on a special yet common input: two consecutive frames from a video stream wherein the interframe motion is restricted from being arbitrarily large.
READ MORE
Understanding Road Layout From Videos as a Whole
CVPR 2020 | We address the problem of inferring the layout of complex road scenes from video sequences. To this end, we formulate it as a top-view road attributes prediction problem, and our goal is to predict these attributes for each frame both accurately and consistently.
READ MORE
Peek-a-Boo: Occlusion Reasoning in Indoor Scenes With Plane Representations
CVPR 2020 | We address the challenge of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts of the plane.
READ MORE
Degeneracy in Self-Calibration Revisited & a Deep Learning Solution for Uncalibrated SLAM
IROS 2019 | We first revisit the geometric approach to radial distortion self-calibration and provide a proof that explicitly shows the ambiguity between radial distortion and scene depth under forward camera motion. In view of such geometric degeneracy and the prevalence of forward motion in practice, we further propose a learning approach that trains a convolutional neural network on a large amount of synthetic data to estimate the camera parameters and show its application to SLAM without knowing camera parameters prior.
READ MORE
Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles
IROS 2019 | We address the problem of 3D object detection from 2D monocular images in autonomous driving scenarios. We lift the 2D images to 3D representations using learned neural networks and leverage existing networks working directly on 3D data to perform 3D object detection and localization.
READ MORE
GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition
CCV 2019 | Traditional intrinsic image decomposition focuses on decomposing images into reflectance and shading, leaving surface normals and lighting entangled in shading. In this work, we propose a global-local spherical harmonics (GLoSH) lighting model to improve the lighting component and jointly predict reflectance and surface normals.
READ MORE
Deep Supervision With Intermediate Concepts
PAMI 2019 | We propose an approach for injecting prior domain structure into CNN training by supervising hidden layers with intermediate concepts. We formulate a probabilistic framework that predicts improved generalization through our deep supervision. This allows training only from synthetic CAD renderings where concept values can be extracted, while achieving generalization to real images.
READ MORE
A Parametric Top-View Representation of Complex Road Scenes
CVPR 2019 | We address the problem of inferring the layout of complex road scenes given a single camera as input. We first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making.
READ MORE
Structure-and-Motion-Aware Rolling Shutter Correction
CVPR 2019 | In this paper, we first make a theoretical contribution by proving that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a convolutional neural network-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and performs RS image correction.
READ MORE
Learning to Simulate
ICLR 2019 | Simulation can be a useful tool when obtaining and annotating train data is costly. However, optimal tuning of simulator parameters can itself be a laborious task. We implement a meta-learning algorithm in which a reinforcement learning agent, as the met learner, automatically adjusts the parameters of a non-differentiable simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data.
READ MORE
Unseen Object Segmentation in Videos via Transferable Representations
ACCV 2018 | We exploit existing annotations in source images and transfer such visual information to segment videos with unseen object categories. Without using any annotations in the target video, we propose a method to jointly mine useful segments and learn feature representations that better adapt to the target frames.
READ MORE
R2P2: A Reparameterized Pushforward Policy for Diverse, Precise Generative Path Forecasting
ECCV 2018 | We propose a method to forecast a vehicle’s ego-motion as a distribution over spatiotemporal paths, conditioned on features embedded in an overhead map. The method learns a policy and induces a distribution over simulated trajectories that is both “diverse” (produces most paths likely under the data) and “precise” (mostly produces paths likely under the data).
READ MORE
Hierarchical Metric Learning & Matching for 2D & 3D Geometric Correspondences
ECCV 2018 | While a metric loss applied to the deepest layer of a CNN is expected to yield ideal features, the growing receptive field and striding effects cause shallower features to be better at high-precision matching. We leverage this insight, along with hierarchical supervision, to learn more effective descriptors for geometric matching. We evaluate for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across multiple datasets.
READ MORE
Learning to Look Around Objects for Top-View Representations of Outdoor Scenes
ECCV 2018 | We propose a convolutional neural network that learns to predict occluded portions of a scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top view.
READ MORE
Learning to Adapt Structured Output Space for Semantic Segmentation
CVPR 2018 | We develop a semantic segmentation method for adapting source ground truth labels to the unseen target domain. To achieve it, we consider semantic segmentation as structured prediction with spatial similarities between the source and target domains and then adopt multi-level adversarial learning in the output space.
READ MORE
Fast & Accurate Online Video Object Segmentation via Tracking Parts
CVPR 2018 | We propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process after receiving images. We first utilize a part-based tracking method to deal with challenging factors such as large deformation, occlusion and cluttered background. We next construct an efficient region-of-interest segmentation network to generate part masks, with a similarity-based scoring function to refine these object parts and generate final segmentation outputs.
READ MORE
SegFlow: Joint Learning for Video Object Segmentation & Optical Flow
ICCV 2017 | We propose an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos. The proposed SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework. The unified framework can be trained iteratively offline to learn a generic notion, or it can be fine-tuned online for specific objects.
READ MORE
Scene Parsing With Global Context Embedding
ICCV 2017 | We present a scene-parsing method that utilizes global context information based on both parametric and non-parametric models. Compared to previous methods which only exploit the local relationship between objects, we train a context network based on scene similarities to generate feature representations for global contexts.
READ MORE
Deep Network Flow for Multi-Object Tracking
CVPR 2017 | We demonstrate that it is possible to learn features for network-flow-based data association via backpropagation by expressing the optimum of a smoothed network flow problem as a differentiable function of the pairwise association costs. We apply this approach to multi-object tracking with a network-flow formulation.
READ MORE
Deep Supervision With Shape Concepts for Occlusion-Aware 3D Object Parsing
CVPR 2017 | We propose a deep CNN architecture to localize object semantic parts in 2D images and 3D space while inferring their visibility states given a single RGB image. We exploit domain knowledge to regularize the network by deeply supervising its hidden layers. In doing so, we sequentially infer a causal sequence of intermediate concepts.
READ MORE
Learning Random-Walk Label Propagation for Weakly-Supervised Semantic Segmentation
CVPR 2017 | Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data. Given cheaply obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings using random-walk hitting probabilities, which leads to a differentiable parameterization with uncertainty estimates that are incorporated into our loss.
READ MORE
DESIRE: Distant Future Prediction in Dynamic Scenes With Interacting Agents
CVPR 2017 | We introduce a deep stochastic IOC RNN encoder-decoder framework, DESIRE, for the task of future prediction of multiple interacting agents in dynamic scenes. It produces accurate future predictions by tackling multi-modality of futures while accounting for a rich set of both static and dynamic scene contexts.
READ MORE
Universal Correspondence Network
NeurIPS 2016 | We present deep metric learning to obtain a feature space that preserves geometric or semantic similarity. Our visual correspondences span across rigid motions to intra-class shape or appearance variations. Our fully convolutional architecture, along with a novel correspondence contrastive loss, allows faster training by effective reuse of computations, accurate gradient computation and linear time testing instead of quadratic time for typical patch similarity methods.
READ MORE
Deep Deformation Network for Object Landmark Localization
ECCV 2016 | We propose a cascaded framework for localizing landmarks in non-rigid objects. The first stage initializes the shape as constrained to lie within a low-rank manifold, and the second stage estimates local deformations parameterized as thin-plate spline transformations. Since our framework does not incorporate either handcrafted features or part connectivity, it is easy to train and test and generally applicable to various object types.
READ MORE
WarpNet: Weakly Supervised Matching for Single-View Reconstruction
CVPR 2016 | Our WarpNet matches images of objects in fine-grained datasets without using part annotations. It aligns an object in one image with a different object in another by exploiting a fine-grained dataset to create artificial data for training a Siamese network with an unsupervised discriminative learning approach.
READ MORE
A Continuous Occlusion Model for Road Scene Understanding
CVPR 2016 | We present a physically interpretable 3D model for handling occlusions with applications to road scene understanding. Given object detection and SFM point tracks, our unified model probabilistically assigns point tracks to objects and reasons about object detection scores and bounding boxes.
READ MORE
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
ECCV 2020 | Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module.
READ MORE
Pseudo-RGB-D for Self-Improving Monocular SLAM & Depth Prediction
ECCV 2020 | Classical monocular simultaneous localization and mapping (SLAM) and the emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches to building a 3D map of a surrounding environment. In this paper, we demonstrate that coupling these two approaches by leveraging the strengths of each mitigates the other’s shortcomings.
READ MORE