Projects | 3D Perception

MEDIA ANALYTICS

PROJECTS

PEOPLE

PUBLICATIONS

PATENTS

3D Perception

We have pioneered the development of learned bird-eye view representations for road scenes which form a basis for 3D perception using images in applications like autonomous driving. Our techniques for 3D localization of objects achieve high accuracy for object position, orientation and part locations with just a monocular camera, using novel geometric and learned priors. We have led the development of the first monocular SLAM systems for large-scale outdoor driving scenes, as well as structure from motion methods that overcome challenges due to rolling shutter cameras in high-speed applications. Our Lidar-based instantaneous motion estimation can detect subtle motions such as a car about to start merging into a driving lane, which allows early prediction of intents leading to improved safety outcomes.

Publication Tags (project tag): 3dots

Featured Publications

Instantaneous Perception of Moving Objects in 3D

June 17, 2024/CVPR2024

The perception of 3D motion of surrounding traffic participants is crucial for driving safety. While existing works primarily focus on general large motions, we contend that the instantaneous detection and quantification of subtle motions is equally important as they indicate the nuances in driving behavior

NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization

June 18, 2023/CVPR 2023

Monocular 3D object localization in driving scenes is a crucial task, but challenging due to its ill-posed nature. Estimating 3D coordinates for each pixel on the object surface holds great potential as it provides dense 2D-3D geometric constraints for the underlying PnP problem. However, high-quality

Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts

June 19, 2022/CVPR'22

We propose an end-to-end network that takes a single perspective RGB image of a complex road scene as input, to produce occlusion-reasoned layouts in perspective space as well as a parametric bird’s-eye-view (BEV) space. In contrast to prior works that require dense supervision such as semantic labels

Fusing the Old with the New: Learning Relative Pose with Geometry-Guided Uncertainty

June 19, 2021/CVPR 2021, Virtual

Learning methods for relative camera pose estimation have been developed largely in isolation from classical geometric approaches. The question of how to integrate predictions from deep neural networks (DNNs) and solutions from geometric solvers, such as the 5-point algorithm [37], has as yet remained

Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction

August 28, 2020/ECCV 2020 - The 16th European Conference on Computer Vision, Glasgow, UK

Classical monocular Simultaneous Localization And Mapping (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. In this paper, we demonstrate that the coupling

Image Stitching and Rectification for Hand-Held Cameras

August 23, 2020/ECCV 2020 - The 16th European Conference on Computer Vision, Glasgow, UK

In this paper, we derive a new differential homography that can account for the scanline-varying camera poses in Rolling Shutter (RS) cameras, and demonstrate its application to carry out RS-aware image stitching and rectification at one stroke. Despite the high complexity of RS geometry, we focus in

Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling

August 23, 2020/ECCV 2020 - The 16th European Conference on Computer Vision, Glasgow, UK

Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction

Peek-a-boo: Occlusion Reasoning in Indoor Scenes with Plane Representations

June 16, 2020/CVPR 2020

We address the challenging task of occlusion-aware indoor 3D scene understanding. We represent scenes by a set of planes, where each one is defined by its normal, offset and two masks outlining (i) the extent of the visible part and (ii) the full region that consists of both visible and occluded parts

Understanding Road Layout from Videos as a Whole

June 16, 2020/CVPR 2020

In this paper, we address the problem of inferring the layout of complex road scenes from video sequences. To this end, we formulate it as a top-view road attributes prediction problem and our goal is to predict these attributes for each frame both accurately and consistently. In contrast to prior work,

Degeneracy in Self-Calibration Revisited and a Deep Learning Solution for Uncalibrated SLAM

November 3, 2019/IROS 2019, The Venetian Macao, Macau, China

Self-calibration of camera intrinsics and radial distortion has a long history of research in the computer vision community. However, it remains rare to see real applications of such techniques to modern Simultaneous Localization And Mapping (SLAM) systems, especially in driving scenarios. In this paper,

Deep Supervision with Intermediate Concepts (IEEE)

August 1, 2019/IEEE Transactions on Pattern Analysis and Machine Intelligence

Read Deep Supervision with Intermediate Concepts (IEEE). Recent data-driven approaches to scene interpretation predominantly pose inference as an end-to-end black-box mapping, commonly performed by a Convolutional Neural Network (CNN). However, decades of work on perceptual organization in both human

A Parametric Top-View Representation of Complex Road Scenes

June 16, 2019/IEEE Computer Vision and Pattern Recognition (CVPR 2019)

In this paper, we address the problem of inferring the layout of complex road scenes given a single camera as input. To achieve that, we first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable

Learning Structure-And-Motion-Aware Rolling Shutter Correction

June 16, 2019/IEEE Computer Vision and Pattern Recognition (CVPR 2019)

An exact method of correcting the rolling shutter (RS) effect requires recovering the underlying geometry, i.e. the scene structures and the camera motions between scanlines or between views. However, the multiple-view geometry for RS cameras is much more complicated than its global shutter (GS) counterpart,

Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

September 8, 2018/European Conference on Computer Vision - ECCV 2018, Munich, Germany

Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do

Learning to Look around Objects for Top-View Representations of Outdoor Scenes

September 8, 2018/European Conference on Computer Vision – ECCV 2018, Munich, Germany

Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the