Blog | NEC Labs America

A Dataset for High-Level 3D Scene Understanding of Complex Road Scenes in the Top-View

June 17, 2019/in Publications/by NEC Labs America

We introduce a novel dataset for high-level 3D scene understanding of complex road scenes. Our annotations extend the existing datasets KITTI [5] and nuScenes [1] with semantically and geometrically meaningful attributes like the number of lanes or the existence of, and distance to, intersections, sidewalks and crosswalks. Our attributes are rich enough to build a meaningful representation of the scene in the top-view and provide a tangible interface to the real world for several practical applications.

Learning Structure-And-Motion-Aware Rolling Shutter Correction

June 16, 2019/in Publications/by NEC Labs America

An exact method of correcting the rolling shutter (RS) effect requires recovering the underlying geometry, i.e. the scene structures and the camera motions between scanlines or between views. However, the multiple-view geometry for RS cameras is much more complicated than its global shutter (GS) counterpart, with various degeneracies. In this paper, we first make a theoretical contribution by showing that RS two-view geometry is degenerate in the case of pure translational camera motion. In view of the complex RS geometry, we then propose a Convolutional Neural Network (CNN)-based method which learns the underlying geometry (camera motion and scene structure) from just a single RS image and perform RS image correction. We call our method structure-and-motion-aware RS correction because it reasons about the concealed motions between the scanlines as well as the scene structure. Our method learns from a large-scale dataset synthesized in a geometrically meaningful way where the RS effect is generated in a manner consistent with the camera motion and scene structure. In extensive experiments, our method achieves superior performance compared to other state-of-the-art methods for single image RS correction and subsequent Structure from Motion (SfM) applications.

Gotta Adapt Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild

June 16, 2019/in Publications/by NEC Labs America

Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels. We propose that advantages may be derived by combining them, in the form of different insights that lead to a novel design and complementary properties that result in better performance. At the feature level, inspired by insights from semi-supervised learning, we propose a classification-aware domain adversarial neural network that brings target examples into more classifiable regions of source domain. Next, we posit that computer vision insights are more amenable to injection at the pixel level. In particular, we use 3D geometry and image synthesis based on a generalized appearance flow to preserve identity across pose transformations, while using an attribute-conditioned CycleGAN to translate a single source into multiple target images that differ in lower-level properties such as lighting. Besides standard UDA benchmark, we validate on a novel and apt problem of car recognition in unlabeled surveillance images using labeled images from the web, handling explicitly specified, nameable factors of variation through pixel-level and implicit, unspecified factors through feature-level adaptation.

Feature Transfer Learning for Face Recognition with Under-Represented Data

June 16, 2019/in Publications/by NEC Labs America

Despite the large volume of face recognition datasets, there is a significant portion of subjects, of which the samples are insufficient and thus under-represented. Ignoring such significant portion results in insufficient training data. Training with under-represented data leads to biased classifiers in conventionally-trained deep networks. In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples. A Gaussian prior of the variance is assumed across all subjects and the variance from regular ones are transferred to the under-represented ones. This encourages the under-represented distribution to be closer to the regular distribution. Further, an alternating training regimen is proposed to simultaneously achieve less biased classifiers and a more discriminative feature representation. We conduct ablative study to mimic the under-represented datasets by varying the portion of under-represented classes on the MS-Celeb-1M dataset. Advantageous results on LFW, IJB-A and MS-Celeb-1M demonstrate the effectiveness of our feature transfer and training strategy, compared to both general baselines and state-of-the-art methods. Moreover, our feature transfer successfully presents smooth visual interpolation, which conducts disentanglement to preserve identity of a class while augmenting its feature space with non-identity variations such as pose and lighting.

A Parametric Top-View Representation of Complex Road Scenes

June 16, 2019/in Publications/by NEC Labs America

In this paper, we address the problem of inferring the layout of complex road scenes given a single camera as input. To achieve that, we first propose a novel parameterized model of road layouts in a top-view representation, which is not only intuitive for human visualization but also provides an interpretable interface for higher-level decision making. Moreover, the design of our top-view scene model allows for efficient sampling and thus generation of large-scale simulated data, which we leverage to train a deep neural network to infer our scene model’s parameters. Specifically, our proposed training procedure uses supervised domain-adaptation techniques to incorporate both simulated as well as manually annotated data. Finally, we design a Conditional Random Field (CRF) that enforces coherent predictions for a single frame and encourages temporal smoothness among video frames. Experiments on two public data sets show that: (1) Our parametric top-view model is representative enough to describe complex road scenes; (2) The proposed method outperforms baselines trained on manually-annotated or simulated data only, thus getting the best of both; (3) Our CRF is able to generate temporally smoothed while semantically meaningful results.

Learning from Rules Performs as Implicit Regularization

June 9, 2019/in Publications/by NEC Labs America

In this paper, we study the generalization performance of deep neural networks in learning problems where the given task is governed by a set of rules. We consider two settings of supervised learning and rule-based learning. In supervised learning, the network is trained with pairs of inputs and the corresponding solutions that satisfy the problem constraints. In rule-based learning, the constraints are encoded into a neural network module that is applied on the output of the solver network. In this approach, instead of training with any actual solutions of the problem, the model will be trained to explicitly satisfy the constraints. We perform the experiments on two problems of solving a system of nonlinear equations and solving Sudoku puzzles. Our experimental results show that, compared to supervised approach, rule-based learning results in higher training error, but significantly lower validation error, especially when training data is small, thus performing as an implicit regularization.

Neural Collaborative Subspace Clustering

June 9, 2019/in Publications/by NEC Labs America

We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces. In contrast to previous attempts, our model runs without the aid of spectral clustering. This makes our algorithm one of the kinds that can gracefully scale to large datasets. At its heart, our neural model benefits from a classifier which determines whether a pair of points lies on the same subspace or not. Essential to our model is the construction of two affinity matrices, one from the classifier and the other from a notion of subspace self-expressiveness, to supervise training in a collaborative scheme. We thoroughly assess and contrast the performance of our model against various state-of-the-art clustering algorithms including deep subspace-based ones.

Robust Beam Tracking and Data Communication in Millimeter Wave Mobile Networks

June 3, 2019/in Publications/by NEC Labs America

Millimeter-wave (mmWave) bands have shown the potential to enable high data rates for next generation mobile networks. In order to cope with high path loss and severe shadowing in mmWave frequencies, it is essential to employ massive antenna arrays and generate narrow transmission patterns (beams). When narrow beams are used, mobile user tracking is indispensable for reliable communication. In this paper, a joint beam tracking and data communication strategy is proposed in which, the base station (BS) increases the beamwidth during data transmission to compensate for location uncertainty caused by user mobility. In order to evade low beamforming gains due to widening the beam pattern, a probing scheme is proposed in which the BS transmits a number of probing packets to refine the estimation of angle of arrival based on the user feedback, which enables reliable data transmission through narrow beams again. In the proposed scheme, time is divided into similar frames each consisting of a probing phase followed by a data communication phase. A steady state analysis is provided based on which, the duration of data transmission and probing phases are optimized. Furthermore, the results are generalized to consider practical constraints such as minimum feasible beamwidth. Simulation results reveal that the proposed method outperforms well-known approaches such as optimized beam sweeping.

Tripping Through Time: Efficient Temporal Localization of Activities in Videos

May 16, 2019/in Publications/by NEC Labs America

Localizing moments in untrimmed videos using language queries is a new task that requires the ability to accurately ground language into video. Existing approaches process the video, often more than once, to localize the activities and are inefficient. In this paper, we present TripNet, an end-to-end system which uses a gated attention architecture to model fine grained textual and visual representations in order to align text and video content. Furthermore, TripNet uses reinforcement learning to efficiently localize relevant activity clips in long videos, by learning how to skip around the video saving feature extraction and processing time. In our evaluation over Charades-STA and ActivityNet Captions dataset, we find that TripNet achieves high accuracy and only processes 32-41% of the entire video.

Learning To Simulate

May 6, 2019/in Publications/by NEC Labs America

Simulation is a useful tool in situations where training data for machine learning models is costly to annotate or even hard to acquire. In this work, we propose a reinforcement learning-based method for automatically adjusting the parameters of any (non-differentiable) simulator, thereby controlling the distribution of synthesized data in order to maximize the accuracy of a model trained on that data. In contrast to prior art that hand-crafts these simulation parameters or adjusts only parts of the available parameters, our approach fully controls the simulator with the actual underlying goal of maximizing accuracy, rather than mimicking the real data distribution or randomly generating a large volume of data. We find that our approach (i) quickly converges to the optimal simulation parameters in controlled experiments and (ii) can indeed discover good sets of parameters for an image rendering simulator in actual computer vision applications.

A Dataset for High-Level 3D Scene Understanding of Complex Road Scenes in the Top-View

Learning Structure-And-Motion-Aware Rolling Shutter Correction

Gotta Adapt Em All: Joint Pixel and Feature-Level Domain Adaptation for Recognition in the Wild

Feature Transfer Learning for Face Recognition with Under-Represented Data

A Parametric Top-View Representation of Complex Road Scenes

Learning from Rules Performs as Implicit Regularization

Neural Collaborative Subspace Clustering

Robust Beam Tracking and Data Communication in Millimeter Wave Mobile Networks

Tripping Through Time: Efficient Temporal Localization of Activities in Videos

Learning To Simulate

Contact Us

About Us

Our Pages

Recent Publications

Events

News