Google is a global technology company that has redefined how people access and use information. Its innovations span search, AI, cloud computing, and digital infrastructure, shaping the future of connectivity and automation. NEC Labs America collaborates with Google on federated learning frameworks, large-scale optimization, and general-purpose transformers. Our work addresses scalability and efficiency in deep learning. Please read about our latest news and collaborative publications with Google.

Posts

Improving Language-Based Object Detection by Explicit Generation of Negative Examples

The recent progress in language-based object detection with an open-vocabulary can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training from image captions with grounded bounding boxes (ground truth or pseudo-labeled) enable the models to reason over an open-vocabulary and understand object descriptions in free-form text. In this work, we investigate the role of negative captions for training such language-based object detectors. While the fixed label space in standard object detection datasets clearly defines the set of negative classes, the free-form text used for language-based detection makes the space of potential negatives virtually infinite in size. We propose to leverage external knowledge bases and large-language-models to automatically generate contradictions for each caption in the training dataset. Furthermore, we leverage image-generate tools to create corresponding negative images to the contradicting caption. Such automatically generated data constitute hard negative examples for language-based detection and improve the model when trained from. Our experiments demonstrate the benefits of the automatically generated training data on two complex benchmarks.

Field and lab experimental demonstration of nonlinear impairment compensation using neural networks

Fiber nonlinearity is one of the major limitations to the achievable capacity in long distance fiber optic transmission systems. Nonlinear impairments are determined by the signal pattern and the transmission system parameters. Deterministic algorithms based on approximating the nonlinear Schrodinger equation through digital back propagation, or a single step approach based on perturbation methods have been demonstrated, however, their implementation demands excessive signal processing resources, and accurate knowledge of the transmission system. A completely different approach uses machine learning algorithms to learn from the received data itself to figure out the nonlinear impairment. In this work, a single-step, system agnostic nonlinearity compensation algorithm based on a neural network is proposed to pre-distort symbols at transmitter side to demonstrate ~0.6?dB Q improvement after 2800?km standard single-mode fiber transmission using 32 Gbaud signal. Without prior knowledge of the transmission system, the neural network tensor weights are constructed from training data thanks to the intra-channel cross-phase modulation and intra-channel four-wave mixing triplets used as input features.

Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

Interest point descriptors have fueled progress on almost every problem in computer vision. Recent advances in deep neural networks have enabled task-specific learned descriptors that outperform hand-crafted descriptors on many problems. We demonstrate that commonly used metric learning approaches do not optimally leverage the feature hierarchies learned in a Convolutional Neural Network (CNN), especially when applied to the task of geometric feature matching. While a metric loss applied to the deepest layer of a CNN, is often expected to yield ideal features irrespective of the task, in fact the growing receptive field as well as striding effects cause shallower features to be better at high precision matching tasks. We leverage this insight together with explicit supervision at multiple levels of the feature hierarchy for better regularization, to learn more effective descriptors in the context of geometric matching tasks. Further, we propose to use activation maps at different layers of a CNN, as an effective and principled replacement for the multi-resolution image pyramids often used for matching tasks. We propose concrete CNN architectures employing these ideas and evaluate them on multiple datasets for 2D and 3D geometric matching as well as optical flow, demonstrating state-of-the-art results and generalization across datasets.