Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning Image captioning systems are expected to have the ability to combine individual concepts when describing scenes with concept combinations that are not observed during training. In spite of significant progress in image captioning with the help of the autoregressive generation framework, current approaches fail to generalize well to novel concept combinations. We propose a new framework that revolves around probing several similar image caption training instances (retrieval), performing analogical reasoning over relevant entities in retrieved prototypes (analogy), and enhancing the generation process with reasoning outcomes (composition). Our method augments the generation model by referring to the neighboring instances in the training set to produce novel concept combinations in generated captions. We perform experiments on the widely used image captioning benchmarks. The proposed models achieve substantial improvement over the compared baselines on both composition-related evaluation metrics and conventional image captioning metrics.
LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) architecture designed to address the vanishing gradient problem in traditional RNNs for sequential data. LSTMs introduce a memory cell and a set of gating mechanisms, including input gates, forget gates, and output gates. These gates control the flow of information into, out of, and within the memory cell. This enables LSTMs to capture and remember long-term dependencies in sequential data, making them well-suited for tasks like natural language processing, speech recognition, and time-series prediction.
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module. We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO. Inspired by prior geometric systems, we allow the networks to see beyond a small temporal window during training, through a novel a loss that incorporates temporally distant ( g $O(100)$) frames. Given GPU memory constraints, we propose a stage-wise training mechanism, where the first stage operates in a local time window and the second stage refines the poses with a “global” loss given the first stage features. We demonstrate competitive results on several standard VO datasets, including KITTI and TUM RGB-D.
Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series The problem of learning and forecasting underlying trends in time series data arises in a variety of applications, such as traffic management, energy optimization, etc. In literature, a trend in time series is characterized by the slope and duration, and its prediction is then to forecast the two values of the subsequent trend given historical data of the time series. For this problem, existing approaches mainly deal with the case in univariate time series. However, in many real-world applications, there are multiple variables at play, and handling all of them at the same time is crucial for an accurate prediction. A natural way is to employ multi-task learning (MTL) techniques in which the trend learning of each time series is treated as a task. The key point of MTL is to learn task relatedness to achieve better parameter sharing, which however is challenging in trend prediction task. First, effectively modeling the complex temporal patterns in different tasks is hard as the temporal and spatial dimensions are entangled. Second, the relatedness among tasks may change over time. In this paper, we propose a neural network, DeepTrends, for multivariate time series trend prediction. The core module of DeepTrends is a tensorized LSTM with adaptive shared memory (TLASM). TLASM employs the tensorized LSTM to model the temporal patterns of long-term trend sequences in an MTL setting. With an adaptive shared memory, TLASM is able to learn the relatedness among tasks adaptively, based upon which it can dynamically vary degrees of parameter sharing among tasks. To further consider short-term patterns, DeepTrends utilizes a multi-task 1dCNN to learn the local time series features, and employs a task-specific sub-network to learn a mixture of long-term and short-term patterns for trend prediction. Extensive experiments on real datasets demonstrate the effectiveness of the proposed model.
4 Independence Way, Suite 200
Princeton, NJ 08540
San Jose Office
2033 Gateway Place, Suite 200
San Jose, CA 95110
NEC Laboratories America, Inc. (NEC Labs) is the US-based center for NEC Corporation’s global network of corporate research laboratories. Our diverse research groups collaborate with industry, academia and governments to provide disruptive solutions to complex problems. A leader in the integration of IT and network technologies with more than 100 years of expertise, NEC provides a combination of products and solutions that cross-utilize the company’s experience and global resources to meet the complex and ever-changing needs of its customers.
Read Our Blog Posts
- Apply for a Summer 2024 Internship
- Unearthing Nature’s Orchestra – How Fiber Optic Cables Can Hear Cicada Secrets
- NEC Labs America Team Heading to NeurIPS23 in New Orleans
- Sarper Ozharar Receives Award from Koç University
- Meet the NEC Labs America Intern Helping to Make Autonomous Vehicles Safer and More Secure
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Industrial Labs to Drive Disruptive Innovation for the Fourth Industrial Revolution
- A New Hope: AI Research is Conquering Today’s Computer Vision Plateau
- NEC Labs America’s Time Series Data Research Drives Space Systems Innovation
- Next-Generation Computing Finally Sees Light
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Using AI To Safely Put The First Woman On The Moon
- Our AI Research Contributing to NASA’s Artemis Space Program
- NEC provides AI-based traffic monitoring system with fiber-optic sensing technology for NEXCO CENTRAL
- Beyond Communication: Telecom Fiber Networks for Rain Detection and Classification