cross modal interaction Archives

Cross Model Interaction refers to the interaction or integration of information or features from different modalities within a computational or cognitive system. These systems deal with multiple types of data or information modalities, such as text, images, audio, or other sensor data. This concept is particularly relevant in fields where data comes in various forms, and a holistic understanding requires considering information from multiple sources.

Posts

Contextual Grounding of Natural Language Entities in Images

December 13, 2019/in Publications/by NEC Labs America

In this paper, we introduce a contextual grounding approach that captures the context in corresponding text entities and image regions to improve the grounding accuracy. Specifically, the proposed architecture accepts pre-trained text token embeddings and image object features from an off-the-shelf object detector as input. Additional encoding to capture the positional and spatial information can be added to enhance the feature quality. There are separate text and image branches facilitating respective architectural refinements for different modalities. The text branch is pre-trained on a large-scale masked language modeling task while the image branch is trained from scratch. Next, the model learns the contextual representations of the text tokens and image objects through layers of high-order interaction respectively. The final grounding head ranks the correspondence between the textual and visual representations through cross-modal interaction. In the evaluation, we show that our model achieves the state-of-the-art grounding accuracy of 71.36% over the Flickr30K Entities dataset. No additional pre-training is necessary to deliver competitive results compared with related work that often requires task-agnostic and task-specific pre-training on cross-modal datasets. The implementation is publicly available at https://gitlab.com/necla-ml/grounding.

Contextual Grounding of Natural Language Phrases in Images

November 5, 2019/in Publications/by NEC Labs America

Contact Us

About Us

Our Pages

Recent Publications

Events

News

Tag Archive for: cross modal interaction