retrieval augmented generation (rag) Archives

Retrieval-Augmented Generation (RAG) is a natural language processing (NLP) framework that combines both retrieval-based and generation-based models to improve the quality and relevance of generated text. In RAG, a generative model is augmented with a retriever component, which retrieves relevant information from a large corpus of text or knowledge base before generating a response. This approach allows the model to leverage the benefits of both retrieval and generation techniques, resulting in more coherent, informative, and contextually appropriate outputs.

Posts

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

June 17, 2024/in Publications/by NEC Labs America

Retrieval-augmented generation (RAG) is used in natural language processing (NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs to generate more informed and accurate responses. When enterprise data is primarily videos AI models like vision language models (VLMs) are necessary to convert information in videos into text. While essential this conversion is a bottleneck especially for large corpus of videos. It delays the timely use of enterprise videos to generate useful responses. We propose ViTA a novel method that leverages two unique characteristics of VLMs to expedite the conversion process. As VLMs output more text tokens they incur higher latency. In addition large (heavyweight) VLMs can extract intricate details from images and videos but they incur much higher latency per output token when compared to smaller (lightweight) VLMs that may miss details. To expedite conversion ViTA first employs a lightweight VLM to quickly understand the gist or overview of an image or a video clip and directs a heavyweight VLM (through prompt engineering) to extract additional details by using only a few (preset number of) output tokens. Our experimental results show that ViTA expedites the conversion time by as much as 43% without compromising the accuracy of responses when compared to a baseline system that only uses a heavyweight VLM.

iRAG: An Incremental Retrieval Augmented Generation System for Videos

April 24, 2024/in Publications/by NEC Labs America

Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entails high processing times, and not all information in the rich multimodal data is typically in the text descriptions. Since the user queries are not known apriori, developing a system for multimodal to text conversion and interactive querying of multimodal data is challenging.To address these limitations, we propose iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of large corpus of multimodal data. Unlike traditional RAG, iRAG quickly indexes large repositories of multimodal data, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the multimodal data to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long multimodal to text conversion times, overcomes information loss issues by doing on-demand query-specific extraction of details in multimodal data, and ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of large, real-world multimodal data. Experimental results on real-world long videos demonstrate 23x to 25x faster video to text ingestion, while ensuring that quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any querying.

Posts

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

iRAG: An Incremental Retrieval Augmented Generation System for Videos

Contact Us

About Us

Our Pages

Read Our Blog Posts

Tag Archive for: retrieval augmented generation (rag)

Posts

Contact Us

About Us

Our Pages

Read Our Blog Posts