Domain-specific Question Answering generates accurate responses to user queries using specialized data from a particular enterprise or industry domain. LeanContext generates compact, query-aware summaries of relevant enterprise data to reduce costs while maintaining accuracy. Instead of relying on generic summarizers, which produce summaries not tailored to specific queries, LeanContext ensures that only the most pertinent data is used to answer questions. By focusing on the context of the query, this method optimizes responses from large language models (LLMs) with minimal data input, improving cost efficiency and maintaining domain-specific accuracy.

Posts

Optimizing LLM API usage costs with novel query-aware reduction of relevant enterprise data

Costs of LLM API usage rise rapidly when proprietary enterprise data is used as context for user queries to generate more accurate responses from LLMs. To reduce costs, we propose LeanContext, which generates query-aware, compact and AI model-friendly summaries of relevant enterprise data context. This is unlike traditional summarizers that produce query-unaware human-friendly summaries that are also not as compact. We first use retrieval augmented generation (RAG) to generate a query-aware enterprise data context, which includes key, query-relevant enterprise data. Then, we use reinforcement learning to further reduce the context while ensuring that a prompt consisting of the user query and the reduced context elicits an LLM response that is just as accurate as the LLM response to a prompt that uses the original enterprise data context. Our reduced context is not only query-dependent, but it is also variable-sized. Our experimental results demonstrate that LeanContext (a) reduces costs of LLM API usage by 37% to 68% (compared to RAG), while maintaining the accuracy of the LLM response, and (b) improves accuracy of responses by 26% to 38% when state-of-the-art summarizers reduce RAG context.