How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

A doctor quickly needs a fact. A financial analyst needs a number. A scientist needs a record from last Tuesday’s experiment, not a summary written six months ago. All three are increasingly turning to large language models for help, and all three are running into the same fundamental wall: a system that cannot reliably tell whether to search a document corpus or query a structured database.

How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

That distinction sounds like a technical detail, but in high-stakes domains it is the difference between a correct answer and a dangerous one. In a hospital, a wrong answer about drug dosages is not an inconvenience. In a trading system, an imprecise response to a question about covenant compliance is a liability. The gap between a useful AI system and an unreliable one often comes down to a single question: where did that answer come from, and was that the right place to look?

A new paper, Learning to Route: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation, by researchers from our Data Science and System Security department and Arizona State University, presented at the 2026 ACM Web Conference (WWW 2026), directly addresses this gap. Their research introduces a rule-driven agent framework for a hybrid-source retrieval-augmented generation (RAG) that learns to route each incoming query to the most appropriate knowledge source, whether that is a document corpus, a structured database, or neither.

The authors include: Haoyue Bai, NEC Labs America intern, PHD candidate at Arizona State University; Haoyu Wang, NEC Laboratories America; Shengyu Chen, NEC Laboratories America; Zhengzhang Chen, NEC Laboratories America; Lu-An Tang, NEC Laboratories America; Wei Cheng, NEC Laboratories America; Yanjie Fu, Arizona State University; and DSSS Department Head, Haifeng Chen, NEC Laboratories America.

Two Sources, One Problem

Retrieval-augmented generation connects large language models to external knowledge at inference time, improving their ability to answer questions that require current or domain-specific information. The dominant approach pulls from unstructured document corpora, such as research papers or web pages. But relational databases, which underpin finance, healthcare, and scientific research worldwide, have been largely left out of that picture.

The research team ran motivating experiments on the TATQA financial dataset using GPT-4.1-mini to understand what is actually lost. Their analysis showed that many questions can be answered correctly only with database augmentation, while others require document retrieval. Neither source dominates across all query types, and relying on a single source leaves significant coverage gaps. The finding is straightforward but consequential: databases and documents offer complementary strengths, and the question is how to exploit both without incurring the cost of indiscriminate use.

Why Throwing Both at The Problem Backfires

The obvious fix is to combine both sources for every query. The team’s experiments showed why that does not work. Feeding both databases and documents to a model simultaneously introduces redundant and often conflicting evidence, distracts the model from the correct answer, and causes token counts to spike. On TATQA, queries that were answerable from a single source frequently failed under naive hybrid augmentation. Token usage ballooned with no consistent accuracy gain. In one comparison, the hybrid strategy consumed over 400 tokens per query, while their routing approach kept usage around 300 tokens per query with higher accuracy.

“Query types show consistent regularities in their alignment with retrieval paths, suggesting that routing decisions can be effectively guided by systematic rules that capture these patterns.”—Haoyu Wang, NEC Laboratories America.

That regularity is the foundation of the entire approach. Fact-centric and numerical questions consistently align with database retrieval. Open-ended and descriptive questions consistently align with document retrieval. The pattern holds across multiple datasets. Existing learned routers, whether classifier-based or LLM-based, struggle to capture these heterogeneous patterns stably: they require large, labeled datasets, behave as black boxes, and tend to produce uncontrollable routing decisions. The team’s answer was to build something transparent and adaptable instead.

A Three-Part Agent Architecture

The framework has three cooperating components.

  1. The routing agent scores each incoming query against a set of explicit, human-readable rules and selects the highest-scoring path. The rules encode the observed regularities directly. For example, a question requesting numbers or percentages scores higher for database augmentation; a question containing “how” or “why” scores higher for document augmentation. Because the rules are explicit, every routing decision is interpretable and auditable, a property that matters in regulated industries where black-box decisions create compliance risk.
  2. The rule-making expert agent operates at a higher level, refining the rule set based on accumulated question-answering feedback. After a batch of queries is processed, the system generates a diagnostic report covering which rules fired, how often, and with what accuracy. The expert agent reads that report as a textual gradient and rewrites the rules to address its weaknesses. On TATQA, the F1 score climbed steadily from 0.080 without any updates to over 0.096 at a batch size of 50. On FinQA, accuracy peaked at a batch size of 25. The system improves as it sees more queries, and the refinement happens offline, so it does not add latency at inference time.
  3. The path-level meta-cache addresses a practical concern: even a fast routing agent adds overhead. The cache stores the embedding representation of each query alongside its routing scores. When a new query arrives with an embedding similar enough to a cached one, the system reuses the prior routing decision rather than invoking the full agent. This yields lower latency meaningfully without sacrificing the reliability of the underlying factual answers, since the cache stores routing decisions rather than answers, making it safe to use even when the underlying database is frequently updated.

What The Results Show

The team evaluated across three QA benchmarks spanning financial data (TATQA and FinQA) and general knowledge (WikiQA), testing with four LLM backbones: LLaMA-3, Qwen2.5, GPT-4o, and GPT-4.1. The framework consistently outperformed both static strategies and learned routing baselines across all combinations. On WikiQA with Qwen2.5, it achieved an accuracy of 0.302, well above the best competing method at 0.260. On TATQA with LLaMA-3, accuracy improved from 0.188 to 0.212.

The gains over learned routing baselines are particularly notable. Neural routing models can capture complex patterns but require large training sets, tend to overfit, and lack interpretability. The rule-driven approach achieves higher accuracy with lower computational overhead by exploiting the structural regularities inherent in the alignment of query types with data sources.

Path utilization analysis reinforced the point. Across all three datasets, paths that were selectively chosen by the routing mechanism outperformed the same paths when applied uniformly to all queries. Routing did not just improve overall accuracy; it made each augmentation strategy more effective by sending it only the queries it was suited for.

Experiments on three QA datasets demonstrate that our framework consistently outperforms static strategies and learned routing baselines, achieving higher accuracy while maintaining moderate computational cost.

What This Means for Enterprise AI

The implications go well beyond benchmark scores. Healthcare systems that route questions about treatment protocols to documents while routing questions about specific lab values to structured databases could meaningfully reduce the risk of incorrect information reaching clinicians. Financial tools that distinguish between questions that need context and those that need a precise number become genuinely more reliable rather than merely more capable.

More broadly, this research points toward a maturing model of how RAG should work in production. Rather than treating retrieval as a single unified step, future systems will increasingly need to reason about which kind of retrieval is warranted before any retrieval occurs. The routing framework developed by this team combines the transparency of explicit rules, the adaptability of agent-based refinement, and the efficiency of intelligent caching into a practical system that can be deployed and trusted.

About The Authors

Publication to Blog Post Series

Our Publication-to-Blog Post Series highlights the real-world impact of our latest research, translating complex innovations into practical applications. From AI and machine learning to optical networking and intelligent systems, we showcase how our work goes beyond theory to address real-world challenges. Explore how cutting-edge research at NEC Laboratories America is driving measurable outcomes across industries.

How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

Most retrieval-augmented generation systems stop at documents, ignoring the relational databases that power finance, healthcare, and research. Our researchers built a rule-driven framework that learns which source to query for each question, delivering better answers at lower computational cost.
Rethinking Molecular Drug Design From Generation to Control

Rethinking Molecular Drug Design: From Generation to Control

Designing drug molecules is no longer just about generation, but control. NEC Laboratories America introduces MolDiffdAE, a diffusion-based framework that enables precise, multi-objective tuning of 3D molecular properties. By learning a semantic space, researchers can efficiently guide design, accelerating drug discovery and exploration of chemical space.