How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

May 13, 2026|byNEC Labs America|inNews|tagsai agents, haifeng chen, haoyu wang, haoyue bai, hybrid retrieval, knowledge retrieval, large language models, lu an tang, natural language processing, pub to post series, question answering, rag, relational databases, retrieval augmented generation, routing framework, shengyu chen, wei cheng, yanjie fu, zhengzhang chen

A doctor quickly needs a fact. A financial analyst needs a number. A scientist needs a record from last Tuesday’s experiment, not a summary written six months ago. All three are increasingly turning to large language models for help, and all three are running into the same fundamental wall: a system that cannot reliably tell whether to search a document corpus or query a structured database.

How Rule-Driven Routing Makes Retrieval-Augmented Generation Smarter

That distinction sounds like a technical detail, but in high-stakes domains it is the difference between a correct answer and a dangerous one. In a hospital, a wrong answer about drug dosages is not an inconvenience. In a trading system, an imprecise response to a question about covenant compliance is a liability. The gap between a useful AI system and an unreliable one often comes down to a single question: where did that answer come from, and was that the right place to look?

A new paper, Learning to Route: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation, by researchers from our Data Science and System Security department and Arizona State University, presented at the 2026 ACM Web Conference (WWW 2026), directly addresses this gap. Their research introduces a rule-driven agent framework for a hybrid-source retrieval-augmented generation (RAG) that learns to route each incoming query to the most appropriate knowledge source, whether that is a document corpus, a structured database, or neither.

The authors include: Haoyue Bai, NEC Labs America intern, PHD candidate at Arizona State University; Haoyu Wang, NEC Laboratories America; Shengyu Chen, NEC Laboratories America; Zhengzhang Chen, NEC Laboratories America; Lu-An Tang, NEC Laboratories America; Wei Cheng, NEC Laboratories America; Yanjie Fu, Arizona State University; and DSSS Department Head, Haifeng Chen, NEC Laboratories America.

Two Sources, One Problem

Retrieval-augmented generation connects large language models to external knowledge at inference time, improving their ability to answer questions that require current or domain-specific information. The dominant approach pulls from unstructured document corpora, such as research papers or web pages. But relational databases, which underpin finance, healthcare, and scientific research worldwide, have been largely left out of that picture.

The research team ran motivating experiments on the TATQA financial dataset using GPT-4.1-mini to understand what is actually lost. Their analysis showed that many questions can be answered correctly only with database augmentation, while others require document retrieval. Neither source dominates across all query types, and relying on a single source leaves significant coverage gaps. The finding is straightforward but consequential: databases and documents offer complementary strengths, and the question is how to exploit both without incurring the cost of indiscriminate use.

Why Throwing Both at The Problem Backfires

The obvious fix is to combine both sources for every query. The team’s experiments showed why that does not work. Feeding both databases and documents to a model simultaneously introduces redundant and often conflicting evidence, distracts the model from the correct answer, and causes token counts to spike. On TATQA, queries that were answerable from a single source frequently failed under naive hybrid augmentation. Token usage ballooned with no consistent accuracy gain. In one comparison, the hybrid strategy consumed over 400 tokens per query, while their routing approach kept usage around 300 tokens per query with higher accuracy.

“Query types show consistent regularities in their alignment with retrieval paths, suggesting that routing decisions can be effectively guided by systematic rules that capture these patterns.”—Haoyu Wang, NEC Laboratories America.

That regularity is the foundation of the entire approach. Fact-centric and numerical questions consistently align with database retrieval. Open-ended and descriptive questions consistently align with document retrieval. The pattern holds across multiple datasets. Existing learned routers, whether classifier-based or LLM-based, struggle to capture these heterogeneous patterns stably: they require large, labeled datasets, behave as black boxes, and tend to produce uncontrollable routing decisions. The team’s answer was to build something transparent and adaptable instead.

A Three-Part Agent Architecture

The framework has three cooperating components.

The routing agent scores each incoming query against a set of explicit, human-readable rules and selects the highest-scoring path. The rules encode the observed regularities directly. For example, a question requesting numbers or percentages scores higher for database augmentation; a question containing “how” or “why” scores higher for document augmentation. Because the rules are explicit, every routing decision is interpretable and auditable, a property that matters in regulated industries where black-box decisions create compliance risk.
The rule-making expert agent operates at a higher level, refining the rule set based on accumulated question-answering feedback. After a batch of queries is processed, the system generates a diagnostic report covering which rules fired, how often, and with what accuracy. The expert agent reads that report as a textual gradient and rewrites the rules to address its weaknesses. On TATQA, the F1 score climbed steadily from 0.080 without any updates to over 0.096 at a batch size of 50. On FinQA, accuracy peaked at a batch size of 25. The system improves as it sees more queries, and the refinement happens offline, so it does not add latency at inference time.
The path-level meta-cache addresses a practical concern: even a fast routing agent adds overhead. The cache stores the embedding representation of each query alongside its routing scores. When a new query arrives with an embedding similar enough to a cached one, the system reuses the prior routing decision rather than invoking the full agent. This yields lower latency meaningfully without sacrificing the reliability of the underlying factual answers, since the cache stores routing decisions rather than answers, making it safe to use even when the underlying database is frequently updated.

What The Results Show

The team evaluated across three QA benchmarks spanning financial data (TATQA and FinQA) and general knowledge (WikiQA), testing with four LLM backbones: LLaMA-3, Qwen2.5, GPT-4o, and GPT-4.1. The framework consistently outperformed both static strategies and learned routing baselines across all combinations. On WikiQA with Qwen2.5, it achieved an accuracy of 0.302, well above the best competing method at 0.260. On TATQA with LLaMA-3, accuracy improved from 0.188 to 0.212.

The gains over learned routing baselines are particularly notable. Neural routing models can capture complex patterns but require large training sets, tend to overfit, and lack interpretability. The rule-driven approach achieves higher accuracy with lower computational overhead by exploiting the structural regularities inherent in the alignment of query types with data sources.

Path utilization analysis reinforced the point. Across all three datasets, paths that were selectively chosen by the routing mechanism outperformed the same paths when applied uniformly to all queries. Routing did not just improve overall accuracy; it made each augmentation strategy more effective by sending it only the queries it was suited for.

Experiments on three QA datasets demonstrate that our framework consistently outperforms static strategies and learned routing baselines, achieving higher accuracy while maintaining moderate computational cost.

What This Means for Enterprise AI

The implications go well beyond benchmark scores. Healthcare systems that route questions about treatment protocols to documents while routing questions about specific lab values to structured databases could meaningfully reduce the risk of incorrect information reaching clinicians. Financial tools that distinguish between questions that need context and those that need a precise number become genuinely more reliable rather than merely more capable.

More broadly, this research points toward a maturing model of how RAG should work in production. Rather than treating retrieval as a single unified step, future systems will increasingly need to reason about which kind of retrieval is warranted before any retrieval occurs. The routing framework developed by this team combines the transparency of explicit rules, the adaptability of agent-based refinement, and the efficiency of intelligent caching into a practical system that can be deployed and trusted.

About The Authors

Haoyu Wang is a Researcher in the Data Science & System Security Department at NEC Laboratories America. He received his undergraduate degree in Computer Science from the University of Science and Technology of China, his M.S. in Engineering Physics/Applied Physics from Columbia University, and his Ph.D. in Computer Science from the University of Virginia.

Shengyu Chen is a Researcher in the Data Science and System Security Department at NEC Laboratories America, based in Princeton, NJ. He earned his Master’s degree in Computer Science from Indiana University Bloomington and completed his Ph.D. in Computer Science at the University of Pittsburgh. At NEC, Dr. Chen develops advanced AI methods for analyzing complex spatial-temporal and time-series data.

Zhengzhang Chen is a Senior Researcher in the Data Science and System Security Department at NEC Laboratories America in Princeton, NJ. He received his PhD in Computer Science from North Carolina State University. Dr. Chen’s research focuses on machine learning for dynamic and complex systems, with expertise spanning anomaly detection, causal discovery, multimodal data analysis, and trustworthy AI.

Lu-An Tang is a Senior Researcher in the Data Science & System Security Department at NEC Laboratories America. He received his BS in Engineering and his MS in Engineering from Peking University and his Ph.D. in Computer Science from the University of Illinois Urbana-Champaign, where his work focused on anomaly detection, cyber security, IoT, AIOps and LLM applications with RAG.

Wei Cheng is a Senior Researcher at NEC Labs America. He received his Ph.D. from the Department of Computer Science, UNC at Chapel Hill, in 2015, advised by Prof. Wei Wang. His research interests include data science, machine learning, and bioinformatics. He has filed over sixty patents and published more than ninety research papers in top-tier conferences

Haifeng Chen is the Department Head of the Data Science and System Security Department at NEC Laboratories America. He received his PhD in Computer Engineering from Rutgers University. His research focuses on data mining, system security, and industrial AI. He leads NEC’s work on secure systems, anomaly detection, and AI-driven automation solutions.

Publication to Blog Post Series

Our Publication-to-Blog Post Series highlights the real-world impact of our latest research, translating complex innovations into practical applications. From AI and machine learning to optical networking and intelligent systems, we showcase how our work goes beyond theory to address real-world challenges. Explore how cutting-edge research at NEC Laboratories America is driving measurable outcomes across industries.