On Synthesizing Data for Context Attribution in Question Answering

Publication Date: 8/1/2025

Event: The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

Reference: pp. 16929-16950, 2025

Authors: Gorjan Radevski, NEC Laboratories Europe, KU Leuven; Kiril Gashteovski, NEC Laboratories Europe, Ss. Cyril and Methodius University; Christopher Malon, NEC Laboratories America, Inc.; Shahbaz Syed, NEC Laboratories Europe; Sebastien Nicolas, NEC Laboratories Europe; Chia-Chien Hung, NEC Laboratories Europe; Timo Sztyler, NEC Laboratories Europe; Verena Heußer, NEC Laboratories Europe; Wiem Ben Rim, University College London; Masafumi Enomoto, NEC Corporation; Kunihiro Takeoka, NEC Corporation; Masafumi Oyamada, NEC Corporation; Goran Glavaš, University of Würzburg; Carolin Lawrence, NEC Laboratories Europe

Abstract: Question Answering (QA) accounts for a significant portion of LLM usage “in the wild”. However, LLMs sometimes produce false or misleading responses, also known as hallucinations. Therefore, grounding the generated answers in contextually provided information—i.e., providing evidence for the generated text—is paramount for LLMs’ trustworthiness. Providing this information is the task of context attribution. In this paper, we systematically study LLM-based approaches for this task, namely we investigate (i) zero-shot inference, (ii) LLM ensembling, and (iii) fine-tuning of small LM son synthetic data generated by larger LLMs.Our key contribution is SYNQA: a novel generative strategy for synthesizing context attribution data. Given selected context sentences, an LLM generates QA pairs that are supported by these sentences. This leverages LLMs’ natural strengths in text generation while ensuring clear attribution paths in the synthetic training data. We show that the attribution data synthesized via SYNQA is highly effective for fine-tuning small LMs for context attribution in different QA tasks and domains. Finally, with a user study, we validate the usefulness of small, efficient LMs (fine-tuned on synthetic data from SYNQA) in context attribution for QA.

Publication Link: https://aclanthology.org/2025.acl-long.828/

Additional Publication Link: https://arxiv.org/abs/2504.05317

On Synthesizing Data for Context Attribution in Question Answering

Contact Us

About Us

Our Pages

Recent Publications

Events

News