KU Leuven (Katholieke Universiteit Leuven) is a leading research university in Leuven, Belgium, founded in 1425, making it one of the oldest universities in Europe. It is consistently ranked among the top universities worldwide and is known for its strong emphasis on research, innovation, and interdisciplinary collaboration across fields such as science, engineering, medicine, law, and the humanities. As a comprehensive and international institution, KU Leuven attracts students and scholars from around the globe, offering a wide range of English-taught programs and maintaining partnerships with universities, industry, and research centers worldwide.

Posts

On Synthesizing Data for Context Attribution in Question Answering

Question Answering (QA) accounts for a significantportion of LLM usage “in the wild”.However, LLMs sometimes produce false ormisleading responses, also known as hallucinations.Therefore, grounding the generatedanswers in contextually provided information—i.e., providing evidence for the generated text—is paramount for LLMs’ trustworthiness. Providingthis information is the task of context attribution.In this paper, we systematically studyLLM-based approaches for this task, namelywe investigate (i) zero-shot inference, (ii) LLMensembling, and (iii) fine-tuning of small LMson synthetic data generated by larger LLMs.Our key contribution is SYNQA: a novel generativestrategy for synthesizing context attributiondata. Given selected context sentences, anLLM generates QA pairs that are supported bythese sentences. This leverages LLMs’ naturalstrengths in text generation while ensuring clearattribution paths in the synthetic training data.We show that the attribution data synthesizedvia SYNQA is highly effective for fine-tuningsmall LMs for context attribution in differentQA tasks and domains. Finally, with a userstudy, we validate the usefulness of small, efficientLMs (fine-tuned on synthetic data fromSYNQA) in context attribution for QA.