SS Cyril And Methodius University is a Macedonian institution active in engineering and computing research. NEC Labs America considers universities like this important partners for global collaboration in sensing and AI. These relationships support NEC’s mission of advancing applied AI and communication technologies through international academic cooperation.

Posts

On Synthesizing Data for Context Attribution in Question Answering

Question Answering (QA) accounts for a significantportion of LLM usage “in the wild”.However, LLMs sometimes produce false ormisleading responses, also known as hallucinations.Therefore, grounding the generatedanswers in contextually provided information—i.e., providing evidence for the generated text—is paramount for LLMs’ trustworthiness. Providingthis information is the task of context attribution.In this paper, we systematically studyLLM-based approaches for this task, namelywe investigate (i) zero-shot inference, (ii) LLMensembling, and (iii) fine-tuning of small LMson synthetic data generated by larger LLMs.Our key contribution is SYNQA: a novel generativestrategy for synthesizing context attributiondata. Given selected context sentences, anLLM generates QA pairs that are supported bythese sentences. This leverages LLMs’ naturalstrengths in text generation while ensuring clearattribution paths in the synthetic training data.We show that the attribution data synthesizedvia SYNQA is highly effective for fine-tuningsmall LMs for context attribution in differentQA tasks and domains. Finally, with a userstudy, we validate the usefulness of small, efficientLMs (fine-tuned on synthetic data fromSYNQA) in context attribution for QA.