EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation
Publication Date: 12/20/2025
Event: 2025 IJCNLP-AACL International Joint Conference on Natural Language Processing & Asia-Pacific Chapter of the Association for Computational Linguistics
Reference: pp. 1-13, 2025
Authors: Kai Zhang, NEC Laboratories America, Inc., Lehigh University; Christopher Malon, NEC Laboratories America, Inc.; Lichao Sun, Lehigh University; Martin Renqiang Min, NEC Laboratories America, Inc.
Abstract: Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models, have shown improved performance, their supervised fine-tuning (SFT) objective is not explicitly aligned with clinical efficacy. In this work, we introduce EditGRPO, a mixed-policy reinforcement learning algorithm designed specifically to optimize the generation through clinically motivated rewards. EditGRPO integrates on-policy exploration with off-policy guidance by injecting sentence-level detailed corrections during training rollouts. This mixed-policy approach addresses the exploration dilemma and sampling efficiency issues typically encountered in RL. Applied to a Qwen2.5-VL-3B, EditGRPO outperforms both SFT and vanilla GRPO baselines, achieving an average improvement of 3.4% in clinical metrics across four major datasets. Notably, EditGRPO also demonstrates superior out-of-domain generalization, with an average performance gain of5.9% on unseen datasets.
Publication Link: https://arxiv.org/abs/2509.22812


