Attribute-Centric Compositional Text-to-Image Generation

Publication Date: 3/13/2025

Event: International Journal of Computer Vision

Reference: pp. 1-16, 2024

Authors: Yuren Cong, Leibniz University Hannover; Martin Renqiang Min, NEC Laboratories America, Inc.; Li Erran Li, Amazon; Bodo Rosenhahn, Leibniz University Hannover; Michael Ying Yang, University of Bath

Abstract: Despite the recent impressive breakthroughs in text-to-image generation, generative models have difficulty in capturing thedata distribution of underrepresented attribute compositions while over-memorizing overrepresented attribute compositions,which raises public concerns about their robustness and fairness. To tackle this challenge, we propose ACTIG, an attributecentriccompositional text-to-image generation framework. We present an attribute-centric feature augmentation and a novelimage-free training scheme, which greatly improves model’s ability to generate images with underrepresented attributes.Wefurther propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.We validateour framework on the CelebA-HQ and CUB datasets. Extensive experiments show that the compositional generalization ofACTIG is outstanding, and our framework outperforms previous works in terms of image quality and text-image consistency

Publication Link: https://doi.org/10.1007/s11263-025-02371-0