Quantitative Bounds for Length Generalization in Transformers

Publication Date: 7/19/2025

Event: 3rd Workshop on High-dimensional Learning Dynamics (HiLD), San Diego, CA

Reference: pp. 1-13, 2025

Authors: Eshaan Nichani, Princeton University; Zachary Izzo, NEC Laboratories America, Inc.; Jason D. Lee, Princeton University

Abstract: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. [8], who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.

Publication Link: https://openreview.net/pdf?id=DgGPgVLrRX