Quantitative Bounds for Length Generalization in Transformers
Publication Date: 7/19/2025
Event: 3rd Workshop on High-dimensional Learning Dynamics (HiLD), San Diego, CA
Reference: pp. 1-13, 2025
Authors: Eshaan Nichani, Princeton University; Zachary Izzo, NEC Laboratories America, Inc.; Jason D. Lee, Princeton University
Abstract: We provide quantitative bounds on the length of sequences required to be observed during training for a transformer to length generalize, e.g., to continue to perform well on sequences unseen during training. Our results improve on Huang et al. [8], who show that there is a finite training length beyond which length generalization is guaranteed, but for which they do not provide quantitative bounds.
Publication Link: https://openreview.net/pdf?id=DgGPgVLrRX