Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

Publication Date: 4/16/2024

Event: https://arxiv.org

Reference: https://arxiv.org/pdf/2404.14657

Authors: Abhishek Aich, NEC Laboratories America, Inc.; Yumin Suh, NEC Laboratories America, Inc.; Samuel Schulter, NEC Laboratories America, Inc.; Manmohan Chandraker, NEC Laboratories America, Inc.

Abstract: A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses >50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former style segmentation architectures to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (?52% GFLOPs reduction with no drop in performance on COCO dataset). We validate our frame work on multiple public benchmarks.

Publication Link: https://arxiv.org/pdf/2404.14657