THAT: Token-wise High-frequency Augmentation Transformer for Hyperspectral Pansharpening

Publication Date: 10/5/2025

Event: IEEE SMC 2025

Reference: pp. 1-8, 2025

Authors: Hongkun Jin, JPMorgan Chase; Hongcheng Jiang, University of Missouri-Kansas City; Zejun Zhang, University of Southern California; Yuan Zhang, Robinson Research Institute, University of Adelaide; Jia Fu, KTH Royal Institute of Technology; Tingfeng Li, NEC Laboratories America, Inc.; Kai Luo, University of Virginia

Abstract: Transformer-based methods have demonstrated strong potential in hyperspectral pansharpening by modeling long-range dependencies. However, their effectiveness is often limited by redundant token representations and a lack of multiscale feature modeling. Hyperspectral images exhibit intrinsic spectral priors (e.g., abundance sparsity) and spatial priors(e.g., non-local similarity), which are critical for accurate reconstruction. From a spectral spatial perspective, Vision Transformers (ViTs) face two major limitations: they struggle to preserve high-frequency components such as material edges and texture transitions, and suffer from attention dispersion across redundant tokens. These issues stem from the global self-attention mechanism, which tends to dilute high-frequency signals and overlook localized details. To address these challenges, we propose the Token-wise High-frequency AugmentationTransformer (THAT), a novel framework designed to enhance hyperspectral pansharpening through improved high-frequency feature representation and token selection. Specifically, THAT introduces: (1) Pivotal Token Selective Attention (PTSA) to prioritize informative tokens and suppress redundancy; (2) a Multi-level Variance-aware Feed-forward Network (MVFN) to enhance high-frequency detail learning. Experiments on standard benchmarks show that THAT achieves state-of-the art performance with improved reconstruction quality and efficiency.

Publication Link: