MM-TTA: Multi-Modal Test-Time Adaptation
for 3D Semantic Segmentation

Inkyu Shin1 Yi-Hsuan Tsai2 Bingbing Zhuang3 Samuel Schulter3
Buyu Liu3 Sparsh Garg3 In So Kweon1 Kuk-Jin Yoon1
1 KAIST 2 Phiar 3 NEC Laboratories America
CVPR 2022
[ Paper - CVF (coming soon) ] [ Paper - arXiv ] [ Bibtex ] [ Code & data (coming soon) ]

  @inproceedings{mmtta_cvpr_2022,
    author = {Inkyu Shin and Yi-Hsuan Tsai and Bingbing Zhuang and Samuel Schulter
              and Buyu Liu and Sparsh Garg and In So Kweon and Kuk-Jin Yoon},
    title = {{MM-TTA}: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation},
    booktitle = {CVPR},
    year = 2022
  }
      
We propose a Multi-Modal Test-Time Adaptation (MM-TTA) framework that enables a model to be quickly adapted to multi-modal test data without access to the source domain training data. We introduce two modules: 1) Intra-PG to produce reliable pseudo labels within each modality via updating two models (batch norm statistics) in different paces, i.e., slow and fast updating schemes with a momentum, and 2) Inter-PR to adaptively select pseudo-labels from the two modalities. These two modules seamlessly collaborate with each other and co-produce final cross-modal pseudo labels to help test-time adaptation.

Abstract

Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that, directly applying existing methods usually results in performance instability at test time, because multi-modal input is not considered jointly. To design a framework that can take full advantage of multi-modality, where each modality provides regularized self-supervisory signals to other modalities, we propose two complementary modules within and across the modalities. First, Intra-modal Pseudo-label Generation (Intra-PG) is introduced to obtain reliable pseudo labels within each modality by aggregating information from two models that are both pre-trained on source data but updated with target data at different paces. Second, Inter-modal Pseudo-label Refinement (Inter-PR) adaptively selects more reliable pseudo labels from different modalities based on a proposed consistency scheme. Experiments demonstrate that our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios for 3D semantic segmentation.

Short Summary of MM-TTA Process


Main Results

3D semantic segmentation results (mIoU) for different methods (Single-modality TTA, Intra-DG (w/ naive fusion), and Inter-DG + Inter-PR) on the A2D2 → SemanticKITTI benchmark. The different bars for each method show the 2D, 3D and ensemble predictions.
Quantitative comparisons of our MM-TTA approach with UDA methods and TTA baselines for multi-modal 3D semantic segmentation in three settings (A2D2 → SemanticKITTI, Synthia → SemanticKITTI, and nuScenes Day → Night).
Example results of our MM-TTA during test-time adaptation for gradual improvement. While TENT shows little improvements during adaptation, our method can effectively suppress the noise and achieve visually similar results to the ground truth, especially within the area of dotted white boxes.