Self-Training is a semi-supervised learning technique where a model is trained on a small amount of labeled data and then iteratively expands its training set by adding confidently predicted, yet unlabeled, examples. The model is retrained on the augmented dataset, and the process is repeated.

Self-training is particularly useful when labeled data is scarce. It leverages the model’s own predictions to generate more training data, gradually improving performance. However, it requires careful handling to avoid overfitting to potentially incorrect labels.

Posts

Taming Self-Training for Open-Vocabulary Object Detection

Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD.

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage recent advances in vision-language and large language models to design an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. This process operates iteratively, allowing for continuous self-improvement of the model. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method’s superior performance at a reduced cost.

Improving Pseudo Labels for Open-Vocabulary Object Detection

Recent studies show promising performance in open-vocabulary object detection (OVD) using pseudo labels (PLs) from pretrained vision and language models (VLMs). However, PLs generated by VLMs are extremely noisy due to the gap between the pretraining objective of VLMs and OVD, which blocks further advances on PLs. In this paper, we aim to reduce the noise in PLs and propose a method called online Self-training And a Split-and-fusion head for OVD (SAS-Det). First, the self-training finetunes VLMs to generate high quality PLs while prevents forgetting the knowledge learned in the pretraining. Second, a split-and-fusion (SAF) head is designed to remove the noise in localization of PLs, which is usually ignored in existing methods. It also fuses complementary knowledge learned from both precise ground truth and noisy pseudo labels to boost the performance. Extensive experiments demonstrate SAS-Det is both efficient and effective. Our pseudo labeling is 3 times faster than prior methods. SAS-Det outperforms prior state-of-the-art models of the same scale by a clear margin and achieves 37.4 AP50 and 27.3 APr on novel categories of the COCO and LVIS benchmarks, respectively.

Unsupervised Anomaly Detection with Self-Training and Knowledge Distillation

Anomaly Detection (AD) aims to find defective patterns or abnormal samples among data, and has been a hot research topic due to various real-world applications. While various AD methods have been proposed, most of them assume the availability of a clean (anomaly-free) training set, which, however, may be hard to guarantee in many real-world industry applications. This motivates us to investigate Unsupervised Anomaly Detection (UAD) in which the training set includes both normal and abnormal samples. In this paper, we address the UAD problem by proposing a Self-Training and Knowledge Distillation (STKD) model. STKD combats anomalies in the training set by iteratively alternating between excluding samples of high anomaly probabilities and training the model with the purified training set. Despite that the model is trained with a cleaner training set, the inevitably existing anomalies may still cause negative impact. STKD alleviates this by regularizing the model to respond similarly to a teacher model which has not been trained with noisy data. Experiments show that STKD consistently produces more robust performance with different levels of anomalies.

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 for NER on average and 2.5 accuracy for NLI.