Towards Realizing the Value of Labeled Target Samples: a Two-Stage Approach for Semi-Supervised Domain Adaptation Semi-Supervised Domain Adaptation (SSDA) is a recently emerging research topic that extends from the widely-investigated Unsupervised Domain Adaptation (UDA) by further having a few target samples labeled, i.e., the model is trained with labeled source samples, unlabeled target samples as well as a few labeled target samples. Compared with UDA, the key to SSDA lies how to most effectively utilize the few labeled target samples. Existing SSDA approaches simply merge the few precious labeled target samples into vast labeled source samples or further align them, which dilutes the value of labeled target samples and thus still obtains a biased model. To remedy this, in this paper, we propose to decouple SSDA as an UDA problem and a semi-supervised learning problem where we first learn an UDA model using labeled source and unlabeled target samples and then adapt the learned UDA model in a semi-supervised way using labeled and unlabeled target samples. By utilizing the labeled source samples and target samples separately, the bias problem can be well mitigated. We further propose a consistency learning based mean teacher model to effectively adapt the learned UDA model using labeled and unlabeled target samples. Experiments show our approach outperforms existing methods.
Domain Adaptation is a subfield of machine learning and transfer learning that focuses on adapting a model trained on one domain (the source domain) to perform well on a different, but related, domain (the target domain). In other words, domain adaptation aims to make a machine learning model more robust and effective when applied to data that comes from a distribution or context that is different from what it was originally trained on.
Adversarial Alignment for Source Free Object Detection Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data. While most existing SFOD methods generate pseudo labels via a source-pretrained model to guide training, these pseudo labels usually contain high noises due to heavy domain discrepancy. In order to obtain better pseudo supervisions, we divide the target domain into source-similar and source-dissimilar parts and align them in the feature space by adversarial learning. Specifically, we design a detection variance-based criterion to divide the target domain. This criterion is motivated by a finding that larger detection variances denote higher recall and larger similarity to the source domain. Then we incorporate an adversarial module into a mean teacher framework to drive the feature spaces of these two subsets indistinguishable. Extensive experiments on multiple cross-domain object detection datasets demonstrate that our proposed method consistently outperforms the compared SFOD methods. Our implementation is available at https://github.com/ChuQiaosong
MM TTA: Multi Modal Test Time Adaptation for 3D Semantic Segmentation Test time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi modal extension of test time adaptation for 3D semantic segmentation. We find that directly applying existing methods usually results in performance instability at test time because multi modal input is not considered jointly. To design a framework that can take full advantage of multi modality, where each modality provides regularized self supervisory signals to other modalities, we propose two complementary modules within and across the modalities. First, Intra modal Pseudolabel Generation (Intra PG) is introduced to obtain reliable pseudo labels within each modality by aggregating information from two models that are both pre trained on source data but updated with target data at different paces. Second, Inter modal Pseudo label Refinement (Inter PR) adaptively selects more reliable pseudo labels from different modalities based on a proposed consistency scheme. Experiments demonstrate that our regularized pseudo labels produce stable self learning signals in numerous multi modal test time adaptation scenarios for 3D semantic segmentation. Visit our project website at https://www.nec labs.com/mas/MM TTA
Learning Cross-Modal Contrastive Features for Video Domain Adaptation Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.
Learning Cross modal Contrastive Features for Video Domain Adaptation Learning transferable and domain adaptive feature representations from videos is important for video relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross modal inputs under the cross domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross modal and cross domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC Kitchens, and demonstrate the effectiveness of our components against state of the art algorithms.
Domain Adaptive Semantic Segmentation using Weak Labels We propose a novel framework for domain adaptation in semantic segmentation with image-level weak labels in the target domain. The weak labels may be obtained based on a model prediction for unsupervised domain adaptation (UDA), or from a human oracle in a new weakly-supervised domain adaptation (WDA) paradigm for semantic segmentation. Using weak labels is both practical and useful, since (i) collecting image-level target annotations is comparably cheap in WDA and incurs no cost in UDA, and (ii) it opens the opportunity for category-wise domain alignment. Our framework uses weak labels to enable the interplay between feature alignment and pseudo-labeling, improving both in the process of domain adaptation. Specifically, we develop a weak-label classification module to enforce the network to attend to certain categories, and then use such training signals to guide the proposed category-wise alignment method. In experiments, we show considerable improvements with respect to the existing state-of-the-arts in UDA and present a new benchmark in the WDA setting.
Shuffle and Attend: Video Domain Adaptation We address the problem of domain adaptation in videos for the task of human action recognition. Inspired by image-based domain adaptation, we can perform video adaptation by aligning the features of frames or clips of source and target videos. However, equally aligning all clips is sub-optimal as not all clips are informative for the task. As the first novelty, we propose an attention mechanism which focuses on more discriminative clips and directly optimizes for video-level (cf. clip-level) alignment. As the backgrounds are often very different between source and target, the source background-corrupted model adapts poorly to target domain videos. To alleviate this, as a second novelty, we propose to use the clip order prediction as an auxiliary task. The clip order prediction loss, when combined with domain adversarial loss, encourages learning of representations which focus on the humans and objects involved in the actions, rather than the uninformative and widely differing (between source and target) backgrounds. We empirically show that both components contribute positively towards adaptation performance. We report state-of-the-art performances on two out of three challenging public benchmarks, two based on the UCF and HMDB datasets, and one on Kinetics to NEC-Drone datasets. We also support the intuitions and the results with qualitative results.
Active Adversarial Domain Adaptation We propose an active learning approach for transferring representations across domains. Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains. The former uses a domain discriminative model to align domains, while the latter utilizes the model to weigh samples to account for distribution shifts. Specifically, our importance weight promotes unlabeled samples with large uncertainty in classification and diversity compared to la-beled examples, thus serving as a sample selection scheme for active learning. We show that these two views can be unified in one framework for domain adaptation and transfer learning when the source domain has many labeled examples while the target domain does not. AADA provides significant improvements over fine-tuning based approaches and other sampling methods when the two domains are closely related. Results on challenging domain adaptation tasks such as object detection demonstrate that the advantage over baseline approaches is retained even after hundreds of examples being actively annotated.
Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones We address the problem of human action classification in drone videos. Due to the high cost of capturing and labeling large-scale drone videos with diverse actions, we present unsupervised and semi-supervised domain adaptation approaches that leverage both the existing fully annotated action recognition datasets and unannotated (or only a few annotated) videos from drones. To study the emerging problem of drone-based action recognition, we create a new dataset, NEC-DRONE, containing 5,250 videos to evaluate the task. We tackle both problem settings with 1) same and 2) different action label sets for the source (e.g., Kinectics dataset) and target domains (drone videos). We present a combination of video and instance-based adaptation methods, paired with either a classifier or an embedding-based framework to transfer the knowledge from source to target. Our results show that the proposed adaptation approach substantially improves the performance on these challenging and practical tasks. We further demonstrate the applicability of our method for learning cross-view action recognition on the Charades-Ego dataset. We provide qualitative analysis to understand the behaviors of our approaches.
4 Independence Way, Suite 200
Princeton, NJ 08540
San Jose Office
2033 Gateway Place, Suite 200
San Jose, CA 95110
NEC Laboratories America, Inc. (NEC Labs) is the US-based center for NEC Corporation’s global network of corporate research laboratories. Our diverse research groups collaborate with industry, academia and governments to provide disruptive solutions to complex problems. A leader in the integration of IT and network technologies with more than 100 years of expertise, NEC provides a combination of products and solutions that cross-utilize the company’s experience and global resources to meet the complex and ever-changing needs of its customers.
Read Our Blog Posts
- Meet the NEC Labs America Intern Helping to Make Autonomous Vehicles Safer and More Secure
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Industrial Labs to Drive Disruptive Innovation for the Fourth Industrial Revolution
- A New Hope: AI Research is Conquering Today’s Computer Vision Plateau
- NEC Labs America’s Time Series Data Research Drives Space Systems Innovation
- Next-Generation Computing Finally Sees Light
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Using AI To Safely Put The First Woman On The Moon
- Our AI Research Contributing to NASA’s Artemis Space Program
- NEC provides AI-based traffic monitoring system with fiber-optic sensing technology for NEXCO CENTRAL