Improving Cross-Domain Detection with Self-Supervised Learning Cross-Domain Detection (XDD) aims to train a domain-adaptive object detector using unlabeled images from a target domain and labeled images from a source domain. Existing approaches achieve this either by aligning the feature maps or the region proposals from the two domains, or by transferring the style of source images to that of target images. In this paper, rather than proposing another method following the existing lines, we introduce a new framework complementary to existing methods. Our framework unifies some popular Self-Supervised Learning (SSL) techniques (e.g., rotation angle prediction, strong/weak data augmentation, mean teacher modeling) and adapts them to the XDD task. Our basic idea is to leverage the unsupervised nature of these SSL techniques and apply them simultaneously across domains (source and target) and models (student and teacher). These SSL techniques can thus serve as shared bridges that facilitate knowledge transfer between domains. More importantly, as these techniques are independently applied in each domain, they are complementary to existing domain alignment techniques that relies on interactions between domains (e.g., adversarial alignment). We perform extensive analyses on these SSL techniques and show that they significantly improve the performance of existing methods. In addition, we reach comparable or even better performance than the state-of-the-art methods when integrating our framework with an old well-established method.
Yun Fu works at Northeastern University.
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images! Finetuning a large vision language model (VLM) on a target dataset after large scale pretraining is a dominant paradigm in visual question answering (VQA). Datasets for specialized tasks such as knowledge-based VQA or VQA in non natural-image domains are orders of magnitude smaller than those for general-purpose VQA. While collecting additional labels for specialized tasks or domains can be challenging, unlabeled images are often available. We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets. SelTDA uses the VLM and target dataset to build a teacher model that can generate question-answer pseudolabels directly conditioned on an image alone, allowing us to pseudolabel unlabeled images. SelTDA then finetunes the initial VLM on the original dataset augmented with freshly pseudolabeled images. We describe a series of experiments showing that our self-taught data augmentation increases robustness to adversarially searched questions, counterfactual examples, and rephrasings, it improves domain generalization, and results in greater retention of numerical reasoning skills. The proposed strategy requires no additional annotations or architectural modifications, and is compatible with any modern encoder-decoder multimodal transformer. Code available at https://github.com/codezakh/SelTDA
Inductive and Unsupervised Representation Learning on Graph Structured Objects Inductive and unsupervised graph learning is a critical technique for predictive or information retrieval tasks where label information is difficult to obtain. It is also challenging to make graph learning inductive and unsupervised at the same time, as learning processes guided by reconstruction error based loss functions inevitably demand graph similarity evaluation that is usually computationally intractable. In this paper, we propose a general framework SEED (Sampling, Encoding, and Embedding Distributions) for inductive and unsupervised representation learning on graph structured objects. Instead of directly dealing with the computational challenges raised by graph similarity evaluation, given an input graph, the SEED framework samples a number of subgraphs whose reconstruction errors could be efficiently evaluated, encodes the subgraph samples into a collection of subgraph vectors, and employs the embedding of the subgraph vector distribution as the output vector representation for the input graph. By theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism. Using public benchmark datasets, our empirical study suggests the proposed SEED framework is able to achieve up to 10% improvement, compared with competitive baseline methods.
On Novel Object Recognition: A Unified Framework for Discriminability and Adaptability The rich and accessible labeled data fueled the revolutionary successes of deep learning in object recognition. However, recognizing objects of novel classes with limited supervision information provided, i.e., Novel Object Recognition (NOR), remains a challenging task. We identify in this paper two key factors for the success of NOR that previous approaches fail to simultaneously guarantee. The first is producing discriminative feature representations for images of novel classes, and the second is generating a flexible classifier readily adapted to novel classes provided with limited supervision signals. To secure both key factors, we propose a framework which decouples a deep classification model into a feature extraction module and a classification module. We learn the former to ensure feature discriminability with a standard multi-class classification task by fully utilizing the competing information among all classes within a training set, and learn the latter to secure adaptability by training a meta-learner network which generates classifier weights whenever provided with minimal supervision information of target classes. Extensive experiments on common benchmark datasets in the settings of both zero-shot and few-shot learning demonstrate our method achieves state-of-the-art performance.
Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective Zero-shot learning (ZSL) aims to recognize instances of unseen classes solely based on the semantic descriptions of the classes. Existing algorithms usually formulate it as a semantic-visual correspondence problem, by learning mappings from one feature space to the other. Despite being reasonable, previous approaches essentially discard the highly precious discriminative power of visual features in an implicit way, and thus produce undesirable results. We instead reformulate ZSL as a conditioned visual classification problem, i.e., classifying visual features based on the classifiers learned from the semantic descriptions. With this reformulation, we develop algorithms targeting various ZSL settings: For the conventional setting, we propose to train a deep neural network that directly generates visual feature classifiers from the semantic attributes with an episode-based training scheme; For the generalized setting, we concatenate the learned highly discriminative classifiers for seen classes and the generated classifiers for unseen classes to classify visual features of all classes; For the transductive setting, we exploit unlabeled data to effectively calibrate the classifier generator using a novel learning-without-forgetting self-training mechanism and guide the process by a robust generalized cross-entropy loss. Extensive experiments show that our proposed algorithms significantly outperform state-of-the-art methods by large margins on most benchmark datasets in all the ZSL settings.
4 Independence Way, Suite 200
Princeton, NJ 08540
San Jose Office
2033 Gateway Place, Suite 200
San Jose, CA 95110
NEC Laboratories America, Inc. (NEC Labs) is the US-based center for NEC Corporation’s global network of corporate research laboratories. Our diverse research groups collaborate with industry, academia and governments to provide disruptive solutions to complex problems. A leader in the integration of IT and network technologies with more than 100 years of expertise, NEC provides a combination of products and solutions that cross-utilize the company’s experience and global resources to meet the complex and ever-changing needs of its customers.
Read Our Blog Posts
- Apply for a Summer 2024 Internship
- Unearthing Nature’s Orchestra – How Fiber Optic Cables Can Hear Cicada Secrets
- NEC Labs America Team Heading to NeurIPS23 in New Orleans
- Sarper Ozharar Receives Award from Koç University
- Meet the NEC Labs America Intern Helping to Make Autonomous Vehicles Safer and More Secure
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Industrial Labs to Drive Disruptive Innovation for the Fourth Industrial Revolution
- A New Hope: AI Research is Conquering Today’s Computer Vision Plateau
- NEC Labs America’s Time Series Data Research Drives Space Systems Innovation
- Next-Generation Computing Finally Sees Light
- AI/Fiber-Optic Combo Poised To Improve Telecommunications
- Using AI To Safely Put The First Woman On The Moon
- Our AI Research Contributing to NASA’s Artemis Space Program
- NEC provides AI-based traffic monitoring system with fiber-optic sensing technology for NEXCO CENTRAL
- Beyond Communication: Telecom Fiber Networks for Rain Detection and Classification