Posts

Controllable Dynamic Multi Task Architectures (arXiv)

Read Controllable Dynamic Multi Task Architectures (arXiv). Multi task learning commonly encounters competition for resources among tasks, specifically when model capacity is limited. This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. In contrast to the existing dynamic multi task approaches that adjust only the weights within a fixed architecture, our approach affords the flexibility to dynamically control the total computational cost and match the user preferred task importance better. We propose a disentangled training of two hypernetworks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree structured models with adapted weights. Experiments on three multi task benchmarks, namely PASCAL Context, NYU v2, and CIFAR 100, show the efficacy of our approach. Project page is available at https://www.nec labs.com/˜mas/DYMU.

On Generalizing Beyond Domains in Cross Domain Continual Learning (arXiv)

Read On Generalizing Beyond Domains in Cross Domain Continual Learning (arXiv). Humans have the ability to accumulate knowledge of new tasks in varying conditions, but deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task. Many recent methods focus on preventing catastrophic forgetting under the assumption of train and test data following similar distributions. In this work, we consider a more realistic scenario of continual learning under domain shifts where the model must generalize its inference to an unseen domain. To this end, we encourage learning semantically meaningful features by equipping the classifier with class similarity metrics as learning parameters which are obtained through Mahalanobis similarity computations. Learning of the backbone representation along with these extra parameters is done seamlessly in an end to end manner. In addition, we propose an approach based on the exponential moving average of the parameters for better knowledge distillation. We demonstrate that, to a great extent, existing continual learning algorithms fail to handle the forgetting issue under multiple distributions, while our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.

Learning Cross-Modal Contrastive Features for Video Domain Adaptation

Learning Cross-Modal Contrastive Features for Video Domain Adaptation Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross-modal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

Learning Cross modal Contrastive Features for Video Domain Adaptation (arXiv)

Read Learning Cross modal Contrastive Features for Video Domain Adaptation (arXiv). Learning transferable and domain adaptive feature representations from videos is important for video relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the cross modal inputs under the cross domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross modal and cross domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC Kitchens, and demonstrate the effectiveness of our components against state of the art algorithms.

Cross-Domain Similarity Learning for Face Recognition in Unseen Domains

Cross-Domain Similarity Learning for Face Recognition in Unseen Domains Face recognition models trained under the assumption of identical training and test distributions often suffer from poor generalization when faced with unknown variations, such as a novel ethnicity or unpredictable individual make-ups during test time. In this paper, we introduce a novel cross-domain metric learning loss, which we dub Cross-Domain Triplet (CDT) loss, to improve face recognition in unseen domains. The CDT loss encourages learning semantically meaningful features by enforcing compact feature clusters of identities from one domain, where the compactness is measured by underlying similarity metrics that belong to another training domain with different statistics. Intuitively, it discriminatively correlates explicit metrics derived from one domain, with triplet samples from another domain in a unified loss function to be minimized within a network, which leads to better alignment of the training domains. The network parameters are further enforced to learn generalized features under domain shift, in a model-agnostic learning pipeline. Unlike the recent work of Meta Face Recognition [18], our method does not require careful hard-pair sample mining and filtering strategy during training. Extensive experiments on various face recognition benchmarks show the superiority of our method in handling variations, compared to baseline and the state-of-the-art methods.

Cross Domain Similarity Learning for Face Recognition in Unseen Domains (arXiv)

Read Cross Domain Similarity Learning for Face Recognition in Unseen Domains (arXiv). Face recognition models trained under the assumption of identical training and test distributions often suffer from poor generalization when faced with unknown variations, such as a novel ethnicity or unpredictable individual make ups during test time. In this paper, we introduce a novel cross domain metric learning loss, which we dub Cross Domain Triplet (CDT) loss, to improve face recognition in unseen domains. The CDT loss encourages learning semantically meaningful features by enforcing compact feature clusters of identities from one domain, where the compactness is measured by underlying similarity metrics that belong to another training domain with different statistics. Intuitively, it discriminatively correlates explicit metrics derived from one domain, with triplet samples from another domain in a unified loss function to be minimized within a network, which leads to better alignment of the training domains. The network parameters are further enforced to learn generalized features under domain shift, in a model agnostic learning pipeline. Unlike the recent work of Meta Face Recognition, our method does not require careful hard pair sample mining and filtering strategy during training. Extensive experiments on various face recognition benchmarks show the superiority of our method in handling variations, compared to baseline and the state of the art methods.

Uncertainty Aware Physically Guided Proxy Tasks for Unseen Domain Face Anti-Spoofing

Uncertainty Aware Physically Guided Proxy Tasks for Unseen Domain Face Anti-Spoofing. Face anti-spoofing (FAS) seeks to discriminate genuine faces from fake ones arising from any type of spoofing attack. Due to the wide variety of attacks, it is implausible to obtain training data that spans all attack types. We propose to leverage physical cues to attain better generalization on unseen domains. As a specific demonstration, we use physically guided proxy cues such as depth, reflection, and material to complement our main anti-spoofing (a.k.a liveness detection) task, with the intuition that genuine faces across domains have consistent face like geometry, minimal reflection, and skin material. We introduce a novel uncertainty-aware attention scheme that independently learns to weigh the relative contributions of the main and proxy tasks, preventing the over confident issue with traditional attention modules. Further, we propose attribute-assisted hard negative mining to disentangle liveness irrelevant features with liveness features during learning. We evaluate extensively on public benchmarks with intra-dataset and inter-dataset protocols. Our method achieves superior performance especially in unseen domain generalization for FAS.

Voting Based Approaches For Differentially Private Federated Learning

Voting Based Approaches For Differentially Private Federated Learning Differentially Private Federated Learning (DPFL) is an emerging field with many applications. Gradient averaging based DPFL methods require costly communication rounds and hardly work with large capacity models, due to the explicit dimension dependence in its added noise. In this work, inspired by knowledge transfer non federated privacy learning from Papernot et al.(2017, 2018), we design two new DPFL schemes, by voting among the data labels returned from each local model, instead of averaging the gradients, which avoids the dimension dependence and significantly reduces the communication cost. Theoretically, by applying secure multi party computation, we could exponentially amplify the (data dependent) privacy guarantees when the margin of the voting scores are large. Extensive experiments show that our approaches significantly improve the privacy utility trade off over the state of the arts in DPFL.

Improving Face Recognition by Clustering Unlabeled Faces in the Wild

Improving Face Recognition by Clustering Unlabeled Faces in the Wild While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation. Prior work has mostly been in controlled settings, where the labeled and unlabeled data sets have no overlapping identities by construction. This is not realistic in large-scale face recognition, where one must contend with such overlaps, the frequency of which increases with the volume of data. Ignoring identity overlap leads to significant labeling noise, as data from the same identity is split into multiple clusters. To address this, we propose a novel identity separation method based on extreme value theory. It is formulated as an out-of-distribution detection algorithm, and greatly reduces the problems caused by overlapping-identity label noise. Considering cluster assignments as pseudo-labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty. Extensive experiments on both controlled and real settings demonstrate our method’s consistent improvements over supervised baselines, e.g., 11.6% improvement on IJB-A verification.

Improving Face Recognition by Clustering Unlabeled Faces in the Wild (arXiv)

Read Improving Face Recognition by Clustering Unlabeled Faces in the Wild (arXiv). While deep face recognition has benefited significantly from large scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation. Prior work has mostly been in controlled settings, where the labeled and unlabeled data sets have no overlapping identities by construction. This is not realistic in large scale face recognition, where one must contend with such overlaps, the frequency of which increases with the volume of data. Ignoring identity overlap leads to significant labeling noise, as data from the same identity is split into multiple clusters. To address this, we propose a novel identity separation method based on extreme value theory. It is formulated as an out of distribution detection algorithm, and greatly reduces the problems caused by overlapping identity label noise. Considering cluster assignments as pseudo labels, we must also overcome the labeling noise from clustering errors. We propose a modulation of the cosine loss, where the modulation weights correspond to an estimate of clustering uncertainty. Extensive experiments on both controlled and real settings demonstrate our method’s consistent improvements over supervised baselines, e.g., 11.6% improvement on IJB A verification.