Entries by NEC Labs America

Learning to Learn across Diverse Data Biases in Deep Face Recognition

Convolutional Neural Networks have achieved remarkable success in face recognition, in part due to the abundant availability of data. However, the data used for training CNNs is often imbalanced. Prior works largely focus on the long-tailed nature of face datasets in data volume per identity or focus on single bias variation. In this paper, we show that many bias variations such as ethnicity, head pose, occlusion and blur can jointly affect the accuracy significantly. We propose a sample level weighting approach termed Multi-variation Cosine Margin (MvCoM), to simultaneously consider the multiple variation factors, which orthogonally enhances the face recognition losses to incorporate the importance of training samples. Further, we leverage a learning to learn approach, guided by a held-out meta learning set and use an additive modeling to predict the MvCoM. Extensive experiments on challenging face recognition benchmarks demonstrate the advantages of our method in jointly handling imbalances due to multiple variations.

Controllable Dynamic Multi-Task Architectures

Multi-task learning commonly encounters competition for resources among tasks, specifically when model capacity is limited. This challenge motivates models which allow control over the relative importance of tasks and total compute cost during inference time. In this work, we propose such a controllable multi-task network that dynamically adjusts its architecture and weights to match the desired task preference as well as the resource constraints. In contrast to the existing dynamic multi-task approaches that adjust only the weights within a fixed architecture, our approach affords the flexibility to dynamically control the total computational cost and match the user-preferred task importance better. We propose a disentangled training of two hyper networks, by exploiting task affinity and a novel branching regularized loss, to take input preferences and accordingly predict tree-structured models with adapted weights. Experiments on three multi-task benchmarks, namely PASCAL-Context, NYU-v2, and CIFAR-100, show the efficacy of our approach. Project page is available at https://www.nec-labs.com/-mas/DYMU.

Distributed Fiber Optic Sensors Placement for Infrastructure-as-a-Sensor

Recently, the distributed fiber optic sensing (DFOS) techniques have advanced rapidly. There emerges various types of DFOS sensors that can monitor physical parameters such as temperature, strain, and vibration. With these DFOS sensors deployed, the telecom networks are capable of offering additional services beyond communications, such as monitoring road traffic condition, monitoring utility pole health, monitoring city noise and accident, thus evolving to a new paradigm of Infrastructure-as-a-Sensor (IaaSr) or Network-as-a-Sensor (NaaSr). When telecom network carriers upgrade their infrastructures with DFOS sensors to provide such IaaSr/NaaSr services, there will arise a series of critical challenges: (1) where to place the DFOS sensors, and (2) how to provision the DFOS sensing fiber routes to cover the whole network infrastructures with the minimum number of DFOS sensors? We name this as the DFOS placement problem. In this paper, we prove that the DFOS placement problem is an NP-hard problem, and we analyze the upper bound of the number of DFOS sensors used. To facilitate the optimal solution, we formulate the DFOS placement problem with an Integer Linear Programming model that aims at minimizing the number of DFOS sensors used. Furthermore, we propose a cost-efficient heuristic solution, called Explore-and-Pick (EnP), which can achieve a close-to-optimal performance in a fast manner. We analyze the approximation ratio and the computational complexity of the proposed EnP algorithm. In addition, we conduct comprehensive simulations to evaluate the performance of the proposed solutions. Simulation results show that the EnP algorithm can outperform the baseline algorithm by 16% in average and 26% at best, and it achieves a performance that is close to the optimal result obtained by ILP.

SEED: Sound Event Early Detection via Evidential Uncertainty

Sound Event Early Detection (SEED) is an essential task in recognizing the acoustic environments and soundscapes. However, most of the existing methods focus on the offline sound event detection, which suffers from the over-confidence issue of early-stage event detection and usually yield unreliable results. To solve the problem, we propose a novel Polyphonic Evidential Neural Network (PENet) to model the evidential uncertainty of the class probability with Beta distribution. Specifically, we use a Beta distribution to model the distribution of class probabilities, and the evidential uncertainty enriches uncertainty representation with evidence information, which plays a central role in reliable prediction. To further improve the event detection performance, we design the backtrack inference method that utilizes both the forward and backward audio features of an ongoing event. Experiments on the DESED database show that the proposed method can simultaneously improve 13.0% and 3.8% in time delay and detection F1 score compared to the state-of-the-art methods.

Fast Few-shot Debugging for NLU Test Suites

We study few-shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem. Given a few debugging examples of a certain phenomenon, and a held-out test set of the same phenomenon, we aim to maximize accuracy on the phenomenon at a minimal cost of accuracy on the original test set. We examine several methods that are faster than full epoch retraining. We introduce a new fast method, which samples a few in-danger examples from the original training set. Compared to fast methods using parameter distance constraints or Kullback-Leibler divergence, we achieve superior original accuracy for comparable debugging accuracy.

Codebook Design for Hybrid Beamforming in 5G Systems

Massive MIMO and hybrid beamforming are among the key physical layer technologies for the next generation wireless systems. In the last stage of the hybrid beamforming, the goal is to generate sharp beam with maximal and preferably uniform gain. We highlight the shortcomings of uniform linear arrays (ULAs) in generating such perfect beams, i.e., beams with maximal uniform gain and sharp edges, and propose a solution based on a novel antenna configuration, namely, twin-ULA (TULA). Consequently, we propose two antenna configurations based on TULA: Delta and Star. We pose the problem of finding the beamforming coefficients as a continuous optimization problem for which we find the analytical closed-form solution by a quantization/aggregation method. Thanks to the derived closed-form solution the beamforming coefficients can be easily obtained with low complexity. Through numerical analysis, we illustrate the effectiveness of the proposed antenna structure and beamforming algorithm to reach close-to-perfect beams.

Superclass-Conditional Gaussian Mixture Model for Coarse-To-Fine Few-Shot Learning

Learning fine-grained embeddings is essential for extending the generalizability of models pre-trained on “coarse” labels (e.g., animals). It is crucial to fields for which fine-grained labeling (e.g., breeds of animals) is expensive, but fine-grained prediction is desirable, such as medicine. The dilemma necessitates adaptation of a “coarsely” pre-trained model to new tasks with a few “finer-grained” training labels. However, coarsely supervised pre-training tends to suppress intra-class variation, which is vital for cross-granularity adaptation. In this paper, we develop a training framework underlain by a novel superclass-conditional Gaussian mixture model (SCGM). SCGM imitates the generative process of samples from hierarchies of classes through latent variable modeling of the fine-grained subclasses. The framework is agnostic to the encoders and only adds a few distribution related parameters, thus is efficient, and flexible to different domains. The model parameters are learned end-to-end by maximum-likelihood estimation via a principled Expectation-Maximization algorithm. Extensive experiments on benchmark datasets and a real-life medical dataset indicate the effectiveness of our method.

Provable Adaptation Across Multiway Domains via Representation Learning

This paper studies zero-shot domain adaptation where each domain is indexed on a multi-dimensional array, and we only have data from a small subset of domains. Our goal is to produce predictors that perform well on unseen domains. We propose a model which consists of a domain-invariant latent representation layer and a domain-specific linear prediction layer with a low-rank tensor structure. Theoretically, we present explicit sample complexity bounds to characterize the prediction error on unseen domains in terms of the number of domains with training data and the number of data per domain. To our knowledge, this is the first finite-sample guarantee for zero-shot domain adaptation. In addition, we provide experiments on two-way MNIST and four-way fiber sensing datasets to demonstrate the effectiveness of our proposed model.