Yoshiyuki Yajima works at NEC Corporation.

Posts

Sound Event Classification meets Data Assimilation with Distributed Fiber-Optic Sensing

Distributed Fiber-Optic Sensing (DFOS) is a promising technique for large-scale acoustic monitoring. However, its wide variation in installation environments and sensor characteristics causes spatial heterogeneity. This heterogeneity makes it difficult to collect representative training data. It also degrades the generalization ability of learning-based models, such as fine-tuning methods, under a limited amount of training data. To address this, we formulate Sound Event Classification (SEC) as data assimilation in an embedding space. Instead of training models, we infer sound event classes by combining pretrained audio embeddings with simulated DFOS signals. Simulated DFOS signals are generated by applying various frequency responses and noise patterns to microphone data, which allows for diverse prior modeling of DFOS conditions. Our method achieves out-of-domain (OOD) robust classification without requiring model training. The proposed method achieved accuracy improvements of 6.42, 14.11, and 3.47 percentage points compared with conventional zero-shot and two types of fine-tune methods, respectively. By employing the simulator in the framework of data assimilation, the proposed method also enables precise estimation of physical parameters from observed DFOS signals.

Trainingless Adaptation of Pretrained Models for Environmental Sound Classification

Deep neural network (DNN)-based models for environmental sound classification are not robust against a domain to which training data do not belong, that is, out-of-distribution or unseen data. To utilize pretrained models for the unseen domain, adaptation methods, such as finetuning and transfer learning, are used with rich computing resources, e.g., the graphical processing unit (GPU). However, it is becoming more difficult to keep up with research trends for those who have poor computing resources because state-of-the-art models are becoming computationally resource-intensive. In this paper, we propose a trainingless adaptation method for pretrained models for environmental sound classification. To introduce the trainingless adaptation method, we first propose an operation of recovering time–frequency-ish (TF-ish) structures in intermediate layers of DNN models. We then propose the trainingless frequency filtering method for domain adaptation, which is not a gradient-based optimization widely used. The experiments conducted using the ESC-50 dataset show that the proposed adaptation method improves the classification accuracy by 20.40 percentage points compared with the conventional method.