Text-guided Device-realistic Sound Generation for Fiber-based Sound Event Classification
Recent advancements in unique acoustic sensing devices and large-scale audio recognition models have unlocked new possibilities for environmental sound monitoring and detection. However, applying pretrained models to non-conventional acoustic sensors results in performance degradation due to domain shifts, caused by differences in frequency response and noise characteristics from the original training data. In this study, we introduce a text-guided framework for generating new datasets to retrain models specifically for these non-conventional sensors efficiently. Our approach integrates text-conditional audio generative models with two additional steps: (1) selecting audio samples based on text input to match the desired sounds, and (2) applying domain transfer techniques using recorded impulse responses and background noise to simulate the characteristics of the sensors. We demonstrate this process by generating emulated signals for fiber-optic Distributed Acoustic Sensors (DAS), creating datasets similar to the recorded ESC-50 dataset. The generated signals are then used to train a classifier, which outperforms few-shot learning approaches in environmental sound classification.