At the Speed of Sound: Efficient Audio Scene Classification
Publication Date: 6/11/2020
Event: The Annual ACM International Conference on Multimedia Retrieval (ICMR 2020)
Reference: pp. 301-305, 2020
Authors: Bo Dong, NEC Laboratories America, Inc.; University of Texas at Dallas; Cristian Lumezanu, NEC Laboratories America, Inc.; Yuncong Chen, NEC Laboratories America, Inc.; Dongjin Song, NEC Laboratories America, Inc.; Takehiko Mizoguchi, NEC Laboratories America, Inc.; Haifeng Chen, NEC Laboratories America, Inc.; Latifur Khan, University of Texas at Dallas
Abstract: Efficient audio scene classification is essential for smart sensing platforms such as robots, medical monitoring, surveillance, or autonomous vehicles. We propose a retrieval-based scene classification architecture that combines recurrent neural networks and attention to compute embeddings for short audio segments. We train our framework using a custom audio loss function that captures both the relevance of audio segments within a scene and that of sound events within a segment. Using experiments on real audio scenes, we show that we can discriminate audio scenes with high accuracy after listening in for less than a second. This preserves 93% of the detection accuracy obtained after hearing the entire scene.
Publication Link: https://dl.acm.org/doi/10.1145/3372278.3390730