Explainable Acoustic Scene Classification: Making Decisions Audible

H. Nazim Bicer, Philipp Götz, Cagdas Tuna, and Emanuël A. P. Habets

Submitted to IWAENC-2022

Abstract

This study presents a sonification method that provides ``audible explanations" to improve the transparency of the decision-making processes of convolutional neural networks designed for acoustic scene classification (ASC). First, a deep neural network (DNN) based on the ResNet architecture is proposed. Secondly, Grad-CAM and guided backpropagation images are computed for a given input signal. These images are then used to produce frequency-selective filters that retain signal components in the input that contribute to the decision of the trained DNN. The test results demonstrate that the proposed model outperforms two baseline models. The reconstructed audio waveform is interpretable by the human ear, serving as a valuable tool to examine and possibly improve ASC models.

Sonification Examples

Below are five listening examples from different acoustic scenes. For each scene, the input signal and the two sonification outputs, based on Grad-CAM and guided backpropagation, are presented.

Acoustic scene: Beach

42_original_beach


Acoustic scene: Forest path

56_original_forestpath


Acoustic scene: Car

250_original_car

Acoustic scene: Library

260_original_library


Acoustic scene: Bus

311_original_bus