A Data-Driven Approach to Audio Decorrelation

Carlotta Anemüller, Oliver Thiergart, and Emanuël A. P. Habets

Published in IEEE Signal Processing Letters

Abstract

The degree of correlation between two audio signals entering the ears is known to have a significant impact on the spatial perception of a sound image. Audio signal decorrelation is therefore a widely used tool in various applications within the field of spatial audio processing. This paper explores for the first time the use of a data-driven approach for audio decorrelation. We propose a convolutional neural network architecture that is trained with the help of a state-of-the-art reference decorrelator. The proposed approach is evaluated using music and applause signals by means of objective evaluations as well as through a listening test. The proposed approach can serve as a proof of concept to address common limitations of existing decorrelation techniques in future work, which include introduction of temporal smearing and coloration artifacts and the production of a limited number of mutually uncorrelated output signals.

Audio Examples

The following examples are a subset of the items included in the listening test. The original audio files all originate from the evaluation subset of the FSD50K dataset [1].

  • Original: unprocessed single-channel input signal (not used in the listening test)
  • Reference method: output of the state-of-the-art reference decorrelator [2], w.r.t. which the proposed method was trained
  • Proposed method: output of the proposed method

Guitar1

Piano2

Dense applause3

Sparse applause4

References

[1] E. Fonseca, X. Favory, J. Pons, F. Font, and X. Serra, “FSD50K: an open dataset of human-labeled sound events,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 829–852, 2022, publisher: IEEE.

[2] ISO 23090-4, “MPEG-I Immersive Audio,” WD0, 2022.


1 Item 121007 of the FSD50K evaluation subset, uploader: Thirsk, license: https://creativecommons.org/licenses/by/3.0/.

2 Item 319585 of the FSD50K evaluation subset, uploader: visual, license: https://creativecommons.org/licenses/by/3.0/.

3 Item 1923 of the FSD50K evaluation subset, uploader: RHumphries, license: https://creativecommons.org/licenses/by/3.0/.

4 Item 395414 of the FSD50K evaluation subset, uploader: debsound, license: http://creativecommons.org/licenses/by-nc/3.0/.