Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators

Sebastian Braun, Adam Kuklasinski, Ofer Schwartz, Oliver Thiergart, Emanuel A. P. Habets, Sharon Gannot, Simon Doclo and Jesper Jensen

IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018.

Abstract

Reduction of late reverberation can be achieved using spatio-spectral filters such as the multichannel Wiener filter (MWF). To compute this filter, an estimate of the late reverberation power spectral density (PSD) is required. In recent years, a multitude of late reverberation PSD estimators have been proposed. In this contribution, these estimators are categorized into several classes, their relations and differences are discussed, and a comprehensive experimental comparison is provided. To compare their performance, simulations in controlled as well as practical scenarios are conducted. It is shown that a common weakness of spatial coherence-based estimators is their performance in high direct-to-diffuse ratio (DDR) conditions. To mitigate this problem, a correction method is proposed and evaluated. It is shown that the proposed correction method can decrease the speech distortion without significantly affecting the reverberation reduction.

Audio Examples

Acoustic setup:

  • Uniform circular array of 6 omnidirectional microphones with radius 4.5 cm
  • Measured room impulse responses in large conference room with T60 = 800 ms
  • SNR to additive pink noise of 15 dB

Description:

  • A multichannel Wiener filter (MWF) is used to extract the direct sound while suppressing late reverberation and noise.
  • The source position and noise PSD matrix are known.
  • The PSD of the late reverberation is estimated using various estimators.
  • Switch between the various processed and unprocessed audio files to compare how the MWF sounds using the different PSD estimators.

Note: Please use Google Chrome if you experience playback problems.

Comparison between all diffuse PSD estimators (source distance 3.5 m):

Comparison between selected estimators without and with bias compensation (source distance 2.5 m):