AudioLabs - Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation

Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation

Martin Strauss, Matteo Torcoli and Bernd Edler

presented at SLT, 2022

Examples

Listening test items. The test data is the dataset published by Valentini et al. [1]. The comparing methods are MetricGAN+ [2] and CDiffuSE [3].

ID	Gender	Noise	Test input SNR [dB]	Website
p232_003	male	Bus	7.5	Link
p232_219	male	Cafe	7.5	Link
p232_005	male	Bus	2.5	Link
p232_032	male	Cafe	2.5	Link
p257_186	female	Bus	7.5	Link
p257_049	female	Cafe	7.5	Link
p257_367	female	Bus	2.5	Link
p257_251	female	Cafe	2.5	Link

References

[1] C. Valentini-Botinhao, X. Wang, S. Takaki, and J. Yamagishi, “Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks,” in Proceedings Interspeech Conference, 2016, pp. 352–356.

[2] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, Y. Tsao, "MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement", in Proceedings Interspeech Conference, 2021, pp. 201–205.

[3] Y.-J. Lu, Z.-Q. Wang, S. Watanabe, A. Richard, C. Yu, and Y. Tsao, "CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT", in Proceedings ICASSP, 2022, pp. 7402-7406

International Audio Laboratories Erlangen

Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation

Examples

References