Direction and Reverberation Preserving Noise Reduction of Ambisonics Signals

Adrian Herzog and Emanuël A. P. Habets

Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing.


Ambisonics encodes the directional information of a sound field w.r.t. a reference position in an efficient and scalable way. However, the sound field might contain undesired noise. Reducing the noise while preserving the directional distribution of all sound field components is a challenging task. Recently, the present authors proposed a direction-preserving noise reduction method for higher-order Ambisonics (HOA) signals, which, in contrast to e.g. binaural beamforming methods, yields a HOA signal at the output. In this work, we investigate the direction-preserving noise reduction method further and compare it against a beamforming-based method and the matrix multi-channel Wiener filter. Different methods to estimate the power spectral densities which are needed for the noise reduction methods are discussed. Moreover, a method to preserve the reverberation of the desired signal is proposed. In the evaluation, the discussed methods are compared for different speech sources in anechoic and reverberant conditions and different noise types.


Binauralized HOA signals with 1 plane-wave source (speech) and noise with SNR=6dB.
Third-order Ambisonics were used for the processing. The lower bound of the direction preserving noise reduction was set to -20 dB. For the matched mixing, the noisy signal was mixed in partially such that the noise reduction matches with the direction preserving method.

Files were generated using the SPARTA AmbiBIN VST plugin [1] with default HRIRs.
Playback with headphones is recommended.

Female speech and diffuse white noise.
Male speech and babble noise.
Female speech and directional noise in simulated room with 600 ms reverberation time.

Note: The matrix PMWF requires a 16x16 matrix inversion per time-frequency bin and is thus computationally much more expensive than the other noise reduction methods.

Male speech and babble noise in simulated room with 600 ms reverberation time.

Note: The same spatial coherence matrix is assumed for the late reverberation and the babble noise. Hence, the reverberation model is required to distinguish noise and reverberation.

Male speech and ambient noise in lecture room with 1.25 s reverberation time with impulse responses and noise from ACE Corpus [2].


[1] L. McCormack and A. Politis - SPARTA and COMPASS: Real-time implementations of linear and parametric spatial audio reproduction and processing methods, AES Conf. Immersive and Interactive Audio, York, UK, March 2019. [2] J. Eaton, N. D. Gabuich, A. H. Moore and P. Naylor - The ACE challenge - corpus descriotion and performance evaluation, in Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), New Paltz, NY, USA, Oct. 2015.