Weilong Huang, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, and Emanuel A. P. Habets
Abstract
Spatial filtering with desired directivity patterns using compact microphone arrays is essential in many audio applications. Directivity patterns achievable using traditional beamformers depend on the number of microphones and the array aperture. Generally, their effectiveness degrades for compact arrays. To overcome these limitations, we propose a neural directional filtering (NDF) approach that leverages deep neural networks to enable sound capture with a predefined directivity pattern. The NDF computes a single-channel complex mask from the microphone array signals, which is then applied to a reference microphone to produce an output that approximates a virtual directional microphone with the desired directivity pattern. We introduce training strategies and propose data-dependent metrics to evaluate the directivity pattern and directivity factor. We show that the proposed method: i) achieves a frequency-invariant directivity pattern by leveraging low-frequency bands to avoid spatial aliasing, ii) learns diverse and higher-order patterns, iii) can be steered in different directions, and iv) generalizes to unseen conditions. Lastly, experimental comparisons demonstrate superior performance over conventional beamforming and parametric approaches.
Application for Interference Suppression
Figure 1: Simulated two-source scenario with a static target and a moving interferer. The source at 0 degree was static, while the moving source completes a full circle around the array in the clockwise direction.
In this experiment, the environment was simulated using an RIR generator, and the specific scenario was described in Figure 1. The source-array distance was 1.5 m. The simulated room was 5 m x 4 m x 3.5 m and has an RT60 of 0.15 seconds. The NDF model trained with speech signals and a static 1st-order cardioid pattern pointing to 0 degree was used for demonstration. The moving source completed one full rotation around the array in approximately 18 seconds at a constant speed. Below are the audio results for this simulation.
Application for Stereo Audio Recording
Figure 2: Real recording scenario for stereo audio recording. One active speaker is moving from 0 degrees to 180 degrees with a fixed distance of 1.4 m. For steerable NDF, theta_L and theta_R stand for two different steering directions of the steerable NDF. The directivity pattern learned by NDF is 1st-order cardioid pattern.
We enacted the acoustic scene in a real room 4.6 m x 4.5 m x 2.6 m with RT60 = 0.23 s
Application for tracking moving sources
Initially, there are two sources in the same position. One of the sources will start moving along the circle, and the steerable NDF will track the moving source. This scenario is simulated in an anechoic environment. The source-array distance is 1.5 m.