Neural Direction Filtering

Weilong Huang, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, and Emanuel A. P. Habets

Abstract

Spatial filtering with desired directivity patterns using compact microphone arrays is essential in many audio applications. Directivity patterns achievable using traditional beamformers depend on the number of microphones and the array aperture. Generally, their effectiveness degrades for compact arrays. To overcome these limitations, we propose a neural directional filtering (NDF) approach that leverages deep neural networks to enable sound capture with a predefined directivity pattern. The NDF computes a single-channel complex mask from the microphone array signals, which is then applied to a reference microphone to produce an output that approximates a virtual directional microphone with the desired directivity pattern. We introduce training strategies and propose data-dependent metrics to evaluate the directivity pattern and directivity factor. We show that the proposed method: i) achieves a frequency-invariant directivity pattern by leveraging low-frequency bands to avoid spatial aliasing, ii) learns diverse and higher-order patterns, iii) can be steered in different directions, and iv) generalizes to unseen conditions. Lastly, experimental comparisons demonstrate superior performance over conventional beamforming and parametric approaches.

Application for Interference Suppression

MovingSource_cropped-1

Figure 1: Simulated two-source scenario with a static target and a moving interferer. The source at 0 degree was static, while the moving source completes a full circle around the array in the clockwise direction.

In this experiment, the environment was simulated using an RIR generator, and the specific scenario was described in Figure 1. The source-array distance was 1.5 m. The simulated room was 5 m x 4 m x 3.5 m and has an RT60 of 0.15 seconds. The NDF model trained with speech signals and a static 1st-order cardioid pattern pointing to 0 degree was used for demonstration. The moving source completed one full rotation around the array in approximately 18 seconds at a constant speed. Below are the audio results for this simulation.

The interference is speech signals which is not included in the training of the NDF

The interference is music signals which is unseen in the training of the NDF

Application for Stereo Audio Recording

stereo_system

Figure 2: Real recording scenario for stereo audio recording. One active speaker is moving from 0 degrees to 180 degrees with a fixed distance of 1.4 m. For steerable NDF, theta_L and theta_R stand for two different steering directions of the steerable NDF. The directivity pattern learned by NDF is 1st-order cardioid pattern.

We enacted the acoustic scene in a real room 4.6 m x 4.5 m x 2.6 m with RT60 = 0.23 s

theta_L = 45 degrees and theta_R = 315 degrees

To better experience the binaural effect, please use a headphone for listening to the audio results below

theta_L = 60 degrees and theta_R = 300 degrees

To better experience the binaural effect, please use a headphone for listening to the audio results below

Application for tracking moving sources

Initially, there are two sources in the same position. One of the sources will start moving along the circle, and the steerable NDF will track the moving source. This scenario is simulated in an anechoic environment. The source-array distance is 1.5 m.