Weilong Huang, Srikanth Raj Chetupalli, and Emanuel A. P. Habets
Abstract
Spatial filtering with a desired directivity pattern is advantageous for many audio applications. In this work, we propose neural directional filtering with user-defined directivity patterns (UNDF), which enables spatial filtering based on directivity patterns that users can define during inference. To achieve this, we propose a DNN architecture that integrates feature-wise linear modulation (FiLM), allowing user-defined patterns to serve as conditioning inputs. Through analysis, we demonstrate that the FiLM-based architecture enables the UNDF to generalize to unseen user-defined patterns during interference with higher directivities, scaling variations, and different steering directions. Furthermore, we progressively refine training strategies to enhance pattern approximation and enable UNDF to approximate irregular shapes. Lastly, experimental comparisons show that UNDF outperforms conventional methods.
Setup
We simulated sound sources in a reverberant room, where all the sound sources had a distance of 1.5 m from the array center. The simulated room was 6 m x 4 m x 3.5 m and had an RT60 of 0.15 s. The maximum suppression of input patterns was set to -20dB for the UNDF. The UNDF model was trained only using speech signals. All the experiments below lasted about 20 s.
Experiment 1
Figure 1: Pattern inputs for three distinct periods. The red arrow indicates a music source at 230 degrees. The black arrow indicates a speech source at 60 degrees.
Experiment 2
Figure 1: Pattern inputs for three distinct periods. The red arrow indicates the first speech signal located at 190 degrees. The black arrow indicates the second speech source located at 60 degrees. The green arrow indicates the third speech source located at 280 degrees.
Experiment 3
Figure 1: Pattern inputs for four distinct periods. The red arrow indicates the first speech signal located at 190 degrees. The black arrow indicates the second speech source located at 240 degrees.