This is the accompanying website for the ICASSP submission Fine-Tuning BigVGAN-V2 for Robust Musical Tuning Preservation by Hans-Ulrich Berendes, Ben Maman, Meinard Müller.
| Recent work has shown that neural vocoders can be prone to musical tuning bias, shifting the pitch reference of input signals, and reducing audio quality in cases of non-standard tuning. In this work, we investigate methods to mitigate this issue by fine-tuning, using the popular BigVGAN-V2 vocoder as a case study. Our goal is to make the vocoder reliable across a wide range of tunings and to examine whether reducing tuning bias also leads to more consistent audio quality. To this end, we explore two training approaches: one based on recordings with natural tuning variation, and another using data created through pitch-shift augmentation to replicate tuning diversity. The results demonstrate that targeted adaptation of BigVGAN-V2 mitigates tuning bias and improves audio quality for signals with non-standard tuning. In particular, our fine-tuned model with 80 mel bands matches the tuning preservation of the higher-resolution 128-band version, providing an efficient and robust solution for music synthesis across diverse tuning conditions. |
In the following we provide sound examples reconstructed with the investigated vocoders. BV2-80 and BV2-128 are the pretrained BigVGAN-V2 models with 80 and 128 mel bands, respectively. BV2-80-Norm, BV2-80-Unif, and BV2-80-Unif-PS are our fine-tuned variants of the BV2-80 model. For more details, please refer to the paper. The reference signal is synthesized with the physical piano model Pinaoteq, which allows us to directly specify the tuning of the reference signal.
For a more direct comparison of single models across tuning, please visit the demo page of our previous work.
Here, we present the items which we used in our listening test. As opposed to the examples above, the items are real recordings with a given tuning that we do not modify. The items are ordered by absolute tuning deviation from 440 Hz.