AudioLabs - Fine-Tuning BigVGAN-V2 for Robust Musical Tuning Preservation

Fine-Tuning BigVGAN-V2 for Robust Musical Tuning Preservation

This is the accompanying website for the ICASSP submission Fine-Tuning BigVGAN-V2 for Robust Musical Tuning Preservation by Hans-Ulrich Berendes, Ben Maman, Meinard Müller.

Abstract

Recent work has shown that neural vocoders can be prone to musical tuning bias, shifting the pitch reference of input signals, and reducing audio quality in cases of non-standard tuning. In this work, we investigate methods to mitigate this issue by fine-tuning, using the popular BigVGAN-V2 vocoder as a case study. Our goal is to make the vocoder reliable across a wide range of tunings and to examine whether reducing tuning bias also leads to more consistent audio quality. To this end, we explore two training approaches: one based on recordings with natural tuning variation, and another using data created through pitch-shift augmentation to replicate tuning diversity. The results demonstrate that targeted adaptation of BigVGAN-V2 mitigates tuning bias and improves audio quality for signals with non-standard tuning. In particular, our fine-tuned model with 80 mel bands matches the tuning preservation of the higher-resolution 128-band version, providing an efficient and robust solution for music synthesis across diverse tuning conditions.

Sound examples of synthetic piano

In the following we provide sound examples reconstructed with the investigated vocoders. BV2-80 and BV2-128 are the pretrained BigVGAN-V2 models with 80 and 128 mel bands, respectively. BV2-80-Norm, BV2-80-Unif, and BV2-80-Unif-PS are our fine-tuned variants of the BV2-80 model. For more details, please refer to the paper. The reference signal is synthesized with the physical piano model Pinaoteq, which allows us to directly specify the tuning of the reference signal.

For a more direct comparison of single models across tuning, please visit the demo page of our previous work.

Beethoven Op014 No2-01, 0 Cents (440 Hz)

Beethoven Op014 No2-01, -40 Cents (429.95 Hz)

Beethoven Op014 No2-01, -20 Cents (434.95 Hz)

Beethoven Op028-01, 0 Cents (440 Hz)

Beethoven Op028-01, -40 Cents (429.95 Hz)

Beethoven Op028-01, -20 Cents (434.95 Hz)

Listening Test Sound Examples

Here, we present the items which we used in our listening test. As opposed to the examples above, the items are real recordings with a given tuning that we do not modify. The items are ordered by absolute tuning deviation from 440 Hz.

International Audio Laboratories Erlangen

Fine-Tuning BigVGAN-V2 for Robust Musical Tuning Preservation

Abstract

Sound examples of synthetic piano

Beethoven Op014 No2-01, 0 Cents (440 Hz)

Beethoven Op014 No2-01, -40 Cents (429.95 Hz)

Beethoven Op014 No2-01, -20 Cents (434.95 Hz)

Beethoven Op028-01, 0 Cents (440 Hz)

Beethoven Op028-01, -40 Cents (429.95 Hz)

Beethoven Op028-01, -20 Cents (434.95 Hz)

Listening Test Sound Examples

Item 1, Tuning +1 Cent

Item 2, Tuning +2 Cents

Item 3, Tuning +19 Cents

Item 4, Tuning -19 Cents

Item 5, Tuning +20 Cents

Item 6, Tuning -25 Cents

Item 7, Tuning 39 Cents

Item 8, Tuning -39 Cents

Item 9, Tuning 42 Cents

Item 10, Tuning -43 Cents