Voice Privacy Challenge 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories

Ünal Ege Gaznepoğlu, Anna Leschanowsky, and Nils Peters

Presented as selected papers from the 2nd VoicePrivacy Challenge at the 2022 ISCA SPSC Symposium on Security and Privacy in Speech Communication, Sep 23-24 2022. The 2022 ISCA SPSC Symposium is a satellite event of Interspeech 2022.

Abstract

We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variant. Among the known deficiencies of X-vector-based anonymization systems is the insufficient disentangling of the input features. In particular, the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modifications. Especially in cross-gender conversion, this situation causes unnatural sounding voices, increases word error rates (WERs), and personal information leakage.

Our submission overcomes this problem by synthesizing an F0 trajectory, which better harmonizes with the anonymized x-vector. We utilize a low-complexity deep neural network to estimate an appropriate F0 value per frame, using the linguistic content from the bottleneck features (BN) and the anonymized X-vector.

Our approach results in a significantly improved anonymization system and increased naturalness of the synthesized voice. Consequently, our results suggest that F0 extraction is not required for voice anonymization.

system

Audio Anonymization Examples

Same-Gender Anonymization

Cross-Gender Anonymization

Paper (click to enlarge)

cover

@techreport{GaznepogluVPC22,
author = {Ünal Ege Gaznepoğlu, Anna Leschanowsky, Nils Peters},
institution = {VoicePrivacy Challenge 2022},
title = {VoicePrivacy 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories},
language = {en},
keywords = {voice privacy, neural networks, F0},
year = {2022}}