VoV-DB: A Little Database of Voice-over-Voice Material

Voice-over-Voice (VoV) Audio in TV and Streaming

Voice-over-Voice (VoV) is a common mixing practice observed in news reports and documentaries, where a foreground voice is mixed on top of a background voice, e.g., to translate an interview. This is achieved by ducking the background voice so that the foreground voice is more intelligible, while still allowing the listener to perceive the presence and tone of the background voice. Currently, there is little published research on ducking practices for VoV and on technical details such as the Loudness Difference (LD) between foreground and background speech.

In order to investigate the ducking practices for this type of content, the VoV-DB was created and it is made available here. The material and the experiments carried out using the VoV-DB are detailed in our paper:

  1. David Geary, Matteo Torcoli, Jouni Paulus, Christian Simon, Davide Straninger, Alessandro Travaglini, and Ben Shirley
    Loudness Differences for Voice-Over-Voice Audio in TV and Streaming
    Journal of the Audio Engineering Society, 68(11): 810–818, 2020. DOI
    @ARTICLE{Geary2020,
    author={David Geary and Matteo Torcoli and Jouni Paulus and Christian Simon and Davide Straninger and Alessandro Travaglini and Ben Shirley},
    journal={Journal of the Audio Engineering Society},
    title={{Loudness Differences for Voice-Over-Voice Audio in TV and Streaming}},
    year={2020},
    volume={68},
    number={11},
    pages={810--818},
    doi={10.17743/jaes.2020.0022},
    month={Nov.},
    }

Please cite this paper, if you find the VoV-DB useful.

VoV-DB

The VoV-DB is distributed under a Creative Commons Attribution-NonCommercial 3.0 Unported License and it can be downloaded here:

This package contains the LICENSE.txt file and 3 .mov files corresponding to 3 different recording locations.

Each .mov file consists of an audio-video recording of a native English speaker plus an additional audio stream containing the German Voice-Over translating the English content. The audio streams are stereo PCM at 48 kHz (24 bits).

Transcripts of the speech are also provided in the subfolder transcripts.

Contacts

We appreciate any feedback and contribution. Correspondence should be addressed to matteo.torcoli@iis.fraunhofer.de.