Sensory dissonance (SD) quantifies the interference between partials in a mixture of simultaneously sounding tones and correlates with the perceived dissonance or unpleasantness of this mixture. While it is mainly studied in music perception, often using synthetic signals or symbolic inputs, in this paper, we focus on a practical application and investigate SD as a tool for analyzing the interactions between voices in multi-track music recordings. Using visualization and statistical analysis on an existing dataset of four-part chorales recorded with various wind instruments, we examine how timbre, tuning, and score influences SD. To do this, we introduce the notion of relative SD, which quantifies how individual voices in a multi-track recording contribute to overall SD of their polyphonic mixture. In addition to discussing practical aspects of measuring SD between and within real music signals, our case study shows potential benefits and limitations of using SD as an analysis tool in music production, for example, to inform or automate tasks like take selection or equalization.
Larynx microphones (LMs) provide a practical way to obtain crosstalk-free recordings of the human voice by picking up vibrations directly from the throat. This can be useful in a multitude of music information retrieval scenarios related to singing, e.g., the analysis of individual voices recorded in environments with lots of interfering noise. However, LMs have a limited frequency range and barely capture the effects of the vocal tract, which makes the recorded signal unsuitable for downstream tasks that require high-quality recordings. We publish a dataset with over 3.5 hours of popular music we recorded with four amateur singers accompanied by a guitar, where both LM and clean close-up microphone signals are available. This dataset is then used to train a data-driven baseline approach for singing voice reconstruction from LM signals using differentiable signal processing, inspired by a source-filter model that emulates the missing vocal tract effects.
We introduce and compare two methods to adaptively modify the partials of simultaneously sounding synthesized tones to minimize roughness. By changing their amplitude and/or frequency over time, it is possible to dynamically control the timbre of a polyphonic sound in real time. This introduces an additional parameter for sound synthesis that may allow for changing the roughness of a sound without modifying other perceptual attributes of the individual tones, like their fundamental frequency (F0) or loudness. We draw inspiration from choir singers, who may not only dynamically adapt their pitch, but also control their vocal formants (i.e., the prevalence of certain partials) as an additional means to facilitate intonation and voice blending between musicians.
We introduce and compare two methods to adaptively modify the partials of simultaneously sounding synthesized tones to minimize roughness. By changing their amplitude and/or frequency over time, it is possible to dynamically control the timbre of a polyphonic sound in real time. This introduces an additional parameter for sound synthesis that may allow for changing the roughness of a sound without modifying other perceptual attributes of the individual tones, like their fundamental frequency (F0) or loudness. We draw inspiration from choir singers, who may not only dynamically adapt their pitch, but also control their vocal formants (i.e., the prevalence of certain partials) as an additional means to facilitate intonation and voice blending between musicians.
Intonation is the process of choosing an appropriate pitch for a given note in a musical performance. Particularly in polyphonic singing, where all musicians can continuously adapt their pitch, this leads to complex interactions. We formulate intonation adaptation as a cost minimization problem and introduce a differentiable cost measure by adapting and combining existing principles for measuring intonation. In particular, our measure consists of two terms, representing a tonal aspect (the proximity to a tonal grid) and a harmonic aspect (the perceptual dissonance between salient frequencies). Our measure can be used to flexibly account for different artistic intents while allowing for robust and joint processing of multiple voices in real-time, which we for the task of intonation adaptation of amateur choral music using recordings from a publicly available multitrack dataset.