AudioLabs - A Differentiable Cost Measure for Intonation Processing in Polyphonic Music

A Differentiable Cost Measure for Intonation Processing in Polyphonic Music

This is the accompanying website to the article [1].

Simon Schwär, Sebastian Rosenzweig, and Meinard Müller
A Differentiable Cost Measure for Intonation Processing in Polyphonic Music
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 626–633, 2021. PDF Demo

@inproceedings{SchwaerRM_IntonationCost_ISMIR,
author    = {Simon Schw{\"a}r and Sebastian Rosenzweig and Meinard M{\"u}ller},
title     = {A Differentiable Cost Measure for Intonation Processing in Polyphonic Music},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
pages     = {626--633},
address   = {online},
year      = {2021},
url-demo  = {https://www.audiolabs-erlangen.de/resources/MIR/2021-ISMIR-IntonationCostMeasure},
url-pdf   = {https://archives.ismir.net/ismir2021/paper/000078.pdf}
}

Proof of differentiability
Code (external link)

Abstract

Intonation is the process of choosing an appropriate pitch for a given note in a musical performance. Particularly in polyphonic singing, where all musicians can continuously adapt their pitch, this leads to complex interactions. To achieve an overall balanced sound, the musicians dynamically adjust their intonation considering musical, perceptual, and acoustical aspects. When adapting the intonation in a recorded performance, a sound engineer may have to individually fine-tune the pitches of all voices to account for these aspects in a similar way. In this paper, we formulate intonation adaptation as a cost minimization problem. As our main contribution, we introduce a differentiable cost measure by adapting and combining existing principles for measuring intonation. In particular, our measure consists of two terms, representing a tonal aspect (the proximity to a tonal grid) and a harmonic aspect (the perceptual dissonance between salient frequencies). We show that, combining these two aspects, our measure can be used to flexibly account for different artistic intents while allowing for robust and joint processing of multiple voices in real-time. In an experiment, we demonstrate the potential of our approach for the task of intonation adaptation of amateur choral music using recordings from a publicly available multitrack dataset.

Audio Examples

Tonal cost for synthetic example	Harmonic cost for synthetic example

Synthetic Example

Vocal Quartet Recording (Bars 45-48)

Vocal Quartet Recording (Full Recording)

Acknowledgments

This project is supported by the German Research Foundation (DFG MU 2686/12-1, MU 2686/13-1). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.