AudioLabs - Computational Analysis of Georgian Vocal Music and Beyond (GVM+)

Computational Analysis of Georgian Vocal Music and Beyond (GVM+)

The aim of the GVM+ project is to develop computational tools for analyzing vocal music by combining traditional model-based and recent data-driven approaches. The project is funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.

Principal investigator: Prof. Dr. Meinard Müller
Programme period: 2023–2026
Grant No. 401198673 (MU 2686/13-2)
Project website of first phase (GVM)

Project Description

Computational Analysis of Georgian Vocal Music and Beyond

In the project's first phase (initial proposal), our main objective was to advance ethnomusicological research focusing on traditional Georgian vocal music by employing computational methods from audio signal processing and music information retrieval (MIR). By developing novel computational tools applied to a concrete music scenario, we explored the potential of computer-assisted methods for reproducible and corpus-driven research within the humanities. Furthermore, by systematically processing and annotating unique collections of field recordings, we contributed to the preservation and dissemination of the rich Georgian musical heritage. In the second phase of the project (renewal proposal), we broaden our perspective and set ourselves new goals. First, we will systematically expand and improve our computational tools for analyzing vocal music by combining traditional model-based and recent data-driven approaches. In particular, we want to achieve substantial progress in notoriously difficult MIR tasks such as estimating multiple fundamental frequencies and analyzing harmonic and melodic intonation aspects in polyphonic singing. To explore the scalability and applicability of our methods, we go beyond traditional Georgian vocal music and consider other corpora of recorded singing, including Western choral music, children's songs, and traditional music from different musical cultures. Another fundamental goal for the project's second phase is to explore the potential of novel contact microphones that overcome some limitations of the previously used headset and larynx microphones. We plan to use sensors to minimize external acoustic noise while offering high sensitivity to body vibrations in a frequency range between a few Hertz and 2200 Hertz. Comprising the fundamental frequency of the vibrations caused by the larynx (as well as several overtones), this extensive frequency range enables the analysis of speech and singing as well as of body vibrations as low as the heartbeat. Such novel technology will lay the basis for generating high-quality training data as required for recent MIR techniques based on deep learning and open new paths for investigating how singers synchronize some of their body functions (e.g., heartbeat variability, respiration) during singing.

Projektbeschreibung

Computergestützte Analyse georgischer Vokalmusik und darüber hinaus

In der ersten Projektphase (Erstantrag) bestand unser Hauptziel darin, die musikethnologische Forschung mit Schwerpunkt auf traditionelle georgische Vokalmusik durch Einsatz rechnergestützter Methoden der Audiosignalverarbeitung und des Music Information Retrieval (MIR) voranzutreiben. Hierbei wurde durch Betrachtung konkreter Anwendungsszenarien das Potential dieser Methoden für reproduzierbare, korpusbasierte Studien innerhalb der Geisteswissenschaften ausgelotet. Weiterhin leisteten wir durch die systematische Aufarbeitung einer einzigartigen multimodalen Sammlung von Feldaufnahmen einen Beitrag zur Bewahrung und Verbreitung des reichen georgischen Musikerbes. In der zweiten Projektphase (Fortsetzungsantrag) erweitern wir unseren Blickwinkel erheblich mit neuen Zielsetzungen. Erstens entwickeln wir unsere computergestützten Analysewerkzeuge weiter, indem wir traditionelle modellbasierte und aktuelle datengetriebene Verfahren kombinieren. Hierbei wollen wir wesentliche Fortschritte bei notorisch schwierigen MIR-Fragestellungen wie der simultanen Schätzung mehrerer Grundfrequenzen und der harmonischen und melodischen Intonationsanalyse für polyphone Gesangsaufnahmen erzielen. Um die Skalierbarkeit und Anwendbarkeit unserer Methoden zu untersuchen, gehen wir über die traditionelle georgische Vokalmusik hinaus und betrachten weitere Korpora von Gesangsaufnahmen, darunter westliche Chormusik, Kinderlieder und traditionelle Musik verschiedener Musikkulturen. Als ein weiteres zentrales Ziel wollen wir neuartige Kontaktmikrofone entwickeln, um hierdurch einige Einschränkungen der zuvor verwendeten Headset- und Kehlkopfmikrofone zu überwinden. Hierbei sollen hochsensitive Körpervibrationssensoren zum Einsatz kommen, durch die eine Erfassung hochfrequenter Kehlkopfschwingungen zur Analyse von Sprache und Gesang (bis zu 2200 Hertz) und eine synchrone Messung tieffrequenter Körperschwingungen wie dem Herzschlag ermöglicht werden (während Außengeräusche keine Störungen verursachen). Diese neuartige Aufnahmetechnologie legt die Grundlage für die Erzeugung qualitativ hochwertiger Trainingsdaten, wie sie für aktuelle MIR-Techniken auf Grundlage von Deep Learning erforderlich sind. Weiterhin wollen wir das Potential dieser Technologien im Kontext komplexer Fragestellungen der musikalischen Interaktion, wie zum Beispiel der Synchronisation von Herzschlagvariabilität und Atmung beim gemeinsamen Singen ausloten.

Projected-Related Resources and Demonstrators

The following list provides an overview of the most important publicly accessible sources created in the GVM+ project:

Invited Talk Digitale Musikverarbeitung: Vom Scheitern Lernen, Workshop: Digitalität & Kulturelle Resilienz: Afghanische Musik in der Diaspora, UNESCO Chair in Digital Culture and Arts in Education, Friedrich-Alexander-Universität Erlangen-Nürnberg, January 9, 2025
A Variational Y-Autoencoder for Disentangling Gesture and Material of Interaction Sounds (AES 2022)

Projected-Related Publications

The following publications reflect the main scientific contributions of the work carried out in the GVM+ project.

Simon Schwär, Michael Krause, Michael Fast, Sebastian Rosenzweig, Frank Scherbaum, and Meinard Müller
A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction
Transaction of the International Society for Music Information Retrieval (TISMIR), 7(1): 30–43, 2024. PDF Demo DOI

@article{SchwaerKRSM_LarynxEnhancement_TISMIR,
author = {Simon Schw{\"a}r and Michael Krause and Michael Fast and Sebastian Rosenzweig and Frank Scherbaum and Meinard M{\"u}ller},
title = {A Dataset of Larynx Microphone Recordings for Singing Voice Reconstruction},
journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
volume = {7},
number = {1},
pages = {30--43},
year = {2024},
publisher = {Ubiquity Press},
doi = {10.5334/tismir.166},
url       = {https://transactions.ismir.net/articles/10.5334/tismir.166},
url-pdf   = {2024_SchwaerKFRSM_LarynxMicSVR_TISMIR_ePrint.pdf},
url-demo = {https://audiolabs-erlangen.de/resources/MIR/LM-SVR/}
}

Hans-Ulrich Berendes, Simon Schwär, and Meinard Müller
Lyrics Transcription in Western Classical Music with Whisper: A Case Study on Schubert's Winterreise
In Workshop on NLP for Music and Audio (NLP4MUSA), 2024. PDF DOI

@inproceedings{BerendesSM24_Whisper_NLP,
author    = {Hans-Ulrich Berendes and Simon Schw{\"a}r and Meinard M{\"u}ller},
title     = {Lyrics Transcription in {W}estern Classical Music with {W}hisper: {A} Case Study on {S}chubert's {W}interreise},
booktitle = {Workshop on {NLP} for Music and Audio ({NLP4MUSA})},
address   = {Oakland, CA, United States},
year      = {2024},
pages     = {},
doi       = {},
url-pdf   = {2024_BerendesSM_LyricsTranscription_NLP4MUSA.pdf}
}

Simon Schwär and Meinard Müller
Multi-Scale Spectral Loss Revisited
IEEE Signal Processing Letters, 30: 1712–1716, 2023. PDF DOI

@article{SchwaerM23_MultiScaleSpecLoss_IEEE-SPL,
author = {Simon Schw{\"a}r and Meinard M{\"u}ller},
title = {Multi-Scale Spectral Loss Revisited},
journal = {{IEEE} Signal Processing Letters},
year={2023},
volume={30},
pages={1712--1716},
doi = {10.1109/LSP.2023.3333205},
url-pdf = {2023_SchwaerM_MultiScaleSpecLoss_IEEE-SPL.pdf}
}

Frank Scherbaum and Meinard Müller
From Intonation Adjustments to Synchronisation of Heart Reat Variability: Singer Interaction in Traditional Georgian Vocal Music
Musicologist, 7(2): 155–177, 2023. PDF DOI

@article{ScherbaumM23_IntHeartRate_Musicologist,
author = {Frank Scherbaum and Meinard M{\"u}ller},
title = {From Intonation Adjustments to Synchronisation of Heart Reat Variability: {S}inger Interaction in Traditional {G}eorgian Vocal Music},
journal = {Musicologist},
year={2023},
volume={7},
number={2},
pages={155--177},
doi = {10.33906/musicologist.1144787},
url-pdf = {2023_ScherbaumM_IntonationHeartRate_Musicologist.pdf}
}

Peter Meier, Simon Schwär, Gerhard Krump, and Meinard Müller
Evaluating Real-Time Pitch Estimation Algorithms for Creative Music Game Interaction
In: INFORMATIK 2023 — Designing Futures: Zukünfte gestalten, Gesellschaft für Informatik e.V.: 873–882, 2023. PDF DOI

@incollection{MeierSKM23_PitchEstimationGames_GI,
author    = {Peter Meier and Simon Schw{\"a}r and Gerhard Krump and Meinard M{\"u}ller},
title     = {Evaluating Real-Time Pitch Estimation Algorithms for Creative Music Game Interaction},
booktitle = {INFORMATIK 2023 -- Designing Futures: Zuk{\"u}nfte gestalten},
publisher = {Gesellschaft f{\"u}r Informatik e.V.},
address   = {Bonn, Germany},
year      = {2023},
doi       = {10.18420/inf2023_97},
pages     = {873--882},
url-pdf   = {2023_MeierSKM_PitchEstimationGames_GI_Preprint.pdf}
}

Peter Meier, Simon Schwär, Gerhard Krump, and Meinard Müller
Real-Time Pitch Estimation for Creative Music Game Interaction
In Proceedings of the Deutsche Jahrestagung für Akustik (DAGA): 1346–1349, 2023. PDF

@inproceedings{MeierSKM23_MusicGame_DAGA,
author    = {Peter Meier and Simon Schw{\"a}r and Gerhard Krump and Meinard M{\"u}ller},
title     = {Real-Time Pitch Estimation for Creative Music Game Interaction},
booktitle = {Proceedings of the {D}eutsche {J}ahrestagung f{\"u}r {A}kustik ({DAGA})},
address   = {Hamburg, Germany},
year      = {2023},
pages     = {1346--1349},
url-pdf   = {2023_MeierSKM_PitchGame_DAGA_ePrint.pdf}
}

Simon Schwär, Meinard Müller, and Sebastian J. Schlecht
Modifying Partials for Minimum-Roughness Sound Synthesis
In Proceedings of the International Conference on Timbre: 130–134, 2023. PDF

@inproceedings{SchwaerMS23_PartialSoundSynthesis_TIMBRE,
author    = {Simon Schw{\"a}r and Meinard M{\"u}ller and Sebastian J. Schlecht},
title     = {Modifying Partials for Minimum-Roughness Sound Synthesis},
booktitle = {Proceedings of the International Conference on Timbre},
address   = {Thessaloniki, Greece},
year      = {2023},
pages     = {130--134},
url-pdf   = {2023_SchwaerMS_AdaptiveTimbre_TIMBRE.pdf}
}

Simon Schwär, Meinard Müller, and Sebastian Schlecht
A Variational Y-Autoencoder for Disentangling Gesture and Material of Interaction Sounds
In Proceedings of the AES Conference on Audio for Virtual and Augmented Reality, 2022. PDF Details

@inproceedings{SchwaerMS22_TouchSound_AES,
author    = {Simon Schw{\"a}r and Meinard M{\"u}ller and Sebastian Schlecht},
title     = {A Variational {Y}{-}Autoencoder for Disentangling Gesture and Material of Interaction Sounds},
booktitle = {Proceedings of the {AES} Conference on Audio for Virtual and Augmented Reality},
address = {Redmond, USA},
year      = {2022},
url-details = {https://www.audiolabs-erlangen.de/resources/2022-AVAR-InteractionSounds},
url-pdf   = {http://www.aes.org/e-lib/browse.cfm?elib=21853}
}

International Audio Laboratories Erlangen

Computational Analysis of Georgian Vocal Music and Beyond (GVM+)

Project Description

Projektbeschreibung

Projected-Related Resources and Demonstrators

Projected-Related Publications

Projected-Related Ph.D. Theses

Links