AudioLabs - Informed Sound Activity Detection in Music Recordings (ISAD)

Informed Sound Activity Detection in Music Recordings (ISAD)

In the ISAD project, we explored fundamental techniques and computational tools for detecting sound sources or characteristic sound events that are present in a given music recording. The project was funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.

Principal investigators: Prof. Dr. Meinard Müller, Dr.-Ing. Jakob Abeßer
Programme period: 2017–2020
Grant: MU 2686/11-1, AB 675/2-1
Project website of second phase (ISAD2)

Project Description

Informed Sound Activity Detection in Music Recordings

In music information retrieval, the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. One prominent task is known as singing voice detection. The objective is to automatically locate all sections of a given music recording where a main singer is active. Although this task seems to be simple for human listeners, the detection of the singing voice by computational methods remains difficult due to complex superpositions of sound sources that typically occur in music where the singing voice interacts with accompanying instruments. Extending this scenario, the goal of automatic instrument recognition is to identify all performing instruments in a given music recording and to derive a segmentation into sections with homogeneous instrumentation. Other related problems deal with finding all monophonic sections, identifying all solo parts or sections with a predominant melody, or locating sections with a specific timbre. In this project, motivated by these segmentation problems, we adopted a comprehensive perspective. Our goal was to explore fundamental techniques and computational tools for detecting sound sources or characteristic sound events that are present in a given music recording. To cope with a wide range of musical properties and complex superpositions of different sound sources, we focused on informed approaches that exploit various types of additional knowledge. Such knowledge may be given in the form of musical parameters (e.g., number of instruments, score information), sound examples (e.g., instrument samples, representative sections), or user input (e.g., annotations, interactive feedback). By combining audio segmentation, detection, and classification techniques, we developed novel approaches that can efficiently adapt to requirements within specific application scenarios. To test and evaluate our activity detection algorithms, we considered various challenging music scenarios including Western classical music, jazz music, and opera recordings.

Projektbeschreibung

Informierte Klangquellenerkennung in Musikaufnahmen

Die Entwicklung computergestützter Verfahren zur Analyse, Segmentierung und Klassifikation von Musiksignalen ist ein zentrales Forschungsthema des "Music Information Retrieval". Eine wichtige Fragestellung besteht darin, alle Abschnitte einer Musikaufnahme, in denen eine Gesangsstimme vorkommt, zu identifizieren. Während diese Aufgabe für den Menschen leicht zu bewerkstelligen ist, stoßen automatisierte Verfahren aufgrund komplexer akustischer Überlagerungen von Gesangs- und Begleitstimmen schnell an ihre Grenzen. Eine erweiterte Problemstellung stellt die automatische Erkennung von Musikinstrumenten dar. Hierbei besteht das Ziel darin, eine Musikaufnahme in Abschnitte ähnlicher Instrumentierung zu segmentieren und die darin vorkommenden Instrumente zu identifizieren. Weitere verwandte Fragestellungen betreffen die Erkennung monophon gespielter Abschnitte, die Identifikation von Solopassagen und die Bestimmung aller Abschnitte mit dominanter Melodiestimme oder anderen spezifischen Klangeigenschaften. Ausgehend von den beschriebenen Detektionsproblemen haben wir in diesem Forschungsprojekt grundsätzliche Fragestellungen der klanglichen Segmentierung und Klassifikation von Musikaufnahmen erforscht. Hierbei wurden automatisierte Verfahren zur Detektion unterschiedlicher Klangquellen in komplexen Musikaufnahmen entwickelt. Um klangliche Vielfalt und möglichen Überlagerungen verschiedener Klangquellen besser bewältigen zu können, haben wir in diesem Projekt informierte Verfahren, die unterschiedliche Arten von Vor- oder Zusatzwissen berücksichtigen können, untersucht. Solches Wissen kann in Form von musikalischen Parametern (z. B. Anzahl der Instrument, Noteninformation), Klangbeispielen (z. B. Samples von Instrumenten, repräsentative Musikpassagen) oder Nutzerspezifikationen (z. B. Annotationen, interaktives Feedback) gegeben sein. Unter Verwendung von Methoden aus den Bereichen der Audiosignalverarbeitung und Informatik (Strukturanalyse, Klassifikation) entwickelten wir im Rahmen dieses Projektes neuartige Analysetechniken, die sich effizient an die jeweiligen Anforderungen einer spezifischen Anwendung anpassen lassen. Die entwickelten Erkennungsverfahren sollen für verschiedene Musikgenres (u. a. Klassische Musik, Jazzmusik und Opernaufnahmen), die eine Vielzahl möglicher Klangeigenschaften und Instrumentierungen abdecken, getestet und evaluiert werden.

Projected-Related Resources and Demonstrators

The following list provides an overview of the most important publicly accessible sources created in the ISAD project:

Projected-Related Publications

The following publications reflect the main scientific contributions of the work carried out in the ISAD project.

Jakob Abeßer and Meinard Müller
Fundamental Frequency Contour Classification: A Comparison between Hand-Crafted and CNN-Based Features
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 486–490, 2019. DOI

@inproceedings{AbesserM19_ContourFeature_ICASSP,
author    = {Jakob Abe{\ss}er and Meinard M{\"u}ller},
title     = {Fundamental Frequency Contour Classification: {A} Comparison between Hand-Crafted and {CNN}-Based Features},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
pages     = {486--490},
address   = {Brighton, {UK}},
year      = {2019},
doi       = {10.1109/ICASSP.2019.8682252}
}

Michael Krause, Meinard Müller, and Christof Weiß
Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization
Electronics, 10(10): 1–14, 2021. PDF DOI

@article{KrauseMW21_OperaSingingActivity_Electronics,
author    = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}},
title     = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization},
journal   = {Electronics},
volume    = {10},
number    = {10},
pages     = {1214:1--14},
year      = {2021},
doi       = {10.3390/electronics10101214},
url-pdf   = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf}
}

Stylianos Ioannis Mimilakis, Konstantinos Drossos, Estefan ' ia Cano, and Gerald Schuller
Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation
IEEE/ACM Transactions on Audio, Speech & Language Processing, 28: 266–278, 2019. DOI

@article{MimilakisDCS19_DenoisingAutoencoders_TASLP,
author    = {Stylianos Ioannis Mimilakis and Konstantinos Drossos and Estefan{\'{\i}}a Cano and Gerald Schuller},
title     = {Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation},
journal   = {{IEEE/ACM} Transactions on Audio, Speech {\&} Language Processing},
volume    = {28},
number    = {},
pages     = {266--278},
year      = {2019},
doi       = {10.1109/TASLP.2019.2952013}
}

Stylianos I. Mimilakis, Christof Weiß, Vlora Arifi-Müller, Jakob Abeßer, and Meinard Müller
Cross-Version Singing Voice Detection in Opera Recordings: Challenges for Supervised Learning
In Machine Learning and Knowledge Discovery in Databases — Proceedings of the International Workshops of ECML PKDD 2019, Part II: 429–436, 2019. DOI

@inproceedings{MimilakisWAAM19_SingingVDetWagner_MML,
author    = {Stylianos I. Mimilakis and Christof Wei{\ss} and Vlora Arifi-M{\"u}ller and Jakob Abe{\ss}er and Meinard M{\"u}ller},
title     = {Cross-Version Singing Voice Detection in Opera Recordings: {C}hallenges for Supervised Learning},
booktitle = {Machine Learning and Knowledge Discovery in Databases -- Proceedings of the International Workshops of {ECML} {PKDD} 2019, Part {II}},
series    = {Communications in Computer and Information Science},
volume    = {1168},
pages     = {429--436},
address   = {W{\"u}rzburg, Germany},
year      = {2019},
doi       = {10.1007/978-3-030-43887-6_35}
}

Michael Taenzer, Jakob Abeßer, Stylianos I. Mimilakis, Christof Weiß, Hanna Lukashevich, and Meinard Müller
Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 612–619, 2019. PDF DOI

@inproceedings{TaenzerAMWLM19_InstrumentCNN_ISMIR,
author    = {Michael Taenzer and Jakob Abe{\ss}er and Stylianos I. Mimilakis and Christof Wei{\ss} and Hanna Lukashevich and Meinard M{\"u}ller},
title     = {Investigating {CNN}-based Instrument Family Recognition for {W}estern Classical Music Recordings},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address   = {Delft, The Netherlands},
pages     = {612--619},
year      = {2019},
doi       = {10.5281/zenodo.3527884},
url-pdf   = {2019_TaenzerAMWML_InstrumentCNN_ISMIR_PrintedVersion.pdf}
}

Michael Taenzer, Stylianos I. Mimilakis, and Jakob Abeßer
Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks
Electronics, 10(7), 2021. DOI

@article{TaenzerMA21_LocalPolyphonyEstimation_Electronics,
title     = {Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks},
author    = {Michael Taenzer and Stylianos I. Mimilakis and Jakob Abe{\ss}er},
journal   = {Electronics},
volume    = {10},
number     = {7},
year      = {2021},
doi       = {10.3390/electronics10070851}
}

Christof Weiß, Stefan Balke, Jakob Abeßer, and Meinard Müller
Computational Corpus Analysis: A Case Study on Jazz Solos
In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR): 416–423, 2018. PDF DOI

@inproceedings{WeissBAM18_JazzComplexity_ISMIR,
author    = {Christof Wei{\ss} and Stefan Balke and Jakob Abe{\ss}er and Meinard M{\"u}ller},
title     = {Computational Corpus Analysis: {A} Case Study on Jazz Solos},
booktitle = {Proceedings of the 19th International Society for Music Information Retrieval Conference ({ISMIR})},
pages     = {416--423},
address   = {Paris, France},
year      = {2018},
doi       = {10.5281/zenodo.1492439},
url-pdf   = {2018_WeissBAM_JazzComplexity_ISMIR_PrintedVersion.pdf}
}

Frank Zalkow, Stefan Balke, Vlora Arifi-Müller, and Meinard Müller
MTD: A Multimodal Dataset of Musical Themes for MIR Research
Transactions of the International Society for Music Information Retrieval (TISMIR), 3(1): 180–192, 2020. PDF Details Demo DOI

@article{ZalkowBAM20_MTD_TISMIR,
title       = {{MTD}: {A} Multimodal Dataset of Musical Themes for {MIR} Research},
author      = {Frank Zalkow and Stefan Balke and Vlora Arifi-M{\"{u}}ller and Meinard M{\"{u}}ller},
journal     = {Transactions of the International Society for Music Information Retrieval ({TISMIR})},
volume      = {3},
number      = {1},
year        = {2020},
pages       = {180--192},
doi         = {10.5334/tismir.68},
url-details = {https://transactions.ismir.net/articles/10.5334/tismir.68/},
url-pdf   = {2020_ZalkowBAM_MTD_TISMIR_ePrint.pdf},
url-demo  = {https://www.audiolabs-erlangen.de/resources/MIR/MTD}
}

Frank Zalkow, Stefan Balke, and Meinard Müller
Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 331–335, 2019. DOI

@inproceedings{ZalkowBM19_SalienceRetrieval_ICASSP,
author      = {Frank Zalkow and Stefan Balke and Meinard M{\"u}ller},
title       = {Evaluating Salience Representations for Cross-Modal Retrieval of {W}estern Classical Music Recordings},
booktitle   = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
address     = {Brighton, {UK}},
year        = {2019},
pages       = {331--335},
doi         = {10.1109/ICASSP.2019.8683609}
}

Frank Zalkow and Meinard Müller
Using Weakly Aligned Score—Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 184–191, 2020. PDF DOI

@inproceedings{ZalkowM20_CTC_ISMIR,
author    = {Frank Zalkow and Meinard M{\"u}ller},
title     = {Using Weakly Aligned Score--Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address   = {Montr{\'{e}}al, Canada},
pages     = {184--191},
year      = {2020},
doi       = {10.5281/zenodo.4245400},
url-pdf   = {2020_ZalkowM_WeaklyAlignedTrain_ISMIR.pdf}
}

Frank Zalkow and Meinard Müller
CTC-Based Learning of Chroma Features for Score-Audio Music Retrieval
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29: 2957–2971, 2021. PDF Details DOI

@article{ZalkowMueller21_ChromaCTC_TASLP,
author      = {Frank Zalkow and Meinard M{\"u}ller},
title       = {{CTC}-Based Learning of Chroma Features for Score-Audio Music Retrieval},
journal     = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
volume      = {29},
pages       = {2957--2971},
year        = {2021},
doi         = {10.1109/TASLP.2021.3110137},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2021_TASLP-ctc-chroma},
url-pdf = {https://ieeexplore.ieee.org/document/9531521},
}

Projected-Related Ph.D. Theses

Frank Zalkow
Learning Audio Representations for Cross-Version Retrieval of Western Classical Music
PhD Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2021. PDF Details

@phdthesis{Zalkow21_CrossVersionRetrieval_PhD,
author      = {Frank Zalkow},
year        = {2021},
title       = {Learning Audio Representations for Cross-Version Retrieval of Western Classical Music},
school      = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
url-details = {https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/16777},
url-pdf     = {2021_Zalkow_AudioRepRetrieval_ThesisPhD.pdf}
}

Stefan Balke
Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content
PhD Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2018. PDF Details

@phdthesis{Balke18_MultimediaProcMusic_PhD,
author      = {Stefan Balke},
year        = {2018},
title       = {Multimedia Processing Techniques for Retrieving, Extracting, and Accessing Musical Content},
school      = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
url-details = {https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/9635},
url-pdf     = {2018_Balke_MultimediaProcessing_ThesisPhD.pdf}
}

International Audio Laboratories Erlangen