In the ISAD2 project, we develop model-based and data-driven techniques for learning and detecting characteristic sound events in acoustic data including music recordings and environmental sounds. The project is funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.
Informed Sound Activity Detection in Music and Audio Signals
In music information retrieval (MIR), the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. In the project's first phase (2017-2020), we explored fundamental techniques for detecting characteristic sound events present in a given music recording. Here, our focus was on informed approaches that exploit musical knowledge in the form of score information, instrument samples, or musically salient sections. We considered concrete tasks such as locating audio sections with a specific timbre or instrument, identifying monophonic themes in complex polyphonic music recordings, and classifying music genres or playing styles based on melodic contours. We tested our approaches within complex music scenarios, including instrumental Western classical music, jazz, and opera recordings. In this second phase of the project, our goals are significantly extended. First, we go beyond the music scenario by considering environmental sounds as a second challenging audio domain. As a central methodology, we explore and combine the benefits of model-based and data-driven techniques to learn task-specific sound event representations. Furthermore, we investigate hierarchical approaches to simultaneously incorporate, exploit, learn, and capture sound events that manifest on different temporal scales and belong to hierarchically ordered categories. An overarching goal of the project's second phase is to develop explainable deep learning models that provide a better understanding of the structural and acoustic properties of sound events.
Informierte Klangquellenerkennung in Musik- und Audiosignalen
Im Bereich des Music Information Retrieval (MIR) ist die Entwicklung von computergestützten Methoden zur Analyse, Segmentierung und Klassifizierung von Musiksignalen von grundlegender Bedeutung. In der ersten Projektephase (2017-2020) untersuchten wir grundlegende Techniken zur Erkennung charakteristischer Klangereignisse, die in einer gegebenen Musikaufnahme vorhanden sind. Dabei lag unser Fokus auf Ansätzen, die musikalisches Wissen in Form von Notentextinformationen, Klangbeispielen oder musikalisch repräsentativen Musikpassagen nutzen. Zentrale Aufgabenstellungen bestanden im Auffinden von Audioabschnitten mit einer bestimmten Klangfarbe oder Instrumentierung, die Erkennung monophoner Themen in polyphonen Musikaufnahmen und die Klassifizierung von Musikstilen oder Spielweisen anhand melodischer Konturmerkmale. Die entwickelten Erkennungsverfahren wurden im Rahmen komplexer Musikszenarien (u.a. klassische Musik, Jazzmusik und Opernaufnahmen) experimentell getestet und ausgewertet. In der zweiten Projektphase erweitern wir unsere Ziele erheblich. Erstens betrachten wir neben dem Musikszenario die Erkennung von Umwelt- und Umgebungsgeräusche als zweite komplexe Audiodomäne. Zweitens kombinieren wir, als unsere zentrale Methodik, Aspekte von modellbasierten und datengetriebenen Verfahren, um aufgabenspezifische Darstellungsformen von Klangereignissen zu lernen. Darüber hinaus verfolgen wir integrative und hierarchische Strategien, um Schallereignisse auf verschiedenen Zeitskalen und hinsichtlich hierarchisch angeordneter Kategorien zu erfassen und zu analysieren. Unser übergeordnetes Ziel der zweiten Projektphase ist es, erklärbare und nachvollziehbare Deep-Learning-Modelle zu entwickeln, die ein besseres Verständnis der strukturellen und akustischen Eigenschaften von Klangquellen ermöglichen.
Organization (Stefan Balke, Jakob Abesser, Meinard Müller): Special Session Sound Analysis for Music and Audio Signals, Jahrestagung für Akustik (DAGA), Hannover, Germany, March 20/21, 2024
Lecture and Seminar (4 SWS) by Jakob Abesser: Computational Analysis of Sound and Music. TU Ilmenau, Summer Semester 2024
Lecture Slides & Jupyter Notebooks
Organization (Jakob Abesser, Sebastian Stober, Meinard Müller): Special Session Sound Analysis for Music and Audio Signals, Jahrestagung für Akustik (DAGA), Hamburg, Germany, March 8, 2023
PDF (Inhaltsverzeichnis)
Research Seminar (2 SWS) by Jakob Abesser and Martin Pfleiderer: KI-gestützte Audioanalyse von Musik und Soundscapes. HfM Weimar, Winter Semester 2022/2023
Talk (Jakob Abeßer): Erkennung akustischer Quellen in komplexen Szenarien. Jenaer Akustiktag, Ernst Abbe Hochschule, Jena, April 27, 2022
Talk (Jakob Abeßer): Technische Aspekte in der KI-Musikanalyse. Jahrestagung des DMV (Deutscher Musikverleger-Verband e.V.), Erfurt, Oktober 16, 2023
The following list provides an overview of the most important publicly accessible sources created in the ISAD2 project:
Semi-Supervised Piano Transcription Using Pseudo-Labeling Techniques [ISMIR 2024]
Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings [IEEE/ACM-TASLP 2023]
Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment [ISMIR 2023]
A Cross-Version Approach to Audio Representation Learning for Orchestral Music [ISMIR 2023]
How Robust are Audio Embeddings for Polyphonic Sound Event Tagging? [IEEE/ACM-TASLP 2023]
Jazz Structure Dataset (JSD) [TISMIR 2022]
Urban Sound Monitoring (USM) dataset [AES Convention 2022]
Demo video
Code for BassUNet: Jazz bass transcription using a U-Net architecture [Electronics 2021]
The following publications reflect the main scientific contributions of the work carried out in the ISAD2 project.
@article{AbesserLS25_SoundRecurrence_JASMP,
author = {Jakob Abe{\ss}er and Zhiwei Liang and Bernhard Seeber},
title = {Sound recurrence analysis for acoustic scene classification},
journal = {EURASIP Journal on Audio, Speech, and Music Processing},
volume = {2025},
number = {1},
year = {2025},
doi = {10.1186/s13636-024-00390-2}
}
@article{AbesserSM25_PitchContour_arXiv,
author = {Jakob Abe{\ss}er and Simon Schw{\"a}r and Meinard M{\"u}ller},
title = {Pitch Contour Exploration across Audio Domains: {A} Vision-Based Transfer Learning Approach},
journal = {arXiv preprint arXiv:2503.19161},
year = {2025},
doi = {10.48550/arXiv.2503.19161}
}
@inproceedings{BerendesMM25_TuningMatters_ISMIR,
author = {Hans-Ulrich Berendes and Ben Maman and Meinard M{\"u}ller},
title = {Tuning Matters: {A}nalyzing Musical Tuning Bias in Neural Vocoders},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
pages = {166--173},
address = {Daejeon, Korea},
year = {2025},
doi = {doi.org/10.5281/zenodo.17706359},
url-pdf = {2025_BerendesMM_TuningMatters_ISMIR_ePrint.pdf}
}
@InProceedings{BidarouniA24_DomainShift_EUSIPCO,
author = {Amir Latifi Bidarouni and Jakob Abe{\ss}er},
title = {Towards Domain Shift in Location-Mismatch Scenarios for Bird Activity Detection},
booktitle = {Proceedings of the Euopean Signal Processing Conference (EUSIPCO)},
year = {2024},
address = {Lyon, France},
pages = {1267--1271},
doi = {10.23919/EUSIPCO63174.2024.10715313}
}
@inproceedings{StrahlM24_PianoTranscriptionSemiSup_ISMIR,
author = {Sebastian Strahl and Meinard M{\"u}ller},
title = {Semi-Supervised Piano Transcription Using Pseudo-Labeling Techniques},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {San Francisco, CA, United States},
year = {2024},
pages = {173--181},
doi = {10.5281/zenodo.14877303},
url-pdf = {2024_StrahlM_PianoTranscriptionSemiSup_ISMIR_ePrint.pdf}
}
@article{Krause23_HierarchicalClassificationInstrument_IEEE-TASLP,
author = {Michael Krause and Meinard M{\"u}ller},
title = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings},
journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
year={2023},
volume={31},
pages={2567--2578},
doi = {10.1109/TASLP.2023.3291506},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass/},
url-pdf = {https://ieeexplore.ieee.org/abstract/document/10171391}
}
@article{AbesserGM23_PolyphonicSound_TASLP,
author = {Jakob Abe{\ss}er and Sascha Grollmisch and Meinard M{\"u}ller},
title = {How Robust are Audio Embeddings for Polyphonic Event Tagging?},
journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
volume = {31},
pages = {2658--2667},
year = {2023},
doi = {10.1109/TASLP.2023.3293032},
url-pdf = {https://ieeexplore.ieee.org/document/10178070},
url-demo = {https://zenodo.org/record/7912746}
}
@article{AbesserUZG23_SoundPolyphony_JAES,
author = {Jakob Abe{\ss}er and Asad Ullah and Sebastian Ziegler and Sascha Grollmisch},
title = {Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes},
journal = {Journal of the Audio Engineering Society (AES)},
year = {2023},
volume = {71},
number = {12},
pages = {860--872},
url = {https://www.aes.org/e-lib/browse.cfm?elib=22348}
}
@InProceedings{BidarouniA23_DomainAdaptation_I2S,
author = {Amir Latifi Bidarouni and Jakob Abe{\ss}er},
title = {Unsupervised Feature-Space Domain Adaptation applied for Audio Classification},
booktitle = {Proceedings of the {IEEE} International Symposium on the Internet of Sounds ({IS2})},
address = {Pisa, Italy},
year = {2023},
pages = {1--7},
doi = {10.1109/IEEECONF59510.2023.10335455},
}
@InProceedings{GrollmischCLA23_UncertaintySemiSupervised_EUSIPCO,
author = {Sascha Grollmisch and Estefan{\'i}a Cano and Hanna Lukashevich and Jakob Abe{\ss}er},
title = {Uncertainty in Semi-supervised Audio Classification -- A Novel Extension for {FixMatch}},
booktitle = {Proceedings of the European Signal Processing Conference (EUSIPCO)},
year = {2023},
pages = {161--165},
address = {Helsinki, Finland},
doi = {10.23919/EUSIPCO58844.2023.10289789.}
}
@inproceedings{KrauseWM23_CrossVersionRep_ISMIR,
author = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller},
title = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Milano, Italy},
year = {2023},
pages = {832--839},
doi = {10.5281/ZENODO.10265419},
url-details = {https://doi.org/10.5281/zenodo.10265419},
url-pdf = {2023_KrauseWM_CrossVersionRep_ISMIR_ePrint.pdf}
}
@inproceedings{KrauseSM23_WeakPitchCrossVersion_ISMIR,
author = {Michael Krause and Sebastian Strahl and Meinard M{\"u}ller},
title = {Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Milano, Italy},
year = {2023},
pages = {},
}
@InProceedings{LukashevichGA23_TemperatureScaling_ECMLPKDD,
author = {Hanna Lukashevich and Sascha Grollmisch and Jakob Abe{\ss}er},
title = {Temperature scaling for reliable uncertainty estimation: {A}pplication to automatic music genre classification},
booktitle = {Proceedings of the Uncertainty Meets Explainability Workshop, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
year = {2023},
location = {Torino, Italy}
}
@InProceedings{LukashevichGASB23_PosteriorClass_AM,
author = {Hanna Lukashevich and Sascha Grollmisch and Jakob Abe{\ss}er and Sebastian Stober and Joachim B{\"o}s},
title = {How reliable are posterior class probabilties in automatic music classification?},
booktitle = {Proceedings of the Audio Mostly Conference},
year = {2023},
address = {Edinburgh, Scotland},
pages = {45--50},
doi = {10.1145/3616195.36162}
}
@InProceedings{Abesser22_UrbanSounds_AES,
author = {Jakob Abe{\ss}er},
title = {Classifying Sounds in Polyphonic Urban Sound Scenes},
booktitle = {Proceedings of the AES Convention},
address = {The Hague, Netherlands},
year = {2022}
}
@article{BalkeRWAM22_JSD_TISMIR,
author = {Stefan Balke and Julian Reck and Christof Wei{\ss} and Jakob Abe{\ss}er and Meinard M{\"u}ller},
title = {{JSD}: {A} Dataset for Structure Analysis in Jazz Music},
journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
volume = {5},
number = {1},
pages = {156--172},
year = {2022},
publisher = {Ubiquity Press},
doi = {doi.org/10.5334/tismir.131},
url = {https://doi.org/10.5334/tismir.131},
url-pdf = {2022_BalkeRWAM_JSD_TISMIR_ePrint.pdf},
url-demo = {https://github.com/stefan-balke/jsd}
}
@inproceedings{GrollmischVA22_Fixmatch_ISMIR,
author = {Sascha Grollmisch and Estefan{\'i}a Cano and Jakob Abe{\ss}er},
title = {Audio Augmentations for Semi-Supervised Learning with Fixmatch},
booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Bengaluru, India},
year = {2022}
}
@inproceedings{KrauseM22_HierarchyClass_ICASSP,
author = {Michael Krause and Meinard M{\"u}ller},
title = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
pages = {406--410},
address = {Singapore},
year = {2022},
doi = {10.1109/ICASSP43922.2022.9747690}
}
@InProceedings{NadarTA_2022_ChordLatentSpace_ICMC,
author = {Christon Ragavan Nadar and Michael Taenzer and Jakob Abe{\ss}er},
title = {Towards Interpreting and Improving the Latent Space for Musical Chord Recognition},
booktitle = {Proceeding of the International Computer Music Conference (ICMC)},
address = {Limerick, Ireland},
year = {2022}
}
@article{AbesserM21_JazzBassTranscription_Electronics,
author = {Jakob Abe{\ss}er and Meinard M{\"u}ller},
title = {Jazz Bass Transcription Using a {U}-Net Architecture},
journal = {Electronics},
volume = {10},
number = {6},
pages = {670:1--11},
year = {2021},
doi = {10.3390/electronics10060670},
url-pdf = {2021_AbesserM_JazzBassTranscription_Electronics.pdf}
}
@article{KrauseMW21_OperaSingingActivity_Electronics,
author = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}},
title = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization},
journal = {Electronics},
volume = {10},
number = {10},
pages = {1214:1--14},
year = {2021},
doi = {10.3390/electronics10101214},
url-pdf = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf}
}
@misc{Abesser26_CompAnalSoundMusic_Habil,
author = {Jakob Abe{\ss}er},
title = {Computational Analysis of Sounds and Music},
howpublished = {Habilitation Thesis},
note = {Technische Universit{\"a}t Ilmenau},
year = {Submitted for Review, 2026}
}
@phdthesis{Grollmisch25_SemisupervisedTransferLearning_PhD,
author = {Sascha Grollmisch},
year = {2025},
title = {Semi-Supervised and Transfer Learning for Few-Shot Audio Classification},
school = {Technische Universit{\"a}t Ilmenau},
url-pdf = {https://www.db-thueringen.de/servlets/MCRFileNodeServlet/dbt_derivate_00070024/ilm1-202500041.pdf}
}
@phdthesis{Krause23_ActivityDetectionMusic_PhD,
author = {Michael Krause},
year = {2023},
title = {Activity Detection for Sound Events in Orchestral Music Recordings},
school = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
url-pdf = {https://open.fau.de/items/e75949af-4aad-4a40-a374-b850dcb5676a}
}