AudioLabs - Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings

Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings

This is the accompanying website for the following paper:

Michael Krause and Meinard Müller
Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 31: 2567–2578, 2023. PDF Details DOI

@article{KrauseM23_HierarchicalInstruments_TASLP,
author    = {Michael Krause and Meinard M{\"u}ller},
title     = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings},
journal   = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing ({TASLP})},
year      = {2023},
volume    = {31},
pages     = {2567--2578},
doi       = {10.1109/TASLP.2023.3291506},
url-pdf     = {https://ieeexplore.ieee.org/document/10171391},
url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass}
}

Abstract

Instrument activity detection is a fundamental task in music information retrieval, serving as a basis for many applications, such as music recommendation, music tagging, or remixing. Most published works on this task deal with popular music and music for smaller ensembles. In this paper, we cover orchestral and opera music recordings as a rarely considered scenario for automated instrument activity detection. Orchestral music is particularly challenging since it consists of intricate polyphonic and polytimbral sound mixtures where multiple instruments are playing simultaneously. Orchestral instruments can naturally be arranged in hierarchical taxonomies, according to instrument families. As the main contribution of this paper, we show that a hierarchical classification approach can be used to detect instrument activity in our scenario, even if only few fine-grained, instrument-level annotations are available. We further consider additional loss terms for improving hierarchical consistency of predictions. For our experiments, we collect a dataset containing 14 hours of orchestral music recordings with aligned instrument activity annotations. Finally, we perform an analysis into the behavior of our proposed approach with regard to potential confounding errors.

Dataset

The aligned instrument activity annotations for the recordings used in this paper are made publicly available as a dataset for further research.

Download

Conventions

The annotations are provided as .npy-Files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape

(N, 18)

with the first dimension corresponding to frames of the recording and the second dimension corresponding to different classes. We use a frame-rate of 43,0664 Hz (obtained with a hop size of 512 at a sample rate of 22050Hz). The individual entries in the second dimension correspond to the classes:

Index	Class identifier
00	INST
01	WW
02	BR
03	TMP
04	VOC
05	ST
06	Fl
07	Ob
08	Cl
09	Bn
10	Hn
11	Tpt
12	Fe
13	Ma
14	Vn
15	Va
16	Vc
17	Db

(see the paper for explanations of the identifiers)

The .npy-Files follow the naming convention

(Subset)_(Composer)_(Work)_V(Version).npy

where

(Subset) is either Ours, FreischuetzDigital, PhenicxAnechoic or BeethovenAnechoic
(Composer) is the composer of the work (e.g. Beethoven)
(Work) is a brief description of the work being played (e.g. DonGiovanniAria, Symphony8Mvmt2, or WalkuereAct1)
(Version) is the number of the version for which this file contains annotations

See also the paper for a description of the recordings in this dataset.

So, for example, the file

Ours_Tschaikowsky_ViolinConcertoMvmt1_V3.npy

contains annotations for the third version of the first movement of Tschaikowsky's Violin Concerto.

Versions

FreischuetzDigital, PhenicxAnechoic and BeethovenAnechoic each contain only one version of the pieces played. For FreischuetzDigital, we use the stereo mixtures as provided in the dataset. For PhenicxAnechoic and BeethovenAnechoic, we obtain stereo mixes by simply summing the tracks for different instruments in the dataset, applying a simple reverb filter, and normalizing.

For the remaining works, we use the following commercial audio recordings:

Subset	Composer	Work	Version	In test set?	Performer / Label
Ours	Wagner	WalkuereAct1	1	x	Karajan / DG 1998
Ours	Wagner	WalkuereAct1	2		Neuhold / MEMBRAN 1995
Ours	Wagner	WalkuereAct1	3		Levine / DG 2012
Ours	Wagner	WalkuereAct1	4		Böhm / DECCA 2008
Ours	Wagner	WalkuereAct1	5		Keilberth/Furtwängler / ZYX 2012
Ours	Wagner	WalkuereAct1	6		Boulez / PHILIPS 2006
Ours	Beethoven	Symphony3Mvmt1	1	x	Abbado / DG
Ours	Beethoven	Symphony3Mvmt2	1		Blomstedt / BC
Ours	Beethoven	Symphony3Mvmt2	2		Drahos / NAXOS
Ours	Beethoven	Symphony3Mvmt2	3		Jarvi / SONY
Ours	Beethoven	Symphony3Mvmt2	4		Scherchen / Heliodor
Ours	Beethoven	Symphony3Mvmt2	5		Fricsay / DG
Ours	Beethoven	Symphony3Mvmt3	1		Blomstedt / BC
Ours	Beethoven	Symphony3Mvmt3	2		Drahos / NAXOS
Ours	Beethoven	Symphony3Mvmt3	3		Jarvi / SONY
Ours	Beethoven	Symphony3Mvmt3	4		Scherchen / Heliodor
Ours	Beethoven	Symphony3Mvmt3	5		Fricsay / DG
Ours	Beethoven	Symphony3Mvmt4	1		Blomstedt / BC
Ours	Beethoven	Symphony3Mvmt4	2		Drahos / NAXOS
Ours	Beethoven	Symphony3Mvmt4	3		Jarvi / SONY
Ours	Beethoven	Symphony3Mvmt4	4		Scherchen / Heliodor
Ours	Beethoven	Symphony3Mvmt4	5		Fricsay / DG
Ours	Dvorak	Symphony9Mvmt1	1		Kubelik / MERCURY
Ours	Dvorak	Symphony9Mvmt1	2		Szell / SONY
Ours	Dvorak	Symphony9Mvmt1	3		Karajan / MEMBRAN
Ours	Dvorak	Symphony9Mvmt1	4		Toscanini / DG
Ours	Dvorak	Symphony9Mvmt1	5		Fricsay / DG
Ours	Dvorak	Symphony9Mvmt2	1		Kubelik / MERCURY
Ours	Dvorak	Symphony9Mvmt2	2		Szell / SONY
Ours	Dvorak	Symphony9Mvmt2	3		Leaper / SONY
Ours	Dvorak	Symphony9Mvmt2	4		Toscanini / DG
Ours	Dvorak	Symphony9Mvmt2	5		Fricsay / DG
Ours	Dvorak	Symphony9Mvmt4	1	x	Suitner / BC
Ours	Tschaikowsky	ViolinConcertoMvmt1	1		Ashkenazy / DECCA
Ours	Tschaikowsky	ViolinConcertoMvmt1	2		Francescatti / SONY
Ours	Tschaikowsky	ViolinConcertoMvmt1	3		Menuhin / MEMBRAN
Ours	Tschaikowsky	ViolinConcertoMvmt1	4		Nishizaki / NAXOS
Ours	Tschaikowsky	ViolinConcertoMvmt1	5		Szering / RCA-CCV
Ours	Tschaikowsky	ViolinConcertoMvmt2	1		Ashkenazy / DECCA
Ours	Tschaikowsky	ViolinConcertoMvmt2	2		Francescatti / SONY
Ours	Tschaikowsky	ViolinConcertoMvmt2	3		Menuhin / MEMBRAN
Ours	Tschaikowsky	ViolinConcertoMvmt2	4		Heifetz / SONY
Ours	Tschaikowsky	ViolinConcertoMvmt2	5		Kogan / BC
Ours	Tschaikowsky	ViolinConcertoMvmt3	1	x	Mullova / PHILIPS

Acknowledgements

We thank Christof Weiß for helpful discussions. This work was supported by the German Research Foundation (DFG MU 2686/7-2, MU 2686/11-2). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institut für Integrierte Schaltungen IIS.

References

Michael Krause and Meinard Müller
Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 406–410, 2022. DOI

@inproceedings{KrauseM22_HierarchyClass_ICASSP,
author    = {Michael Krause and Meinard M{\"u}ller},
title     = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
pages     = {406--410},
address   = {Singapore},
year      = {2022},
doi       = {10.1109/ICASSP43922.2022.9747690}
}

Böhm, Christoph, Ackermann, David, and Weinzierl, Stefan
A multi-channel anechoic orchestra recording of Beethoven’s Symphony no. 8 op. 93
Journal of the Audio Engineering Society, 68(12): 977–984, 2021. DOI

@article{BoehmAW21_BeethovenAnechoic_AES,
author={Böhm, Christoph and Ackermann, David and Weinzierl, Stefan},
journal = {Journal of the Audio Engineering Society},
title={A multi-channel anechoic orchestra recording of {B}eethoven’s {S}ymphony no. 8 op. 93},
year={2021},
volume={68},
number={12},
pages={977-984},
doi={https://doi.org/10.17743/jaes.2020.0056}
}

Thomas Prätzlich, Meinard Müller, Benjamin W. Bohl, and Joachim Veit
Freischütz Digital: Demos of audio-related contributions
In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR), 2015. PDF Details

@inproceedings{PraetzlichMBV15_FreiDi_ISMIR-LBD,
author    = {Thomas Pr{\"a}tzlich and Meinard M{\"u}ller and Benjamin W. Bohl and Joachim Veit},
title     = {{F}reisch{\"u}tz {D}igital: {D}emos of audio-related contributions},
booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
address   = {Mal{\'a}ga, Spain},
year      = {2015},
url-pdf   = {2015_PraetzlichMBV_FreiDi_ISMIR-LBD.pdf},
url-details = {http://freischuetz-digital.de/}
}

Marius Miron, Julio J. Carabias-Orti, Juan J. Bosch, Emilia Gómez, and Jordi Janer
Score-Informed Source Separation for Multichannel Orchestral Recordings
Journal of Electrical and Computer Engineering, 2016: 1–8363507, 2016.

@article{MironCBGJ16_OrchestraSourceSeparation_JECE,
author    = {Marius Miron and Julio J. Carabias{-}Orti and Juan J. Bosch and Emilia G{\'{o}}mez and Jordi Janer},
title     = {Score-Informed Source Separation for Multichannel Orchestral Recordings},
journal   = {Journal of Electrical and Computer Engineering},
volume    = {2016},
pages     = {8363507:1--8363507:19},
year      = {2016},
}

Siddharth Gururani and Alexander Lerch
Semi-Supervised Audio Classification with Partially Labeled Data
In IEEE International Symposium on Multimedia (ISM): 111–114, 2021. DOI

@inproceedings{GururaniL21_PartialLabels_ISM,
author    = {Siddharth Gururani and Alexander Lerch},
title     = {Semi-Supervised Audio Classification with Partially Labeled Data},
booktitle = {{IEEE} International Symposium on Multimedia (ISM)},
address   = {Naple, Italy},
pages     = {111--114},
year      = {2021},
doi       = {10.1109/ISM52913.2021.00027}
}