Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings

This is the accompanying website for the following paper:

  1. Michael Krause and Meinard Müller
    Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings
    IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 31: 2567–2578, 2023. PDF Details DOI
    @article{KrauseM23_HierarchicalInstruments_TASLP,
    author    = {Michael Krause and Meinard M{\"u}ller},
    title     = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings},
    journal   = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing ({TASLP})},
    year      = {2023},
    volume    = {31},
    pages     = {2567--2578},
    doi       = {10.1109/TASLP.2023.3291506},
    url-pdf     = {https://ieeexplore.ieee.org/document/10171391},
    url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass}
    }

Abstract

Instrument activity detection is a fundamental task in music information retrieval, serving as a basis for many applications, such as music recommendation, music tagging, or remixing. Most published works on this task deal with popular music and music for smaller ensembles. In this paper, we cover orchestral and opera music recordings as a rarely considered scenario for automated instrument activity detection. Orchestral music is particularly challenging since it consists of intricate polyphonic and polytimbral sound mixtures where multiple instruments are playing simultaneously. Orchestral instruments can naturally be arranged in hierarchical taxonomies, according to instrument families. As the main contribution of this paper, we show that a hierarchical classification approach can be used to detect instrument activity in our scenario, even if only few fine-grained, instrument-level annotations are available. We further consider additional loss terms for improving hierarchical consistency of predictions. For our experiments, we collect a dataset containing 14 hours of orchestral music recordings with aligned instrument activity annotations. Finally, we perform an analysis into the behavior of our proposed approach with regard to potential confounding errors.

Dataset

The aligned instrument activity annotations for the recordings used in this paper are made publicly available as a dataset for further research.

Conventions

The annotations are provided as .npy-Files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape

(N, 18)

with the first dimension corresponding to frames of the recording and the second dimension corresponding to different classes. We use a frame-rate of 43,0664 Hz (obtained with a hop size of 512 at a sample rate of 22050Hz). The individual entries in the second dimension correspond to the classes:

Index Class identifier
00 INST
01 WW
02 BR
03 TMP
04 VOC
05 ST
06 Fl
07 Ob
08 Cl
09 Bn
10 Hn
11 Tpt
12 Fe
13 Ma
14 Vn
15 Va
16 Vc
17 Db

(see the paper for explanations of the identifiers)

The .npy-Files follow the naming convention

(Subset)_(Composer)_(Work)_V(Version).npy

where

  • (Subset) is either Ours, FreischuetzDigital, PhenicxAnechoic or BeethovenAnechoic
  • (Composer) is the composer of the work (e.g. Beethoven)
  • (Work) is a brief description of the work being played (e.g. DonGiovanniAria, Symphony8Mvmt2, or WalkuereAct1)
  • (Version) is the number of the version for which this file contains annotations

See also the paper for a description of the recordings in this dataset.

So, for example, the file

Ours_Tschaikowsky_ViolinConcertoMvmt1_V3.npy

contains annotations for the third version of the first movement of Tschaikowsky's Violin Concerto.

Versions

FreischuetzDigital, PhenicxAnechoic and BeethovenAnechoic each contain only one version of the pieces played. For FreischuetzDigital, we use the stereo mixtures as provided in the dataset. For PhenicxAnechoic and BeethovenAnechoic, we obtain stereo mixes by simply summing the tracks for different instruments in the dataset, applying a simple reverb filter, and normalizing.

For the remaining works, we use the following commercial audio recordings:

Subset Composer Work Version In test set? Performer / Label
Ours Wagner WalkuereAct1 1 x Karajan / DG 1998
Ours Wagner WalkuereAct1 2 Neuhold / MEMBRAN 1995
Ours Wagner WalkuereAct1 3 Levine / DG 2012
Ours Wagner WalkuereAct1 4 Böhm / DECCA 2008
Ours Wagner WalkuereAct1 5 Keilberth/Furtwängler / ZYX 2012
Ours Wagner WalkuereAct1 6 Boulez / PHILIPS 2006
Ours Beethoven Symphony3Mvmt1 1 x Abbado / DG
Ours Beethoven Symphony3Mvmt2 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt2 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt2 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt2 4 Scherchen / Heliodor
Ours Beethoven Symphony3Mvmt2 5 Fricsay / DG
Ours Beethoven Symphony3Mvmt3 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt3 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt3 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt3 4 Scherchen / Heliodor
Ours Beethoven Symphony3Mvmt3 5 Fricsay / DG
Ours Beethoven Symphony3Mvmt4 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt4 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt4 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt4 4 Scherchen / Heliodor
Ours Beethoven Symphony3Mvmt4 5 Fricsay / DG
Ours Dvorak Symphony9Mvmt1 1 Kubelik / MERCURY
Ours Dvorak Symphony9Mvmt1 2 Szell / SONY
Ours Dvorak Symphony9Mvmt1 3 Karajan / MEMBRAN
Ours Dvorak Symphony9Mvmt1 4 Toscanini / DG
Ours Dvorak Symphony9Mvmt1 5 Fricsay / DG
Ours Dvorak Symphony9Mvmt2 1 Kubelik / MERCURY
Ours Dvorak Symphony9Mvmt2 2 Szell / SONY
Ours Dvorak Symphony9Mvmt2 3 Leaper / SONY
Ours Dvorak Symphony9Mvmt2 4 Toscanini / DG
Ours Dvorak Symphony9Mvmt2 5 Fricsay / DG
Ours Dvorak Symphony9Mvmt4 1 x Suitner / BC
Ours Tschaikowsky ViolinConcertoMvmt1 1 Ashkenazy / DECCA
Ours Tschaikowsky ViolinConcertoMvmt1 2 Francescatti / SONY
Ours Tschaikowsky ViolinConcertoMvmt1 3 Menuhin / MEMBRAN
Ours Tschaikowsky ViolinConcertoMvmt1 4 Nishizaki / NAXOS
Ours Tschaikowsky ViolinConcertoMvmt1 5 Szering / RCA-CCV
Ours Tschaikowsky ViolinConcertoMvmt2 1 Ashkenazy / DECCA
Ours Tschaikowsky ViolinConcertoMvmt2 2 Francescatti / SONY
Ours Tschaikowsky ViolinConcertoMvmt2 3 Menuhin / MEMBRAN
Ours Tschaikowsky ViolinConcertoMvmt2 4 Heifetz / SONY
Ours Tschaikowsky ViolinConcertoMvmt2 5 Kogan / BC
Ours Tschaikowsky ViolinConcertoMvmt3 1 x Mullova / PHILIPS

Acknowledgements

We thank Christof Weiß for helpful discussions. This work was supported by the German Research Foundation (DFG MU 2686/7-2, MU 2686/11-2). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institut für Integrierte Schaltungen IIS.

References

  1. Michael Krause and Meinard Müller
    Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 406–410, 2022. DOI
    @inproceedings{KrauseM22_HierarchyClass_ICASSP,
    author    = {Michael Krause and Meinard M{\"u}ller},
    title     = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages     = {406--410},
    address   = {Singapore},
    year      = {2022},
    doi       = {10.1109/ICASSP43922.2022.9747690}
    }
  2. Böhm, Christoph, Ackermann, David, and Weinzierl, Stefan
    A multi-channel anechoic orchestra recording of Beethoven’s Symphony no. 8 op. 93
    Journal of the Audio Engineering Society, 68(12): 977–984, 2021. DOI
    @article{BoehmAW21_BeethovenAnechoic_AES,
    author={Böhm, Christoph and Ackermann, David and Weinzierl, Stefan},
    journal = {Journal of the Audio Engineering Society},
    title={A multi-channel anechoic orchestra recording of {B}eethoven’s {S}ymphony no. 8 op. 93},
    year={2021},
    volume={68},
    number={12},
    pages={977-984},
    doi={https://doi.org/10.17743/jaes.2020.0056}
    }
  3. Thomas Prätzlich, Meinard Müller, Benjamin W. Bohl, and Joachim Veit
    Freischütz Digital: Demos of audio-related contributions
    In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR), 2015. PDF Details
    @inproceedings{PraetzlichMBV15_FreiDi_ISMIR-LBD,
    author    = {Thomas Pr{\"a}tzlich and Meinard M{\"u}ller and Benjamin W. Bohl and Joachim Veit},
    title     = {{F}reisch{\"u}tz {D}igital: {D}emos of audio-related contributions},
    booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Mal{\'a}ga, Spain},
    year      = {2015},
    url-pdf   = {2015_PraetzlichMBV_FreiDi_ISMIR-LBD.pdf},
    url-details = {http://freischuetz-digital.de/}
    }
  4. Marius Miron, Julio J. Carabias-Orti, Juan J. Bosch, Emilia Gómez, and Jordi Janer
    Score-Informed Source Separation for Multichannel Orchestral Recordings
    Journal of Electrical and Computer Engineering, 2016: 1–8363507, 2016.
    @article{MironCBGJ16_OrchestraSourceSeparation_JECE,
    author    = {Marius Miron and Julio J. Carabias{-}Orti and Juan J. Bosch and Emilia G{\'{o}}mez and Jordi Janer},
    title     = {Score-Informed Source Separation for Multichannel Orchestral Recordings},
    journal   = {Journal of Electrical and Computer Engineering},
    volume    = {2016},
    pages     = {8363507:1--8363507:19},
    year      = {2016},
    }
  5. Siddharth Gururani and Alexander Lerch
    Semi-Supervised Audio Classification with Partially Labeled Data
    In IEEE International Symposium on Multimedia (ISM): 111–114, 2021. DOI
    @inproceedings{GururaniL21_PartialLabels_ISM,
    author    = {Siddharth Gururani and Alexander Lerch},
    title     = {Semi-Supervised Audio Classification with Partially Labeled Data},
    booktitle = {{IEEE} International Symposium on Multimedia (ISM)},
    address   = {Naple, Italy},
    pages     = {111--114},
    year      = {2021},
    doi       = {10.1109/ISM52913.2021.00027}
    }