A Cross-Version Approach to Audio Representation Learning for Orchestral Music

This is the accompanying website for the following paper:

  1. Michael Krause, Christof Weiß, and Meinard Müller
    A Cross-Version Approach to Audio Representation Learning for Orchestral Music
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2023.
    @inproceedings{KrauseWM23_CrossVersionRepresentationLearning_ISMIR,
    author    = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller},
    title     = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Milano, Italy},
    year      = {2023}
    }

Abstract

Deep learning systems have become popular for tackling a variety of music information retrieval tasks. However, these systems often require large amounts of labeled data for supervised training, which can be very costly to obtain. To alleviate this problem, recent papers on learning music audio representations employ alternative training strategies that utilize unannotated data. In this paper, we introduce a novel cross-version approach to audio representation learning that can be used with music datasets containing several versions (performances) of a musical work. Our method exploits the correspondences that exist between two versions of the same musical section. We evaluate our proposed cross-version approach qualitatively and quantitatively on complex orchestral music recordings and show that it can better capture aspects of instrumentation compared to techniques that do not use cross-version information.

Code

Github iconOn Github

Trained Models

Extract in a subfolder outputs/models in the code repository

Dataset Download

The aligned annotations for the recordings used in this paper are made publicly available as a dataset for further research.

Extract in a subfolder data/ in the code repository.

Dataset Structure

This dataset contains aligned instrument activity annotations (in 02_Annotations/ann_audio_instruments_npz/csv). The corresponding .wav-files of music audio need to be obtained individually and placed in 01_RawData/audio_wav.

Furthermore, we also provide warping paths (in 02_Annotations/ann_audio_sync) that map the different versions to a common musical time axis.

Finally, we also provide note annotations in 02_Annotations/ann_audio_note_npz.

Naming Convention

All files follow the naming convention

(Subset)_(Composer)_(Work)_V(Version)

where

  • (Subset) is either Ours or WagnerRing.
  • (Composer) is the composer of the work (e.g. Beethoven)
  • (Work) is a brief description of the work being played (e.g. Symphony3Mvmt2, or WalkuereAct1)
  • (Version) is the number of the version for which this file contains annotations

So, for example, the file

01_RawData/audio_wav/Ours_Tschaikowsky_ViolinConcertoMvmt1_V3.wav

contains the audio for the third version of the first movement of Tschaikowsky's Violin Concerto.

Aligned Instrument Annotations

The instrument annotations in 02_Annotations/ann_audio_instruments_npz are provided as .npz-files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape

(N, 18)

with the first dimension corresponding to frames of the recording and the second dimension corresponding to different classes. We use a frame-rate of 43,0664 Hz (obtained with a hop size of 512 at a sample rate of 22050Hz). The individual entries in the second dimension correspond to the classes:

Index Class identifier
00 INST
01 WW
02 BR
03 TMP
04 VOC
05 ST
06 Fl
07 Ob
08 Cl
09 Bn
10 Hn
11 Tpt
12 Fe
13 Ma
14 Vn
15 Va
16 Vc
17 Db

(see "Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings" for explanations of the identifiers)

Exactly the same annotations are also provided as .csv-files in 02_Annotations/ann_audio_instruments_csv. Here, each row corresponds to an activity region for one of the classes, with starts and ends given in seconds. In 03_ExtraMaterial/npz_to_csv_anno.py, we provide a Python script for obtaining the .csv-files from the .npy-files.

Alignments Between Versions

The .csv-files in 02_Annotations/ann_audio_sync map the different versions per piece to a common time axis. Each file contains two columns, separated by a comma, where

  • the first column corresponds to time in seconds in the audio recording and
  • the second column corresponds to a common musical time axis per piece.

For the recordings in the WagnerRing subset, the common musical time axis is given in measures. For the remaining recordings, the common musical time axis does not have a unit and simply progresses from 0 (beginning of the piece) to 1 (end of the piece).

Note Annotations

The note annotations in 02_Annotations/ann_audio_note_npz are provided as .npz-files (i.e. Numpy arrays) to be loaded with python. The arrays have the shape

(N, 128)

with the first dimension corresponding to frames of the recording (see above for frame rate etc.) and the second dimension corresponding to different MIDI pitches.

Information on Specific Subsets

This dataset consists of a combination of audio from different previously released datasets, as well as audio we collected ourselves. In all cases, annotations are provided by us. More details on the individual subsets of the dataset follow.

WagnerRing

The audio files are from the WagnerRing dataset described in

[WRD] Christof Weiß, Vlora Arifi-Müller, Michael Krause, Frank Zalkow, Stephanie Klauk, Rainer Kleinertz, and Meinard Müller. Wagner Ring Dataset: A complex opera scenario for music processing and computational musicology. TISMIR, 2023

Concretely, we use the following versions of the first act of Die Walküre (also referred to as B1 in the dataset):

Subset Composer Work Version ID (in our dataset) In our test set? Performer / Label Identifier in [WRD]
WagnerRing Wagner WalkuereAct1 1 x Karajan / DG 1998 Karajan1966
WagnerRing Wagner WalkuereAct1 2 Neuhold / MEMBRAN 1995 Neuhold1993
WagnerRing Wagner WalkuereAct1 3 Levine / DG 2012 Levine1987
WagnerRing Wagner WalkuereAct1 4 Böhm / DECCA 2008 Bohm1967
WagnerRing Wagner WalkuereAct1 5 Keilberth/Furtwängler / ZYX 2012 KeilberthFurtw1952
WagnerRing Wagner WalkuereAct1 6 Boulez / PHILIPS 2006 Boulez1980
WagnerRing Wagner WalkuereAct1 7 x Barenboim / Warner Classics 2009 Barenboim1991
WagnerRing Wagner WalkuereAct1 8 x Haitink / EMI Classics 2008 Haitink1988

Ours

Information on the individual audio files is provided in the following table. Some links are contained to the website https://cc0.oer-musik.de/ (containing recordings which are public domain in Germany).

Subset Composer Work Version ID In our test set? Performer / Label cc0 Link
Ours Beethoven Symphony3Mvmt1 1 x Abbado / DG
Ours Beethoven Symphony3Mvmt2 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt2 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt2 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt2 4 Scherchen / Heliodor https://cc0.oer-musik.de/428002/
Ours Beethoven Symphony3Mvmt2 5 Fricsay / DG https://cc0.oer-musik.de/002894793106-55/
Ours Beethoven Symphony3Mvmt3 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt3 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt3 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt3 4 Scherchen / Heliodor https://cc0.oer-musik.de/428002/
Ours Beethoven Symphony3Mvmt3 5 Fricsay / DG https://cc0.oer-musik.de/002894793106-55/
Ours Beethoven Symphony3Mvmt4 1 Blomstedt / BC
Ours Beethoven Symphony3Mvmt4 2 Drahos / NAXOS
Ours Beethoven Symphony3Mvmt4 3 Jarvi / SONY
Ours Beethoven Symphony3Mvmt4 4 Scherchen / Heliodor https://cc0.oer-musik.de/428002/
Ours Beethoven Symphony3Mvmt4 5 Fricsay / DG https://cc0.oer-musik.de/002894793106-55/
Ours Dvorak Symphony9Mvmt1 1 Kubelik / MERCURY
Ours Dvorak Symphony9Mvmt1 2 Szell / SONY
Ours Dvorak Symphony9Mvmt1 3 Karajan / MEMBRAN https://cc0.oer-musik.de/600001041-9/
Ours Dvorak Symphony9Mvmt1 4 Toscanini / DG https://cc0.oer-musik.de/at114/
Ours Dvorak Symphony9Mvmt1 5 Fricsay / DG https://cc0.oer-musik.de/lpm18142/
Ours Dvorak Symphony9Mvmt2 1 Kubelik / MERCURY
Ours Dvorak Symphony9Mvmt2 2 Szell / SONY
Ours Dvorak Symphony9Mvmt2 3 Leaper / SONY
Ours Dvorak Symphony9Mvmt2 4 Toscanini / DG https://cc0.oer-musik.de/at114/
Ours Dvorak Symphony9Mvmt2 5 Fricsay / DG https://cc0.oer-musik.de/lpm18142/
Ours Dvorak Symphony9Mvmt4 1 x Suitner / BC
Ours Tschaikowsky ViolinConcertoMvmt1 1 Ashkenazy / DECCA
Ours Tschaikowsky ViolinConcertoMvmt1 2 Francescatti / SONY
Ours Tschaikowsky ViolinConcertoMvmt1 3 Menuhin / MEMBRAN
Ours Tschaikowsky ViolinConcertoMvmt1 4 Nishizaki / NAXOS
Ours Tschaikowsky ViolinConcertoMvmt1 5 Szering / RCA-CCV https://cc0.oer-musik.de/ccv5015-tschaikowski/
Ours Tschaikowsky ViolinConcertoMvmt2 1 Ashkenazy / DECCA
Ours Tschaikowsky ViolinConcertoMvmt2 2 Francescatti / SONY
Ours Tschaikowsky ViolinConcertoMvmt2 3 Menuhin / MEMBRAN
Ours Tschaikowsky ViolinConcertoMvmt2 4 Heifetz / SONY
Ours Tschaikowsky ViolinConcertoMvmt2 5 Kogan / BC
Ours Tschaikowsky ViolinConcertoMvmt3 1 x Mullova / PHILIPS

Acknowledgements

This work was supported by the German Research Foundation (DFG MU 2686/7-2, MU 2686/11-2). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institut für Integrierte Schaltungen IIS. The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE).

References

  1. Matthew C. McCallum
    Unsupervised Learning of Deep Features for Music Segmentation
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 346–350, 2019. DOI
    @inproceedings{McCallum19_UnsupervisedStructureLearning_ICASSP,
    author    = {Matthew C. McCallum},
    title     = {Unsupervised Learning of Deep Features for Music Segmentation},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages     = {346--350},
    address   = {Brighton, {UK}},
    year      = {2019},
    doi       = {10.1109/ICASSP.2019.8683407},
    }
  2. Janne Spijkervet and John Ashley Burgoyne
    Contrastive Learning of Musical Representations
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 673–681, 2021.
    @inproceedings{SpijkervetB21_ContrastiveLearningMusical_ISMIR,
    author    = {Janne Spijkervet and John Ashley Burgoyne},
    title     = {Contrastive Learning of Musical Representations},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Online},
    pages     = {673--681},
    year      = {2021},
    OPTurl       = {https://archives.ismir.net/ismir2021/paper/000084.pdf}
    }
  3. Frank Zalkow and Meinard Müller
    Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music
    Applied Sciences, 10(1), 2020. DOI
    @article{ZalkowMueller20_Shingles_AppliedSciences,
    author      = {Frank Zalkow and Meinard M{\"u}ller},
    title       = {Learning Low-Dimensional Embeddings of Audio Shingles for Cross-Version Retrieval of Classical Music},
    journal     = {Applied Sciences},
    volume      = {10},
    number      = {1},
    year        = {2020},
    doi         = {10.3390/app10010019},
    }
  4. Meinard Müller, Yigitcan Özer, Michael Krause, Thomas Prätzlich, and Jonathan Driedger
    Sync Toolbox: A Python Package for Efficient, Robust, and Accurate Music Synchronization
    Journal of Open Source Software (JOSS), 6(64): 1–4, 2021. DOI
    @article{MuellerOKPD21_SyncToolbox_JOSS,
    author    = {Meinard M{\"u}ller and Yigitcan {\"O}zer and Michael Krause and Thomas Pr{\"a}tzlich and Jonathan Driedger},
    title     = {{S}ync {T}oolbox: {A} {P}ython Package for Efficient, Robust, and Accurate Music Synchronization},
    journal   = {Journal of Open Source Software ({JOSS})},
    volume    = {6},
    number    = {64},
    year      = {2021},
    pages     = {3434:1--4},
    doi         = {10.21105/joss.03434}
    }
  5. Jonathan Foote
    Automatic audio segmentation using a measure of audio novelty
    In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME): 452–455, 2000.
    @inproceedings{Foote00_segmentationNovelty_ICME,
    author = {Jonathan Foote},
    title = {Automatic audio segmentation using a measure of audio novelty},
    pages = {452--455},
    booktitle = {Proceedings of the {IEEE} International Conference on Multimedia and Expo (ICME)},
    year = {2000},
    address = {New York, NY, USA},
    }