Demos and Code

MTD: A Multimodal Dataset of Musical Themes for MIR Research


Link: Website

Musical themes are essential elements in Western classical music. In this paper, we present the Musical Theme Dataset (MTD), a multimodal dataset inspired by “A Dictionary of Musical Themes” by Barlow and Morgenstern from 1948. For a subset of 2067 themes of the printed book, we created several digital representations of the musical themes. Beyond graphical sheet music, we provide symbolic music encodings, audio snippets of music recordings, alignments between the symbolic and audio representations, as well as detailed metadata on the composer, work, recording, and musical characteristics of the themes. In addition to the data, we also make several parsers and web-based interfaces available to access and explore the different modalities and their relations through visualizations and sonifications. These interfaces also include computational tools, bridging the gap between the original dictionary and music information retrieval (MIR) research. The dataset is of relevance for various subfields and tasks in MIR, such as cross-modal music retrieval, music alignment, optical music recognition, music transcription, and computational musicology.

Using Weakly Aligned Score—Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval


Link: Website

Many music information retrieval tasks involve the comparison of a symbolic score representation with an audio recording. A typical strategy is to compare score–audio pairs based on a common mid-level representation, such as chroma features. Several recent studies demonstrated the effectiveness of deep learning models that learn task-specific mid-level representations from temporally aligned training pairs. However, in practice, there is often a lack of strongly aligned training data, in particular for real-world scenarios. In our study, we use weakly aligned score–audio pairs for training, where only the beginning and end of a score excerpt is annotated in an audio recording, without aligned correspondences in between. To exploit such weakly aligned data, we employ the Connectionist Temporal Classification (CTC) loss to train a deep learning model for computing an enhanced chroma representation. We then apply this model to a cross-modal retrieval task, where we aim at finding relevant audio recordings of Western classical music, given a short monophonic musical theme in symbolic notation as a query. We present systematic experiments that show the effectiveness of the CTC-based model for this theme-based retrieval task.

Tools for Semi-Automatic Bounding Box Annotation of Musical Measures in Sheet Music


Link: Website

In score following, one main goal is to highlight measure positions in sheet music synchronously to audio playback. Such applications require alignments between sheet music and audio representations. Often, such alignments can be computed automatically in the case that the sheet music representations are given in some symbolically encoded music format. However, sheet music is often available only in the form of digitized scans. In this case, the automated computation of accurate alignments poses still many challenges. In this contribution, we present various semi-automatic tools for solving the subtask of determining bounding boxes (given in pixels) of measure positions in digital scans of sheet music—a task that is extremely tedious when being done manually.

Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings


Link: Website

In this paper, we consider a cross-modal retrieval scenario of Western classical music. Given a short monophonic musical theme in symbolic notation as query, the objective is to find relevant audio recordings in a database. A major challenge of this retrieval task is the possible difference in the degree of polyphony between the monophonic query and the music recordings. Previous studies for popular music addressed this issue by performing the cross-modal comparison based on predominant melodies extracted from the recordings. For Western classical music, however, this approach is problematic since the underlying assumption of a single predominant melody is often violated. Instead of extracting the melody explicitly, another strategy is to perform the cross-modal comparison directly on the basis of melody-enhanced salience representations. As the main contribution of this paper, we evaluate several conceptually different salience representations for our cross-modal retrieval scenario. Our extensive experimental results, which have been made available on a website, comprise more than 2000 musical themes and 100 hours of audio recordings.

A Web-Based Interface for Score Following and Track Switching in Choral Music


Link: Website

Music can be represented in many different ways. In particular, audio and sheet music renditions are of high importance in Western classical music. For choral music, a sheet music representation typically consists of several parts (for the individual singing voice sections) and possibly an accompaniment. Within a choir rehearsal scenario, there are various tasks that can be supported by techniques developed in music information retrieval (MIR). For example, it may be helpful for a singer if both, audio and sheet music modalities, are present synchronously—a well-known task that is known as score following. Furthermore, listening to individual parts of choral music can be very instructive for practicing. The listening experience can be enhanced by switching between the audio tracks of a suitable multi-track recording. In this contribution, we introduce a web-based interface that integrates score-following and track-switching functionalities, build upon already existing web technology.