Mid-Level Audio Features Based on Cascaded Harmonic-Residual-Percussive Separation

Accompanying website for the paper:

  1. Patricio López-Serrano, Christian Dittmar, and Meinard Müller
    Mid-Level Audio Features Based on Cascaded Harmonic-Residual-Percussive Separation
    In Proceedings of the AES Conference on Semantic Audio, 2017. Details Demo
    author    = {Patricio L\'{o}pez-Serrano and Christian Dittmar and Meinard M{\"u}ller},
    title     = {Mid-Level Audio Features Based on Cascaded Harmonic-Residual-Percussive Separation},
    booktitle = {Proceedings of the {AES} Conference on Semantic Audio},
    address = {Erlangen, Germany},
    year      = {2017},
    url-details = {http://www.aes.org/e-lib/browse.cfm?elib=18755},



Harmonic-percussive separation is a technique that splits music recordings into harmonic and percussive components--it can be used as a preprocessing step to facilitate further tasks like key detection (harmonic component) or drum transcription (percussive component). We propose a cascaded harmonic-residual-percussive (HRP) procedure yielding a mid-level feature to analyze musical phenomena like percussive event density, timbral changes, and homogeneous structural segments. We first outline the steps to compute cascaded HRP features (CHRP) and then illustrate their capabilities by means of three examples: to visualize percussive and noise-like properties of snare-drum playing techniques, to examine changes between harmonic and percussive timbres in electronic music, and to identify homogeneous, purely percussive passages in funk and soul recordings (also known as breaks).

HRP Ramp

This figure contains the CHRP feature matrix for a signal consisting of five non-overlapping sound samples: castanets, snare roll, applause, staccato strings and legato violin. The castanets have the highest percussive energy and are well confined to the P component. The snare roll is predominantly composed of RP and RRR. Indeed, snare drums have a percussive attack and a decay curve which is both noisy and tonal (according to the drum's tuning frequency). When struck in rapid succession, the noisy decay tails overpower the percussive onsets. Applause is centered around RRR and well-confined to the residual region. The staccato strings are predominantly harmonic, with an additional RH component which corresponds to the noisy attacks that emerge in this playing technique. Violin legato is confined to the H component, since the stable, harmonic signal properties dominate all other components.


Event Density: Snare Paradiddles

Our first example shows energy migrating from percussive to residual in a snare-playing technique known as paradiddles}. Figure (a) shows the waveform of paradiddles played on a snare drum; first with increasing speed (0--40 sec), and then with decreasing speed (40--75 sec). Figure (b) contains the corresponding CHRP feature matrix; notice how there is very little remaining P energy after a certain onset frequency or playing speed has been reached (around 25 sec). This is due to the fact that the noise-like tails reach a relative proximity and overpower the individual percussive onsets, centering the energy around the residual components in the feature matrix.


Homogeneous Musical Structure: Drum Break

Hip hop producers often seek purely percussive passages (also known as drum breaks or breaks) within funk and soul recordings in order to sample them and create new musical material. The Figure (top part, red regions) shows a structural annotation of Funky Drummer by James Brown, focusing on the location of the drum breaks. The bottom part contains the corresponding CHRP feature matrix, where the drum breaks -- a timbrally homogeneous musical structure -- are visible at the beginning and end of the piece: notice how the annotated parts correspond to low H energy and high P energy.


Structure and Timbral Changes in Electronic Music

We show CHRP for Nicole Moudaber's track Liberum Spirita (Original Mix) YouTube Link. The audio offered here contains four bars before and after each of the three annotated points. (1) [4:53] indicates the removal of the bass drum and the beginning of a section dominated by cymbal sounds and an increasing level of noise-like components (notice the increase in all R components and the decrease in the H component); (2) [6:08] indicates the reintroduction of the bass drum. The region between (2) and (3) contains an articulative sound in the form of a noise burst. Timepoint (3) also marks the removal of a long-tailed ride cymbal sound, which can be seen as a drop in RRR energy. The decrease in RH energy at (3) is primarily due to the removal of a synthesizer sound consisting of both tonal and noise-like components.


Legal notice

The multimedia linked on this page are given for educational purposes only. If any legal problems occur, please contact us. Any content that allegedly infringes the copyright of a third party will be removed upon request by the copyright holder.