This is the accompanying website for the paper "Extending Harmonic-Percussive Separation of Audio Signals" by Jonathan Driedger, Meinard Müller and Sascha Disch [pdf][bib].

Abstract

In recent years, methods to decompose an audio signal into a harmonic and a percussive component have received a lot of interest and are frequently applied as a processing step in a variety of scenarios. One problem is that the computed components are often not of purely harmonic or percussive nature but also contain sounds that are neither clearly harmonic nor percussive. Furthermore, depending on the parameter settings, one often can observe a leakage of harmonic sounds into the percussive component and vice versa. In this paper we present two extensions to a state-of-the-art harmonic-percussive separation procedure to target these problems. First, we introduce a separation factor parameter into the decomposition process that allows for tightening separation results and for enforcing the components to be clearly harmonic or percussive. As second contribution, inspired by the classical sines+transients+noise (STN) audio model, this novel concept is exploited to add a third residual component to the decomposition which captures the sounds that lie in between the clearly harmonic and percussive sounds of the audio signal.




Teaser: Decomposition Results

Example decompositions computed with our proposed iterative procedure.

Item Name Original Decomposition
CastanetsViolinApplause
Stepdad
Heavy
Bongo
Glockenspiel
Winterreise
= harmonic component   = residual component   = percussive component

The used parameters are:
Nh = 4096, Np = 256, βh = 2, βp = 2, filter length horizontal = 200 ms, filter length vertical = 500 Hz.




Section 3: Harmonic-Percussive-Residual Separation

Section 3.2: Median Filter Length

Fixed parameters: β = 1, N = 1024

CastanetsViolinApplause

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component

Stepdad

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component

Heavy

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component

Bongo

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component

Glockenspiel

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component

Winterreise

freq \ time 50 ms 100 ms 200 ms 500 ms 1000 ms
1000 Hz
500 Hz
200 Hz
100 Hz
50 Hz
= harmonic component   = percussive component



Section 3.2: Frame Size vs. Separation Factor

Fixed parameters: filter length horizontal = 200 ms, filter length vertical = 500 Hz

CastanetsViolinApplause

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component

Stepdad

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component

Heavy

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component

Bongo

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component

Glockenspiel

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component

Winterreise

N \ β 1.0 1.5 2.0 3.0 5.0 10.0
4096
2048
1024
512
256
128
= harmonic component   = residual component   = percussive component



Section 3.3: Iterative Procedure

Fixed parameters: Nh = 4096, Np = 256, filter length horizontal = 200 ms, filter length vertical = 500 Hz

CastanetsViolinApplause

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component

Stepdad

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component

Heavy

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component

Bongo

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component

Glockenspiel

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component

Winterreise

βp \ βh 1.5 2.0 3.0 5.0 10.0
10.0
5.0
3.0
2.0
1.5
= harmonic component   = residual component   = percussive component



Section 4: Evaluation

ViolinCastanetsApplause

SDR
SIR
SAR

BL

HP

HP-I

HPR

HPR-I

HPR-IO

BL

HP

HP-I

HPR

HPR-I

HPR-IO

BL

HP

HP-I

HPR

HPR-I

HPR-IO

Violin -3.10 -5.85 0.08 8.23 7.65 8.85 -3.10 -5.09 1.08 17.69 14.58 21.65 274.25 8.33 9.44 8.82 8.78 9.11
Castanets -2.93 3.58 2.86 8.29 9.14 9.28 -2.93 6.06 10.45 22.34 20.66 24.41 274.25 8.14 4.07 8.49 9.50 9.44
Applause -3.04 - -7.03 4.25 4.93 5.00 -3.04 - 14.69 8.41 12.80 9.04 274.25 - -6.85 6.95 5.93 7.69

Table 1. Objective evaluation measures. All values are given in dB. Click on the values to listen to the respective components.





For comments and feedback, please contact Jonathan Driedger (jonathan (at) audiolabs-erlangen.de).

page last modified Friday, 11 April 2014 - 09:00