FMP AudioLabs
C6

Energy-Based Novelty


Following Section 6.1.1 of [Müller, FMP, Springer 2015], we introduce in this notebook an energy-based approach for computing a novelty curve.

Local Energy Function

Often a note onset goes along with a sudden increase of the signal's energy. Based on this assumption, a straightforward way to detect note onsets is to transform the signal into a local energy function that indicates the local energy of the signal for each time instance and then to look for sudden changes in this function. Let $x:\mathbb{Z}\to\mathbb{R}$ be a DT-signal. Furthermore, let $w:[-M:M]\to\mathbb{R}$ for some $M\in\mathbb{N}$ be a bell-shaped window function centered at time zero (e.g., a Hann window). The local energy of $x$ with regard to $w$ is defined to be the function $E_w^x:\mathbb{Z}\to\mathbb{R}$ given by

\begin{equation} E_w^x(n) := \sum_{m=-M}^{M} |x(n+m)w(m)|^2 = \sum_{m\in\mathbb{Z}}|x(m)w(m-n)|^2 \end{equation}

In the following figures, we continue our Queen example "Another one bites the dust" (see the introductory FMP notebook on onset detection).

In [1]:
import numpy as np
import os, sys, librosa
import pandas as pd
from scipy import signal
from matplotlib import pyplot as plt
import IPython.display as ipd

sys.path.append('..')
import libfmp.b
import libfmp.c2
import libfmp.c6

%matplotlib inline

fn_ann = os.path.join('..', 'data', 'C6', 'FMP_C6_F01_Queen.csv')
ann, label_keys = libfmp.c6.read_annotation_pos(fn_ann)

fn_wav = os.path.join('..', 'data', 'C6', 'FMP_C6_F01_Queen.wav')
Fs = 22050
x, Fs = librosa.load(fn_wav, Fs) 
x_duration = len(x)/Fs

N = 2048
w = signal.hann(N)

#Calculate local energy
x_square = x**2
energy_local = np.convolve(x_square, w**2, 'same')

libfmp.b.plot_signal(x, Fs, title='Waveform')
libfmp.b.plot_signal(x_square, Fs, title='Waveform (squared)')
fig, ax, line = libfmp.b.plot_signal(energy_local, Fs, color='k', 
                    title='Local energy function (Hann window)')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration);
plt.tight_layout()

Discrete Derivative and Half-Wave Rectification

To measure energy changes, we take a derivative of the local energy function. In the discrete case, the easiest way to realize such a derivative is to take the difference between two subsequent energy values. Furthermore, since we are interested in energy increases (and not decreases), we keep only the positive differences while setting the negative differences to zero. The latter step is known as half-wave rectification and is notated as:

\begin{equation} |r|_{\geq 0} := \frac{r+|r|}{2} = \left\{\begin{array}{ll} r, &\,\, \mbox{if $r\geq 0$,}\\ 0, &\,\, \mbox{if $r< 0$,} \end{array}\right. \end{equation}

for $r\in\mathbb{R}$. Altogether, we obtain an energy-based novelty function $\Delta_\mathrm{Energy}:\mathrm{Z}\to\mathbb{R}$ given by

\begin{equation} \Delta_\mathrm{Energy}(n):= |E_w^x(n+1)-E_w^x(n)|_{\geq 0} \end{equation}
In [2]:
#Differentiation and half-wave rectification
energy_local_diff = np.diff(energy_local)
energy_local_diff = np.concatenate((energy_local_diff, np.array([0])))
novelty_energy = np.copy(energy_local_diff)
novelty_energy[energy_local_diff < 0] = 0

libfmp.b.plot_signal(energy_local_diff, Fs, color='k', 
                    title='Discrete derivative (Hann window)')
fig, ax, line = libfmp.b.plot_signal(novelty_energy, Fs, color='k',
                    title='Energy-based novelty function (Hann window)')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration);
plt.tight_layout()

Note that this novelty function nicely indicates that the percussive beats contain a lot of energy. The low-energy hihat strokes in between, however, are not visible. Furthermore, note that the smoothing effect introduced by the bell-shaped Hann windowed is essential in this procedure when applying a simple framewise difference function. For example, using a rectangular window instead, the difference function reacts to small local fluctuations leading to a noisy energy function. This is demonstrated by the following figure.

In [3]:
# Use rectangular window
w = signal.boxcar(N)
x_square = x**2
energy_local = np.convolve(x_square, w**2, 'same')
energy_local_diff = np.diff(energy_local)
energy_local_diff = np.concatenate((energy_local_diff, np.array([0])))
novelty_energy = np.copy(energy_local_diff)
novelty_energy[energy_local_diff < 0] = 0

fig, ax, line = libfmp.b.plot_signal(energy_local, Fs, color='k',
                    title='Local energy function (rectangular window)')
libfmp.b.plot_signal(energy_local_diff, Fs, color='k', 
                    title='Discrete derivative (rectangular window)')
fig, ax, line = libfmp.b.plot_signal(novelty_energy, Fs, color='k', title='Energy-based novelty function (rectangular window)')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration);
plt.tight_layout()

Logarithmic Compression

To account for the fact that human perception of sound intensity is logarithmic in nature, one often applies a logarithm to the energy values, for example, by switching to the logarithmic decibel or by applying logarithmic compression. In the later case, we use the function $\Gamma_\gamma:\mathbb{R}_{>0} \to \mathbb{R}_{>0}$ defined by

\begin{equation} \Gamma_\gamma(v):=\log(1+ \gamma \cdot v), \end{equation}

where the positive constant $\gamma\in\mathbb{R}_{>0}$ regulates the degree of compression. The resulting novelty function is given by:

\begin{equation} \Delta_\mathrm{Energy}^\mathrm{Log}(n):= |\Gamma_\gamma(E_w^x(n+1))-\Gamma_\gamma(E_w^x(n))|_{\geq 0}. \end{equation}

Besides the window length $N$, we introduce in the following implementation also a hope-size parameter $H$, which allows for decreasing the feature sampling rate of the computed novelty. Furthermore, dividing by its maximum value, we normalize the novelty function. In the next figure, continuing our running example, one can observe that some of the weak hihat onsets become visible (see $t=1.3~\mathrm{sec}$ and $t=2.3~\mathrm{sec}$) when using a logarithmic novelty function. As a downside of logarithmic compression, some noise-like sound components may be amplified, possibly leading to spurious peaks. As shown in this example, logarithmic compression also shifted some peak positions forward, now preceding the reference annotations (shown in red). This may indicate that the reference annotations, which were generated manually by a human listener, correspond to slightly delay "perceived" onset positions.

In [4]:
def compute_novelty_energy(x, Fs=1, N=2048, H=128, gamma=10.0, norm=True):
    """Compute energy-based novelty function

    Notebook: C6/C6S1_NoveltyEnergy.ipynb

    Args:
        x (np.ndarray): Signal
        Fs (scalar): Sampling rate (Default value = 1)
        N (int): Window size (Default value = 2048)
        H (int): Hop size (Default value = 128)
        gamma (float): Parameter for logarithmic compression (Default value = 10.0)
        norm (bool): Apply max norm (if norm==True) (Default value = True)

    Returns:
        novelty_energy (np.ndarray): Energy-based novelty function
        Fs_feature (scalar): Feature rate
    """
    # x_power = x**2
    w = signal.hann(N)
    Fs_feature = Fs / H
    energy_local = np.convolve(x**2, w**2, 'same')
    energy_local = energy_local[::H]
    if gamma is not None:
        energy_local = np.log(1 + gamma * energy_local)
    energy_local_diff = np.diff(energy_local)
    energy_local_diff = np.concatenate((energy_local_diff, np.array([0])))
    novelty_energy = np.copy(energy_local_diff)
    novelty_energy[energy_local_diff < 0] = 0
    if norm:
        max_value = max(novelty_energy)
        if max_value > 0:
            novelty_energy = novelty_energy / max_value
    return novelty_energy, Fs_feature

N = 2048
H = 128
nov_1, Fs_nov = compute_novelty_energy(x, Fs=Fs, N=N, H=H, gamma=None)
nov_2, Fs_nov = compute_novelty_energy(x, Fs=Fs, N=N, H=H, gamma=1000)

fig, ax, line = libfmp.b.plot_signal(nov_1, Fs=Fs_nov, color='k', 
                    title='Novelty function (original)')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration);
fig, ax, line = libfmp.b.plot_signal(nov_2, Fs=Fs_nov, color='k',
                    title='Novelty function with logarithmic compression')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration);

Example: Note Onsets for Different Instruments

Another general problem in onset detection is energy fluctuation in nonsteady sounds as a result of vibrato or tremolo. Especially for purely energy-based procedures, amplitude modulations often lead to spurious peaks in the resulting novelty function. This is demonstrated by the following examples, which show the energy-based novelty function for the note C4 played by different instruments. While the novelty function shows a single clear peak in the case of a piano sound, there are many additional peaks in the case of a violin or flute sound. Furthermore, the relatively slow energy increase at the beginning of the violin sound my lead to a smeared and temporally inaccurate onset peak.

In [5]:
fn_ann = os.path.join('..', 'data', 'C6', 'FMP_C6_F04_NoteC4_PTVF.csv')
ann, label_keys = libfmp.c6.read_annotation_pos(fn_ann, label='onset', header=0)

fn_wav = os.path.join('..', 'data', 'C6','FMP_C6_F04_NoteC4_PTVF.wav')
x, Fs = librosa.load(fn_wav)
x_duration = len(x)/Fs
N = 2048
H = 256
nov, Fs_nov = compute_novelty_energy(x, Fs=Fs, N=N, H=H, gamma=None)

plt.figure(figsize=(9,4))
ax = plt.subplot(2,1,1)
fig, ax, line = libfmp.b.plot_signal(x, Fs, ax = ax, title='Waveform')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                    nontime_axis=True, time_min=0, time_max=x_duration)

ax = plt.subplot(2,1,2)
fig, ax, line = libfmp.b.plot_signal(nov, Fs=Fs_nov, ax = ax, color='k', 
                     title='Novelty function')
libfmp.b.plot_annotation_line(ann, ax=ax, label_keys=label_keys,
                nontime_axis=True, time_min=0, time_max=x_duration)
plt.ylim([0, 0.5]);
plt.tight_layout()

Further Notes

Acknowledgment: This notebook was created by Meinard Müller and Angel Villar-Corrales.
C0 C1 C2 C3 C4 C5 C6 C7 C8