Following Section 4.3.2 of [Müller, FMP, Springer 2015], we introduce in this notebook the concept of scape plots and apply them for visualizing the fitness of segments. These plots were originally introduced into the music processing area by Sapp and then applied for structure analysis by Müller and Jiang.
In the context of audio thumbnailing, we computed a fitness measure that assigns to each possible segment a fitness value expressing a segment-specific property. We now introduce a representation by which a segment-dependent property can be visualized in a compact and hierarchical way. Recall that a segment $\alpha=[s:t]\subseteq [1:N]$ is uniquely determined by its starting point $s$ and its end point $t$. Since any two numbers $s,t\in[1:N]$ with $s\leq t$ define a segment, there are $(N+1)N/2$ different segments. Instead of considering start and end points, each segment can also be uniquely described by its center
$$ c(\alpha):=(s+t)/2 $$and its length $|\alpha|$. Using the center to parameterize a horizontal axis and the length to parameterize the height, each segment can be represented by a point in a triangular representation. This way, the set of segments are ordered from bottom to top in a hierarchical way according to their length. In particular, the top of this triangle corresponds to the unique segment of maximal length $N$ and the bottom points of the triangle correspond to the $N$ segments of length one (where the start point coincides with the end point). Furthermore, all segments $\alpha'\subseteq\alpha$ contained in a given segment $\alpha$ correspond to points in the triangular representation that lie in a subtriangle below the point given by $\alpha$
Given a triangular representation of all segments within $[1:N]$, the following example visually indicates the following sets of segments (see Exercise 4.12 of [Müller, FMP, Springer 2015]):
(a) All segments having a minimal length above a given threshold $\theta\geq 0$
(b) All segments that contain a given segment $\alpha$
(c) All segments that are disjoint to a given segment $\alpha$
(d) All segments that contain the center $c(\alpha)$ of a given segment $\alpha$
The triangular representation can be used as a grid for visualizing a specific numeric property $\varphi(\alpha)\in\mathbb{R}$ that can be computed for all segments $\alpha$. This property, for example, can be the fitness values as used for audio thumbnailing (see Section 4.3 of [Müller, FMP, Springer 2015]). Such a visual representation is also referred to as scape plot representation of the property. More precisely, we define a scape plot $\Delta$ by setting
\begin{equation} \label{eq:AudioStru:Thumb:SPfitness} \Delta(c(\alpha),|\alpha|):=\varphi(\alpha) \end{equation}for segment $\alpha$. As a toy example, we consider the function $\varphi$ defined by $\varphi(\alpha):= (t-s+1)/N$ for $\alpha=[s:t]$, which encodes the segment lengths relative to the total length $N$. In the following code cell, we provide a visualization function for plotting a scape plot representation of this function.
N
-square matrix SP
as data structure to the store the segment-dependent property $\varphi(\alpha)\in\mathbb{R}$. We use the first dimension of SP
to encode the length and the second one to encode the center. Since indexing in Python starts with index 0
, one needs to be careful when interpreting the length dimension. In particular, the entry SP[length_minus_one, start]
contains the information for the segment having length length_minus_one + 1
for length_minus_one = 0, ..., N-1
. Furthermore, note that only the left-upper part (including the diagonal) of SP
is used.
import numpy as np
import os, sys, librosa, math
from scipy import signal
from matplotlib import pyplot as plt
import matplotlib
import matplotlib.gridspec as gridspec
import IPython.display as ipd
import pandas as pd
from numba import jit
from matplotlib.colors import ListedColormap
sys.path.append('..')
import libfmp.b
import libfmp.c4
from libfmp.b import FloatingBox
%matplotlib inline
def visualize_scape_plot(SP, Fs=1, ax=None, figsize=(4, 3), title='',
xlabel='Center (seconds)', ylabel='Length (seconds)'):
"""Visualize scape plot
Notebook: C4/C4S3_ScapePlot.ipynb
Args:
SP: Scape plot data (encodes as start-duration matrix)
Fs: Sampling rate
ax, figsize, title, xlabel, ylabel: Standard parameters for plotting
Returns:
fig, ax, im
"""
fig = None
if(ax is None):
fig = plt.figure(figsize=figsize)
ax = plt.gca()
N = SP.shape[0]
SP_vis = np.zeros((N, N))
for length_minus_one in range(N):
for start in range(N-length_minus_one):
center = start + length_minus_one//2
SP_vis[length_minus_one, center] = SP[length_minus_one, start]
extent = np.array([-0.5, (N-1)+0.5, -0.5, (N-1)+0.5])/Fs
im = plt.imshow(SP_vis, cmap='hot_r', aspect='auto', origin='lower', extent=extent)
x = np.asarray(range(N))
x_half_lower = x/2
x_half_upper = x/2 + N/2 - 1/2
plt.plot(x_half_lower/Fs, x/Fs, '-', linewidth=3, color='black')
plt.plot(x_half_upper/Fs, np.flip(x, axis=0)/Fs, '-', linewidth=3, color='black')
plt.plot(x/Fs, np.zeros(N)/Fs, '-', linewidth=3, color='black')
plt.xlim([0, (N-1) / Fs])
plt.ylim([0, (N-1) / Fs])
ax.set_title(title)
ax.set_xlabel(xlabel)
ax.set_ylabel(ylabel)
plt.tight_layout()
plt.colorbar(im, ax=ax)
return fig, ax, im
N = 9
SP = np.zeros((N,N))
for k in range(N):
for s in range(N-k):
length = k + 1
SP[k, s]= length/N
plt.figure(figsize=(7,3))
ax = plt.subplot(121)
plt.imshow(SP, cmap='hot_r', aspect='auto')
ax.set_title('Data structure (N = %d)'%N)
ax.set_xlabel('Segment start (samples)')
ax.set_ylabel('Length minus one (samples)')
plt.colorbar()
ax = plt.subplot(122)
fig, ax, im = visualize_scape_plot(SP, Fs=1, ax=ax, title='Scape plot visualization',
xlabel='Segment center (samples)', ylabel='Length minus one (samples)')
We now use the scape plot representation for visualizing the fitness measure for all segments. As first example, we continue with our Brahms example. Recall that this piece has the musical structure $A_1A_2B_1B_2CA_3B_3B_4D$. Using settings as in FMP notebook on audio thumbnailing, we compute a (normalized) self-similarity matrix (SSM), which serves as input of our fitness computation.
fn_wav = os.path.join('..', 'data', 'C4', 'FMP_C4_Audio_Brahms_HungarianDances-05_Ormandy.wav')
tempo_rel_set = libfmp.c4.compute_tempo_rel_set(0.66, 1.5, 5)
penalty = -2
x, x_duration, X, Fs_feature, S, I = libfmp.c4.compute_sm_from_filename(fn_wav, L=41, H=10,
L_smooth=8, tempo_rel_set=tempo_rel_set, penalty=penalty, thresh= 0.15)
S = libfmp.c4.normalization_properties_ssm(S)
fn_ann_color = 'FMP_C4_Audio_Brahms_HungarianDances-05_Ormandy.csv'
fn_ann = os.path.join('..', 'data', 'C4', fn_ann_color)
ann_frames, color_ann = libfmp.c4.read_structure_annotation(fn_ann, fn_ann_color, Fs=Fs_feature)
cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)
fig, ax, im = libfmp.c4.plot_ssm_ann(S, ann_frames, Fs=1, color_ann=color_ann, cmap=cmap_penalty,
xlabel='Time (frames)', ylabel='Time (frames)')
In the next code cell, we compute the fitness measure $\varphi(\alpha)\in\mathbb{R}$ (as well as the score $\sigma(\alpha)$ , normalized score $\bar{\sigma}(\alpha)$ , coverage $\gamma(\alpha)$, and normalized coverage $\bar{\gamma}(\alpha)$) for all segments $\alpha$.
# @jit(nopython=True)
def compute_fitness_scape_plot(S):
"""Compute scape plot for fitness and other measures
Notebook: /C4/C4S3_ScapePlot.ipynb
Args:
S: Self-similarity matrix
Returns:
SP_all: Vector containing five different scape plots for five measures
(fitness, score, normalized score, coverage, normlized coverage)
"""
N = S.shape[0]
SP_fitness = np.zeros((N, N))
SP_score = np.zeros((N, N))
SP_score_n = np.zeros((N, N))
SP_coverage = np.zeros((N, N))
SP_coverage_n = np.zeros((N, N))
for length_minus_one in range(N):
for start in range(N-length_minus_one):
S_seg = S[:, start:start+length_minus_one+1]
D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)
path_family = libfmp.c4.compute_optimal_path_family(D)
fitness, score, score_n, coverage, coverage_n, path_family_length = libfmp.c4.compute_fitness(
path_family, score, N)
SP_fitness[length_minus_one, start] = fitness
SP_score[length_minus_one, start] = score
SP_score_n[length_minus_one, start] = score_n
SP_coverage[length_minus_one, start] = coverage
SP_coverage_n[length_minus_one, start] = coverage_n
SP_all = [SP_fitness, SP_score, SP_score_n, SP_coverage, SP_coverage_n]
return SP_all
SP_all = compute_fitness_scape_plot(S)
Next, we visualize the fitness values $\varphi(\alpha)$ using a scape plot representation, which we also refer to as fitness scape plot. Furthermore, we also plot the fitness-maximizing segment or audio thumbnail
$$ \alpha^\ast := \underset{\alpha}{\mathrm{argmax}} \,\, \varphi(\alpha). $$along with its path family and induced segments. Note that the thumbnail as well as the induced segments are represented by points (blue and green points, respectively) in the scape plot representation.
def seg_max_SP(SP):
"""Return segment with maximal value in SP
Notebook: C4/C4S3_ScapePlot.ipynb
Args:
SP: Scape plot
Returns:
seg: Segment [start_index:end_index]
"""
N = SP.shape[0]
#value_max = np.max(SP)
arg_max = np.argmax(SP)
ind_max = np.unravel_index(arg_max, [N, N])
seg = [ind_max[1], ind_max[1]+ind_max[0]]
return seg
def plot_seg_in_sp(ax, seg, S=None, Fs=1):
"""Plot segment and induced segements as points in SP visualization
Notebook: C4/C4S3_ScapePlot.ipynb
Args:
ax: Axis for image
seg: Segment [start_index:end_index]
S: Self-similarity matrix
Fs: Sampling rate
"""
if S is not None:
S_seg = S[:, seg[0]:seg[1]+1]
D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)
path_family = libfmp.c4.compute_optimal_path_family(D)
segment_family, coverage = libfmp.c4.compute_induced_segment_family_coverage(path_family)
length = segment_family[:, 1] - segment_family[:, 0] + 1
center = segment_family[:, 0] + length//2
ax.scatter(center/Fs, length/Fs, s=64, c='white', zorder=9999)
ax.scatter(center/Fs, length/Fs, s=16, c='lime', zorder=9999)
length = seg[1] - seg[0] + 1
center = seg[0] + length//2
ax.scatter(center/Fs, length/Fs, s=64, c='white', zorder=9999)
ax.scatter(center/Fs, length/Fs, s=16, c='blue', zorder=9999)
def plot_sp_ssm(SP, seg, S, ann, color_ann=[], title='', figsize=(5, 4)):
"""Visulization of SP and SSM
Notebook: C4/C4S3_ScapePlot.ipynb"""
float_box = libfmp.b.FloatingBox()
fig, ax, im = visualize_scape_plot(SP, figsize=figsize, title=title,
xlabel='Center (frames)', ylabel='Length (frames)')
plot_seg_in_sp(ax, seg, S)
float_box.add_fig(fig)
penalty = np.min(S)
cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)
fig, ax, im = libfmp.c4.plot_ssm_ann_optimal_path_family(
S, ann, seg, color_ann=color_ann, fontsize=8, cmap=cmap_penalty, figsize=(4, 4),
ylabel='Time (frames)')
float_box.add_fig(fig)
float_box.show()
def check_segment(seg, S):
"""Prints properties of segments with regard to SSM S
Notebook: C4/C4S3_ScapePlot.ipynb
Args:
seg: Segment [start_index:end_index]
S: Self-similarity matrix
Returns:
path_family: Optimal path family
"""
N = S.shape[0]
S_seg = S[:, seg[0]:seg[1]+1]
D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)
path_family = libfmp.c4.compute_optimal_path_family(D)
fitness, score, score_n, coverage, coverage_n, path_family_length = libfmp.c4.compute_fitness(
path_family, score, N)
segment_family, coverage2 = libfmp.c4.compute_induced_segment_family_coverage(path_family)
print('Segment (alpha):', seg)
print('Length of segment:', seg[-1]-seg[0]+1)
print('Length of feature sequence:', N)
print('Induced segment path family:\n', segment_family)
print('Fitness: %0.10f' % fitness)
print('Score: %0.10f' % score)
print('Normalized score: %0.10f' % score_n)
print('Coverage: %d, %d' % (coverage, coverage2))
print('Normalized coverage: %0.10f' % coverage_n)
print('Length of all paths of family: %d' % path_family_length)
return path_family
figsize=(5,4)
SP = SP_all[0]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Scape plot: Fitness', figsize=figsize)
plt.show()
path_family = check_segment(seg, S)
The resulting fitness scape plot reflects the musical structure in a hierarchical way. The thumbnail segment is $\alpha^\ast=[175:197]$, which musically corresponds to the $B_4$-part. The coordinates in the scape plot are specified by the center $c(\alpha)=186$ and the length $|\alpha|=23$. The induced segment family consists of the four $B$-part. Note that all four $B$-part segments have almost the same fitness and lead to more or less the same segment family. Recall that the introduced fitness measure slightly favors shorter segments. Therefore, since in this recording the $B_4$-part is played faster than, e.g., the $B_1$-part, the fitness measure favors the $B_4$-part segment over the $B_1$-part segment. In other words, our procedure chooses the shortest most representative segment as thumbnail.
Next, we illustrate that in the definition of the fitness measure (see also Section 4.3.1.3 of [Müller, FMP, Springer 2015]), the normalization of score and coverage as well as the combination (harmonic mean) of the two measures is of crucial importance. To this end, we look at the scape plots of the various measures (as well as the measure-maximizing segments) individually. We start the score measure $\sigma$. The score-maximizing segment is $\alpha=[1:N]$, which is the entire recording.
SP = SP_all[1]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Scape plot: Score', figsize=figsize)
path_family = check_segment(seg, S)
Subtracting trivial self-explanations and normalizing with regard to the length of the optimal path family, yields the normalized score $\bar{\sigma}$. Since this measure expresses the average score of a path family without expressing how much of the audio material is actually covered, many of the small segments have a relatively high score. Using such a measure would typically result in false-positive segments of small length. This is also demonstrated by the following scape plot and the $\bar{\sigma}$-maximizing path family,
SP = SP_all[2]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Scape plot: Normalized score', figsize=figsize)
path_family = check_segment(seg, S)
The next figure shows the scape plot for the coverage measure $\gamma$. As for the score, the coverage-maximizing segment is $\alpha=[1:N]$, which is the entire recording.
SP = SP_all[3]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Scape plot: Coverage', figsize=figsize)
path_family = check_segment(seg, S)
Subtracting trivial self-explanations and normalizing with regard to the length $N$ of the yields the normalized coverage $\bar{\gamma}$. As an be seen by the following scape plot along with the $\bar{\gamma}$-maximizing segment, the coverage measures a property that is conceptually different to the score. Opposed to the normalized score, the normalized coverage typically favors segments which induced segment family covers large portions of the input sequence.
SP = SP_all[4]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Scape plot: Normalized coverage', figsize=figsize)
path_family = check_segment(seg, S)
fn_ann_color = 'FMP_C4_Audio_Beatles_YouCantDoThat.csv'
fn_ann = os.path.join('..', 'data', 'C4', fn_ann_color)
fn_wav = os.path.join('..', 'data', 'C4', 'FMP_C4_Audio_Beatles_YouCantDoThat.wav')
tempo_rel_set = libfmp.c4.compute_tempo_rel_set(0.66, 1.5, 5)
penalty = -2
x, x_duration, X, Fs_feature, S, I = libfmp.c4.compute_sm_from_filename(fn_wav, L=21, H=10,
L_smooth=8, tempo_rel_set=tempo_rel_set, penalty=penalty, thresh= 0.15)
S = libfmp.c4.normalization_properties_ssm(S)
ann_frames, color_ann = libfmp.c4.read_structure_annotation(fn_ann, fn_ann_color, Fs=Fs_feature)
color_ann = {'I': [1, 0, 0, 0.2], 'V': [0, 1, 0, 0.2], 'B': [0, 0, 1, 0.2], '': [1, 1, 1, 0.2]}
cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)
fig, ax, im = libfmp.c4.plot_ssm_ann(S, ann_frames, Fs=1, color_ann=color_ann, cmap=cmap_penalty,
xlabel='Time (frames)', ylabel='Time (frames)')
SP_all = compute_fitness_scape_plot(S)
figsize=(5,4)
SP = SP_all[0]
seg = seg_max_SP(SP)
plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann,
title='Fitness scape plot', figsize=figsize)
path_family = check_segment(seg, S)
Within the field of music processing, scape plots were originally introduced by Craig Sapp to hierarchically represent harmony in musical scores. In this notebook, we used this concept for visualizing the fitness measure for all segments in a compact and hierarchical way. This allowed us to not only gain an overview of the repetitive structure of a music recording, but also to better understand the construction of the fitness measure by looking at score and coverage values separately. In particular, we demonstrated the following:
The fitness scape plot can be further refined by indicating the relations between different segments using suitable color codings. Müller and Jiang use the lightness component of the color to indicate the fitness of the encoded segment and the hue component of the color to reveal the relations between different segments. The result of this visualization for our Brahms example is shown in the following figure.