{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\"FMP\"\n", "\"AudioLabs\"\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\"C4\"\n", "

Scape Plot Representation

\n", "
\n", "\n", "
\n", "\n", "

\n", "Following Section 4.3.2 of [Müller, FMP, Springer 2015], we introduce in this notebook the concept of scape plots and apply them for visualizing the fitness of segments. These plots were originally introduced into the music processing area by Sapp and then applied for structure analysis by Müller and Jiang.\n", "\n", "

\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Triangular Representation of Segments\n", "\n", "In the context of [audio thumbnailing](../C4/C4S3_AudioThumbnailing.html), we computed a [fitness measure](../C4/C4S3_AudioThumbnailing.html) that assigns to each possible segment a fitness value expressing a segment-specific property. We now introduce a representation by which a segment-dependent property can be visualized in a compact and hierarchical way. Recall that a **segment** $\\alpha=[s:t]\\subseteq [1:N]$ is uniquely determined by its starting point $s$ and its end point $t$. Since any two numbers $s,t\\in[1:N]$ with $s\\leq t$ define a segment, there are $(N+1)N/2$ different segments. Instead of considering start and end points, each segment can also be uniquely described by its center \n", "\n", "$$\n", " c(\\alpha):=(s+t)/2\n", "$$\n", "\n", "and its length $|\\alpha|$. Using the center to parameterize a horizontal axis and the length to parameterize the height, each segment can be represented by a point in a **triangular representation**. This way, the set of segments are ordered from bottom to top in a hierarchical way according to their length. In particular, the top of this triangle corresponds to the unique segment of maximal length $N$ and the bottom points of the triangle correspond to the $N$ segments of length one (where the start point coincides with the end point). Furthermore, all segments $\\alpha'\\subseteq\\alpha$ contained in a given segment $\\alpha$ correspond to points in the triangular representation that lie in a subtriangle below the point given by $\\alpha$\n", "\n", "\"FMP_C8_F19\"\n", "\n", "Given a triangular representation of all segments within $[1:N]$, the following example visually indicates the following sets of segments (see Exercise 4.12 of [Müller, FMP, Springer 2015]):\n", "\n", "(a) All segments having a minimal length above a given threshold $\\theta\\geq 0$
\n", "(b) All segments that contain a given segment $\\alpha$
\n", "(c) All segments that are disjoint to a given segment $\\alpha$
\n", "(d) All segments that contain the center $c(\\alpha)$ of a given segment $\\alpha$\n", "\n", "\n", "\"FMP_C4_E12.png\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scape Plot\n", "\n", "The triangular representation can be used as a grid for visualizing a specific numeric property $\\varphi(\\alpha)\\in\\mathbb{R}$ that can be computed for all segments $\\alpha$. This property, for example, can be the fitness values as used for audio thumbnailing (see Section 4.3 of [Müller, FMP, Springer 2015]). Such a visual representation is also referred to as **scape plot** representation of the property. More precisely, we define a scape plot $\\Delta$ by setting \n", "\n", "\\begin{equation}\n", "\\label{eq:AudioStru:Thumb:SPfitness}\n", " \\Delta(c(\\alpha),|\\alpha|):=\\varphi(\\alpha)\n", "\\end{equation}\n", "\n", "for segment $\\alpha$. As a toy example, we consider the function $\\varphi$ defined by $\\varphi(\\alpha):= (t-s+1)/N$ for $\\alpha=[s:t]$, which encodes the segment lengths relative to the total length $N$. In the following code cell, we provide a visualization function for plotting a scape plot representation of this function. \n", "\n", "
\n", "Note: In our implementation, we use an N-square matrix SP as data structure to the store the segment-dependent property $\\varphi(\\alpha)\\in\\mathbb{R}$. We use the first dimension of SP to encode the length and the second one to encode the center. Since indexing in Python starts with index 0, one needs to be careful when interpreting the length dimension. In particular, the entry SP[length_minus_one, start] contains the information for the segment having length length_minus_one + 1 for length_minus_one = 0, ..., N-1. Furthermore, note that only the left-upper part (including the diagonal) of SP is used.\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:51:52.381981Z", "iopub.status.busy": "2024-02-15T08:51:52.381673Z", "iopub.status.idle": "2024-02-15T08:51:55.256927Z", "shell.execute_reply": "2024-02-15T08:51:55.256125Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import os, sys, librosa, math\n", "from scipy import signal\n", "from matplotlib import pyplot as plt\n", "import matplotlib\n", "import matplotlib.gridspec as gridspec\n", "import IPython.display as ipd\n", "import pandas as pd\n", "from numba import jit\n", "from matplotlib.colors import ListedColormap\n", "sys.path.append('..')\n", "import libfmp.b\n", "import libfmp.c4\n", "from libfmp.b import FloatingBox\n", "\n", "%matplotlib inline\n", "\n", "def visualize_scape_plot(SP, Fs=1, ax=None, figsize=(4, 3), title='',\n", " xlabel='Center (seconds)', ylabel='Length (seconds)', interpolation='nearest'):\n", " \"\"\"Visualize scape plot\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " SP: Scape plot data (encodes as start-duration matrix)\n", " Fs: Sampling rate (Default value = 1)\n", " ax: Used axes (Default value = None)\n", " figsize: Figure size (Default value = (4, 3))\n", " title: Title of figure (Default value = '')\n", " xlabel: Label for x-axis (Default value = 'Center (seconds)')\n", " ylabel: Label for y-axis (Default value = 'Length (seconds)')\n", " interpolation: Interpolation value for imshow (Default value = 'nearest')\n", "\n", " Returns:\n", " fig: Handle for figure\n", " ax: Handle for axes\n", " im: Handle for imshow\n", " \"\"\"\n", " fig = None\n", " if ax is None:\n", " fig = plt.figure(figsize=figsize)\n", " ax = plt.gca()\n", " N = SP.shape[0]\n", " SP_vis = np.zeros((N, N))\n", " for length_minus_one in range(N):\n", " for start in range(N-length_minus_one):\n", " center = start + length_minus_one//2\n", " SP_vis[length_minus_one, center] = SP[length_minus_one, start]\n", "\n", " extent = np.array([-0.5, (N-1)+0.5, -0.5, (N-1)+0.5]) / Fs\n", " im = plt.imshow(SP_vis, cmap='hot_r', aspect='auto', origin='lower', extent=extent, interpolation=interpolation)\n", " x = np.asarray(range(N))\n", " x_half_lower = x/2\n", " x_half_upper = x/2 + N/2 - 1/2\n", " plt.plot(x_half_lower/Fs, x/Fs, '-', linewidth=3, color='black')\n", " plt.plot(x_half_upper/Fs, np.flip(x, axis=0)/Fs, '-', linewidth=3, color='black')\n", " plt.plot(x/Fs, np.zeros(N)/Fs, '-', linewidth=3, color='black')\n", " plt.xlim([0, (N-1) / Fs])\n", " plt.ylim([0, (N-1) / Fs])\n", " ax.set_title(title)\n", " ax.set_xlabel(xlabel)\n", " ax.set_ylabel(ylabel)\n", " plt.tight_layout()\n", " plt.colorbar(im, ax=ax)\n", " return fig, ax, im\n", "\n", "N = 9\n", "SP = np.zeros((N,N))\n", "for k in range(N):\n", " for s in range(N-k):\n", " length = k + 1\n", " SP[k, s]= length/N \n", "\n", "plt.figure(figsize=(7,3))\n", "ax = plt.subplot(121)\n", "plt.imshow(SP, cmap='hot_r', aspect='auto') \n", "ax.set_title('Data structure (N = %d)'%N)\n", "ax.set_xlabel('Segment start (samples)')\n", "ax.set_ylabel('Length minus one (samples)')\n", "plt.colorbar() \n", "\n", "ax = plt.subplot(122)\n", "fig, ax, im = visualize_scape_plot(SP, Fs=1, ax=ax, title='Scape plot visualization', \n", " xlabel='Segment center (samples)', ylabel='Length minus one (samples)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fitness Scape Plot\n", "\n", "We now use the scape plot representation for visualizing the fitness measure for all segments. As first example, we continue with our [Brahms example](../C4/C4S1_MusicStructureGeneral.html). Recall that this piece has the musical structure $A_1A_2B_1B_2CA_3B_3B_4D$. Using settings as in [FMP notebook on audio thumbnailing](../C4/C4S3_AudioThumbnailing.html), we compute a (normalized) self-similarity matrix (SSM), which serves as input of our fitness computation. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:51:55.292905Z", "iopub.status.busy": "2024-02-15T08:51:55.292585Z", "iopub.status.idle": "2024-02-15T08:52:00.585761Z", "shell.execute_reply": "2024-02-15T08:52:00.585138Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fn_wav = os.path.join('..', 'data', 'C4', 'FMP_C4_Audio_Brahms_HungarianDances-05_Ormandy.wav')\n", "tempo_rel_set = libfmp.c4.compute_tempo_rel_set(0.66, 1.5, 5)\n", "penalty = -2\n", "x, x_duration, X, Fs_feature, S, I = libfmp.c4.compute_sm_from_filename(fn_wav, L=41, H=10, \n", " L_smooth=8, tempo_rel_set=tempo_rel_set, penalty=penalty, thresh= 0.15)\n", "S = libfmp.c4.normalization_properties_ssm(S)\n", " \n", "fn_ann_color = 'FMP_C4_Audio_Brahms_HungarianDances-05_Ormandy.csv'\n", "fn_ann = os.path.join('..', 'data', 'C4', fn_ann_color)\n", "ann_frames, color_ann = libfmp.c4.read_structure_annotation(fn_ann, fn_ann_color, Fs=Fs_feature)\n", "\n", "cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)\n", "fig, ax, im = libfmp.c4.plot_ssm_ann(S, ann_frames, Fs=1, color_ann=color_ann, cmap=cmap_penalty, \n", " xlabel='Time (frames)', ylabel='Time (frames)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next code cell, we compute the fitness measure $\\varphi(\\alpha)\\in\\mathbb{R}$ (as well as the score $\\sigma(\\alpha)$, normalized score $\\bar{\\sigma}(\\alpha)$, coverage $\\gamma(\\alpha)$, and normalized coverage $\\bar{\\gamma}(\\alpha)$) for all segments $\\alpha$. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:00.588381Z", "iopub.status.busy": "2024-02-15T08:52:00.588189Z", "iopub.status.idle": "2024-02-15T08:52:03.820113Z", "shell.execute_reply": "2024-02-15T08:52:03.819353Z" } }, "outputs": [], "source": [ "# @jit(nopython=True)\n", "def compute_fitness_scape_plot(S):\n", " \"\"\"Compute scape plot for fitness and other measures\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " S (np.ndarray): Self-similarity matrix\n", "\n", " Returns:\n", " SP_all (np.ndarray): Vector containing five different scape plots for five measures\n", " (fitness, score, normalized score, coverage, normlized coverage)\n", " \"\"\"\n", " N = S.shape[0]\n", " SP_fitness = np.zeros((N, N))\n", " SP_score = np.zeros((N, N))\n", " SP_score_n = np.zeros((N, N))\n", " SP_coverage = np.zeros((N, N))\n", " SP_coverage_n = np.zeros((N, N))\n", "\n", " for length_minus_one in range(N):\n", " for start in range(N-length_minus_one):\n", " S_seg = S[:, start:start+length_minus_one+1]\n", " D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)\n", " path_family = libfmp.c4.compute_optimal_path_family(D)\n", " fitness, score, score_n, coverage, coverage_n, path_family_length = libfmp.c4.compute_fitness(\n", " path_family, score, N)\n", " SP_fitness[length_minus_one, start] = fitness\n", " SP_score[length_minus_one, start] = score\n", " SP_score_n[length_minus_one, start] = score_n\n", " SP_coverage[length_minus_one, start] = coverage\n", " SP_coverage_n[length_minus_one, start] = coverage_n\n", " SP_all = [SP_fitness, SP_score, SP_score_n, SP_coverage, SP_coverage_n]\n", " return SP_all\n", "\n", "SP_all = compute_fitness_scape_plot(S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we visualize the fitness values $\\varphi(\\alpha)$ using a scape plot representation, which we also refer to as **fitness scape plot**. Furthermore, we also plot the fitness-maximizing segment or [audio thumbnail](../C4/C4S3_AudioThumbnailing.html)\n", "\n", "$$\n", " \\alpha^\\ast := \\underset{\\alpha}{\\mathrm{argmax}} \\,\\, \\varphi(\\alpha).\n", "$$\n", "\n", "along with its path family and induced segments. Note that the thumbnail as well as the induced segments are represented by points (blue and green points, respectively) in the scape plot representation. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:03.823401Z", "iopub.status.busy": "2024-02-15T08:52:03.823184Z", "iopub.status.idle": "2024-02-15T08:52:04.320649Z", "shell.execute_reply": "2024-02-15T08:52:04.319983Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [175, 197]\n", "Length of segment: 23\n", "Length of feature sequence: 205\n", "Induced segment path family:\n", " [[ 41 67]\n", " [ 68 90]\n", " [150 175]\n", " [176 197]]\n", "Fitness: 0.4286698296\n", "Score: 68.0249476604\n", "Normalized score: 0.5175281340\n", "Coverage: 98, 98\n", "Normalized coverage: 0.3658536585\n", "Length of all paths of family: 87\n" ] } ], "source": [ "def seg_max_sp(SP):\n", " \"\"\"Return segment with maximal value in SP\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " SP (np.ndarray): Scape plot\n", "\n", " Returns:\n", " seg (tuple): Segment ``(start_index, end_index)``\n", " \"\"\"\n", " N = SP.shape[0]\n", " # value_max = np.max(SP)\n", " arg_max = np.argmax(SP)\n", " ind_max = np.unravel_index(arg_max, [N, N])\n", " seg = [ind_max[1], ind_max[1]+ind_max[0]]\n", " return seg\n", "\n", "def plot_seg_in_sp(ax, seg, S=None, Fs=1):\n", " \"\"\"Plot segment and induced segements as points in SP visualization\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " ax: Axis for image\n", " seg: Segment ``(start_index, end_index)``\n", " S: Self-similarity matrix (Default value = None)\n", " Fs: Sampling rate (Default value = 1)\n", " \"\"\"\n", " if S is not None:\n", " S_seg = S[:, seg[0]:seg[1]+1]\n", " D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)\n", " path_family = libfmp.c4.compute_optimal_path_family(D)\n", " segment_family, coverage = libfmp.c4.compute_induced_segment_family_coverage(path_family)\n", " length = segment_family[:, 1] - segment_family[:, 0] + 1\n", " center = segment_family[:, 0] + length//2\n", " ax.scatter(center/Fs, length/Fs, s=64, c='white', zorder=9999)\n", " ax.scatter(center/Fs, length/Fs, s=16, c='lime', zorder=9999)\n", " length = seg[1] - seg[0] + 1\n", " center = seg[0] + length//2\n", " ax.scatter(center/Fs, length/Fs, s=64, c='white', zorder=9999)\n", " ax.scatter(center/Fs, length/Fs, s=16, c='blue', zorder=9999)\n", "\n", "def plot_sp_ssm(SP, seg, S, ann, color_ann=[], title='', figsize=(5, 4)):\n", " \"\"\"Visulization of SP and SSM\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " SP: Scape plot\n", " seg: Segment ``(start_index, end_index)``\n", " S: Self-similarity matrix\n", " ann: Annotation\n", " color_ann: color scheme used for annotations (Default value = [])\n", " title: Title of figure (Default value = '')\n", " figsize: Figure size (Default value = (5, 4))\n", " \"\"\"\n", " float_box = libfmp.b.FloatingBox()\n", " fig, ax, im = visualize_scape_plot(SP, figsize=figsize, title=title,\n", " xlabel='Center (frames)', ylabel='Length (frames)')\n", " plot_seg_in_sp(ax, seg, S)\n", " float_box.add_fig(fig)\n", "\n", " penalty = np.min(S)\n", " cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)\n", " fig, ax, im = libfmp.c4.plot_ssm_ann_optimal_path_family(\n", " S, ann, seg, color_ann=color_ann, fontsize=8, cmap=cmap_penalty, figsize=(4, 4),\n", " ylabel='Time (frames)')\n", " float_box.add_fig(fig)\n", " float_box.show()\n", " \n", "def check_segment(seg, S):\n", " \"\"\"Prints properties of segments with regard to SSM ``S``\n", "\n", " Notebook: C4/C4S3_ScapePlot.ipynb\n", "\n", " Args:\n", " seg (tuple): Segment ``(start_index, end_index)``\n", " S (np.ndarray): Self-similarity matrix\n", "\n", " Returns:\n", " path_family (list): Optimal path family\n", " \"\"\"\n", " N = S.shape[0]\n", " S_seg = S[:, seg[0]:seg[1]+1]\n", " D, score = libfmp.c4.compute_accumulated_score_matrix(S_seg)\n", " path_family = libfmp.c4.compute_optimal_path_family(D)\n", " fitness, score, score_n, coverage, coverage_n, path_family_length = libfmp.c4.compute_fitness(\n", " path_family, score, N)\n", " segment_family, coverage2 = libfmp.c4.compute_induced_segment_family_coverage(path_family)\n", " print('Segment (alpha):', seg)\n", " print('Length of segment:', seg[-1]-seg[0]+1)\n", " print('Length of feature sequence:', N)\n", " print('Induced segment path family:\\n', segment_family)\n", " print('Fitness: %0.10f' % fitness)\n", " print('Score: %0.10f' % score)\n", " print('Normalized score: %0.10f' % score_n)\n", " print('Coverage: %d, %d' % (coverage, coverage2))\n", " print('Normalized coverage: %0.10f' % coverage_n)\n", " print('Length of all paths of family: %d' % path_family_length)\n", " return path_family\n", " \n", "figsize=(5,4)\n", "SP = SP_all[0]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Scape plot: Fitness', figsize=figsize)\n", "plt.show()\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting fitness scape plot reflects the musical structure in a hierarchical way. The **thumbnail segment** is $\\alpha^\\ast=[175:197]$, which musically corresponds to the $B_4$-part. The coordinates in the scape plot are specified by the center $c(\\alpha)=186$ and the length $|\\alpha|=23$. The **induced segment** family consists of the four $B$-part. Note that all four $B$-part segments have almost the same fitness and lead to more or less the same segment family. Recall that the introduced fitness measure slightly favors [shorter segments](../C4/C4S3_AudioThumbnailing.html). Therefore, since in this recording the $B_4$-part is played faster than, e.g., the $B_1$-part, the fitness measure favors the $B_4$-part segment over the $B_1$-part segment. In other words, our procedure chooses the shortest most representative segment as thumbnail." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Normalized Score and Coverage\n", "\n", "Next, we illustrate that in the [definition of the fitness measure](../C4/C4S3_AudioThumbnailing.html) (see also Section 4.3.1.3 of [Müller, FMP, Springer 2015]), the normalization of score and coverage as well as the combination (harmonic mean) of the two measures is of crucial importance. To this end, we look at the scape plots of the various measures (as well as the measure-maximizing segments) individually. We start the score measure $\\sigma$. The score-maximizing segment is $\\alpha=[1:N]$, which is the entire recording. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:04.324544Z", "iopub.status.busy": "2024-02-15T08:52:04.324276Z", "iopub.status.idle": "2024-02-15T08:52:04.797854Z", "shell.execute_reply": "2024-02-15T08:52:04.797185Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [0, 204]\n", "Length of segment: 205\n", "Length of feature sequence: 205\n", "Induced segment path family:\n", " [[ 0 204]]\n", "Fitness: 0.0000000000\n", "Score: 205.0000000000\n", "Normalized score: 0.0000000000\n", "Coverage: 205, 205\n", "Normalized coverage: 0.0000000000\n", "Length of all paths of family: 205\n" ] } ], "source": [ "SP = SP_all[1]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Scape plot: Score', figsize=figsize)\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subtracting trivial self-explanations and normalizing with regard to the length of the optimal path family, yields the **normalized score** $\\bar{\\sigma}$. Since this measure expresses the average score of a path family without expressing how much of the audio material is actually covered, many of the small segments have a relatively high score. Using such a measure would typically result in false-positive segments of small length. This is also demonstrated by the following scape plot and the $\\bar{\\sigma}$-maximizing path family, " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:04.801244Z", "iopub.status.busy": "2024-02-15T08:52:04.800956Z", "iopub.status.idle": "2024-02-15T08:52:05.116212Z", "shell.execute_reply": "2024-02-15T08:52:05.115558Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [183, 188]\n", "Length of segment: 6\n", "Length of feature sequence: 205\n", "Induced segment path family:\n", " [[ 54 61]\n", " [ 76 81]\n", " [163 168]\n", " [183 188]]\n", "Fitness: 0.1680842515\n", "Score: 20.5560876154\n", "Normalized score: 0.6065036506\n", "Coverage: 26, 26\n", "Normalized coverage: 0.0975609756\n", "Length of all paths of family: 24\n" ] } ], "source": [ "SP = SP_all[2]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Scape plot: Normalized score', figsize=figsize)\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next figure shows the scape plot for the coverage measure $\\gamma$. As for the score, the coverage-maximizing segment is $\\alpha=[1:N]$, which is the entire recording. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:05.119222Z", "iopub.status.busy": "2024-02-15T08:52:05.118996Z", "iopub.status.idle": "2024-02-15T08:52:05.473328Z", "shell.execute_reply": "2024-02-15T08:52:05.472547Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [91, 204]\n", "Length of segment: 114\n", "Length of feature sequence: 205\n", "Induced segment path family:\n", " [[ 0 92]\n", " [ 93 204]]\n", "Fitness: 0.1205737073\n", "Score: 128.2312836287\n", "Normalized score: 0.0697611943\n", "Coverage: 205, 205\n", "Normalized coverage: 0.4439024390\n", "Length of all paths of family: 204\n" ] } ], "source": [ "SP = SP_all[3]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Scape plot: Coverage', figsize=figsize)\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subtracting trivial self-explanations and normalizing with regard to the length $N$ of the yields the **normalized coverage** $\\bar{\\gamma}$. As an be seen by the following scape plot along with the $\\bar{\\gamma}$-maximizing segment, the coverage measures a property that is conceptually different to the score. Opposed to the normalized score, the normalized coverage typically favors segments which induced segment family covers large portions of the input sequence." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:05.476467Z", "iopub.status.busy": "2024-02-15T08:52:05.476186Z", "iopub.status.idle": "2024-02-15T08:52:05.809077Z", "shell.execute_reply": "2024-02-15T08:52:05.808115Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [56, 76]\n", "Length of segment: 21\n", "Length of feature sequence: 205\n", "Induced segment path family:\n", " [[ 32 55]\n", " [ 56 76]\n", " [ 77 94]\n", " [140 163]\n", " [164 184]\n", " [185 204]]\n", "Fitness: 0.3370126523\n", "Score: 50.3634055074\n", "Normalized score: 0.2488424196\n", "Coverage: 128, 128\n", "Normalized coverage: 0.5219512195\n", "Length of all paths of family: 118\n" ] } ], "source": [ "SP = SP_all[4]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Scape plot: Normalized coverage', figsize=figsize)\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: Beatles Songs" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:05.812129Z", "iopub.status.busy": "2024-02-15T08:52:05.811835Z", "iopub.status.idle": "2024-02-15T08:52:06.354372Z", "shell.execute_reply": "2024-02-15T08:52:06.353817Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fn_ann_color = 'FMP_C4_Audio_Beatles_YouCantDoThat.csv'\n", "fn_ann = os.path.join('..', 'data', 'C4', fn_ann_color)\n", "fn_wav = os.path.join('..', 'data', 'C4', 'FMP_C4_Audio_Beatles_YouCantDoThat.wav')\n", "\n", "tempo_rel_set = libfmp.c4.compute_tempo_rel_set(0.66, 1.5, 5)\n", "penalty = -2\n", "x, x_duration, X, Fs_feature, S, I = libfmp.c4.compute_sm_from_filename(fn_wav, L=21, H=10, \n", " L_smooth=8, tempo_rel_set=tempo_rel_set, penalty=penalty, thresh= 0.15)\n", "S = libfmp.c4.normalization_properties_ssm(S)\n", "\n", " \n", "ann_frames, color_ann = libfmp.c4.read_structure_annotation(fn_ann, fn_ann_color, Fs=Fs_feature)\n", "color_ann = {'I': [1, 0, 0, 0.2], 'V': [0, 1, 0, 0.2], 'B': [0, 0, 1, 0.2], '': [1, 1, 1, 0.2]}\n", "\n", "cmap_penalty = libfmp.c4.colormap_penalty(penalty=penalty)\n", "fig, ax, im = libfmp.c4.plot_ssm_ann(S, ann_frames, Fs=1, color_ann=color_ann, cmap=cmap_penalty, \n", " xlabel='Time (frames)', ylabel='Time (frames)')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-02-15T08:52:06.357366Z", "iopub.status.busy": "2024-02-15T08:52:06.357132Z", "iopub.status.idle": "2024-02-15T08:52:07.516539Z", "shell.execute_reply": "2024-02-15T08:52:07.515867Z" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Segment (alpha): [30, 51]\n", "Length of segment: 22\n", "Length of feature sequence: 158\n", "Induced segment path family:\n", " [[ 8 29]\n", " [ 30 51]\n", " [ 67 88]\n", " [ 89 111]\n", " [127 148]]\n", "Fitness: 0.4542030981\n", "Score: 63.4758459114\n", "Normalized score: 0.3805123478\n", "Coverage: 111, 111\n", "Normalized coverage: 0.5632911392\n", "Length of all paths of family: 109\n" ] } ], "source": [ "SP_all = compute_fitness_scape_plot(S)\n", "figsize=(5,4)\n", "SP = SP_all[0]\n", "seg = seg_max_sp(SP)\n", "plot_sp_ssm(SP=SP, seg=seg, S=S, ann=ann_frames, color_ann=color_ann, \n", " title='Fitness scape plot', figsize=figsize)\n", "path_family = check_segment(seg, S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further Notes\n", "\n", "Within the field of music processing, scape plots were originally introduced by Craig Sapp to hierarchically represent harmony in musical scores. In this notebook, we used this concept for visualizing the fitness measure for all segments in a compact and hierarchical way. This allowed us to not only gain an overview of the repetitive structure of a music recording, but also to better understand the [construction of the fitness measure](../C4/C4S3_AudioThumbnailing.html) by looking at score and coverage values separately. In particular, we demonstrated the following:\n", "\n", "* **Normalization is important.** Without normalization (subtracting self-explanations), longer segments would typically dominate the shorter segments and the entire recording would have maximal score or fitness. \n", "* **Combination of score and coverage is important.** By combining score and coverage, the fitness measure balances out the two conflicting principles of having strong repetitions (high score)and of explaining possibly large portions of the recording (high coverage).\n", "\n", "The fitness scape plot can be further refined by indicating the relations between different segments using suitable color codings. [Müller and Jiang](https://www.audiolabs-erlangen.de/fau/professor/mueller/publications/2012_MuellerJiang_StructureVisualization_ISMIR.pdf) use the lightness component of the color to indicate the fitness of the encoded segment and the hue component of the color to reveal the relations between different segments. The result of this visualization for our Brahms example is shown in the following figure.\n", "\n", "\"FMP_C4_Teaser_Ann\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Acknowledgment: This notebook was created by Meinard Müller and Angel Villar-Corrales.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "
\"C0\"\"C1\"\"C2\"\"C3\"\"C4\"\"C5\"\"C6\"\"C7\"\"C8\"
" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" } }, "nbformat": 4, "nbformat_minor": 1 }