International Audio Laboratories Erlangen

\n",
"\n",
"\n",
"\n",
"\n",
"# Short-Time Fourier Transform and Chroma Features\n",
"\n",
"Authors: Meinard Müller, Jonathan Driedger. Thomas Prätzlich, Frank Zalkow\n",
"\n",
"References:\n", "[Mueller2015] Meinard Müller. Fundamentals of Music Processing. Springer Verlag, 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MIR-Course: Harmonic Percussive Source Separation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Abstract" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sounds can broadly be classified into two classes. Harmonic\n", "sound on the one hand side is what we perceive as\n", "pitched sound and what makes us hear melodies and chords.\n", "Percussive sound on the other hand is noise-like and usually\n", "stems from instrument onsets like the hit on a drum or from\n", "consonants in speech. \n", "The goal of harmonic-percussive\n", "source separation (HPSS) is to decompose an input audio signal\n", "into a signal consisting of all harmonic sounds\n", "and a signal consisting of all percussive sounds.\n", "In this lab course, we study an HPSS algorithm and implement it in Python.\n", "Exploiting knowledge about the spectral structure of harmonic and percussive sounds,\n", "this algorithm decomposes the spectrogram of the given input signal\n", "into two spectrograms, one for the harmonic, and one for the percussive component.\n", "Afterwards, two waveforms are reconstructed from the spectrograms\n", "which finally form the desired signals.\n", "Additionally, we describe the application of HPSS for enhancing chroma feature extraction and onset detection.\n", "The techniques used in this lab cover median filtering, spectral masking and\n", "the inversion of the short-time Fourier transform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Harmonic-Percussive Source Separation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When listening to our environment, there exists a wide variety of different sounds.\n", "However, on a very coarse level, many sounds can be\n", "categorized to belong in either one of two classes:\n", "harmonic or percussive sounds.\n", "Harmonic sounds are the ones which we perceive to have a certain *pitch* such that we could for example sing along to them.\n", "The sound of a violin is a good example of a harmonic sound.\n", "*Percussive* sounds often stem from two colliding objects like for example the two shells of castanets.\n", "An important characteristic of percussive sounds is that they do not have a pitch but a very clear\n", "localization in time.\n", "Many real-world sounds are mixtures of harmonic and percussive components.\n", "For example, a note played on a piano has a percussive onset (resulting from the hammer hitting the strings)\n", "preceding the harmonic tone (resulting from the vibrating string)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n",
"**Homework Exercise 1**

\n", "Think about three real world examples of sounds which are clearly harmonic and three examples of\n", " sounds which are clearly percussive.\n", "

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Think about three real world examples of sounds which are clearly harmonic and three examples of\n", " sounds which are clearly percussive.\n", "

\n",
"What are characteristics of harmonic and percussive signals? Sketch a \n",
" waveform of a percussive signal and the waveform of a harmonic signal. What are the main\n",
" differences between those waveforms?\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The goal of harmonic-percussive source separation (HPSS) is to decompose a given input signal into a\n",
"sum of two component signals,\n",
"one consisting of all harmonic sounds and the other consisting of all percussive sounds.\n",
"The core observation in many HPSS algorithms is that in a spectrogram representation of the\n",
"input signal, harmonic sounds tend to form horizontal structures (in time-direction), while\n",
"percussive sounds form vertical structures (in frequency-direction).\n",
"For an example, have a look at following Figure where you can see the power spectrograms of two signals.\n",
"The left Figure shows the power spectrogram of a sine-tone with a frequency of\n",
"$4000$ Hz and a duration of one second.\n",
"This tone is as harmonic as a sound can be.\n",
"The power spectrogram shows just one horizontal line.\n",
"Contrary, the power spectrogram on the rigtht side shows just one vertical line.\n",
"It is the spectrogram of a signal which is zero everywhere, except for the sample at $0.5$ seconds where it is one.\n",
"Therefore, when listening to this signal, we just hear a brief ``click'' at $0.5$ seconds.\n",
"This signal is the prototype of a percussive sound.\n",
"The same kind of structures can be observed in lower Figure, which shows a\n",
"spectrogram of a violin recording and a spectrogram of a castanets recording.\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n", " \n", " | \n", " \n", " |
---|---|

\n", " Spectrogram of an ideal harmonic signal.\n", " | \n", " Spectrogram of an ideal percussive signal.\n", " |

\n", " \n", " | \n", " \n", " |

\n", " Spectrogram of a recording of a violin.\n", " | \n", " Spectrogram of a recording of a castanets.\n", " |

\n",
"**Homework Exercise 2**

\n", "Suppose you apply an HPSS algorithm to white noise. Recall that white noise has a constant power spectral density (it is also said to be flat). What do you expect the harmonic and the percussive\n", " component to sound like?\n", "

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Suppose you apply an HPSS algorithm to white noise. Recall that white noise has a constant power spectral density (it is also said to be flat). What do you expect the harmonic and the percussive\n", " component to sound like?\n", "

\n",
"If you apply an HPSS algorithm to a recording of your favorite rock band. What do you expect\n",
" the harmonic and the percussive component to sound like?\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## An HPSS Algorithm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will now describe an actual HPSS algorithm.\n",
"Formally, given a discrete input audio signal $x:{\\mathbb Z}\\to{\\mathbb R}$, the algorithm should compute a harmonic component signal $x_\\mathrm{h}$\n",
"and a percussive component signal $x_\\mathrm{p}$, such that $x = x_\\mathrm{h} + x_\\mathrm{p}$. \n",
"Furthermore, the signals $x_\\mathrm{h}$ and $x_\\mathrm{p}$ contain the harmonic and percussive sounds of $x$, respectively.\n",
"In the following we describe the consecutive steps of an HPSS algorithm.\n",
"We start with the computation of the *STFT* (Subsection [STFT](#STFT)) and proceed\n",
"with enhancing the power spectrogram using *median filtering* (Subsection [Median Filtering](#MedFilt)).\n",
"Afterwards, the filtered spectrograms are used to compute *binary masks* (Subsection [Binary Masking](#Mask)) which are\n",
"used to construct STFTs for the harmonic and the percussive component.\n",
"These STFTs are finally transformed back to the time domain (Subsection [ISTFT](#ISTFT))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### STFT"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the first step, we compute the short-time Fourier transform (STFT) ${\\mathcal X}$ of the signal $x$ as:\n",
"\n",
"\\begin{equation}\n",
" {\\mathcal X}(m,k):= \\sum_{n=0}^{N-1} x(n + m H)w(n)\\exp(-2\\pi ikn/N)\n",
"\\end{equation}\n",
"\n",
"with ${m\\in[0:M-1]:=\\{0,\\ldots,M -1\\}}$ and $k\\in[0:N-1]$, where $M$ is the number of frames,\n",
"$N$ is the frame size and length of the discrete Fourier transform,\n",
"${w:[0:N -1]\\to{\\mathbb R}}$ is a window function and $H$ is the hopsize.\n",
"From ${\\mathcal X}$ we can then derive the power spectrogram ${\\mathcal Y}$ of $x$:\n",
"\n",
"\\begin{equation}\n",
" {\\mathcal Y}(m,k) := |{\\mathcal X}(m,k)|^2.\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Homework Exercise 3**

\n", "The parameters of the STFT have a crucial influence on the HPSS algorithm. Think about what happens to ${\\mathcal Y}$ in the case\n", " you choose $N$ to be very large or very small. How could this influence the algorithm? (Hint: Think about how $N$ influences the time- and frequency-resolution of the STFT.)\n", "

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* For large $N$ frames cover longer segments of the signal and the frequency\n",
"resolution increases while the time-resolution decreases. For small $N$ frames\n",
"cover shorter segments of the signal and the frequency-resolution decreases\n",
"while the time-resolution increases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "The parameters of the STFT have a crucial influence on the HPSS algorithm. Think about what happens to ${\\mathcal Y}$ in the case\n", " you choose $N$ to be very large or very small. How could this influence the algorithm? (Hint: Think about how $N$ influences the time- and frequency-resolution of the STFT.)\n", "

\n",
"Explain in technical terms why harmonic sounds form horizontal and percussive sounds form vertical structures in spectrograms. (Hint: Have a look at the exponential basis functions of the STFT. What does one of these functions describe? How can an impulse be represented with them?)\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* To represent a harmonic sound in terms of a sum of sinusoidals (the basis\n",
"functions of the Fourier transform) one needs just a few of them and they\n",
"do not change over time. This means that the same frequency bands have\n",
"high values in all frames which leads to horizontal structures in a spectrogram. To represent a short burst which is more noise-like, one needs many\n",
"sinusoidals. Almost all frequency bands will show energy to represent such\n",
"a burst. However, this burst has a very short duration, so this structure will\n",
"only last for one or two frames which results in a vertical structure in the\n",
"spectrogram."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Lab Experiment 1**

\n", "Load an audio file`CastanetsViolin.wav` using `sf.read`.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import soundfile as sf\n",
"from IPython.display import Audio\n",
"\n",
"# your code here...\n",
"\n",
"Audio(x, rate=Fs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Load an audio file

\n",
"Compute the STFT ${\\mathcal X}$ of the input signal $x$ using `librosa.stft` with the parameters `N=1024`, `H=512`, and a hann-window.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import librosa\n",
"\n",
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Compute the power spectrogram ${\\mathcal Y}$.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Visualize ${\\mathcal Y}$. Can you spot harmonic and percussive structures?\n",
" Apply logarithmic compression with $\\gamma=10$ (see STFT Lab for details on this) when visualizing spectrograms.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from matplotlib import pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Do the same for the parameters $N=128$, $H=64$, a hann-window, and $N=8192$, $H=4096$, and a hann-window.\n",
" How do the spectrograms change when you change the parameters? What happens to the harmonic and percussive structures?\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Median Filtering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next step, we want to compute a *harmonically enhanced* spectrogram $\\tilde{{\\mathcal Y}}_\\mathrm{h}$ and\n",
"a percussively enhanced spectrogram $\\tilde{{\\mathcal Y}}_\\mathrm{p}$ by filtering ${\\mathcal Y}$.\n",
"This can be done by using a *median filter*.\n",
"The median of a list of numbers can be found by arranging all numbers from lowest to highest value and picking\n",
"the middle one.\n",
"E.g. the median of the list $(7, 3, 4, 6, 5)$ is $5$.\n",
"Formally, let $A = (a_1, a_2, \\dots, a_L)$ be a list of length $L \\in {\\mathbb N}$ consisting of real numbers $a_l \\in {\\mathbb R}, l \\in [1:L]$.\n",
"First, the elements of $A$ are sorted in ascending order. This results in a list\n",
"$\\tilde{A} = (\\tilde{a}_1, \\tilde{a}_2, \\dots, \\tilde{a}_L)$\n",
"with $\\tilde{a}_l \\leq \\tilde{a}_m$ for $l < m$ and $l, m \\in [1:L]$.\n",
"Then, the ${\\mathrm{median}}$ of $A$ is defined as\n",
"\n",
"\\begin{equation}\n",
"{\\mathrm{median}}(A) := \\begin{cases} \\tilde{a}_{(L+1)/2} & \\mbox{for $L$ being odd}\\\\ (\\tilde{a}_{L/2} + \\tilde{a}_{L/2+1})/2 & \\mbox{otherwise}\\end{cases}\n",
"\\end{equation}\n",
"\n",
"Now, given a matrix $B\\in{\\mathbb R}^{M\\times K}$, we define harmonic and percussive median filters\n",
"\n",
"\\begin{eqnarray}\n",
" {\\mathrm{medfilt}_\\mathrm{h}}(B)(m,k) := {\\mathrm{median}}(\\{B(m-\\ell_\\mathrm{h},k),\\ldots,B(m+\\ell_\\mathrm{h},k)\\})\\\\\n",
" {\\mathrm{medfilt}_\\mathrm{p}}(B)(m,k) := {\\mathrm{median}}(\\{B(m,k-\\ell_\\mathrm{p}),\\ldots,B(m,k+\\ell_\\mathrm{p})\\})\n",
"\\end{eqnarray}\n",
"\n",
"for $M,K,\\ell_\\mathrm{h},\\ell_\\mathrm{p}\\in{\\mathbb N}$, where $2\\ell_\\mathrm{h} + 1$ and $2\\ell_\\mathrm{p} + 1$ are the lengths of the median filters, respectively.\n",
"Note that we simply assume $B(m,k)=0$ for $m \\notin [0:M-1]$ or $k \\notin [0:K-1]$.\n",
"The enhanced spectrograms are then computed as\n",
"\n",
"\\begin{eqnarray}\n",
" \\tilde{{\\mathcal Y}}_\\mathrm{h} := {\\mathrm{medfilt}_\\mathrm{h}}({\\mathcal Y})\\\\\n",
" \\tilde{{\\mathcal Y}}_\\mathrm{p} := {\\mathrm{medfilt}_\\mathrm{p}}({\\mathcal Y})\n",
"\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Homework Excercise 4**

\n", "The arithmetic ${\\mathrm{mean}}$ of a set $A\\subset{\\mathbb R}$ of size $N$ is defined as ${{\\mathrm{mean}}(A):= \\frac{1}{N}\\sum_{n=0}^{N-1}a_n}$.\n", " Compute the ${\\mathrm{median}}$ and the ${\\mathrm{mean}}$ for the set ${A=\\{2, 3, 190, 2, 3\\}}$.\n", " Why do you think the HPSS algorithm employs median filtering and not mean filtering?\n", "

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "The arithmetic ${\\mathrm{mean}}$ of a set $A\\subset{\\mathbb R}$ of size $N$ is defined as ${{\\mathrm{mean}}(A):= \\frac{1}{N}\\sum_{n=0}^{N-1}a_n}$.\n", " Compute the ${\\mathrm{median}}$ and the ${\\mathrm{mean}}$ for the set ${A=\\{2, 3, 190, 2, 3\\}}$.\n", " Why do you think the HPSS algorithm employs median filtering and not mean filtering?\n", "

\n",
"Apply a horizontal and a vertical median filter of length $3$ to the matrix\n",
"\n",
" \\begin{equation*}\n",
" B =\n",
" \\begin{bmatrix}\n",
" 1 & 1 & 46 & 2 \\\\\n",
" 3 & 1 & 50 & 1 \\\\\n",
" 60 & 68 & 70 & 67 \\\\\n",
" 2 & 1 & 65 & 1\n",
" \\end{bmatrix}\n",
" \\end{equation*}\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy import signal\n",
"import numpy as np\n",
"\n",
"# your code here...\n",
"\n",
"B = np.array([[1, 1, 46, 2], [3, 1, 50, 1], [60, 68, 70, 67], [2, 1, 65, 1]], dtype='float64')\n",
"print('Origignal B:')\n",
"print(B)\n",
"print('Horizontally filtered B:')\n",
"print(horizontal_median_filter(B, 3))\n",
"print('Vertically filtered B:')\n",
"print(vertical_median_filter(B, 3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Explain in your own words why median filtering allows for enhancing/suppressing harmonic/percussive structures in a spectrogram.\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Lab Experiment 2**

\n", "Apply harmonic and percussive median filters to the power spectrogram ${\\mathcal Y}$ which you computed in the previous exercise (`N=1024`, `H=512`, and a hann-window using `scipy.signal.medfilt`.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Apply harmonic and percussive median filters to the power spectrogram ${\\mathcal Y}$ which you computed in the previous exercise (

\n",
"Play around with different filter lengths (3, 11, 51, 101). Visualize the filtered spectrograms. What are your observations?\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Binary Masking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having the enhanced spectrograms $\\tilde{{\\mathcal Y}}_\\mathrm{h}$ and $\\tilde{{\\mathcal Y}}_\\mathrm{p}$, we now need to assign all\n",
"time-frequency bins of ${\\mathcal X}$ to either the harmonic or the percussive component.\n",
"This can be done by *binary masking*.\n",
"A binary mask is a matrix ${\\mathcal M}\\in\\{0,1\\}^{M\\times K}$.\n",
"It can be applied to an STFT ${\\mathcal X}$ by computing\n",
"${\\mathcal X} \\odot {\\mathcal M}$, where the operator $\\odot$ denotes point-wise multiplication.\n",
"A mask value of one preserves the value in the STFT and a mask value of zero suppresses it.\n",
"For our HPSS algorithm, the binary masks are defined by comparing the values in the enhanced\n",
"spectrograms $\\tilde{{\\mathcal Y}}_\\mathrm{h}$ and $\\tilde{{\\mathcal Y}}_\\mathrm{p}$.\n",
"\n",
"\\begin{eqnarray}\n",
"{\\mathcal M}_\\mathrm{h}(m,k) :=\n",
"\\begin{cases}\n",
" 1 & \\text{if } \\tilde{{\\mathcal Y}}_\\mathrm{h}(m,k) \\geq \\tilde{{\\mathcal Y}}_\\mathrm{p}(m,k) \\\\\n",
" 0 & \\text{else}\n",
" \\end{cases} \\\\\n",
"{\\mathcal M}_\\mathrm{p}(m,k) :=\n",
"\\begin{cases}\n",
" 1 & \\text{if } \\tilde{{\\mathcal Y}}_\\mathrm{p}(m,k) > \\tilde{{\\mathcal Y}}_\\mathrm{h}(m,k) \\\\\n",
" 0 & \\text{else.}\n",
" \\end{cases}\n",
"\\end{eqnarray}\n",
"\n",
"Applying these masks to the original STFT ${\\mathcal X}$ yields the STFTs for the harmonic and the percussive component of the signal\n",
"${{\\mathcal X}_\\mathrm{h} := ({\\mathcal X} \\odot {\\mathcal M}_\\mathrm{h})}$ and ${{\\mathcal X}_\\mathrm{p} := ({\\mathcal X} \\odot {\\mathcal M}_\\mathrm{p})}$.\n",
"Note that by the definition of ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$, it holds that ${\\mathcal M}_\\mathrm{h}(m,k)+{\\mathcal M}_\\mathrm{p}(m,k) = 1$\n",
"for $m \\in[0:M-1]$, $k\\in[0:K-1]$.\n",
"Therefore, every time-frequency bin of ${\\mathcal X}$ is assigned either to ${\\mathcal X}_\\mathrm{h}$ or ${\\mathcal X}_\\mathrm{p}$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Homework Excercise 5**

\n", "Assume you have the two enhanced spectrograms\n", "\n", " \\begin{equation*}\n", " \\begin{array}{cc}\n", " \\tilde{{\\mathcal Y}}_\\mathrm{h} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 2 & 2 \\\\\n", " 1 & 3 & 1 & 1 \\\\\n", " 60 & 68 & 68 & 67 \\\\\n", " 1 & 2 & 1 & 1\n", " \\end{bmatrix}, &\n", " \\tilde{{\\mathcal Y}}_\\mathrm{p} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 46 & 1 \\\\\n", " 3 & 1 & 50 & 2 \\\\\n", " 2 & 1 & 65 & 1 \\\\\n", " 2 & 1 & 65 & 1\n", " \\end{bmatrix}\n", " \\end{array}\n", " \\end{equation*}\n", "\n", "Compute the binary masks ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$ and apply them to the matrix\n", " \\begin{equation*}\n", " {\\mathcal X} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 46 & 2 \\\\\n", " 3 & 1 & 50 & 1 \\\\\n", " 60 & 68 & 70 & 67 \\\\\n", " 2 & 1 & 65 & 1\n", " \\end{bmatrix}\n", " \\end{equation*}\n", "

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Y_h_toy = np.array([[1 ,1, 2, 2], [1, 3, 1, 1], [60, 68, 68, 67], [1, 2, 1, 1]])\n",
"Y_p_toy = np.array([[1, 1, 46, 1], [3, 1, 50, 2], [2, 1, 65, 1], [2, 1, 65, 1]])\n",
"X_toy = np.array([[1, 1, 46, 2], [3, 1, 50, 1], [60, 68, 70, 67], [2, 1, 65, 1]])\n",
"\n",
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Assume you have the two enhanced spectrograms\n", "\n", " \\begin{equation*}\n", " \\begin{array}{cc}\n", " \\tilde{{\\mathcal Y}}_\\mathrm{h} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 2 & 2 \\\\\n", " 1 & 3 & 1 & 1 \\\\\n", " 60 & 68 & 68 & 67 \\\\\n", " 1 & 2 & 1 & 1\n", " \\end{bmatrix}, &\n", " \\tilde{{\\mathcal Y}}_\\mathrm{p} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 46 & 1 \\\\\n", " 3 & 1 & 50 & 2 \\\\\n", " 2 & 1 & 65 & 1 \\\\\n", " 2 & 1 & 65 & 1\n", " \\end{bmatrix}\n", " \\end{array}\n", " \\end{equation*}\n", "\n", "Compute the binary masks ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$ and apply them to the matrix\n", " \\begin{equation*}\n", " {\\mathcal X} =\n", " \\begin{bmatrix}\n", " 1 & 1 & 46 & 2 \\\\\n", " 3 & 1 & 50 & 1 \\\\\n", " 60 & 68 & 70 & 67 \\\\\n", " 2 & 1 & 65 & 1\n", " \\end{bmatrix}\n", " \\end{equation*}\n", "

\n",
"**Lab Experiment 3**

\n", "Use the median filtered power spectrograms $\\tilde{{\\mathcal Y}}_\\mathrm{h}$ and $\\tilde{{\\mathcal Y}}_\\mathrm{p}$ from the previous exercise (filter length 11) to compute the binary masks ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$.\n", "

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Use the median filtered power spectrograms $\\tilde{{\\mathcal Y}}_\\mathrm{h}$ and $\\tilde{{\\mathcal Y}}_\\mathrm{p}$ from the previous exercise (filter length 11) to compute the binary masks ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$.\n", "

\n",
"Visualize the masks (this time without logarithmic compression).\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Apply the masks to the original STFT ${\\mathcal X}$ to compute ${\\mathcal X}_\\mathrm{h}$ and ${\\mathcal X}_\\mathrm{p}$.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Visualize the power spectrograms ${\\mathcal Y}_\\mathrm{h}$ and ${\\mathcal Y}_\\mathrm{p}$ of ${\\mathcal X}_\\mathrm{h}$ and ${\\mathcal X}_\\mathrm{p}$.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Inversion of the Short-Time Fourier Transform"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the final step, we need to transform our constructed STFTs ${\\mathcal X}_\\mathrm{h}$ and ${\\mathcal X}_\\mathrm{p}$ back to the time-domain.\n",
"To this end, we apply an \"inverse\" STFT to these matrices to compute the component signals $x_\\mathrm{h}$ and $x_\\mathrm{p}$.\n",
"Note that the topic \"inversion of the STFT\" is not as trivial as it might seem at the first glance.\n",
"In the case that ${\\mathcal X}$ is the original STFT of an audio signal $x$, and further preconditions are satisfied (for example that $N \\geq H$ for $N$ being the size of the discrete Fourier transform and $H$ being the hopsize of the STFT),\n",
"it is possible to invert the STFT and to reconstruct $x$ from ${\\mathcal X}$ perfectly.\n",
"However, as soon as the original STFT ${\\mathcal X}$ has been modified to some $\\tilde{{\\mathcal X}}$, for example by masking, there might be no audio signal which has exactly $\\tilde{{\\mathcal X}}$ as its STFT.\n",
"In such a case, one usually aims to find an audio signal whose STFT is \"approximately\" $\\tilde{{\\mathcal X}}$.\n",
"For this Lab Course, you can simply assume that you can invert the STFT using \n",
"**Homework Excercise 6**

\n", "Assume ${\\mathcal X}$ is the original STFT of some audio signal $x$. Why do we need the precondition $N \\geq H$ for $N$ being the size of the discrete Fourier transform and $H$ being the hopsize of the STFT to reconstruct $x$ from ${\\mathcal X}$ perfectly?\n", "

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Assume ${\\mathcal X}$ is the original STFT of some audio signal $x$. Why do we need the precondition $N \\geq H$ for $N$ being the size of the discrete Fourier transform and $H$ being the hopsize of the STFT to reconstruct $x$ from ${\\mathcal X}$ perfectly?\n", "

\n",
"**Lab Experiment 4**

\n", "Apply the inverse STFT function`librosa.istft` to $X_\\mathrm{h}$ and $X_\\mathrm{p}$ from the previous experiment and listen to the results.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display\n",
"\n",
"# your code here...\n",
"\n",
"display(Audio(x_h, rate=Fs))\n",
"display(Audio(x_p, rate=Fs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Apply the inverse STFT function

\n",
"Save the computed harmonic and percussive component by using `sf.write`.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import FileLink\n",
"\n",
"# your code here...\n",
"\n",
"display(FileLink('x_h.wav'))\n",
"display(FileLink('x_p.wav'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Physical Interpretation of Parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that one can specify the filter lengths of the harmonic and percussive median filters in seconds and Hertz, respectively.\n",
"This makes their physical interpretation easier.\n",
"Given the sampling rate $F_\\mathrm{s}$ of the input signal $x$ as well as the frame length $N$ and the hopsize $H$,\n",
"we can convert filter lengths given in seconds and Hertz to filter lengths given in indices\n",
"\n",
"\\begin{eqnarray}\n",
"L_\\mathrm{h}(t):=\\left\\lceil \\frac{F_\\mathrm{s}}{H} t \\right\\rceil\\label{eqn:L_h}\\\\\n",
"L_\\mathrm{p}(d):=\\left\\lceil \\frac{N}{F_\\mathrm{s}} d \\right\\rceil\\label{eqn:L_p}\n",
"\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Homework Excercise 7**

\n", "Assume $F_\\mathrm{s}=22050$ Hz, $N=1024$, and $H=256$. Compute $L_\\mathrm{h}(0.5\\text{ sec})$ and $L_\\mathrm{p}(600 \\text{ Hz})$.\n", "

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Assume $F_\\mathrm{s}=22050$ Hz, $N=1024$, and $H=256$. Compute $L_\\mathrm{h}(0.5\\text{ sec})$ and $L_\\mathrm{p}(600 \\text{ Hz})$.\n", "

\n",
"**Lab Experiment 5**

\n", "Complete the implementation of the HPSS algorithm in the function`HPSS`:\n",
"\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def HPSS(x, N, H, w, Fs, lh_sec, lp_Hz):\n",
" # x: Input signal\n",
" # N: Frame length\n",
" # H: Hopsize\n",
" # w: Window function of length N\n",
" # Fs: Sampling rate of x\n",
" # lh_sec: Horizontal median filter length given in seconds\n",
" # lp_Hz: Percussive median filter length given in Hertz\n",
"\n",
" # stft\n",
"\n",
" # power spectrogram\n",
"\n",
" # median filtering\n",
"\n",
" # masking\n",
"\n",
" # istft\n",
"\n",
" return x_h, x_p"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Complete the implementation of the HPSS algorithm in the function

- \n",
"
- Compute the STFT ${\\mathcal X}$ of the input signal $x$ using
`librosa.stft`. \n",
" - Compute the power spectrogram ${\\mathcal Y}$ from the ${\\mathcal X}$. \n", "
- Convert the median filter lengths from seconds and Hertz to indices. if a filter length is even, subtract 1. \n", "
- Apply median filters to ${\\mathcal Y}$ using
`scipy.signal.medfilt`to compute ${\\mathcal Y}_\\mathrm{h}$ and ${\\mathcal Y}_\\mathrm{p}$. \n",
" - Derive the masks ${\\mathcal M}_\\mathrm{h}$ and ${\\mathcal M}_\\mathrm{p}$ from ${\\mathcal Y}_\\mathrm{h}$ and ${\\mathcal Y}_\\mathrm{p}$. \n", "
- Compute ${\\mathcal X}_\\mathrm{h}$ and ${\\mathcal X}_\\mathrm{p}$. \n", "
- Apply the inverse STFT (
`librosa.istft`) to get $x_\\mathrm{h}$ and $x_\\mathrm{p}$. \n",
"

\n",
"Test your implementation:\n",
"\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for file in ('files/Stepdad.wav', 'files/Applause.wav', 'files/DrumSolo.wav'):\n",
" print('# ' + file)\n",
" \n",
" # your code here...\n",
" \n",
" print('Orignal')\n",
" display(Audio(x, rate=Fs))\n",
" print('Harmonic Component')\n",
" display(Audio(x_h, rate=Fs))\n",
" print('Percussive Component')\n",
" display(Audio(x_p, rate=Fs))\n",
" print('')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Applications of HPSS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In many audio processing tasks, the essential information lies in either the harmonic or the percussive component of an audio signal.\n",
"In such cases, HPSS is very well suited as a pre-processing step to enhance the outcome of an algorithm.\n",
"In the following, we introduce two procedures that can be improved by applying HPSS.\n",
"The harmonic component from the HPSS algorithm can be used to enhance chroma features (Subsection [Chroma](#chroma)) and the percussive component helps to improve the results of an onset detection procedure (Subsection [Onset](#onset))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enhancing Chroma Features using HPSS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Two pitches sound similar when they are an octave apart from each other (12 tones in the equal tempered scale).\n",
"We say that these pitches share the same chroma which we refer to by the pitch spelling names $\\{\\mathrm{C},\\mathrm{C}^{\\sharp},\\mathrm{D},\\mathrm{D}^{\\sharp},\\mathrm{E},\\mathrm{F},\\mathrm{F}^{\\sharp},\\mathrm{G},\\mathrm{G}^{\\sharp},\\mathrm{A},\\mathrm{A}^{\\sharp},\\mathrm{B}\\}$.\n",
"Chroma features exploit the above observation, by adding up all frequency bands in a power spectrogram that belong to the same chroma.\n",
"Technically this can be realized by the following procedure.\n",
"First we assign a pitch index (MIDI pitch number) to each frequency index $k \\in [1:N/2-1]$ of the spectrogram by using the formula:\n",
"\n",
"\\begin{equation}\n",
" p(k) = \\text{round}\\left(12\\log_2\\left( \\frac{ k\\cdot F_\\mathrm{s}}{440 \\cdot N}\\right)\\right)+69.\n",
"\\end{equation}\n",
"\n",
"where $N$ is the number of frequency bins in the spectrogram and $F_\\mathrm{s}$ is the sampling rate of the audio signal.\n",
"Note that $p$ maps frequency indices corresponding to frequencies around the chamber tone A4 (440 Hz) to its MIDI pitch number 69.\n",
"Then we add up all frequency bands in the power spectrogram belonging to the same chroma $c \\in [0:11]$:\n",
"\n",
"\\begin{equation}\n",
" {\\mathcal C}(m,c) := \\sum_ {\\{k |\\,p(k)\\,\\mathrm{mod}\\,12 = c\\,\\}}{{\\mathcal Y}(m,k)}\n",
"\\end{equation}\n",
"\n",
"where $m\\in [0:M-1]$ and $M$ is the number of frames.\n",
"\n",
"Chroma features are correlated with the pitches and the harmonic structure of music.\n",
"Pitches usually form horizontal structures in the spectrogram, whereas transient or percussive sounds form vertical structures. Percussive sounds have a negative impact on the chroma extraction, as they \"activate all frequencies\" in the spectrogram.\n",
"Hence, one way to improve the chroma extraction is to first apply HPSS and to perform the chroma extraction on the power spectrogram of the harmonic component signal ${\\mathcal Y}_\\mathrm{h}(m,k) = |{\\mathcal X_\\mathrm{h}(m,k)}|^2$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- \n",
"
- Load the audio files
`Stepdad.wav`,`Applause.wav`, and`DrumSolo.wav`from the`Data`folder. \n",
" - Apply HPSS using the parameters
`N=1024`,`H=512`, a hann-window,`lh_sec=0.2`, and`lp_Hz=500`to all loaded signals. \n",
" - Listen to the results. \n", "

\n",
"**Lab Experiment 6**

\n", "Apply the HPSS algorithm as a pre-processing step in a chroma extraction procedure:\n", "\n", "

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from numpy.linalg import norm\n",
"\n",
"def simple_chroma(x, N, H, Fs):\n",
" # x: input signal\n",
" # N: frame length\n",
" # H: hopsize\n",
" # Fs: sampling rate\n",
" \n",
" X = librosa.stft(x, N, H, N, window='hann', pad_mode='constant')\n",
" Y = np.abs(X) ** 2\n",
" Y_comp = np.log(1 + 0.5 * Y)\n",
" C = np.zeros((12, Y.shape[1]), dtype=Y.dtype)\n",
" n = np.arange(1, Y.shape[0])\n",
" pitches = np.round(12 * np.log2((Fs * n/N) / 440)) + 69\n",
" chroma_mapping = pitches % 12\n",
" chroma_mapping = np.insert(chroma_mapping, 0, -1) # never use DC offset\n",
" for k in range(12):\n",
" C[k, :] = Y_comp[chroma_mapping == k, :].sum(axis=0)\n",
" C = normalize_chroma(C, 2, 0.0001)\n",
" \n",
" return C\n",
"\n",
"def normalize_chroma(C, norm_p, threshold):\n",
" f_norm = norm(C, norm_p, axis=0)\n",
" C_norm = C.copy()\n",
" C_norm[:, f_norm >= threshold] /= f_norm\n",
" return C_norm\n",
"\n",
"# your code here..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## HPSS for Onset Detection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Onset detection is the task of finding the temporal positions of note onsets in a music recording.\n",
"More concrete, the task could be to detect all time positions on which some drum is hit\n",
"in a recording of a rock song.\n",
"One way to approach this problem is to assume, that drum hits emit a short burst of high energy\n",
"and the goal is therefore to detect these bursts in the input signal.\n",
"To this end, one first computes the *short-time power* ${\\mathcal P}$ of the input signal $x$ by\n",
"\n",
"\\begin{equation}\n",
" {\\mathcal P}(m):= \\sum_{n=0}^{N-1} x(n + m H)^2 \\quad\n",
"\\end{equation}\n",
"\n",
"where $H$ is the hopsize and $N$ is the length of one frame (similar to the computation of the STFT).\n",
"Since we are looking for time-positions of high energy, the goal is therefore to detect peaks in ${\\mathcal P}$.\n",
"A common technique to enhance peaks in a sequence is to\n",
"subtract the *local average* $\\tilde{{\\mathcal P}}$ from ${\\mathcal P}$ itself. $\\tilde{{\\mathcal P}}$ is defined by\n",
"\n",
"\\begin{equation}\n",
"\\tilde{{\\mathcal P}}(m) := \\sum_{j=-J}^{J}{\\mathcal P}(m+j) \\frac{1}{2J+1}\n",
"\\end{equation}\n",
"\n",
"for a neighborhood $J\\in{\\mathbb N}$, $m\\in[0:M-1]$, and $M$ is the number of frames.\n",
"Note that we assume ${\\mathcal P}(m) = 0$ for $m\\notin [0:M-1]$.\n",
"From this, we compute a *novelty curve* ${\\mathcal N}$\n",
"\n",
"\\begin{equation}\n",
"{\\mathcal N}(m) := \\max(0,{\\mathcal P}(m) - \\tilde{{\\mathcal P}}(m))\n",
"\\end{equation}\n",
"\n",
"The peaks in ${\\mathcal N}$ indicate positions of high energy in $x$, and are therefore potential time positions of drum hits.\n",
"\n",
"This procedure works well in case the initial assumption, namely that onsets or drum hits emit some\n",
"burst of energy which stand out from the remaining energy in the signal, is met.\n",
"However, especially in professionally mixed music recordings, the short-time energy is often adjusted\n",
"to be more or less constant over time (compression).\n",
"One possibility to circumvent this problem is to apply HPSS to the input signal prior to the onset detection.\n",
"The onset detection is then executed solely on the percussive component which usually contains\n",
"all drum hits and satisfies the assumption of having energy bursts at the respective time-positions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Apply the HPSS algorithm as a pre-processing step in a chroma extraction procedure:\n", "\n", "

- \n",
"
- Load the file
`CastanetsViolin.wav`. \n",
" - Compute chroma features on $x$ with the parameters
`N=4410`and`H=2205`. \n",
" - Visualize the chroma features. \n", "
- Apply your HPSS algorithm to separate the castanets from the violin. \n", "
- Use the harmonically enhanced signal $x_\\mathrm{h}$ to compute chroma features and visualize them. \n", "
- Now compare the visualization of the chroma extracted from the original signal $x$ and the chroma extracted from the harmonic component signal $x_\\mathrm{h}$. What do you observe? \n", "

\n",
"**Lab Experiment 7**

\n", "Complete the implementation of the onset detection algorithm in the function`onset_detection`:\n",
"\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def stp(x, N, H):\n",
" # x: Input signal\n",
" # N: Frame length\n",
" # H: Hopsize\n",
" x_pad = np.concatenate((x, np.zeros(N)))\n",
" num_windows = np.ceil(1 + (len(x) - N) / H)\n",
" win_pos = np.arange(num_windows) * H\n",
" idx = np.array([np.arange(w, w+N) for w in win_pos], dtype='int32')\n",
" P = (x_pad[idx] ** 2).sum(axis=1) / N\n",
" return P\n",
" \n",
"def onset_detection(x, N, H, J):\n",
" # x: Input signal\n",
" # N: Frame length\n",
" # H: Hopsize\n",
" # J: Neighborhood\n",
" \n",
" # short-time power\n",
"\n",
" # local average\n",
"\n",
" # novelty\n",
" \n",
" return N"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n", "Complete the implementation of the onset detection algorithm in the function

- \n",
"
- Compute the short-time power ${\\mathcal P}$ of the input signal $x$ using the provided function
`stp`. \n",
" - Compute the local average $\\tilde{{\\mathcal P}}$.\n",
" (Hint: Note that the equation can be formulated as a convolution and that you can compute convolutions in Python using the function
`np.convolve`.\n", " Note further that this command has an keyword`mode`. Finally, have a look at the function`np.ones`.) \n",
" - Compute the novelty curve ${\\mathcal N}$. \n", "

\n",
"Test your implementation by applying it to the audio file `StillPluto_BitterPill.wav`. As a starting point, use $N=882$, $H = 441$, and $J = 10$.\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here...\n",
"\n",
"plt.figure(figsize=(16, 10.66))\n",
"plt.plot(t, novelty)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Sonify your results using the function `sonify_noveltyCurve`. This function will generate a stereo audio signal in which you can hear the provided original signal\n",
" in one of the channels. In the other channel, each peak in the provided novelty curve is audible as a click sound. You can therefore check by listening whether the peaks in your\n",
" computed novelty curve are aligned with drum hits in the original signal. To apply the function `sonify_noveltyCurve`, you need to specify the sampling frequency of the\n",
" novelty curve. How can you compute it? (Hint: It is dependent on $H$ and the sampling frequency $F_\\mathrm{s}$ of the input audio signal).\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def sonify_noveltyCurve(novelty, x, Fs, sampling_frequency_novelty):\n",
" pos = np.append(novelty, novelty[-1]) > np.insert(novelty, 0, novelty[0])\n",
" neg = np.logical_not(pos)\n",
" peaks = np.where(np.logical_and(pos[:-1], neg[1:]))[0]\n",
" \n",
" values = novelty[peaks]\n",
" values /= np.max(values)\n",
" peaks = peaks[values >= 0.01]\n",
" values = values[values >= 0.01]\n",
" peaks_idx = np.int32(np.round(peaks / sampling_frequency_novelty * Fs))\n",
" \n",
" sine_periods = 8\n",
" sine_freq = 880\n",
" click = np.sin(np.linspace(0, sine_periods * 2 * np.pi, sine_periods * Fs//sine_freq))\n",
" ramp = np.linspace(1, 1/len(click), len(click)) ** 2\n",
" click = click * ramp\n",
" click = (click * np.abs(np.max(x)))\n",
" \n",
" out = np.zeros(len(x), dtype=x.dtype)\n",
" for i, start in enumerate(peaks_idx):\n",
" idx = np.arange(start, start+len(click))\n",
" out[idx] += (click * values[i]).astype(x.dtype)\n",
" \n",
" return np.vstack((x, out)) \n",
"\n",
"# your code here...\n",
"\n",
"Audio(newx, rate=Fs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Listen to the generated results. What is your impression?\n",
"

"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Now apply your HPSS algorithm to the audio file and rerun the detection algorithm on just the percussive component $x_\\mathrm{p}$. Again, sonify the results. What is your impression now?\n",
"

"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here...\n",
"\n",
"plt.figure(figsize=(16, 10.66))\n",
"plt.plot(t, novelty)\n",
"\n",
"newx = sonify_noveltyCurve(novelty, x, Fs, Fs/H)\n",
"Audio(newx, rate=Fs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Acknowledgment:** The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
"\n",
"