FMP AudioLabs


In this notebook, we give a short overview on how to integrate multimedia objects (in particular, audio, image, and video objects) into a Jupyter notebook. Rather than being comprehensive, we only give a selection of possibilities as used in the other FMP notebooks. In particular, we discuss two alternatives: a direct integration of images, video, and audio elements using HTML tags as well as an integration using the module IPython.display.

Audio Objects

Audio: HTML <audio> tag

The HTML <audio> tag defines an in-browser audio player and allows for playing back a specified audio file (MP3, WAV, OGG), see here for details. Note that the functionality and the visual appearance of the audio player depends on the respective browser used. The <audio> tag can be used within a markdown cell and does not require any Python.

Audio: Using IPython.display.Audio

An alternative is to use the module IPython.display, which is an application programming interface (API) for displaying various tools in IPython. As for audio, the following class is available (IPython version 6.0 or higher):

IPython.display.Audio(data=None, filename=None, url=None, embed=None, rate=None, autoplay=False, normalize=True, *, element_id=None)

Warning: As default, IPython.display.Audio normalizes the audio (dividing by the maximum over all sample values) before playback. This may be unwanted for certain applications, where the volume of the audio should be kept to its original level. For examples, see the FMP notebook on Audio.

When used in a code cell, creates an in-browser audio player. The following two options are conceptually different:

  • When using the keyword argument filename, the audio file is loaded from the specified path and embedded into the notebook (with default embed=True).
  • When using the keyword argument url, the player is linked to the audio file by the specified URL (with default embed=False).

Note that if you want the audio to be playable later with no internet connection (or with no local audio file available), you need to embed the audio file into the notebook. This can be done using the first option. The following example illustrates the difference between the two options.

In [1]:
import os
import IPython.display as ipd
import librosa
import numpy as np
%matplotlib inline

path_filename = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.mp3')

audio_element_filename = ipd.Audio(filename=path_filename)
print('Size of <audio> tag (with embedded audio file): %s Bytes' 
      % len(audio_element_filename._repr_html_().encode('utf8')), flush=True)

audio_element_url = ipd.Audio(url=path_filename)
print('Size of <audio> tag (with linked audio file): %s Bytes' 
      % len(audio_element_url._repr_html_().encode('utf8')), flush=True)
Size of <audio> tag (with embedded audio file): 22910 Bytes
Size of <audio> tag (with linked audio file): 244 Bytes

Audio: WAV and MP3

Embedding audio files may lead to very large Jupyter notebooks (also large files when exported as HTML). This particularly holds when embedding raw audio files encoded as WAV file. For example, encoding a song of five to ten minutes in CD quality (44100 Hz, stereo), may easily lead to a file size of more than 50 MB. Therefore, to reduce the size, one may consider the following:

  • Trim audio files to have short durations.
  • Reduce the sampling rate.
  • Convert to mono.
  • Use the MP3 audio coding format.

The following example shows the difference in file size of a WAV and MP3 audio file.

In [2]:
path_filename_wav = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.wav')
audio_element_wav = ipd.Audio(filename=path_filename_wav)
print('Size of <audio> tag (with embedded WAV file): %s Bytes' 
      % len(audio_element_wav._repr_html_().encode('utf8')), flush=True)

path_filename_mp3 = os.path.join('..', 'data', 'B', 'FMP_B_Note-C4_Piano.mp3')
audio_element_mp3 = ipd.Audio(filename=path_filename_mp3)
print('Size of <audio> tag (with embedded MP3 file): %s Bytes' 
      % len(audio_element_mp3._repr_html_().encode('utf8')), flush=True)
Size of <audio> tag (with embedded WAV file): 121640 Bytes
Size of <audio> tag (with embedded MP3 file): 22910 Bytes

Audio: Waveform-Based Signals

One may also use IPython.display.Audio to embed waveform-based audio signals (either mono or stereo). The following code example shows how to read a WAV and MP3 file using the Python package librosa. Note that in both cases, the audio files are converted into waveform representations.

In [3]:
x_wav, Fs_wav = librosa.load(path_filename_wav, sr=None)
audio_wav = ipd.Audio(data=x_wav, rate=Fs_wav)
print('Size of <audio> tag (coming from WAV): %s Bytes'
      % len(audio_wav._repr_html_().encode('utf8')), flush=True)

x_mp3, Fs_mp3 = librosa.load(path_filename_mp3, sr=None)
audio_mp3 = ipd.Audio(data=x_mp3, rate=Fs_mp3)
print('Size of <audio> tag (coming from MP3): %s Bytes' 
      % len(audio_mp3._repr_html_().encode('utf8')), flush=True)
Size of <audio> tag (coming from WAV): 121636 Bytes
Size of <audio> tag (coming from MP3): 126244 Bytes
/Users/zal/miniconda3/envs/FMP/lib/python3.7/site-packages/librosa/core/ UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")

The next example shows how to generate a stereo audio signal and how to embed it into the Jupyter notebook. For explanations of the code example, we refer to the FMP notebook on waveforms.

Warning: Depending on the web browser, only specific sampling rates may be supported for audio playback. In the following example, we use the sampling rate Fs = 4000, which seems to work for most browsers.
In [4]:
Fs = 4000
duration = 4
t = np.linspace(0, duration, Fs * duration)
signal_left = np.sin(2 * np.pi * 200 * t)
signal_right = np.sin(2 * np.pi * 600 * t)
signal_stereo = [signal_left, signal_right]
ipd.Audio(signal_stereo, rate=Fs)

Image Objects

Image: HTML <img> tag

Similar to audio, there are many ways to integrate image objects into a Jupyter notebook. First of all, one can use the <img> tag within a markdown cell without requring any Python. The following figure shows a self-similarity matrix (SSM) of a recording of Brahms' Hungarian Dance No. 5, see Section 4.2.2 of [Müller, FMP, Springer 2015].


HTML also allows for showing animated GIFs. This simple format encodes a number of images or frames, which are presented in a specific order to create a short animation. Using animated GIFs is a nice way to illustrate processing pipelines. For example, the following animated GIF shows the previous SSM in its original form along with a version after applying smoothing as well as thresholding and scaling.


Image: Using IPython.display.Image

Similar to the the audio case, an alternative is to use the module IPython.display to create an image given the path to a PNG/JPEG/GIF file. As for images, the following class is available:

IPython.display.Image(data=None, url=None, filename=None, format=None, embed=None, width=None, height=None, retina=False, unconfined=False, metadata=None)

Again, there are two options, which either embed or link an image object:

  • When using the keyword argument filename, the image file is loaded from the specified path and embedded into the notebook (with default embed=True).
  • When using the keyword argument url, the data is linked by the specified URL (with default embed=False).

Here are some examples:

In [5]:
path_filename = os.path.join('..', 'data', 'B', 'FMP_B_Chapters_C0_nav.png')

print(' <img> tag with embedded image file:', flush=True)
ipd.display(ipd.Image(filename=path_filename, width=100))

print(' <img> tag with linked image file:', flush=True)
ipd.display(ipd.Image(url=path_filename, width=100))
 <img> tag with embedded image file:
 <img> tag with linked image file:

Image: Display Side by Side

Sometimes, one may want to display several images in a row (rather than having them displayed from top to down). One convenient way is to use the Python library pandas, which provides easy-to-use data structures and data analysis tools.

In [6]:
import os
import pandas as pd
import IPython.display as ipd

pd.set_option('display.max_colwidth', None)

f_img1 = os.path.join('..', 'data', 'B', 'FMP_B_Chapters_C0_nav.png')
f_img2 = os.path.join('..', 'data', 'B', 'FMP_B_Chapters_C1_nav.png')
f_img3 = os.path.join('..', 'data', 'B', 'FMP_B_Chapters_C2_nav.png')
f_img4 = os.path.join('..', 'data', 'B', 'FMP_B_Chapters.gif')

img1 = ipd.Image(url=f_img1, width=100)._repr_html_()
img2 = ipd.Image(url=f_img2, width=100)._repr_html_()
img3 = ipd.Image(url=f_img3, width=100)._repr_html_()
img4 = ipd.Image(url=f_img4, width=100)._repr_html_()

# Generation of two-dimensional tabular data structure (with rows and columns)
df = pd.DataFrame({'images': [img1, img2, img3, img4]})

# Rendering of a DataFrame as an HTML table
ipd.display(ipd.HTML(df.T.to_html(escape=False, header=False, index=False)))
0  <img src="../data/B/FMP_B_Chapters_C0_nav.png" width="100"/>
1  <img src="../data/B/FMP_B_Chapters_C1_nav.png" width="100"/>
2  <img src="../data/B/FMP_B_Chapters_C2_nav.png" width="100"/>
3         <img src="../data/B/FMP_B_Chapters.gif" width="100"/>

Image: Generation and Interaction

There are many ways to generate (interactive) images using Python and to integrate them into a Jupyter notebook. The FMP notebook on python visualization is devoted to this topic.

Video Objects

Video: HTML <video> tag

Finally, we discuss how to integrate videos into a Jupyter notebook. First of all, one can use the <video> tag within a markdown cell without requiring any Python. The following figure shows a video for a user interface (Interpretation Switcher) that facilitates music navigation across different performances of Beethoven's Fifth Symphony, see Section of [Müller, FMP, Springer 2015].

Video: Using IPython.display.Video

An alternative is to use the module IPython.display. The following example shows how to integrate a video as a linked object:

In [7]:
import os
import IPython.display as ipd
path_filename = os.path.join('..', 'data', 'B', 'FMP_B_InterpretationSwitcher_small.mp4')

Video: Using IPython.display.YouTubeVideo

YouTube offers a rich source of videos, which can be easily integrated into a Jupyter notebook. The following class can be used to embed a YouTube video player based on its video identifier.

IPython.display.YouTubeVideo(id, width=400, height=300, **kwargs)

The following YouTube video gives an introduction to chroma features and shows how they can be applied in music navigation and retrieval applications, see also Chapter 3 of [Müller, FMP, Springer 2015]. The video identifier can be found in the YouTube video url.

In [8]:
import IPython.display as ipd
ipd.display(ipd.YouTubeVideo('PF05xP1NqUM', width=600, height=450))

The following YouTube video gives an introduction to tempo and beat tracking, a topic covered in Chapter 6 of [Müller, FMP, Springer 2015].

In [9]:
ipd.display(ipd.YouTubeVideo('FmwpkdcAXl0', width=600, height=450))

Video: Formats and Conversion

There are multitude ways of encoding videos, and there is no answer to the question of the best video format. The suitability of a video format depends on the application in mind, the digital platform used, the bandwidth available (e.g., when distributing a video file), and many other software and hardware constraints. Another source of confusion is that the term video format is often used to refer to two different components called codecs and containers.

A codec is a component that compresses a video file to make it more manageable in terms of storage requirements. There are many different codecs including the x264 codec (used for the H.264 or MPEG-4 Part 10 standard), which is one of the most commonly used formats for the recording, compression, and distribution of video content.

A container can be thought of a set of media files, typically also comprising a video codec, an audio codec, and additional data such as subtitles, lyrics, and other metadata. Important container (or video formats) are for example:

  • AVI: Most popular format in the past; Microsoft replaced AVI by WMV; wide compatibility with both PC and Mac system.
  • WMV: Designed for streaming applications.
  • MOV: Designed for long videos (such as movies); Apple; compatible with Macintosh and Windows. platforms
  • MP4: Widely supported; used in iTunes and other popular products.

Similarly, there are countless tools for processing video files. One of the leading multimedia frameworks currently available is FFmpeg, which offers cross-platform solutions for recording, converting, and streaming audio and video. We refer to the website of FFmpeg for a detailed description of options. Just to give an example, the following command was used to compress, convert, and rescale the video of the Interpretation Switcher shown above:

ffmpeg -i FMP_InterpretationSwitcher.avi -c:v libx264 -preset slow -crf 22 -vf scale=480:-1 FMP_InterpretationSwitcher_small.mp4
Acknowledgment: This notebook was created by Frank Zalkow and Meinard Müller.
C0 C1 C2 C3 C4 C5 C6 C7 C8