FMP AudioLabs

Chapter 7: Content-Based Audio Retrieval

One important topic in music information retrieval is concerned with the development of search engines that enable users to explore music collections in a flexible and intuitive way. In Chapter 7 of [Müller, FMP, Springer 2015], we discuss audio retrieval strategies that follow the query-by-example paradigm: given an audio query, the task is to retrieve all documents that are somehow similar or related to the query. Starting with audio identification, a technique used in many commercial applications such as Shazam, we study various retrieval strategies to handle different degrees of similarity. Furthermore, considering efficiency issues, we discuss fundamental indexing techniques based on inverted lists—a concept originally used in text retrieval.

7.1 Audio Identification
7.2 Audio Matching
7.3 Version Identification
7.4 Further Notes


Topic Relation to [Müller, FMP, Springer 2015] & Description HTML IPYNB
Content-Based Audio Retrieval [Chapter 7 (Introduction), Section 7.4]
Information retrieval; query; document; item; content; query-by-example; specificity; granularity; audio identification; fingerprinting; audio matching; version identification; types of version; arrangement; cover version; Mona Lisa example; Beethoven example (Fifth Symphony); Cover song examples
[html] [ipynb]
Audio Identification [Section 7.1]
Audio fingerprint; client–server model; specificity; robustness; compactness; scalability; spectral peaks; constellation map; indexing; hash; peak pairs; anchor point; target zone; Beatles example (Act Naturally)
[html] [ipynb]
Feature Design (Chroma, CENS) [Section 7.2.1]
Chromagram; normalization; smoothing; downsampling; quantization; Beethoven example (Fifth Symphony)
[html] [ipynb]
Diagonal Matching [Section 7.2.2]
Matching function; cost matrix; dot product; local minimum; retrieval; match; multiple-query strategy; scaled version; tempo variation; synthetic example
[html] [ipynb]
Subsequence DTW [Section 7.2.3]
Local alignment; dynamic time warping; cost matrix; accumulated cost matrix; matching function (DTW, diagonal); step size condition; implementation (LibFMP, LibROSA); Beethoven example (Fifth Symphony)
[html] [ipynb]
Audio Matching [Section 7.2]
Chroma features; CENS; matching function; match; cyclic shift; transposition-invariant matching function; Beethoven example (Fifth Symphony); Shostakovich example (Waltz No. 2); Zager and Evans example (In the year 2525)
[html] [ipynb]
Common Subsequence Matching [Section 7.3.2]
Sequence alignment; global alignment; local alignment; score matrix; path; step size; induced segment; accumulated score matrix; dynamic programming; backtracking; partial matching; monotonicity condition; toy example
[html] [ipynb]
Version Identification [Section 7.3, Section 7.3.2]
Cover song; document-level retrieval; tonal properties; chroma features; local alignment; common longest subsequence; similarity score; Beatles example (Ocean Colour Scene)
[html] [ipynb]
Evaluation Measures [Section 7.3.3, Exercise 7.12]
Document-level retrieval; item; similarity score; rank; top rank; relevance function; precision; recall; PR curve; break-even point; maximal F-measure; average precision; mean average precision (MAP)
[html] [ipynb]
C0 C1 C2 C3 C4 C5 C6 C7 C8