Efficient Retrieval of Music Recordings Using Graph-Based Index Structures

This notebook accompanies the following paper:

The paper compares indexing strategies based on $K$-d trees and HNSW graphs in a cross-modal music retrieval application. See more details in the paper. The notebook shows how to load the features for a provided example dataset, generate shingles, reduce the shingle dimensionality using PCA, construct index structures ($K$-d trees and HNSW graphs), and search for the nearest shingles to a given query in the database.

We start by importing some Python packages.

Parameters

We now set various parameters used throughout this notebook. First, we decide whether we use dimensionality reduction (binary variable dimensionality_reduction). Second, we fix the used dimensionality $K$ (K, changed to 240 later if no dimensionality reduction is applied). Third, we set the number $\nu$ of neighbors to search for (nu).

Then, we have three more parameters for the HNSW graph, i.e., number $\nu'$ of neighbor candidates during the search (nu_prime), the minimum number $M$ of edges for each node in the graph (M), and the number $M'$ of edge candidates during the construction of the graph (M_prime). For details about the meaning of these parameters, we refer to our paper.

Load Dataset

We next load the metadata and the CENS features for our dataset. Then, we generate shingles (as vectors of 240 dimensions) from the feature sequences and an index that specifies the original document for each shingle.

To give an overview of the dataset, we create a table with the dataset's musical works, showing the number of versions for each work.

To give a concrete idea about the dataset content, we now select a random shingle from the dataset, show its corresponding metadata, and visualize the shingle.

Dimensionality Reduction

If required, we perform dimensionality reduction to the shingle using PCA.

Query

In this notebook, we use a query from the database to demonstrate the usage of the index structures. You may also use another query by computing the features for an audio file. To do this, we provide the functions common.compute_features and common.generate_shingles.

Now, we search for the $\nu$ nearest shingles in our dataset using an exhaustive search. Here, we compute a matrix of pairwise distances between the query and all database items and select the items corresponding to the $\nu$ smallest distances.

We next show the results of the search using the function show_result_table.

Nearest Neighbor Search with $K$-d tree

Next, we construct a $K$-d tree for our dataset and search for the $\nu$ nearest shingles using this tree. We then show the results as a table.

Nearest Neighbor Search with HNSW Graph

Finally, we construct an HNSW graph for our dataset and search for the $\nu$ nearest shingles using this graph. We then show the results as a table.

Retrieval Runtime

The aim of using index structures in our music application is to increase the efficiency of the retrieval runtime. In the next code cell, we compare the runtimes of the different search strategies. First, we define functions with a consistent interface for the exhaustive search, the $K$-d tree search, and the HNSW-based search. Then we measure the runtimes of these functions using the Python module timeit.

Acknowledgment: This notebook was created by Frank Zalkow.