SM Toolbox

The SM Toolbox has been developed by Meinard Müller, Nanzhu Jiang, Peter Grosche, and Harald G. Grohganz. It contains MATLAB implementations for computing and enhancing similarity matrices in various ways. Furthermore, the toolbox includes a number of additional tools for parsing, navigation, and visualization synchronized with audio playback. Also, it contains code for a recently proposed audio thumbnailing procedure that demonstrates the applicability and importance of enhancement concepts. The MATLAB implementations provided on this website are published under the terms of the General Public License (GPL). A general overview of the SM Toolbox is given in [1].

If you publish results obtained using these implementations, please cite [1]. For technical details, applications, or data please cite [2], [3], [4], [5], [6], [7].

MATLAB Code

The MATLAB implementations provided on this website are published under the terms of the General Public License (GPL), version 2 or later. If you publish results obtained using these implementations, please cite the references below.

Download SM Toolbox (Version 1.0. Last update: 2013-07-01): [zip]

Computation and Visualization of SSMs

  • features_to_SM.m Computing similarity matrices based on different enhancement strategies such as tempo invariance and transposition invariance.
  • threshSM.m Application of different thresholding techniques.
  • visualizeSM.m Visualiation of similarity matrix.
  • visualizeTransIndex.m Visualization of transposition index matrix.
  • makePlotPlayable.m Synchronized playback of audio file along with a plotted figure.

Thumbnailing application

Demos

The following demo files are provided. These demo files allow you to try out the code and give you a first overview of the toolbox. The necessary audio files to run the demos are also provided by the toolbox.

Important Notes

  • For the SM Toolbox the MATLAB Signal Processing Toolbox is required.
  • For the feature computation the Chromagram Toolbox is required. For convenience, the Chroma Toolbox has been included in the folder MATLAB-Chroma-Toolbox_2.0 of the zip-file provided above. The feature extraction step may replaced using feature extraction functions supplied by other toolboxes.
  • The implementations have been tested using MATLAB 2012b or newer.
  • For questions, please contact Meinard Müller, Nanzhu Jiang or Harald G. Grohganz.

Similarity Matrix

The concept of similarity matrices (SMs) has been widely used for a multitude of music analysis and retrieval tasks including audio structure analysis or version identification. For such tasks, the improvement of structural properties of the similarity matrix at an early state of the processing pipeline has turned out to be of crucial importance. The SM toolbox contains MATLAB implementations for computing and enhancing similarity matrices in various ways.

SM_1
Original SSM
SM_2
Diagonal smoothing
SM_3
Tempo-invariant smoothing
SM_4
Forward-backward smoothing
SM_5
Tranposition-invariant SSM
SM_6
Tranposition index matrix
SM_7
Binary thresholding
SM_8
Thresholding with penalty
  • SM (Similarity Matrix): Given a audio recording, we first extract audio features such as chroma features from the recording, this is done by our Chroma Toolbox. Then a similarity measure between pair of features is specified. In our case, we use cosine similarity. Finally, we compute the similarity matrix with each element encodes the similarity between a certain pair of features.

    MATLAB function: features_to_SM.m

  • SM with diagonal smoothing: One important property of similarity matrices is the appearance of paths which represents high similarity of a pair of segments (the segments can be obtained by projecting the path on the vertical and horizontal axis respectively). One main task is to extract and identify such paths and using them to identify the similar pairs of segments. Due to musical and acoustic variations, there are noises around path structure which make the extraction and identification difficult. To further enhance the path structure, one general strategy is to apply some kind of smoothing filter along the direction of the main diagonal, resulting in an emphasis of diagonal information in S and a denoising of other structures.

    Controlling parameter: paramSM.smoothLenSM

  • SM with tempo-invariance: One of the main enhancement for a similarity matrix. In order to judge whether two segments are similar or not, a simple diagonal smoothing in the similarity matrix is usually not enough since music may have repeated parts with a faster or slower tempo. To deal with such tempo difference, we implement in our tool box that a similarity matrix is smoothed along various directions, each such direction corresponds to a tempo difference.

    Controlling parameters: paramSM.tempoRelMin, paramSM.tempoRelMax, paramSM.tempoNum

  • SM with forward and backward smoothing: By default, the implemented smoothing filter is realized to smooth in forward direction only. This results in a fading out of the paths in particular when using a large length parameter. To avoid this fading out, one can use a forward-backward option, which applies the filter also in backward direction.

    Controlling parameter: paramSM.forwardBackward

  • SM with transposition-invariance: It is often the case that certain musical parts are repeated in a transposed form. Such transpositions can be simulated by cyclically shifting chroma vectors. In our toolbox, we construct transposition-invariant similarity matrices by keeping one chroma feature sequence unaltered whereas the other chroma feature sequence cyclically shifted along the chroma dimension. Then, for each shifted version, a similarity matrix is computed, and the final similarity matrix is obtained by taking the cell-wise maximum over the twelve matrices. In this way, the repetitive structure is revealed even in the presence of key transpositions. Furthermore, storing the maximizing shift index for each cell results in another matrix referred to as transposition index matrix, which displays the harmonic relations within the music recording.

    Controlling parameter: paramSM.circShift

  • SM thresholded: In many music analysis applications, similarity matrices are further processed by suppressing all values that fall below a given threshold. On the one hand, such a step often leads to a substantial reduction of the noise while leaving only the most significant structures. On the other hand, weaker but still relevant information may be lost. Actually, the thresholding strategy may have a significant impact on the final results and has to be carefully chosen in the context of the considered application. In our toolbox, we offer some post-processing techniques such as thresholding, scaling, binarization or penalizing. .

    MATLAB functions and controlling parameters: threshSM.m, paramThres.threshTechnique, paramThres.threshValue, paramThres.applyBinarize, paramThres.applyScale, paramThres.penalty

Thumbnailing Application

As an illustrating application, our toolbox also contains the MATLAB code for a recently proposed audio thumbnailing procedure. For this task, the goal is to find the the most representative and repetitive segment of a given audio recording. Based on a suitable self-similarity matrix, the procedure in [4] computes for each audio segment a fitness value that expresses how well the given segment explains other related segments (also called induced segments) in the audio recording. These relations are expressed by a so-called path family over the given segment. The thumbnail is then defined as the fitness-maximizing segment. Furthermore, a triangular scape plot representation is computed, which shows the fitness of all segments and yields a compact high-level view on the structural properties of the entire audio recording.

scapePlot
Fitness scape plot
thumbnail
Thumbnail with pathfamily and its induced segments
  • Fitness scape plot: Starting with a self-similarity matrix, we derive the fitness scape plot where it encodes all fitness values representing the repetiveness for all possible segments of the given recording. In the computation, various step size and weighting parameters can be used to adjust the procedure. The resulting fitness scape plot can also be visualized.

    MATLAB functions: SSM_to_scapePlotFitness.m, visualizeScapePlot.m

  • Derive thumbnail: Using the fitness scape plot as input, we select the point which encodes the maximum fitness, and its corresponding segment is considered as the thumbnail.

    MATLAB function: scapePlotFitness_to_thumbnail.m

  • Induced segment family: Taking the thumbnail segment and the SSM, we compute the path family for the thumbnail and find all its repetition segments, these segments formed a induced segment family. The path family computation as well as the induced segment family can be visualizled.

    MATLAB functions: thumbnailSSM_to_pathFamily.m, visualizePathFamilySSM.m

Further Functions

  • Parser of annotation file: As an assistant function, we provide a parser of annotation text file in our tool box. This parser can handle most of popular used structure annotating format.

    MATLAB function: parseAnnotationFile.m

  • Visualization with playback function: In order to get intuitive understanding of the relation between visualized phenomena and underlying music, we implemented a function which adds playback functionality to a given plot or image object. With the playback of sound file, one can easily inspect the figure for an audible analysis of certain points of interest.

    MATLAB function: makePlotPlayable.m

References

  1. Meinard Müller, Nanzhu Jiang and Harald G. Grohganz
    SM Toolbox: MATLAB Implementations for Computing and Enhancing Similarity Matrices
    In Proceedings of 53rd Audio Engineering Society (AES), 2014.
    @inproceedings{MuellerJG14_SMToolbox_AES,
    author    = {Meinard M{\"u}ller and Nanzhu Jiang and Harald G. Grohganz},
    title     = {SM Toolbox: MATLAB Implementations for Computing and Enhancing Similarity Matrices},
    booktitle = {Proceedings of 53rd Audio Engineering Society ({AES})},
    address   = {London, UK},
    year      = {2014},
    }
  2. Meinard Müller and Frank Kurth
    Enhancing Similarity Matrices for Music Audio Analysis
    In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP): 437—440, 2006.
    @inproceedings{MuellerK06_EnhancingSimilarityMatrices_ICASSP,
    author    = {Meinard M{\"u}ller and Frank Kurth},
    title     = {Enhancing Similarity Matrices for Music Audio Analysis},
    booktitle = {Proceedings of the International Conference on Acoustics, Speech and Signal Processing ({ICASSP})},
    address   = {Toulouse, France},
    year      = {2006},
    pages     = {437--440},
    }
  3. Meinard Müller and Michael Clausen
    Transposition-Invariant Self-Similarity Matrices
    In Proceedings of the International Conference on Music Information Retrieval (ISMIR): 47—50, 2007.
    @inproceedings{MuellerC07_Transposition_ISMIR,
    author    = {Meinard M{\"u}ller and Michael Clausen},
    title     = {Transposition-Invariant Self-Similarity Matrices},
    booktitle = {Proceedings of the International Conference on Music Information Retrieval ({ISMIR})},
    address   = {Vienna, Austria},
    year      = {2007},
    pages     = {47--50},
    }
  4. Meinard Müller, Nanzhu Jiang and Peter Grosche
    A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing
    IEEE Transactions on Audio, Speech & Language Processing, 21(3): 531—543, 2013.
    @article{MuellerJG13_StructureAnaylsis_IEEE-TASLP,
    author    = {Meinard M{\"u}ller and Nanzhu Jiang and Peter Grosche},
    title     = {A Robust Fitness Measure for Capturing Repetitions in Music Recordings With Applications to Audio Thumbnailing},
    journal   = {IEEE Transactions on Audio, Speech {\&} Language Processing},
    volume    = {21},
    number    = {3},
    year      = {2013},
    pages     = {531-543},
    }
  5. Meinard Müller
    Information Retrieval for Music and Motion
    Springer Verlag, ISBN: 3540740473, 2007.
    @book{Mueller07_InformationRetrieval_SPRINGER,
    author = {Meinard M{\"u}ller},
    title = {Information Retrieval for Music and Motion},
    type = {Monograph},
    year = {2007},
    isbn = {3540740473},
    publisher = {Springer Verlag}
    }
  6. Meinard Müller, Frank Kurth and Michael Clausen
    Audio Matching via Chroma-Based Statistical Features
    In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011.
    @inproceedings{MuellerKC05_ChromaFeatures_ISMIR,
    author = {Meinard M{\"u}ller and Frank Kurth and Michael Clausen},
    title = {Audio Matching via Chroma-Based Statistical Features},
    booktitle = {Proceedings of the 12th International Conference on Music Information Retrieval ({ISMIR})},
    year = {2011},
    pages = {},
    }
  7. Meinard Müller, Verena Konz, Wolfgang Bogler and Vlora Arifi-Müller
    Saarland Music Data (SMD)
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): Late Breaking session, 2011.
    @inproceedings{MuellerKBA11_SMD_ISMIR,
    author    = {Meinard M{\"u}ller and Verena Konz and Wolfgang Bogler and Vlora Arifi-M{\"u}ller},
    title     = {Saarland Music Data ({SMD})},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR}): Late Breaking session},
    year      = {2011},
    }