NMF Toolbox

Abstract

Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB implementations of conceptually distinct NMF variants---in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.

General Description

MATLAB and Python

Dataset

MATLAB Code

Demo Files

demoAudioMosaicingContinuityNMF
demoDrumSoundSeparationNMF
demoEDMDecompositionFourComp

data.zip

Code Description

code.zip

Filename Description and main parameters
NMFD.m Nonnegative Matrix Factor Deconvolution with KLD and fixable components [2]. V, numComp, numIter, numTemplateFrames, initW, initH, paramConstr, fixH
NMF.m Nonnegative matrix factorization with KLD as default cost function [3], [4]. V, costFunc, numIter, numComp.
NMFdiag.m Nonnegative matrix factorization with enhanced diagonal continuity constraints [5]. V, W0, H0, distmeas, numOfIter, fixW, continuity.length, continuity.grid, continuity.sparsen, continuity.polyphony
NMFconv.m Convolutive NMF with beta-divergence [6]. V, numComp, numIter, numTemplateFrames, initW, initH, beta, sparsityWeight, uncorrWeight
convModel.m Convolutive NMF model implementing Eq. (4) from [7]. Note that it can also be used to compute the standard NMF model in case the number of time frames of the templates equals one. W, H
shiftOperator.m Shift operator as described in Eq. (5) from [7]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros. A, shiftAmount
initActivations.m Initialization strategies for NMF activations, including random and uniform. The pitched strategy places gate-like activations at the frames where certain notes are active in the ground truth [8]. The strategy drums uses decaying impulses at these positions [7]. numComp, numFrames, deltaT, pitches, onsets, durations, drums, decay, onsetOffsetTol, tolerance, strategy
initTemplates.m NMF template initialization strategies, including random and uniform. The strategy pitched uses comb-filter templates [8]. The drums strategy uses pre-extracted averaged spectra of typical drum types. numComp, numBins, numTemplateFrames, pitches, drumTypes, strategy
NEMA.m Row-wise nonlinear exponential moving average. Used to introduce exponentially decaying slopes according to Eq. (3) from [9]. lambda
midi2freq.m, freq2midi.m, logFreqLogMag.m Helper functions to convert between MIDI pitches and frequencies in Hz, as well as log-frequency and log-magnitude representations for visualization. midi, freq, A, deltaF, binsPerOctave, upperFreq, lowerFreq
LSEE_MSTFTM_GriffinLim.m, forwardSTFT.m, inverseSTFT.m Reconstruct the time-domain signal by means of the frame-wise inverse FFT and overlap-add method described as least squares error estimation from the modified STFT magnitude (LSEE-MSTFT) in[10]. blockSize, hopSize, anaWinFunc, synWinFunc, reconstMirror, appendFrame, analyticSig, numSamples
alphaWienerFilter.m Alpha-related soft masks for extracting sources from mixture. Details in [11] and experiments in [12]. alpha, binarize

Python Code

Literature

This is the accompanying website for [1], where further details on the toolbox, dataset, and the applications are discussed.

  1. Patricio López-Serrano, Christian Dittmar, Yiğitcan Özer, and Meinard Müller
    NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization
    In Submitted to Proceedings of the International Conference on Digital Audio Effects (DAFx), 2019.
    @inproceedings{LopezSerranoDOEM19_NMFToolbox_DAFx,
    author    = {Patricio L\{'o}pez-Serrano and Christian Dittmar and Yi{ğ}itcan \{"O}zer and Meinard M\"uller},
    booktitle = {Submitted to Proceedings of the International Conference on Digital Audio Effects ({DAFx})},
    title     = {NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization},
    year      = {2019},
    month     = {September},
    address   = {Birmingham, UK},
    pages     = {},
    }
  2. Paris Smaragdis
    Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs
    In Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation (ICA): 494–499, 2004.
    @inproceedings{Smaragdis04_NMD,
    author    = {Paris Smaragdis},
    title     = {Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs},
    booktitle = {Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation {(ICA)}},
    pages     = {494--499},
    address   = {Granada, Spain},
    year      = {2004},
    month     = {September},
    }
  3. Meinard Müller
    Fundamentals of Music Processing
    Springer Verlag, ISBN: 978-3-319-21944-8, 2015.
    @book{Mueller15_FMP_SPRINGER,
    author    = {Meinard M{\"u}ller},
    title     = {Fundamentals of Music Processing},
    type      = {Monograph},
    year      = {2015},
    isbn      = {978-3-319-21944-8},
    publisher = {Springer Verlag}
    }
  4. Daniel D. Lee and H. Sebastian Seung
    Learning the parts of objects by non-negative matrix factorization
    Nature, 401(6755): 788–791, 1999.
    @article{LeeS99_LearningPartsNMF_Nature,
    author={Daniel D. Lee and H. Sebastian Seung},
    title={Learning the parts of objects by non-negative matrix factorization},
    volume={401},
    number={6755},
    journal={Nature},
    year={1999},
    pages={788--791}
    }
  5. Jonathan Driedger, Thomas Prätzlich, and Meinard Müller
    Let It Bee — Towards NMF-Inspired Audio Mosaicing
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 350–356, 2015.
    @inproceedings{DriedgerPM15_AudioMosaicingNMF_ISMIR,
    author    = {Jonathan Driedger and Thomas Pr{\"a}tzlich and Meinard M{\"u}ller},
    title     = {{L}et {I}t {B}ee -- {T}owards {NMF}-Inspired Audio Mosaicing},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {M\'{a}laga, Spain},
    year      = {2015},
    pages     = {350--356},
    }
  6. Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari
    Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation
    John Wiley and Sons, 2009.
    @Book{CichockiZP_AlternateAlgorithmsNmf_Book,
    author    = {Andrzej Cichocki and Rafal Zdunek and Anh Huy Phan and {Shun-ichi} Amari},
    title     = {Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation},
    publisher = {John Wiley and Sons},
    year      = {2009}
    }
  7. Christian Dittmar and Meinard Müller
    Reverse Engineering the Amen Break — Score-Informed Separation and Restoration Applied to Drum Recordings
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(9): 1531–1543, 2016. DOI
    @article{DittmarMueller16_DrumSep_IEEE-ACM-TASLP,
    author    = {Christian Dittmar and Meinard M{\"u}ller},
    title     = {Reverse Engineering the {A}men Break -- Score-Informed Separation and Restoration Applied to Drum Recordings},
    journal   = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing},
    volume    = {24},
    number    = {9},
    pages     = {1531--1543},
    year      = {2016},
    doi       = {10.1109/TASLP.2016.2567645},
    }
  8. Jonathan Driedger, Harald Grohganz, Thomas Prätzlich, Sebastian Ewert, and Meinard Müller
    Score-Informed Audio Decomposition and Applications
    In Proceedings of the ACM International Conference on Multimedia (ACM-MM): 541–544, 2013. PDF Details
    @inproceedings{DriedgerGPEM13_AudioDecomposition_ACM-MM,
    author    = {Jonathan Driedger and Harald Grohganz and Thomas Pr{\"a}tzlich and Sebastian Ewert and Meinard M{\"u}ller},
    title     = {Score-Informed Audio Decomposition and Applications},
    booktitle = {Proceedings of the {ACM} International Conference on Multimedia ({ACM-MM})},
    address   = {Barcelona, Spain},
    year      = {2013},
    pages     = {541--544},
    url-pdf   = {2013_DriedgerGPEM_SourceSeparationInterface_ACM.pdf},
    url-details = {https://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp/}
    }
  9. Christian Dittmar, Patricio López-Serrano, and Meinard Müller
    Unifying Local and Global Methods for Harmonic-Percussive Source Separation
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 176–180, 2018. Demo
    @inproceedings{DittmarLM18_HPSS_KAM_NMF_ICASSP,
    author    = {Christian Dittmar and Patricio L{\'o}pez-Serrano and Meinard M{\"u}ller},
    title     = {Unifying Local and Global Methods for Harmonic-Percussive Source Separation},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    address   = {Calgary, Canada},
    month     = {April},
    year      = {2018},
    pages     = {176--180},
    url-demo={https://www.audiolabs-erlangen.de/resources/MIR/2018-ICASSP-HPSS_KAM_NMF},
    }
  10. Daniel W. Griffin and Jae S. Lim
    Signal estimation from modified short-time Fourier transform
    IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2): 236–243, 1984.
    @article{GriffinL84_SpecgramInversion_TASSP,
    author={Daniel W. Griffin and Jae S. Lim},
    title={Signal estimation from modified short-time {F}ourier transform},
    journal={{IEEE} Transactions on Acoustics, Speech, and Signal Processing},
    year={1984},
    volume={32},
    number={2},
    pages={236--243}
    }
  11. Antoine Liutkus and Roland Badeau
    Generalized Wiener filtering with fractional power spectrograms
    In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 266–270, 2015.
    @inproceedings{LiutkusB15_WienerFilter_ICASSP,
    author = {Antoine Liutkus and Roland Badeau},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})},
    title = {Generalized {W}iener filtering with fractional power spectrograms},
    year = {2015},
    month = {April},
    pages = {266--270},
    address = {Brisbane, Australia},
    }
  12. Christian Dittmar, Jonathan Driedger, Meinard Müller, and Jouni Paulus
    An Experimental Approach to Generalized Wiener Filtering in Music Source Separation
    In Proceedings of the European Signal Processing Conference (EUSIPCO), 2016.
    @inproceedings{DittmarDMP16_WienerFiltering_EUSIPCO,
    author    = {Christian Dittmar and Jonathan Driedger and Meinard M{\"u}ller and Jouni Paulus},
    title     = {An Experimental Approach to Generalized {W}iener Filtering in Music Source Separation},
    booktitle = {Proceedings of the European Signal Processing Conference ({EUSIPCO})},
    address   = {Budapest, Hungary},
    year      = {2016},
    pages     = {},
    month     = {August},
    url-pdf   = {}
    }