Date: Jun 2019
Programmer: Christian Dittmar, Yiğitcan Özer

This is the demo script which illustrates the main functionalities of the 'NMF toolbox'. For a detailed description we refer to [1,2] (see References below).

The notebook proceeds in the following steps:


1. It loads an example audio file containing drums and melodic instruments
2. It computes the STFT of the audio data.
3. It applies KAM and NMF as described in [2], with score-informed initialization of the components
4. It visualizes the decomposition results.
5. It resynthesizes the separated audio streams and saves them as wav files to the hard drive.

Reference:

[1] Christian Dittmar, Meinard Mueller
Reverse Engineering the Amen Break - Score-informed Separation and
Restoration applied to Drum Recordings
IEEE/ACM Transactions on Audio, Speech, and Language Processing,
24(9): 1531-1543, 2016.


[2] Christian Dittmar, Patricio Lopez-Serrano, Meinard Mueller
Unifying Local and Global Methods for Harmonic-Percussive Source Separation
In Proceedings of the IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 2018.

If you use the 'NMF toolbox' please refer to:

[3] Patricio López-Serrano, Christian Dittmar, Yiğitcan Özer, and Meinard Müller
NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization
In Proceedings of the International Conference on Digital Audio Effects (DAFx), 2019.

License:

This file is part of 'NMF toolbox'. 'NMF toolbox' is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. 'NMF toolbox' is distributed in the hope that it will be useful, but ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with 'NMF toolbox'. If not, see http://www.gnu.org/licenses/.

Initialization

In [1]:
import os
import numpy as np
import scipy.io.wavfile as wav
import IPython.display as ipd

from NMFtoolbox.utils import make_monaural, pcmInt16ToFloat32Numpy
from NMFtoolbox.forwardSTFT import forwardSTFT
from NMFtoolbox.inverseSTFT import inverseSTFT
from NMFtoolbox.logFreqLogMag import logFreqLogMag
from NMFtoolbox.HPSS_KAM import HPSS_KAM_Fitzgerald
from NMFtoolbox.initTemplates import initTemplates
from NMFtoolbox.initActivations import initActivations
from NMFtoolbox.NMFD import NMFD
from NMFtoolbox.convModel import convModel
from NMFtoolbox.alphaWienerFilter import alphaWienerFilter
from NMFtoolbox.visualizeComponentsNMF import visualizeComponentsNMF
from NMFtoolbox.visualizeComponentsKAM import visualizeComponentsKAM
from NMFtoolbox.drumSpecificSoftConstraintsNMF import drumSpecificSoftConstraintsNMF

inpPath = '../data/'
outPath = 'output/'

# create the output directory if it doesn't exist
if not os.path.isdir(outPath):
    os.makedirs(outPath)

filename = 'runningExample_IGotYouMixture.wav'

1. Load the audio signal

In [2]:
# read signal
fs, x = wav.read(os.path.join(inpPath, filename))

# make monaural if necessary
x = make_monaural(x)

# convert wav from int16 to float32
x = pcmInt16ToFloat32Numpy(x)

# read corresponding transcription files
melodyTranscription = np.loadtxt(os.path.join(inpPath, 'runningExample_IGotYouMelody.txt'))
drumsTranscription = np.loadtxt(os.path.join(inpPath, 'runningExample_IGotYouDrums.txt'))

2. Compute STFT

In [3]:
# spectral parameters
paramSTFT = dict()
paramSTFT['blockSize'] = 2048
paramSTFT['hopSize'] = 512
paramSTFT['winFunc'] = np.hanning(paramSTFT['blockSize'])
paramSTFT['reconstMirror'] = True
paramSTFT['appendFrame'] = True
paramSTFT['numSamples'] = len(x)

# STFT computation
X, A, P = forwardSTFT(x, paramSTFT)

# get dimensions and time and freq resolutions
numBins, numFrames = X.shape
deltaT = paramSTFT['hopSize'] / fs
deltaF = fs / paramSTFT['blockSize']

# get logarithmically-spaced frequency axis version for visualization purposes
logFreqLogMagA, logFreqAxis = logFreqLogMag(A, deltaF)
numLogBins = len(logFreqAxis)

3. Apply KAM-based Harmonic Percussive Separation

In [4]:
# set common parameters
numIterKAM = 30
kamA, Kern, KernOrd = HPSS_KAM_Fitzgerald(A, numIterKAM, 13)

# visualize
paramVis = dict()
paramVis['deltaT'] = deltaT
paramVis['deltaF'] = deltaF
paramVis['fontSize'] = 14

fh1 = visualizeComponentsKAM(kamA, paramVis)

# save result
fh1.savefig(os.path.join(outPath, 'demoDrumExtractionKAM_NMF_percThreshold_KAM.png'))
In [5]:
audios = []

# resynthesize KAM results
for k in range(2):
    Y = kamA[k] * np.exp(1j * P);
    y, _ = inverseSTFT(Y, paramSTFT)
    audios.append(y)
    # save result
    out_filepath = os.path.join(outPath,
                                'demoDrumExtractionKAM_NMF_scoreInformed_KAM_component_{}_extracted_from_{}'.format(k, filename))
    
    wav.write(filename=out_filepath, rate=fs, data=y)

Input audio mixture

In [6]:
ipd.Audio(x, rate=fs)
Out[6]:

Percussive component based on KAM

In [7]:
ipd.Audio(audios[0].T, rate=fs)
Out[7]:

Harmonic component based on KAM

In [8]:
ipd.Audio(audios[1].T, rate=fs)
Out[8]:
In [9]:
# concatenate new NMF target
V = np.concatenate([kamA[0], kamA[1]], axis=0)
numDoubleBins = V.shape[0]

# prepare matrix to revert concatenation
AccuMat = np.concatenate([np.eye(numBins), np.eye(numBins)], axis=1)

4. Apply score-informed NMF to KAM-based target

In [10]:
# set common parameters
numIterNMF = 30 #  in the score-informed case, less iterations are necessary
numTemplateFrames = 1 #  this can also be adjusted upwards

# generate score-informed templates for the melodic part
paramTemplates = dict()
paramTemplates['deltaF'] = deltaF
paramTemplates['numBins'] = numBins
paramTemplates['numTemplateFrames'] = numTemplateFrames
paramTemplates['pitches'] = melodyTranscription[:, 1]
pitchedW = initTemplates(paramTemplates, 'pitched')

# generate score-informed activations for the melodic part
paramActivations = dict()
paramActivations['deltaT'] = deltaT;
paramActivations['numFrames'] = numFrames
paramActivations['pitches'] = melodyTranscription[:, 1]
paramActivations['onsets'] = melodyTranscription[:, 0]
paramActivations['durations'] = melodyTranscription[:, 2]
pitchedH = initActivations(paramActivations, 'pitched')

# generate score-informed activations for the drum part
paramActivations['drums'] = drumsTranscription[:, 1]
paramActivations['onsets'] = drumsTranscription[:, 0]
paramActivations['decay'] = 0.75
drumsH = initActivations(paramActivations,'drums')
numCompDrum = drumsH.shape[0]

# generate audio-informed templates for the drum part
paramTemplates['numComp'] = numCompDrum
drumsW = initTemplates(paramTemplates, 'drums')
In [11]:
# join drum and pitched initialization
initH = np.concatenate([drumsH, pitchedH], axis=0)

# now get the total number of components
numComp = initH.shape[0]

# fill remaining parts of templates with random values
# create joint templates
initW = list()
informedWeight = 5

# fill missing template parts with noise
for k in range(numComp):
    if k < numCompDrum:
        initW.append(np.concatenate([informedWeight * drumsW[k]/drumsW[k].max().reshape(-1, 1), 
                                     np.random.rand(numBins, numTemplateFrames)], axis=0))
    else:
        initW.append(np.concatenate([np.random.rand(numBins, numTemplateFrames), 
                                     informedWeight * pitchedW[k-numCompDrum]/pitchedW[k-numCompDrum].max()], axis=0))

    # normalize to unit sum
    initW[k] /= initW[k].sum()
In [12]:
# set NMFD parameters
paramNMFD = dict()
paramNMFD['numComp'] = numComp
paramNMFD['numFrames'] = numFrames
paramNMFD['numIter'] = numIterNMF
paramNMFD['numTemplateFrames'] = numTemplateFrames
paramNMFD['numBins'] = numDoubleBins
paramNMFD['initW'] = initW
paramNMFD['initH'] = initH

# set soft constraint parameters
paramConstr = dict()
paramConstr['funcPointerPreProcess'] = drumSpecificSoftConstraintsNMF
paramConstr['Kern'] = Kern
paramConstr['KernOrd'] = KernOrd
paramConstr['decay'] = 0.75
paramConstr['numBinsDrum'] = numBins

# NMFD core method
nmfdW, nmfdH, _, _, tensorW = NMFD(V, paramNMFD, paramConstr)

# set percussiveness to ground truth information
percWeight = np.concatenate([np.ones((1, len(drumsW))), np.zeros((1, len(pitchedW)))], axis=1)

# compute separate models for percussive and harmonic part
Vp = convModel(tensorW, np.diag(percWeight[0]) @ nmfdH)
Vh = convModel(tensorW, np.diag(1-percWeight[0]) @ nmfdH)
               
# accumulate back to original spectrum, reverting the stacking
# this step is described in the last paragraph of sec. 2.4 in [2]
Ap = AccuMat @ Vp
Ah = AccuMat @ Vh

# alpha-Wiener filtering
nmfdA, _ = alphaWienerFilter(A, [Ap, Ah], 1.0)

In [13]:
# visualize results
# create reduced version of templates for visualization
nmfdW_vis = list()
for nmfdW_curr in nmfdW:
    nmfdW_curr = AccuMat @ nmfdW_curr
    nmfdW_vis.append(nmfdW_curr)

fh2, _ = visualizeComponentsNMF(A, nmfdW_vis, nmfdH, nmfdA, paramVis);

# save result
fh2.savefig(os.path.join(outPath, 'demoDrumExtractionKAM_NMF_scoreInformed_NMF.png'))
In [14]:
audios = []

# resynthesize results of NMF with soft constraints and score information
for k in range(2):
    Y = nmfdA[k] * np.exp(1j * P);
    y, _ = inverseSTFT(Y, paramSTFT)

    # save result
    out_filepath = os.path.join(outPath,
                                'demoDrumExtractionKAM_NMF_scoreInformed_NMF_component_{}_extracted_from_{}'.format(k, filename))
    
    wav.write(filename=out_filepath, rate=fs, data=y)
    audios.append(y)

Input audio mixture

In [15]:
ipd.Audio(x, rate=fs)
Out[15]:

Percussive component based on KAM + NMF + Score-Informed Initialization

In [16]:
ipd.Audio(audios[0].T, rate=fs)
Out[16]:

Harmonic component based on KAM + NMF + Score-Informed Initialization

In [17]:
ipd.Audio(audios[1].T, rate=fs)
Out[17]: