   # Fast Convolution and Cross-Correlation

Lab Course
International Audio Laboratories Erlangen
Prof. Dr. Frank Wefers, Prof. Dr. Nils Peters
Winter Term 2020

Fast convolution and cross-correlation, © September 2018 - October 2020

## Abstract¶

This experiment gives an overview on finite impulse response (FIR) filters and their efficient realization with fast convolution techniques. You will examine, program and test implementations of such filters, in the time and frequency domain. You will learn about the relationship between convolution and correlation and estimate the time delays between two signals.

## 1. Introduction¶

Digital filters are used to modify, enhance, alter audio signals in countless applications. Filters can be categorized into different classes based on their properties and structure. A filter maps an input signal $x(n)$ to an output signal $y(n)$. Those filters with a linear input-to-output relationship between $a\cdot x(n) \rightarrow a\cdot y(n)$ are called linear filters. Adding the constraint of time-invariance $x(n-k) \rightarrow y(n-k)$ leads to linear time-invariant filters. This particular class suits a wide range of applications, including equalization (EQ), 3D audio rendering, spatial sound playback and reverberation effects.

A linear time-invariant filter consists of three building blocks only: additions, multiplications and delays. Depending on the arrangement of these units, the filter has either feed-forward structure or it contains feedback loops. Feedback loops usually result in an infinitely long decaying impulse response, that never reaches exactly zero. Hence this sub class is called infinite impulse response filters (IIR filters). When the filter is free of feedback-loops and feed-forward only, the finite number of internal building blocks will result in a finite-length impulse response. Such a filter is called finite impulse response filter (FIR filter). In this exercise we consider FIR filters only. FIR filters can have different internal structures (block diagrams). However, as their impulse response is finite, one can always arrange them in a unique way, called the direct form. It is shown below. This block diagram marks the starting point of our considerations. ## 2. FIR filters and discrete convolution¶

A direct form FIR filter works as follows: The input signal $x(n)$ is delayed sample by sample. All intermediate signals $x(n), x(n\!-\!1), x(n\!-\!2), \dotsc$ are weighted by corresponding filter coefficients $h_0, h_1, h_2, \dotsc$, and summed up to form the output, given by

\begin{equation} y(n) = \sum\limits_{k=0}^{N-1} x(n-k)\cdot h(k) = x(n) \ast h(n) \end{equation}

This sum formula above is known as discrete convolution. It can be regarded as a mathematical operator $\ast$, that takes two input sequences and produces one output sequence. Looking at the block diagram it becomes clear, that the filter coefficients fully determines the FIR filter ("fingerprint"). The impulse response $h(n)$ is its "fingerprint". Convolution allows us to compute the output $y(n)$ for any input $x(n)$, when $h(n)$ is known. The block diagram and formula describe essentially the same thing. FIR filtering means computing discrete convolutions and vice versa. Hence, a fast convolution allows us to realize an FIR filter efficiently. What this exactly means, you will find out in the next minutes.

Homework Exercise 1
Two sequences of length $M$ and $N$ are convolved.
What is the length of the result?
Homework Exercise 2
Convolve the sequences $x(n)=[1, 1, 1, 1, 1]$ and $h(n)=[1, 1, 1]$.
What is result $y(n)=x(n)\ast h(n)$?

#### 2.1 Implementation and testing¶

Let us start programming. First, install some Python packages needed for this exercise.

In [ ]:
!pip install soundfile librosa==0.6


Then, import them using the following cell.

In [ ]:
import matplotlib.pyplot as plt
import numpy as np
import soundfile as sf

from IPython.display import Audio
from glob import glob
from time import process_time


Later in this exercise, the following helper functions will make life easier for you. Take a look at the functions and execute the cell to define them.

In [ ]:
def zeropad(x, N):
"""Pads a signal with zeros on the right to reach the given length"""
M = len(x)
assert M<=N, "Signal is longer than destination length"
xp = np.zeros((N,))
xp[0:M] = x
return xp

def nextPowerOf2(n):
"""Returns the next larger power of two greater or equal n"""
return 1 if n == 0 else 2**(n - 1).bit_length()


Now let us implement an FIR filter and verify that it works correctly. Afterwards we test it with different input signals and impulse responses and measure its computation times.

Lab Experiment 1
Implement the FIR filter in the block diagram shown above in the predefined function below.
Debug and test it with the two sequences you convolved by hand, until you verified that it produces correct results

Hints:
* Helful NumPy functions: roll and dot
* Avoid dynamic memory allocation e.g., by using append within a loop function
In [ ]:
def convolution(x, h):
"""Compute the discrete convolution of two sequences"""
M = len(x)
N = len(h)

y =

return y


For testing purposes you find audio files in the files directory. Let see what we have there...

In [ ]:
display(glob("files/*.wav"))

Lab Experiment 2
Test your filter with some of the provided audio files.
1. Listen to the input signal (speech.wav or music.wav), filter impulse response (lowpass.wav or room.wav) and resulting output signal.
2. Print the length of the signals.
3. Which relationship between the lengths and runtimes do you observe?
Note: convolving the input signal with the room.wav filter response may take some time.
In [ ]:
x, Fs = sf.read('files/speech.wav')
display( Audio(x, rate=Fs) )



#### 2.2 Computational performance¶

How efficient can we filter the input signal $x(n)$ and compute its output $y(n)$? In other words, how many operations, processor instructions or clock cycles does the filtering consume? For an application that is important to know. The computational effort of an algorithm can be assessed in two different ways:

1. We can analyze the algorithm (block diagram) and count the number of relevant operations. This usually includes arithmetic operations (i.e. additions and multiplications) only. The result is expressed as a function of the input lengths. This theoretic complexity allows us to compare different algorithms on paper ("is algorithm A superior to algorithm B?").

2. We can measure execution times of an implementation of the algorithm on an actual machine. This captures all practical effects, such as the memory access, cache utilization, etc. The result are runtimes or equivalent clock cycles for specific input lengths.

Homework Exercise 3
Examine the block diagram above.

1. How many operations does it take to filter an input signal $x(n)$ of length $M$ with the impulse response $h(n)$ of length $N$?
2. Consider the case that both operands are of length $N$: Come up with a simple upper bound on the number of operations ($\mathcal{O}$-notation).
Lab Experiment 3
1. Measure the actual runtime of $N\!\times\! N$ discrete convolutions for different length $N$.
2. Plot the result in a 2D diagram (runtime $T$ [ms] over length $N$)

Hint: Measure times using the function process_time, which is already included.
Note: For more comparable values, you can also measure the NumPy function convolve
In [ ]:
sizes = range(1000, 11000, 1000)
runtimes_convolution = []
cycles = 20

for N in sizes:

runtimes_convolution.append( ... )

plt.plot(sizes, runtimes_convolution, '.-')
plt.xlabel('Size N')
plt.ylabel('Runtime')
plt.show()


## 3. Fast convolution¶

FIR filtering in the time domain becomes computationally complex when operands get longer. Implementing an FIR filter as shown in block diagram, results at a number of operations that is quadratic over the operand lengths. For real-time applications this vast amount of operations might be even too many to be handled. This computational burden is overcome by fast convolution techniques.

A fast convolution is considered an algorithm that computes an $N\!\times\!N$ discrete convolution in asymptotically less than $\mathcal{O}(N^2)$ steps. Several different techniques are known to achieve this computational efficiency. The by far most widely used method is fast convolution with the Fast Fourier Transform (FFT), which we will regard here. This technique is known since the 1960s and became popular as FFT algorithms became available. Still, more than half a decade later, it remains the most often used building block for all kinds of signal processing and filtering techniques. Let us examine how it works.

#### 3.1 Discrete Fourier Transform and Circular Convolution Property¶

The Discrete Fourier Transform (DFT) is a well-known transform, used in many fields of science and technology. It is defined as follows:

\begin{equation} x(n) \overset{\mathcal{DFT}}{\rightarrow} X(k) = \sum\limits_{n=0}^{N-1} x(n)e^{-2\pi i\frac{n k}{N}} \hspace{2em}\text{and}\hspace{2em} X(k) \overset{\mathcal{IDFT}}{\rightarrow} x(n) = \frac{1}{N} \sum\limits_{k=0}^{N-1} X(k) e^{2\pi i\frac{n k}{N}} \end{equation}

A length-$K$ DFT maps $K$ time domain samples $x(n)$ to $K$ spectral coefficients $X(k)$ in the frequency domain.

A inherent concept of the DFT is its periodicity, that occurs in time and frequency domain. A length-$K$ DFT always assumes that input and output are periodic every $K$ elements. Discrete convolution, needed for implementing FIR filters, is aperiodic and somewhat incompatible with the DFT. This is overcome by introducing a periodic convolution operator. The concept is illustrated below Simply assume, that the input sequence $x(n)$ infinitely repeats to both sides (positive and negative $n$) every $K$ samples. This sequence is denoted by $\tilde{x}(n)$. Then, circular convolution is defined the same way as discrete convolution

\begin{equation} y(n) = \sum\limits_{k=0}^{N-1} \tilde{x}(n-k)\cdot h(k) = x(n) \circledast h(n) \end{equation}

The Circular Convolution Property (CCP) states, that circular convolution $\circledast$ in the time domain equals the pairwise product $\times$ of spectral coefficients in the frequency domain \begin{equation} x(n) \circledast h(n) = \mathcal{IDFT}\{ \mathcal{DFT} \{ x(n)\} \times \mathcal{DFT}\{ h(n)\} \} \end{equation}

This marks a huge computational improvement, as a $K\!\times\!K$ circular convolution in the time domain requires $\mathcal{O}(K^2)$ steps to compute. In the frequency domain it becomes just $K$ complex-valued multiplications, which can be computed in $\mathcal{O}(K)$ steps. Digging deeper in the mathematical background one will find, that this reduction results from a matrix diagonalization. When circular convolution is written as a matrix-vector product, the DFT marks an eigenvector base that diagonalizes this convolution matrix. What remains is a simple product of a diagonal matrix with a vector. This is where the impressive reduction originates from.

#### 3.2 Fast Fourier Transforms¶

Computing a length-$K$ DFT/IDFT by the formulae above also requires $\mathcal{O}(K^2)$. Consequently, there would be no benefit from computing convolutions in the frequency domain. This changes, when we use Fast Fourier Transforms (FFT), that compute a length-$K$ DFT/IDFT by just $\mathcal{O}(K\log K)$ operations.

How is this possible? Details on FFT algorithms are way beyond the scope of this experiment. Therefore only a few important remarks are given here. More information can be found in referenced literature. In essence, an FFT algorithm hierarchically decomposes long DFTs into shorter DFTs (divide-and-conquer strategy). This tree-like decomposition removes redundant computations, that occur when using the definition formulae. Short DFTs of lengths 8, 16, 32, etc. are realized with highly-optimized machine code (assembly language) called codelets. As many different decompositions as well as codelets exist, a planner determines the best (most efficient) computation strategy. This strategy and the resulting execution time highly depends on the length $K$ of the transform.

Fast Fourier Transforms can be computed of arbitrary lengths $K$, including prime numbers $K$. However, they execute most efficiently for lengths $K$, where the number $K$ is composite. This means, that the prime factorization of $K$ contains few prime factors with comparably large exponents (ideally pure powers-of-two or small integer multiples of these). Prime-length DFTs require computationally complex reformulations into non-prime-length transforms, making them comparably inefficient. For this reason, the right choice of transform size determines the performance of the algorithm. In most cases, using the next larger of power two will perform faster than using the exact length required for the task.

#### 3.3 Realizing discrete convolutions by circular convolutions¶

For realizing FIR filters we need to implement the discrete convolution, not circular convolution. Is there a way to achieve discrete convolution by circular convolution?

Recall, that a length-$K$ circular convolution maps $K, K$ input to $K$ output values. In other words, the circular convolution requires both inputs to have the same length $K$. Given two sequences $x(n), h(n)$ of length $M, N$ we first have to select a length $K\ge\max(M,N)$. Then we append zeros to the inputs, so that both sequences match the length $K$. This is referred to as zero padding.

Secondly, we need to make sure, that the result of the desired $M\times N$ discrete convolution fits into the selected length $K$. Otherwise, the overhanging results would be mirrored back at the beginning of the period and corrupt the output. This is called time-aliasing. How zero padding solves this problem can be seen in the figure above. Making the period long enough to store all convolution values effectively turns the circular convolution into the the aperiodic discrete convolution, that we desire. For the sake of an efficient FFT, we can also select length $K$ to be longer than the minimum.

Homework Exercise 4
Consider fast convolution with inputs of length $M$ and $N$.
Which is the smallest possible transform length $K$ you can use?
The complete FFT fast convolution algorithm is shown below. It works as follows: - The input sequences are padded to length $K$ and then transformed using length-$K$ FFTs - The $K$ spectral coefficients are multiplied (complex-valued) - The resulting spectrum is transformed back into the time domain. Unwanted excess samples are trimmed off.
Lab Experiment 4
Implement an FFT-based fast convolution in the predefined function below.
As before, debug and test it with the two sequences you convolved by hand.
Observe what happens, when you override the transform length $K$ and use smaller/longer values.

Hints: Use the NumPy functions fft.fft and fft.ifft to compute FFTs and IFFTs. After the inverse transform, extract the real parts only using the function real. This makes the results much more readable.
In [ ]:
def fftconvolution(x, h, K=None):
"""Fast convolution of two sequences using the FFT"""
M = len(x)
N = len(h)

# No predefined transform length? Choose a suitable one!
if K is None: K = nextPowerOf2(M+N-1)

return np.real( ... )


#### 3.4 Computational performance¶

Lab Experiment 5
Now for the fast algorithm, measure the runtime of $N\!\times\! N$-convolutions as done before.
Plot the new results together with the previous results in a 2D diagram (runtime $T$ over length $N$)

How much are the savings by the fast processing?
What happens, when you use the minimum transform length?
In [ ]:
sizes = range(1000, 11000, 1000)
runtimes_fftconvolution = []
cycles = 20

for N in sizes:

runtimes_fftconvolution.append( ... )

plt.plot(sizes, runtimes_convolution, '.-', label='Discrete convolution')
plt.plot(sizes, runtimes_fftconvolution, '.-', label='Fast convolution')
plt.xlabel('Size N')
plt.ylabel('Runtime')
plt.legend()
plt.show()


## 4. Partitioned convolution¶

Performing a Fast Fourier Transform requires all input samples to be available at the time the transform is executed. The same holds for the inverse transform. Thus, the fast convolution technique explained above requires all input data to be available in advance. This makes it inapplicable for a real-time processing.

Real-time audio processing is done block-by-block. Here, the signals are recorded, processed and played back in blocks of a fixed number of samples. The input signal $x(n)$ and output signal $y(n)$ are continuous and of indefinite length. In essence they are partitioned into blocks. Convolution techniques that work in these split operands are called partitioned convolutions.

The input signal $x(n)$ consists of a continuous stream of blocks $x^{(0)}(n), x^{(1)}(n), \dotsc$, each of length $B$. Superscripts indicate the index of the blocks. The block length $B$ (also known as audio buffer size) allows to control the audio latency. Usual block lengths for live effects and Virtual Reality systems are in range of 32, 64 or 128 samples. Less time-critical applications, such as media players or computer games, use lengths of 256 to 2048 samples. As the filtering is now realized on this continuous stream of sample blocks, it is called running convolution. Note, that both, the input and output, are processed in length-$B$ blocks. This means, that the output signal $y(n)$ is also represented by a stream of blocks $y^{(0)}(n), y^{(1)}(n), \dotsc$, each of length $B$. Convolution result exceeding $B$ samples must be returned in later output blocks.

How we can we modify or adapt the fast convolution technique to apply for a real-time filtering as well? This becomes possible by two concepts, known as Overlap-Add and Overlap-Save. Following, we will examine the Overlap-Add technique.

One possibility is to filter each length-$B$ block with the length-$N$ impulse response. This will produce a convolution result longer than $B$. The overlapping samples are added up in a buffer and returned in subsequent output blocks. Hence the name, Overlap-Add (OLA). The algorithm is shown below Lab Experiment 6
Implement the OLA convolution algorithm in the predefined function below.
As before, debug and test it with the two sequences you convolved by hand, until you verified that it produces correct results.
In [ ]:
def OLA_convolution(x, h, B, K=None, verbose=False):
"""Overlap-Add convolution of x and h with block length B"""

M = len(x)
N = len(h)

# Calculate the number of input blocks
num_input_blocks = ...

# Pad x to an integer multiple of B

# Convolve all blocks
for n in range(num_input_blocks):
# Extract the n-th input block
xb = ...
if verbose: print("xb(%d) = " % n, xb)

# Fast convolution
u = ...
if verbose: print("u(%d) = " % n, u)

# Overlap-Add the partial convolution result
y[...] += u[...]

return y


#### 4.2 Overlap-Save¶

The other possibility for realizing a running convolution is known as Overlap-Save (OLS) scheme. Here, the idea is to directly produce a ready-made output block of length-$B$ and avoid overlapping samples (additions). Recall, that any length-$B$ output block contains "decay" from previous input blocks. In the OLS scheme we will setup the input to the convolution in a way, that the output already contains The response of the latest block plus the decay of previous blocks. The algorithm is shown below. The length-$K$ FFT of the input is fed with the $K$ latest input samples of $x(n)$. In other words, when a new block $x^{(i)}(n)$ arrives, we shift the input buffer to the left by $B$ samples and place the new samples rightmost.

This is known as a sliding window of the input. The filter impulse response is processed in the same way as for the OLA scheme.

After what we learned before, it becomes clear that we now perform a $K\times N$ convolution with a transform length of $K$. Hence time-aliasing occurs and excess samples will result in time-aliasing as explained above. However, this is acceptable, as the right-most $B$ samples are alias-free and valid output values. We can use/output them directly, without any additions, as for the Overlap-Add scheme.

Lab Experiment 7
Implement the OLS convolution algorithm in the predefined function below.
As before, debug and test it with the two sequences you convolved by hand, until you verified that it produces correct results

Hints: The NumPy functions roll is helpful to realize the sliding window
In [ ]:
def OLS_convolution(x, h, B, K=None, verbose=False):
"""Overlap-Save convolution of x and h with block length B"""

M = len(x)
N = len(h)

# Calculate the number of input blocks
num_input_blocks = ...

# Pad x to an integer multiple of B

# Input buffer
xw = np.zeros((K,))

# Convolve all blocks
for n in range(num_input_blocks):
# Extract the n-th input block
xb = xp[n*B:n*B+B]
if verbose: print("xb(%d) = " % n, xb)

# Sliding window of the input
if verbose: print("xw(%d) = " % n, xw)

# Fast convolution
u = ...
if verbose: print("u(%d) = " % n, u)

# Save the valid output samples
y[...] = u[...]

return y


#### 4.3 Computational performance¶

Finally, at the end of this exercise, let us measure the performance of partitioned convolutions and compare them to the other techniques. Therefore, you can use a practical example.

Lab Experiment 8
Filter the audio file "files/music.wav" with the lowpass filter impulse response "files/lowpass.wav".
Measure the runtime of all algorithms you have implemented so far.
For the partitioned convolution techniques assume two block lengths:
128 samples for a low-latency and 1024 samples for a moderate latency.

What do you observe?

Important: Again, perform every measurement three times and compute the average value to obtain representative numbers.
In [ ]:
# Your turn ..


## 5. Cross-Correlation¶

Another very popular operation which is closely related to convolution is the correlation operation. The correlation operation can be used e.g., to determine the similarity between two audio signals (cross-correlation) or within a single audio signal (autocorrelation). For instance, given the captured audio signal y, we can use the cross-correlation to find out if the audio signal x is present within y. This is done by computing the correlation coefficient between the captured signal y and signal x. In other applications, the correlation operation is applied to estimate the relative time-of-arrival difference of an audio signal at two or more microphones. This is done by examining the shape of the correlation function as described further below. Put simply, the correlation operation is another important building block for many audio processing applications.

Convolution and cross-correlation are related by: \begin{equation} x_1(n) \star x_2(n) = x_1(n) \circledast x_2(-n) \end{equation}

where $\star$ represents the correlation operation and $\circledast$ represents the convolution operation. Because correlation is so important and so closely related to the convolution operation, this section will teach you how to implement and apply the correlation operator.

#### 5.1 Cross-Correlation in the Time Domain¶

As seen in the equation above, to compute the correlation between two signals, you could simply use the convolution function as implemented in Section 2.1 with time-reversing signal x2:

In [ ]:
x1 = np.array([0, 0, 1, 0])
x2 = np.array([1, 0, 0, 0])
y = convolution(x1, x2[::-1])
display(y)


The result of a cross-correlation operation is a time signal which may look a bit like the plot below.
The location of the peak of this function indicates the time delay estimate between the input signals. Homework Exercise 5
Given the sequences $x1(n)=[0, 0, 1, 0]$ and $x2(n)=[1, 0, 0, 0]$ as shown above.
1. Based on the cross-correlation result, how can you estimate the time delay in samples between these two signals?
2. What is the time delay estimate between the signals x1 and x2, resulting from $x1(n) \star x2(n)$?
3. What is the time delay estimate between the signals x1 and x2, resulting from $x2(n) \star x1(n)$?
4. What can you say about the relationship between these two results?

Now let us implement the correlation operation and verify that it works correctly.

Lab Experiment 9
Implement the correlation operation in the predefined function below.
Debug and test it with the two sequences you correlated using your convolution function, until you verified that it produces correct results.
The function shall return the estimated time delay in samples.
Note that during this lab course, a positive time delay is defined when $x_1$ arrives before $x_2$ and a negative time delay is defined when $x_2$ arrives before $x_1$.

In [ ]:
def delayEstimationTD(x1, x2):
"""Compute the time delay between two signals in the time domain"""
M = len(x1)
N = len(x2)

y =
# return the estimated sample delay between the signals
delay_samples = np.argmax(...)
return delay_samples

x1 = np.array([0, 0, 0, 1, 0])
x2 = np.array([1, 0, 0, 0])
display(delayEstimationTD(x1, x2))
display(delayEstimationTD(x2, x1))
display(delayEstimationTD(x2, x2))


#### 5.2 Cross-correlation in the Frequency Domain¶

Similar to the FFT-based fast convolution, also the correlation operation can be more efficiently computed in the frequency domain. Let's use your previously implemented function fftconvolution:

In [ ]:
x1 = np.array([0, 0, 1, 0])
x2 = np.array([1, 0, 0, 0])
K = nextPowerOf2(len(x1)+len(x2)-1)
y = fftconvolution(x1, x2[::-1], K)
display(y)


Furthermore, the following relationship holds: \begin{equation} x_1(n) \star x_2(n) = \mathcal{IDFT}\{ \mathcal{DFT} \{ x_1(n)\} \times \mathcal{DFT}\{ x_2(n)\}^* \} \end{equation}

with $(\cdot)^*$ denoting the complex conjugate.
The product $\mathcal{DFT} \{ x_1(n)\} \times \mathcal{DFT}\{ x_2(n)\}^*$ is also known as the cross-spectrum. Thus, instead of time-reversing the signal $x_2(n)$ prior the DFT, we can use the complex conjugate of $\mathcal{DFT}\{ x_2(n)\}$.

Now let us implement the correlation operation using the cross-spectrum by modifying your previous solution for fftconvolution.

Lab Experiment 10
Implement the cross-correlation operation via the cross-spectrum in the predefined function below.
Debug and test it with the two sequences you correlated using your fftconvolution function, until you verified that it produces correct results.
The function shall return the estimated time delay in samples.
In [ ]:
def delayEstimationIFFT(x1, x2, K=None):
"""Fast delay estimation of two sequences using the FFT"""
M = len(x1)
N = len(x2)

# No predefined transform length? Choose a suitable one!
if K is None: K = nextPowerOf2(M+N-1)

y = np.real(np.fft.ifft(...))
delay_samples = ...
return delay_samples

x1 = np.array([1, 0, 0, 0, 0, 0, 0])
x2 = np.array([0, 0, 0, 0, 0, 1])
y = delayEstimationIFFT(x1, x2)
y = delayEstimationIFFT(x2, x1)
y = delayEstimationIFFT(x1, x1)
display(y)


To further lower the complexity, it is also possible to estimate the time delay directly in the frequency domain without the need to convert the cross-spectrum into the time domain via IFFT.
Here, the time delay in samples $\hat{D}$ between the signals x1 and x2 is computed from the slope of the unwrapped cross-spectrum phase $G_{x1,x2}(\omega)$. The cross-spectrum has a linear phase response and the slope of the unwrapped phase response is proportional to the time delay between the two input signals.
The image below depicts the correlation function in the time domain (a) and the corresponding unwrapped phase of the cross-spectrum (b). The time delay in (b) is derived by $$\hat{D} = \frac{\phi_{kl}(f_1)}{2 \pi f_1}$$
Another advantage of this algorithm is that the resolution of the delay estimation allows for subsample resolution. So fractional time delays can be estimated.
However, one drawback of this method is the sensitivity to signal noise. To make the time delay estimation more robust against noise, further extensions to this algorithm exist. These extensions are beyond the scope of this lab exercise. Interested students can study the references below and are free to ask for more information :-) . Lab Experiment 11
Implement the time delay estimation via the slope of the cross-spectrum phase in the predefined function below.

In [ ]:
def delayEstimationPhase(x1, x2, K=None):
"""Delay estimation of two sequences using the phase of the cross-spectrum"""

# No predefined transform length? Choose a suitable one!
if K is None: K = nextPowerOf2(len(x1)+len(x2)-1)

phase_unwrapped = np.unwrap(...)
delay_samples = ...
return delay_samples

x1 = np.array([0, 0, 0, 1])
x2 = np.array([1, 0, 0, 0])
display(delayEstimationIFFT(x1, x2))
display(delayEstimationPhase(x1, x2))

# fractional delay estimation
x3 = np.array([0.5, 0.5, 0, 0])
display(delayEstimationPhase(x1, x3))


## 6. Conclusion¶

Thank you for participating in this exercise.
Feedback on this exercise is welcome to improve the exercise.
For further information on the topic the following literature is suggested:

#### References

Frank Wefers
Partitioned convolution algorithms for real-time auralization (PhD thesis)
ISBN: 978-3-8325-3943-6
Logos Verlag, Berlin, 2015
Online available here

Matteo Frigo and Steven G. Johnson
The Design and Implementation of FFTW3
Proceedings of the IEEE 93 (2), 216–231 (2005).
Online available here

Charles Knapp and Glifford Carter
The generalized correlation method for estimation of time delay.
IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4), 320-327 (1976).

Allan Piersol
Time delay estimation using phase data
IEEE Transactions on Acoustics, Speech, and Signal Processing 29 (3), 471-477 (1981).

#### Acknowledgment

The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universitaet Erlangen-Nuernberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS.