Lecture: Selected Topics in Deep Learning for Audio, Speech, and Music Processing (Summer Term 2021)

01_MusicRepr_Teaser 02_FourierTr_Teaser

  • Instructors: Prof. Dr. ir. Emanuël Habets, Prof. Dr. Meinard Müller
  • Credits: 2.5 ECTS
  • Time (Lecture): Summer Term 2021, Monday, 16:00–18:00 (1. Lecture: 19.04.2021, via ZOOM)
    Link and access information for our ZOOM meetings can be found at StudOn (see below).
  • Exam (graded): Oral examination at the end of term
  • Dates (Lecture)): Mo 19.04.2021, Mo 26.04.2021, Mo 03.05.2021, Mo 10.05.2021, Mo 17.05.2021, Mo 31.05.2021, Mo 07.06.2021, Mo 14.06.2021, Mo 21.06.2021, Mo 28.06.2021, Mo 05.07.2021, Mo 12.07.2021,
  • Examination Dates (Room 3R4.03): To be announced
Important Notes:
  • Due to the COVID-19 pandemic, the lecture Selected Topics of Deep Learning for Audio, Speech, and Music Processing will be offered as a fully virtual course (via ZOOM).
  • Participation in the ZOOM session is only possible for FAU students. The ZOOM access information for this course will be made available via StudOn. Therefore, you must register via StudOn prior to the first lecture.
  • This course will based on articles from the research literature. It is strongly adviced that students prepare for the lecture by reading these articles. The lecture time will be used for an introduction to the respective problem, the deepening of important technical aspects, and for having a question–answering dialogue with participants.
  • As a technical requirement, all participants must have access to a computer capable of running the ZOOM video conferencing software (as provided by FAU), including audio and video transmission as well as screensharing.
  • To ensure privacy, the ZOOM sessions will not be recorded. Also, participants are not permitted to record the ZOOM sessions. Furthermore, ZOOM links may not be distributed.


Many recent advances in audio, speech, and music processing have been driven by techniques based on deep learning (DL). For example, DL-based techniques have led to significant improvements in, for example, speaker separation, speech synthesis, acoustic scene analysis, audio retrieval, chord recognition, melody estimation, and beat tracking. Considering specific audio, speech, and music processing tasks, we study various DL-based approaches and their capability to extract complex features and make predictions based on hidden structures and relations. Rather than giving a comprehensive overview, we will study selected and generally applicable DL-based techniques. Furthermore, in the context of challenging application scenarios, we will critically review the potential and limitations of recent deep learning techniques. As one main general objective of the lecture, we want to discuss how you can integrate domain knowledge into neural network architectures to obtain explainable models that are less vulnerable to data biases and confounding factors.

The course consists of two overview-like lectures, where we introduce current research problems in audio, speech, and music processing. We will then continue with 6 to 8 lectures on selected audio processing topics and DL-based techniques. Being based on articles from the research literature, we will provide detailed explanations covered in mathematical depth; we may also try to attract some of the original authors to serve as guest lecturers. Finally, we round off the course by a concluding lecture covering practical aspects (e.g., hardware, software, version control, reproducibility, datasets) that are relevant when working with DL-based techniques.

Course Requirements

In this course, we require a good knowledge of deep learning techniques, machine learning, and pattern recognition as well as a strong mathematical background. Furthermore, we require a solid background in general digital signal processing and some experience with audio, image, or video processing.

It is recommended to finish the following modules (or having equivalent knowledge) before starting this module:


Lecture: Topics, Material, Instructions

The course consists of two overview-like lectures, where we introduce current research problems in audio, speech, and music processing. We will then continue with 6 to 8 lectures wich are based on articles from the research literature. The lecture material includes handouts of slides, links to the original articles, and possibly links to demonstrators and further online resources. In the following list, you find links to the material. If you have any questions regarding the lecture, please contact Prof. Dr. ir. Emanuël Habets and Prof. Dr. Meinard Müller.