ICASSP 2019 Tutorial: Cross-Modal Music Retrieval and Applications

General Information

Presenters

Meinard Müller, International Audio Laboratories Erlangen, Erlangen, Germany
Andreas Arzt, Institute of Computational Perception, Johannes Kepler University, Linz, Austria
Stefan Balke, Institute of Computational Perception, Johannes Kepler University, Linz, Austria

Description

Background and Scientific Context

There has been a rapid growth of digitally available music data including audio recordings, images of scanned sheet music, album covers, and an increasing number of video clips. In this tutorial, we cover general signal processing and machine learning concepts designed to bridge the gap between these different music representations. In particular, we discuss traditional approaches based on musically motivated features, generalized audio fingerprinting, as well as recent data embedding techniques based on deep learning. These technologies form the building blocks for many exciting music navigation and browsing applications including the classical problem of automated score following.

Aims of the Tutorial

Music not only connects people but also relates to many different research disciplines. Adopting an interdisciplinary perspective, our aim is to show that music is an attractive, rich, and challenging problem domain that has many things to offer to the signal processing community. Considering cross-modal music retrieval tasks, we demonstrate that these scenarios are well suited to discuss signal processing and machine learning techniques (comparing traditional feature extraction and deep learning approaches). Furthermore, we want to give some examples of fascinating music retrieval applications of academic, educational, and commercial relevance.

Target Audience and Assumed Knowledge

The main goal of this tutorial is to give an exciting and easy-to-understand introduction to music processing appealing to a wide audience in academia and industry. By providing many illustrative audio examples and by working with pictures (rather than with formulas), we will make an effort to convey the main ideas, in particular to non-experts and to researchers who are new to the field of music processing. By doing so, the tutorial should appeal to a wide and interdisciplinary audience working in different fields including signal processing, information retrieval, and machine learning.

Contents Outline

It is our primary goal to give an exciting tutorial that ranges from basic to advanced techniques used in music processing. Our tutorial consists of three one-hour sessions, where we make sure that in each session is enough room for questions and interaction with the audience. To have some variety throughout the tutorial, we will discuss theoretical as well as practical aspects in each of the three sessions. Recent techniques based on deep learning will play a role throughout the tutorial, even though they are mentioned explicitly only in the third session of the following overview.

1. Session. General introduction and overview; music representations; sequence-based retrieval and mid-level music representations; shingle-based retrieval; recent approaches and discussion.
2. Session. Introduction to score following; music transcription including approaches based on machine learning; symbolic fingerprinting; indexing and efficiency issues; applications/demos.
3. Session. General introduction to deep learning approaches in music processing; cross-modal music embedding; data augmentation; recent trends using input attention; final round of questions and discussion.

Material

The tutorial consists of a short introduction and three session. The following links provide the PDFs of the slides and the handouts.

Overview (Meinard Müller, Andreas Arzt, Stefan Balke)
Slides (PDF), Handouts (6 slides per page) (PDF)
Part I: Classical Approaches (Meinard Müller)
Slides (PDF), Handouts (6 slides per page) (PDF)
Part II. Fingerprinting Approaches (Andreas Arzt)
Slides (PDF), Handouts (6 slides per page) (PDF)
Part III. Machine Learning Approaches (Stefan Balke)
Slides (PDF), Handouts (6 slides per page) (PDF)

Literature

Main References

Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, Gerhard Widmer
Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
IEEE Signal Processing Magazine, 36(1), 2019, pp. 52-62. Details

@article{MuellerABDW19_MusicRetrieval_IEEE-SPM,
author    = {Meinard Müller and Andreas Arzt and Stefan Balke and Matthias Dorfer and Gerhard Widmer},
title     = {Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies},
journal   = {{IEEE} Signal Processing Magazine},
volume    = {36},
number    = {1},
pages     = {52--62},
year      = {2019},
url       = {https://doi.org/10.1109/MSP.2018.2868887},
doi       = {10.1109/MSP.2018.2868887},
url-pdf   = {https://ieeexplore.ieee.org/document/8588416/}
}

Meinard Müller
Fundamentals of Music Processing — Audio, Analysis, Algorithms, Applications
Springer Verlag, ISBN: 978-3-319-21944-8, 2015. Details

@book{Mueller15_FundamentalsMusicProcessig_SPRINGER,
author    = {Meinard M\"{u}ller},
title     = {Fundamentals of Music Processing -- Audio, Analysis, Algorithms, Applications},
type      = {Monograph},
year      = {2015},
isbn      = {978-3-319-21944-8},
publisher = {Springer Verlag},
url-details={http://www.music-processing.de}
}

Further References

Andreas Arzt, Gerhard Widmer, and Reinhard Sonnleitner
Tempo- and Transposition-invariant Identification of Piece and Score Position
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 549—554, 2014.

@inproceedings{ArztWS14_TempoTranspInvariantIdent_ISMIR,
 address = {Taipei, Taiwan},
 author = {Andreas Arzt and Gerhard Widmer and Reinhard Sonnleitner},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 pages = {549--554},
 title = {Tempo- and Transposition-invariant Identification of Piece and Score Position},
 year = {2014}
}

Andreas Arzt, Harald Frostel, Thassilo Gadermaier, Martin Gasser, Maarten Grachten, and Gerhard Widmer
Artificial Intelligence in the Concertgebouw
In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI): 2424—2430, 2015.

@inproceedings{arzt:ijcai:2015,
 address = {Buenos Aires, Argentina},
 author = {Andreas Arzt and Harald Frostel and Thassilo Gadermaier and Martin Gasser and Maarten Grachten and Gerhard Widmer},
 booktitle = {Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)},
 pages = {2424--2430},
 title = {Artificial Intelligence in the Concertgebouw},
 year = {2015}
}

Andreas Arzt, Sebastian Böck, Sebastian Flossmann, Harald Frostel, Martin Gasser, Cynthia Liem, and Gerhard Widmer
The Piano Music Companion
In Proceedings of the European Conference on Artificial Intelligence: 1221—1222, 2014.

@inproceedings{ArztBFFGLW14_MusicCompanion_ECAI,
 acmid = {3006927},
 address = {Prague, Czech Republic},
 author = {Andreas Arzt and Sebastian Böck and Sebastian Flossmann and Harald Frostel and Martin Gasser and Cynthia C. S. Liem and Gerhard Widmer},
 booktitle = {Proceedings of the European Conference on Artificial Intelligence},
 doi = {10.3233/978-1-61499-419-0-1221},
 pages = {1221--1222},
 title = {The Piano Music Companion},
 year = {2014}
}

Andreas Arzt, Sebastian Böck, and Gerhard Widmer
Fast Identification of Piece and Score Position via Symbolic Fingerprinting
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 433—438, 2012.

@inproceedings{ArztBW12_SymbolicFingerprint_ISMIR,
 address = {Porto, Portugal},
 author = {Andreas Arzt and Sebastian Böck and Gerhard Widmer},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 pages = {433--438},
 title = {Fast Identification of Piece and Score Position via Symbolic Fingerprinting},
 year = {2012}
}

Andreas Arzt, Gerhard Widmer, and Simon Dixon
Automatic Page Turning for Musicians via Real-Time Machine Listening
In Proceedings of the European Conference on Artificial Intelligence (ECAI): 241—245, 2008.

@inproceedings{arzt:ecai:2008,
 address = {Patras, Greece},
 author = {Andreas Arzt and Gerhard Widmer and Simon Dixon},
 booktitle = {Proceedings of the European Conference on Artificial Intelligence (ECAI)},
 date-modified = {2016-11-14 13:21:41 +0000},
 pages = {241--245},
 title = {Automatic Page Turning for Musicians via Real-Time Machine Listening},
 year = {2008}
}

Stefan Balke, Vlora Arifi-Müller, Lukas Lamprecht, and Meinard Müller
Retrieving Audio Recordings Using Musical Themes
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 281—285, 2016.

@inproceedings{BalkeALM16_BarlowRetrieval_ICASSP,
 address = {Shanghai, China},
 author = {Stefan Balke and Vlora Arifi-Müller and Lukas Lamprecht and Meinard Müller},
 booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
 month = {March},
 pages = {281--285},
 title = {Retrieving Audio Recordings Using Musical Themes},
 year = {2016}
}

Stefan Balke, Christian Dittmar, Jakob Abeßer, and Meinard Müller
Data-Driven Solo Voice Enhancement for Jazz Music Retrieval
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 196—200, 2017.

@inproceedings{BalkeDAM17_SoloVoiceEnhancement_ICASSP,
 author = {Stefan Balke and Christian Dittmar and Jakob Abeßer and Meinard Müller},
 booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
 location = {New Orleans, USA},
 pages = {196--200},
 title = {Data-Driven Solo Voice Enhancement for Jazz Music Retrieval},
 url-demo = {https://www.audiolabs-erlangen.de/resources/MIR/2017-ICASSP-SoloVoiceEnhancement},
 year = {2017}
}

Stefan Balke, Sanu Achankunju, and Meinard Müller
Matching Musical Themes based on Noisy OCR and OMR Input
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 703—707, 2015.

@inproceedings{BalkePM15_MatchingMusicalThemes_ICASSP,
 address = {Brisbane, Australia},
 author = {Stefan Balke and Sanu Pulimootil Achankunju and Meinard Müller},
 booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
 pages = {703--707},
 title = {Matching Musical Themes based on Noisy OCR and OMR Input},
 year = {2015}
}

Harold Barlow and Sam Morgenstern
A Dictionary of Musical Themes
Crown Publishers, Inc., 1975.

@book{BarlowM75_MusicalThemes_BOOK,
 author = {Harold Barlow and Sam Morgenstern},
 edition = {Revised edition Third Printing},
 publisher = {Crown Publishers, Inc.},
 title = {A Dictionary of Musical Themes},
 year = {1975}
}

Sebastian Böck and Markus Schedl
Polyphonic piano note transcription with recurrent neural networks
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 121—124, 2012.

@inproceedings{BoeckS12_TranscriptionRecurrentNetwork_ICASSP,
 address = {Kyoto, Japan},
 author = {Sebastian Böck and Markus Schedl},
 booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
 month = {March},
 pages = {121--124},
 title = {Polyphonic piano note transcription with recurrent neural networks},
 year = {2012}
}

Rachel Juan J. Bosch and Emilia Gómez
A Comparison of Melody Extraction Methods Based on Source-Filter Modelling
In Proceedings of the International Conference on Music Information Retrieval (ISMIR): 571—577, 2016.

@inproceedings{BoschBSG16_MelodyExtraction_ISMIR,
 address = {New York City, USA},
 author = {Juan J. Bosch, Rachel M. Bittner, Justin Salamon and Emilia Gómez},
 booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)},
 pages = {571--577},
 title = {A Comparison of Melody Extraction Methods Based on Source-Filter Modelling},
 year = {2016}
}

Emmanouil Benetos, Simon Dixon, Zhiyao Duan, and Sebastian Ewert
Automatic Music Transcription: An Overview
In IEEE Signal Processing Magazine, 36(1): 20—30, 2019.

@article{DBLP:journals/spm/BenetosDDE19,
 author = {Emmanouil Benetos and Simon Dixon and Zhiyao Duan and Sebastian Ewert},
 bibsource = {dblp computer science bibliography, https://dblp.org},
 biburl = {https://dblp.org/rec/bib/journals/spm/BenetosDDE19},
 doi = {10.1109/MSP.2018.2869928},
 journal = {IEEE Signal Processing Magazine},
 number = {1},
 pages = {20--30},
 timestamp = {Fri, 18 Jan 2019 23:22:47 +0100},
 title = {Automatic Music Transcription: An Overview},
 url = {https://doi.org/10.1109/MSP.2018.2869928},
 volume = {36},
 year = {2019}
}

Emmanouil Benetos, Simon Dixon, Dimitrios Giannoulis, Holger Kirchhoff, and Anssi Klapuri
Automatic music transcription: challenges and future directions
In Journal of Intelligent Information Systems, 41(3): 407—434, 2013.

@article{BenetosDGKK13_MusicTranscription_JIIS,
 author = {Emmanouil Benetos and Simon Dixon and Dimitrios Giannoulis and Holger Kirchhoff and Anssi Klapuri},
 doi = {10.1007/s10844-013-0258-3},
 journal = {Journal of Intelligent Information Systems},
 number = {3},
 pages = {407--434},
 title = {Automatic music transcription: challenges and future directions},
 url = {http://dx.doi.org/10.1007/s10844-013-0258-3},
 volume = {41},
 year = {2013}
}

Donald Byrd and Jakob Simonsen
Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images
In Journal of New Music Research, 44(3): 169—195, Routledge, 2015.

@article{Byrd2015_OMR_JNMR,
 author = {Donald Byrd and Jakob G. Simonsen},
 doi = {10.1080/09298215.2015.1045424},
 issn = {0929-8215},
 journal = {Journal of New Music Research},
 keywords = {optical music recognition, empirical evaluation, notation, notation complexity},
 number = {3},
 pages = {169--195},
 publisher = {Routledge},
 title = {Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images},
 volume = {44},
 year = {2015}
}

Pedro Cano, Eloi Batlle, Ton Kalker, and Jaap Haitsma
A review of audio fingerprinting
In The Journal of VLSI Signal Processing, 41(3): 271—284, Kluwer Academic Publishers, 2005.

@article{CanoBKH05_FingerprintingReview_VLSI,
 acmid = {1107829},
 address = {Hingham, Massachusetts, USA},
 author = {Pedro Cano and Eloi Batlle and Ton Kalker and Jaap Haitsma},
 doi = {10.1007/s11265-005-4151-3},
 issn = {0922-5773},
 journal = {The Journal of VLSI Signal Processing},
 month = {November},
 number = {3},
 numpages = {14},
 pages = {271--284},
 publisher = {Kluwer Academic Publishers},
 title = {A review of audio fingerprinting},
 url = {http://dx.doi.org/10.1007/s11265-005-4151-3},
 volume = {41},
 year = {2005}
}

Michael Casey, Christophe Rhodes, and Malcolm Slaney
Analysis of Minimum Distances in High-Dimensional Musical Spaces
In IEEE Transactions on Audio, Speech, and Language Processing, 16(5): 1015—1028, 2008.

@article{CaseyRS08_MinimumDistances_IEEE-TASLP,
 author = {Michael A. Casey and Christophe Rhodes and Malcolm Slaney},
 doi = {10.1109/TASL.2008.925883},
 journal = {IEEE Transactions on Audio, Speech, and Language Processing},
 number = {5},
 pages = {1015--1028},
 title = {Analysis of Minimum Distances in High-Dimensional Musical Spaces},
 volume = {16},
 year = {2008}
}

Tian Cheng, Matthias Mauch, Emmanouil Benetos, and Simon Dixon
An attack/decay model for piano transcription
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 584—590, 2016.

@inproceedings{ChengMBD16_AttackDecayPianoTranscription_ISMIR,
 address = {New York City, USA},
 author = {Tian Cheng and Matthias Mauch and Emmanouil Benetos and Simon Dixon},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 pages = {584--590},
 title = {An attack/decay model for piano transcription},
 year = {2016}
}

Matthias Dorfer, Andreas Arzt, and Gerhard Widmer
Learning Audio-Sheet Music Correspondences for Score Identification and Offline Alignment
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 115—122, 2017.

@inproceedings{DorferAW17_ScoreIdentification_ISMIR,
 address = {Suzhou, China},
 author = {Matthias Dorfer and Andreas Arzt and Gerhard Widmer},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 pages = {115--122},
 title = {Learning Audio-Sheet Music Correspondences for Score Identification and Offline Alignment},
 year = {2017}
}

Matthias Dorfer, Jan Schlüter, Andreu Vall, Filip Korzeniowski, and Gerhard Widmer
End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss
In International Journal of Multimedia Information Retrieval, 7(2): 117—128, 2018.

@article{DorferSVKW18_CCALayer_IJMIR,
 author = {Matthias Dorfer and Jan Schlüter and Andreu Vall and Filip Korzeniowski and Gerhard Widmer},
 day = {01},
 doi = {10.1007/s13735-018-0151-5},
 issn = {2192-662X},
 journal = {International Journal of Multimedia Information Retrieval},
 month = {Jun},
 number = {2},
 pages = {117--128},
 title = {End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss},
 url = {https://doi.org/10.1007/s13735-018-0151-5},
 volume = {7},
 year = {2018}
}

Matthias Dorfer, Jan Hajič, Andreas Arzt, Harald Frostel, and Gerhard Widmer
Learning Audio--Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification
In Transactions of the International Society for Music Information Retrieval, 1(1): Ubiquity Press, 2018.

@article{DorferHAFW18_MSMD_TISMIR,
 author = {Matthias Dorfer and Jan Hajič jr.  and Andreas Arzt and Harald Frostel and Gerhard Widmer},
 journal = {Transactions of the International Society for Music Information Retrieval},
 number = {1},
 publisher = {Ubiquity Press},
 title = {Learning Audio--Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification},
 volume = {1},
 year = {2018}
}

Masataka Goto and Roger Dannenberg
Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and Listen to Music
In IEEE Signal Processing Magazine, 36(1): 74—81, 2019.

@article{DBLP:journals/spm/GotoD19,
 author = {Masataka Goto and Roger B. Dannenberg},
 bibsource = {dblp computer science bibliography, https://dblp.org},
 biburl = {https://dblp.org/rec/bib/journals/spm/GotoD19},
 doi = {10.1109/MSP.2018.2874360},
 journal = {IEEE Signal Processing Magazine},
 number = {1},
 pages = {74--81},
 timestamp = {Fri, 18 Jan 2019 23:22:47 +0100},
 title = {Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and Listen to Music},
 url = {https://doi.org/10.1109/MSP.2018.2874360},
 volume = {36},
 year = {2019}
}

Peter Grosche and Meinard Müller
Toward Characteristic Audio Shingles for Efficient Cross-Version Music Retrieval
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 473—476, 2012.

@inproceedings{GroscheM12_RetrievalShingles_ICASSP,
 address = {Kyoto, Japan},
 author = {Peter Grosche and Meinard Müller},
 booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
 pages = {473--476},
 title = {Toward Characteristic Audio Shingles for Efficient Cross-Version Music Retrieval},
 year = {2012}
}

Curtis Eck
Onsets and Frames: Dual-Objective Piano Transcription
In Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018: 50—57, 2018.

@inproceedings{hawthorne:ismir:2018,
 author = {Curtis Hawthorne and
Erich Elsen and
Jialin Song and
Adam Roberts and
Ian Simon and
Colin Raffel and
Jesse Engel and
Sageev Oore and
Douglas Eck},
 booktitle = {Proceedings of the 19th International Society for Music Information
Retrieval Conference, ISMIR 2018, Paris, France, September 23-27,
2018},
 pages = {50--57},
 title = {Onsets and Frames: Dual-Objective Piano Transcription},
 year = {2018}
}

Özgür İzmirli and Gyanendra Sharma
Bridging Printed Music and Audio Through Alignment Using a Mid-level Score Representation
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR):

@inproceedings{IzmirliS12_BridgingPrintedMusicAudio_ISMIR,
 address = {Porto, Portugal},
 author = {Özgür İzmirli and Gyanendra Sharma},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 title = {Bridging Printed Music and Audio Through Alignment Using a Mid-level Score Representation}
}

Rainer Kelz, Sebastian Böck, and Gerhard Widmer
Deep Polyphonic ADSR Piano Note Transcription
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 246-250, 2019.

@inproceedings{kelz:icassp:2019,
 address = {Brighton, United Kingdom},
 author = {Rainer Kelz and Sebastian Böck and Gerhard Widmer},
 booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
 pages = {246-250},
 title = {Deep Polyphonic ADSR Piano Note Transcription},
 year = {2019}
}

Frank Kurth, Meinard Müller, Christian Fremerey, Yoonha Chang, and Michael Clausen
Automated Synchronization of Scanned Sheet Music with Audio Recordings
In Proceedings of the International Conference on Music Information Retrieval (ISMIR): 261—266, 2007.

@inproceedings{KurthMFCC07_AutomatedSynchronization_ISMIR,
 address = {Vienna, Austria},
 author = {Frank Kurth and Meinard Müller and Christian Fremerey and Yoonha Chang and Michael Clausen},
 booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)},
 month = {September},
 pages = {261--266},
 title = {Automated Synchronization of Scanned Sheet Music with Audio Recordings},
 url-pdf = {2007_KurthMuellerFremereyClausen_AutomatedSynchronization_ISMIR.pdf},
 year = {2007}
}

Meinard Müller
Information Retrieval for Music and Motion
Springer Verlag, ISBN: 3540740473, 2007.

@book{Mueller07_InformationRetrieval_SPRINGER,
 author = {Meinard Müller},
 isbn = {3540740473},
 publisher = {Springer Verlag},
 title = {Information Retrieval for Music and Motion},
 type = {Monograph},
 year = {2007}
}

Meinard Müller
Fundamentals of Music Processing
Springer Verlag, ISBN: 978-3-319-21944-8, 2015.

@book{Mueller15_FMP_SPRINGER,
 author = {Meinard Müller},
 isbn = {978-3-319-21944-8},
 publisher = {Springer Verlag},
 title = {Fundamentals of Music Processing},
 type = {Monograph},
 year = {2015}
}

Meinard Müller, Andreas Arzt, Stefan Balke, Matthias Dorfer, and Gerhard Widmer
Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies
In IEEE Signal Processing Magazine, 36(1): 52—62, 2019.

@article{MuellerABDW19_MusicRetrieval_IEEE-SPM,
 author = {Meinard Müller and Andreas Arzt and Stefan Balke and Matthias Dorfer and Gerhard Widmer},
 doi = {10.1109/MSP.2018.2868887},
 journal = {IEEE Signal Processing Magazine},
 number = {1},
 pages = {52--62},
 title = {Cross-Modal Music Retrieval and Applications: An Overview of Key Methodologies},
 url = {https://doi.org/10.1109/MSP.2018.2868887},
 url-pdf = {https://ieeexplore.ieee.org/document/8588416/},
 volume = {36},
 year = {2019}
}

Meinard Müller, Frank Kurth, and Michael Clausen
Audio Matching via Chroma-Based Statistical Features
In Proceedings of the International Conference on Music Information Retrieval (ISMIR): 288—295, 2005.

@inproceedings{MuellerKC05_ChromaFeatures_ISMIR,
 address = {London, UK},
 author = {Meinard Müller and Frank Kurth and Michael Clausen},
 booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)},
 pages = {288--295},
 title = {Audio Matching via Chroma-Based Statistical Features},
 url-details = {https://www.audiolabs-erlangen.de/resources/MIR/chromatoolbox},
 url-pdf = {2005_MuellerKurthClausen_AudioMatching_ISMIR.pdf},
 year = {2005}
}

Justin Salamon, Joan Serrà, and Emilia Gómez
Tonal representations for music retrieval: from version identification to query-by-humming
In International Journal of Multimedia Information Retrieval, 2(1): 45—58, 2013.

@article{SalamonSG13_Retrieval_IJMRI,
 author = {Justin Salamon and Joan Serrà and Emilia Gómez},
 bibsource = {dblp computer science bibliography, http://dblp.org},
 biburl = {http://dblp.uni-trier.de/rec/bib/journals/ijmir/SalamonSG13},
 doi = {10.1007/s13735-012-0026-0},
 journal = {International Journal of Multimedia Information Retrieval},
 number = {1},
 pages = {45--58},
 timestamp = {Fri, 15 Mar 2013 10:07:17 +0100},
 title = {Tonal representations for music retrieval: from version identification to query-by-humming},
 url = {http://dx.doi.org/10.1007/s13735-012-0026-0},
 volume = {2},
 year = {2013}
}

Jürgen Schmidhuber
Deep learning in neural networks: An overview
In Neural Networks, 61: 85—117, 2015.

@article{Schmidhuber15_DeepLearningOverview_NN,
 author = {Jürgen Schmidhuber},
 doi = {10.1016/j.neunet.2014.09.003},
 journal = {Neural Networks},
 pages = {85--117},
 title = {Deep learning in neural networks: An overview},
 volume = {61},
 year = {2015}
}

Joan Serrà, Emilia Gómez, and Perfecto Herrera
Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation and Beyond
In Advances in Music Information Retrieval, 274: 307—332, Springer, 2010.

@incollection{SerraGH10_coversong_BOOKCHAP,
 address = {Berlin, Germany},
 author = {Joan Serrà and Emilia Gómez and Perfecto Herrera},
 booktitle = {Advances in Music Information Retrieval},
 chapter = {14},
 editor = {Ras, Z. W. and Wieczorkowska, A. A.},
 pages = {307--332},
 publisher = {Springer},
 series = {Studies in Computational Intelligence},
 title = {Audio Cover Song Identification and Similarity: Background, Approaches, Evaluation and Beyond},
 volume = {274},
 year = {2010}
}

Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon
An End-to-End Neural Network for Polyphonic Piano Music Transcription
In IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(5): 927—939, 2016.

@article{SigtiaBD16_DNNPolyPianoTrans_TASLP,
 author = {Siddharth Sigtia and Emmanouil Benetos and Simon Dixon},
 journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
 number = {5},
 pages = {927--939},
 title = {An End-to-End Neural Network for Polyphonic Piano Music Transcription},
 volume = {24},
 year = {2016}
}

Joren Six and Marc Leman
Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification
In Proceedings of the International Conference on Music Information Retrieval (ISMIR): 259—264, 2014.

@inproceedings{SixL14_PanakoAcousFP_ISMIR,
 address = {Taipei, Taiwan},
 author = {Joren Six and Marc Leman},
 booktitle = {Proceedings of the International Conference on Music Information Retrieval (ISMIR)},
 pages = {259--264},
 title = {Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification},
 year = {2014}
}

Reinhard Sonnleitner and Gerhard Widmer
Robust Quad-Based Audio Fingerprinting
In IEEE Transactions on Audio, Speech, and Language Processing, 24(3): 409—421, 2016.

@article{SonnleitnerW16_QuadFingerp_TASLP,
 author = {Reinhard Sonnleitner and Gerhard Widmer},
 doi = {10.1109/TASLP.2015.2509248},
 journal = {IEEE Transactions on Audio, Speech, and Language Processing},
 number = {3},
 pages = {409--421},
 title = {Robust Quad-Based Audio Fingerprinting},
 volume = {24},
 year = {2016}
}

Avery Wang
An Industrial Strength Audio Search Algorithm
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 7—13, 2003.

@inproceedings{Wang03_Shazam_ISMIR,
 address = {Baltimore, Maryland, USA},
 author = {Avery Wang},
 booktitle = {Proceedings of the International Society for Music Information Retrieval Conference (ISMIR)},
 pages = {7--13},
 title = {An Industrial Strength Audio Search Algorithm},
 year = {2003}
}

Frank Zalkow, Stefan Balke, and Meinard Müller
Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings
In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 2019.

@inproceedings{ZalkowBM19_SalienceRep_ICASSP,
 address = {Brighton, UK},
 author = {Frank Zalkow and Stefan Balke and Meinard Müller},
 booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
 month = {May},
 title = {Evaluating Salience Representations for Cross-Modal Retrieval of Western Classical Music Recordings},
 year = {2019}
}

Acknowledgments

The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS. The work by Meinard Müller and Stefan Balke was supported by the German Research Foundation (DFG MU 2686/11-1). Furthermore, the work of Andreas Arzt and Stefan Balke was supported by the European Research Council (ERC) under the European Union's Horizon 2020 Framework Programme (H2020, 2014-2020) / ERC Advanced Grant Agreement n.670035, project "Con Espressione".

Tutorial: Cross-Modal Music Retrieval and Applications

IEEE International Conference on Acoustics, Speech and Signal Processing

12 - 17 May, 2019 | Brighton, UK