Lecture: Content-Based Audio Retrieval¶
After working through the material of this lecture, you should be able to answer the following questions:
- What is meant by "content-based retrieval"? What is meant by "query-by-example retrieval"?
- What is the difference between audio identification, audio matching, and version identification? How are these tasks arranged in the specificity–granularity plane? (See Fig. 7.22.)
- What are the general requirements for an audio identification system?
- What is the main idea of the Shazam fingerprinting system? What are the fingerprints used in the system? To which extent are they suited to meet the general requirements?
- What does the term "constellation map" refer to?
- How can the matching of constellation maps be accelerated?
- What is the basic idea of the peak pairing strategy? (See Fig. 7.7.)
- What is the acceleration when using the peak pairing strategy compared to the original procedure? (See Eq. 7.15)
- What is the main idea of audio matching? What is the role of the matching function?
- What is the difference between dynamic time warping (DTW) and subsequence DTW? (See Fig. 7.23.)
- What is the main idea of version identification?
- What is the difference between the identification procedure (common subsequence matching) and subsequence DTW? (See Fig. 7.23.)