Multi-Sensor Cello Recordings for Instantaneous Frequency Estimation

thumbnail

Estimating the fundamental frequency (F0) of a signal is a well studied task in audio signal processing with many applications. If the F0 varies over time, the complexity increases, and it is also more difficult to provide ground truth data for evaluation. In this project we present a dataset of cello recordings addressing the lack of reference annotations for musical instruments. Besides audio data, we include sensor recordings capturing the finger position on the fingerboard which is converted into an instantaneous frequency estimate. This is similar to speech processing, where the electroglottograph (EGG) is able to capture the excitation signal of the vocal tract, which is then used to generate a reference instantaneous F0. Inspired by this approach, we included high speed video camera recordings to extract the excitation signal originating from the moving string. The derived data can be used to analyze vibratos — a very commonly used playing style.

Domains

Audio

High-quality audio serves as the basis for this test set. The recordings took place in a professional recording studio. We used an AKG C414 condenser microphone placed at a distance of approx. 30 cm from the cello bridge. The audio was sampled by an RME MADIface A/D converter. The sample rate is 48000 kHz at 24 bit (for the raw dataset). All files are available as uncompressed PCM files.

audio f0audio

Sensors

To measure this position we used a linear membrane potentiometer. The base layer consists of material with length dependent resistance. The middle layer is made of highly conductive material. The sensor therefore acts as a resistance linearly dependent on the point where the circuit is shorted. The benefits of this sensor type are described in. We chose sensors of length 100 mm; a trade-off between the sensors linearity and convenience to capture multiple notes from a single finger position. The membrane sensor is thin enough to be attached on the fingerboard of the cello. Additionally the motion of the finger on the fingerboard was recorded using a 3-axis accelerometers (Texas Instruments LM335), taped on the players fingers. All sensor data was sampled by an Arduino Due microcontroller at 12 bit resolution.

sensor

Video

To cell string is excited by the bow, which is then transferred to the cello body by the bridge. By using a high speed camera focusing on the string, we are able to extract the movement pattern from the video signal. We used a professional high-speed camera (made by Fraunhofer IIS) and bright light sources specifically designed to minimize flicker present on the image when standard bulbs. Camera data is available as uncompressed raw video. To also be able to extract the slowly varying finger motion, we ensured the camera captured the finger movements as well as the moving string. One close-up video was made from each of the two players playing the D♯3 note. We decided not to include high speed camera videos for the complete test set, because of the loud noise emitted from the camera, as well as the bright set light which needs constant fan cooling.

motion f0motion

Pre-Processing

To pre-process the MUSERC RAW datasets several choices had been made. To allow researchers to reproduces and understand our choices the pre-processing methods are publicly available in our Github repository

Authors

Usage

Creative Commons License
MUSERC is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. If you want to use this dataset in your academic research, please cite our paper:

  1. Stöter, Fabian-Robert, Müller, Michael, and Edler, Bernd
    Multi-Sensor Cello Recordings for Instantaneous Frequency Estimation
    In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference: 995–998, 2015. DOI
    @inproceedings{Stoter:2015:MCR:2733373.2806384,
    author = {St\"{o}ter, Fabian-Robert and M\"{u}ller, Michael and Edler, Bernd},
    title = {Multi-Sensor Cello Recordings for Instantaneous Frequency Estimation},
    booktitle = {Proceedings of the 23rd Annual ACM Conference on Multimedia Conference},
    series = {MM '15},
    year = {2015},
    isbn = {978-1-4503-3459-4},
    location = {Brisbane, Australia},
    pages = {995--998},
    numpages = {4},
    url = {http://doi.acm.org/10.1145/2733373.2806384},
    doi = {10.1145/2733373.2806384},
    acmid = {2806384},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {cello, dataset, fundamental frequency estimation, sensors, visual acoustics},
    }

Downloads

The actual dataset comes in many flavors to download. Choose your files here.

Dataset
Description
Domains
Number of Items
Download Size
Download
MUSERC SA RAW
Raw Recordings
Sensor, Audio
13 (continous recordings)
339 MB
MUSERC SAV RAW
Raw Recordings
Sensor, Audio, Video
2
3.5 GB
MUSERC SA PRE
Pre-Processed Recordings
Sensor, Audio
140
50 MB
t.b.a
MUSERC SAV PRE
Pre-Processed Recordings
Compressed Video
Sensor, Audio, Video
2
1.2 GB
t.b.a
MUSERC SA
Pre-Processed Recordings
+ derived data
Sensor, Audio
140
50 MB
t.b.a
MUSERC SAV
Pre-Processed Recordings
+ derived data
Sensor, Audio, Video
2
1.2 GB
t.b.a

Acknowledgement

We thank Karlheinz Busch for his professional cello play for this recordings.