SynthSOD aligned scores

Dataset

Description

This dataset contains the aligned scores for around 85% of the songs in SynthSOD, which we used to train models for score-informed music source separation [1]. We obtained the score information from the original MIDIs used to synthesize SynthSOD and aligned them using the system described in [2].

The score information is encoded in text files including the start and end time of every note, the MIDI pitch, and the MIDI instrument for every note. The correspondence between the MIDI instrument number and the instrument names in SynthSOD is based on the General MIDI standard (see below). We also provide metadata files in the same format as the ones provided with SynthSOD excluding the songs whose scores we weren't able to align.

If you find this dataset useful for your research, please consider citing the paper where we introduced it [1] and the original paper of SynthSOD [3].

 

About version 2

After publishing the initial version of the dataset, we found that some of the notes present in the scores weren't present in the synthesized audios from SynthSOD. This corresponded to notes with a MIDI pitch out of the range accepted by the synthesizer for that instrument, affecting especially percussion instruments. In the second version of the dataset, we have removed these notes from the scores, so they correspond with the audio files from SynthSOD. The results reported in the paper were obtained by training and evaluating the models with this new version.

SynthSOD instrument name
MIDI instrument number


Violin, Violin_1, Violin_2
40


Viola
41


Cello
42


Bass
43


Oboe
68


coranglais
69


Bassoon
70


Clarinet
71


Piccolo
72


Flute
73


Trumpet
56


Trombone
57


Tuba
58


Horn
60


Harp
46


Timpani
47


untunedpercussion
0
Date made available2 Jun 2025
PublisherZenodo

Funding

FundersFunder number
European Commission101095065

    Field of science, Statistics Finland

    • 113 Computer and information sciences

    Cite this