Datasets

This is a list of datasets that use Freesound content, sorted alphabetically. Do yo have a dataset that uses Freesound and does not appear here? Please send us an email at freesound@freesound.org!

  • Clotho Dataset

    Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. Audio samples are collected from Freesound.


    Konstantinos Drossos, Samuel Lipping, Tuomas Virtanen


    Tampere University

    Clotho Dataset thumbnail

  • DBR Dataset

    DBR dataset is an environmental audio dataset created for the Bachelor’s Seminar in Signal Processing in Tampere University of Technology. The samples in the dataset were collected from the online audio database Freesound. The dataset consists of three classes, each containing 50 samples, and the classes are ‘dog’, ‘bird’, and ‘rain’ (hence the name DBR).


    Ville-Veikko Eklund


    Tampere University

    DBR Dataset thumbnail

  • DCASE2019 Task 4 - Sythetic data

    The synthetic set is composed of 10 sec audio clips generated with Scaper. The foreground events are obtained from a subset of FSD dataset from Freesound. Each event audio clip was verified manually to ensure that the sound quality and the event-to-background ratio were sufficient to be used an isolated event. We also verified that the event was actually dominant in the clip and we controlled if the event onset and offset are present in the clip.


    Turpault Nicolas, Serizel Romain, Salamon Justin, Shah Ankit Parag


    Université de Lorraine, CNRS, Inria, Language Technologies Institute, Carnegie Mellon University, Adobe Research

    DCASE2019 Task 4 - Sythetic data thumbnail

  • freefield1010

    This dataset contains 7690 10-second audio files in a standardised format, extracted from contributions on the Freesound archive which were labelled with the “field-recording” tag. Note that the original tagging (as well as the audio submission) is crowdsourced, so the dataset is not guaranteed to consist purely of “field recordings” as might be defined by practitioners. The intention is to represent the content of an archive collection on such a topic, rather than to represent a controlled definition of such a topic.


    Dan Stowell


    Centre for Digital Music (C4DM), Queen Mary University of London

    freefield1010 thumbnail

  • Freesound Loops 4k (FSL4)

    This dataset contains ~4000 user-contributed loops uploaded to Freesound. Loops were selected by searching Freesound for sounds with the query terms loop and bpm, and then automatically parsing the returned sound filenames, tags and textual descriptions to identify tempo annotations made by users. For example, a sound containing the tag 120bpm is considered to have a ground truth of 120 BPM.


    Frederic Font


    Music Technology Group (MTG), Universitat Pompeu Fabra

    Freesound Loops 4k (FSL4) thumbnail

  • Freesound One-Shot Percussive Sounds

    This dataset contains 10254 one-shot (single event) percussive sounds from Freesound and the corresponding timbral analysis. These were used to train the generative model for Neural Percussive Synthesis Parameterised by High-Level Timbral Features.


    António Ramires, Pritish Chandna, Xavier Favory, Emilia Gómez, Xavier Serra


    Music Technology Group (MTG), Universitat Pompeu Fabra

    Freesound One-Shot Percussive Sounds thumbnail

  • FSDKaggle2018

    FSDKaggle2018 is an audio dataset containing 11,073 audio files annotated with 41 labels of the AudioSet Ontology. FSDKaggle2018 has been used for the DCASE Challenge 2018 Task 2, which was run as a Kaggle competition titled Freesound General-Purpose Audio Tagging Challenge.


    Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Manoj Plakal, Daniel P. W. Ellis, Xavier Serra


    Music Technology Group (MTG), Universitat Pompeu Fabra, Google Research’s Machine Perception Team

    FSDKaggle2018 thumbnail

  • FSDKaggle2019

    FSDKaggle2019 is an audio dataset containing 29,266 audio files annotated with 80 labels of the AudioSet Ontology. FSDKaggle2019 has been used for the DCASE Challenge 2019 Task 2, which was run as a Kaggle competition titled Freesound Audio Tagging 2019. The dataset allows development and evaluation of machine listening methods in conditions of label noise, minimal supervision, and real-world acoustic mismatch. FSDKaggle2019 consists of two train sets and one test set. One train set and the test set consists of manually-labeled data from Freesound, while the other train set consists of noisily labeled web audio data from Flickr videos taken from the YFCC dataset.


    Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra


    Music Technology Group (MTG), Universitat Pompeu Fabra, Google Research’s Machine Perception Team

    FSDKaggle2019 thumbnail

  • FSDnoisy18k

    FSDnoisy18k is an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data.


    Eduardo Fonseca, Mercedes Collado, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra


    Music Technology Group (MTG), Universitat Pompeu Fabra, Google Research’s Machine Perception Team

    FSDnoisy18k thumbnail

  • Good-sounds dataset

    This dataset contains monophonic recordings musical instruments playing two kinds of exercises: single notes and scales. The recordings were made in the Universitat Pompeu Fabra / Phonos recording studio by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching, and uploaded to Freesound. 12 different instruments were recorded using one or up to 4 different microphones (depending on the recording session). For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate mono .flac audio file of 48kHz and 32 bits. The tonal characteristics are explained both in the the following section and the related publication.


    Oriol Romani Picas, Hector Parra Rodriguez, Dara Dabiri, Xavier Serra


    Music Technology Group (MTG), Universitat Pompeu Fabra

    Good-sounds dataset thumbnail

  • SimSceneTVB Learning

    This is a dataset of 600 simulated sound scenes of 45s each representing urban sound environments, simulated using the simScene Matlab library. The dataset is divided in two parts with a train subset (400 scenes) and a test subset (200 scenes) for the development of learning-based models. Each scene is composed of three main sources (traffic, human voices and birds) according to an original scenario, which is composed semi-randomly conditionally to five ambiances: park, quiet street, noisy street, very noisy street and square. Separate channels for the contribution of each source are available. The base audio files used for simulation are obtained from Freesound and LibriSpeech. The sound scenes are scaled according to a playback sound level in dB, which is drawn randomly but remains plausible according to the ambiance.


    Felix Gontier, Mathieu Lagrange, Pierre Aumond, Catherine Lavandier, Jean-François Petiot


    Ecole Centrale de Nantes, University of Paris Seine, University of Cergy-Pontoise, ENSEA

    SimSceneTVB Learning thumbnail

  • SimSceneTVB Perception

    This is a corpus of 100 sound scenes of 45s each representing urban sound environments, including: 6 scenes recorded in Paris; 19 scenes simulated using simScene to replicate recorded scenarios, including the 6 recordings in this corpus; 75 scenes simulated using simScene with diverse new scenarios, containing traffic, human voices and bird sources. The base audio files used for simulation are obtained from Freesound and LibriSpeech. The sound scenes are scaled according to a playback sound level in dB, which is drawn randomly but remains plausible according to the ambiance.


    Felix Gontier, Mathieu Lagrange, Pierre Aumond, Catherine Lavandier, Jean-François Petiot


    Ecole Centrale de Nantes, University of Paris Seine, University of Cergy-Pontoise, ENSEA

    SimSceneTVB Perception thumbnail

  • Sound Events for Surveillance Applications

    The Sound Events for Surveillance Applications (SESA) dataset files were obtained from Freesound. The dataset was divided between train (480 files) and test (105 files) folders. All audio files are WAV, Mono-Channel, 16 kHz, and 8-bit with up to 33 seconds. # Classes: 0 - Casual (not a threat) 1 - Gunshot 2 - Explosion 3 - Siren (also contains alarms).


    Tito Spadini

    Sound Events for Surveillance Applications thumbnail

  • TUT Rare Sound Events 2017

    TUT Rare Sound Events 2017 is a dataset used in the DCASE Challenge 2017 Task 2, focused on the detection of rare sound events in artificially created mixtures. It consists of isolated sound events for each target class and recordings of everyday acoustic scenes to serve as background. The target sound event categories are: Baby crying, Glass breaking, Gunshot. The background audio is part of TUT Acoustic Scenes 2016 dataset, and the isolated sound examples were collected from Freesound. Selection of sounds from Freesound was based on the exact label, selecting examples that had sampling frequency 44.1 kHz or higher.


    Aleksandr Diment, Annamaria Mesaros, Toni Heittola


    Tampere University of Technology

    TUT Rare Sound Events 2017 thumbnail

  • Urban Sound 8K

    This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy. All excerpts are taken from field recordings uploaded to Freesound.


    Justin Salamon, Christopher Jacoby, Juan Pablo Bello


    Music and Audio Research Laboratory (MARL), New York University, Center for Urban Science and Progress (CUSP), New York University

    Urban Sound 8K thumbnail

  • Vocal Imitation Set v1.1.3

    The VocalImitationSet is a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound, which were curated based on Google’s AudioSet ontology. We expect that this dataset will help research communities obtain a better understanding of human’s vocal imitation and build a machine understand the imitations as humans do.


    Bongjun Kim, Bryan Pardo


    Dept. of Electrical Engineering and Computer Science, Northwestern University

    Vocal Imitation Set v1.1.3 thumbnail