Music Classification at SoundCloud

Music Classification at SoundCloud

29.10.2013 @ Sofia University

Lecturers: Petko Nikolov, Vassil Lunchev

Automatic prediction of genres, instruments, moods and vocals

reaches 250 million people

189 in the world by Alexa.com

12 hours audio uploaded every minute

The biggest sound platform

http://alexa.com/

http://alexa.com/

SofiaRecommender

Auto-Tagger

Auto-Tagger

Improve the meta-information

Music Tagging: Defining the Problem

● given a track’s audio signal, find the most relevant tags from each set: genres, moods, instruments, vocals

SoundCloudMusic Tagger

Rock

Punk

Euphoric

String Instruments

Guitar

Male Vocal

System Overview: New sound

Feature Extraction

ClassificationTakingDecision

vector of real values

TAGS

System Overview: Building blocks

Feature Extraction

ClassificationTakingDecision

vector of real values

TAGS

TaxonomyDataset

Classification

● one of the problems of Machine Learning

● given a set of categories C, identify the subset of categories a new observation belongs to

● usually solved by statistical model that uses past data with known categories to adjust its parameters, i.e. classifier

Classification tasks by output type

● binary classification: each observation belongs to exactly 1 of 2 given categories

● multi-class classification: each observation belongs to exactly 1 of N categories, N > 2

● multi-label classification: each observation belongs to a subset of the set of categories

Observations /Examples/

● used as input data for a classification model

● represented by vector x with fixed size D

● each dimension of the vector is called attribute (feature)

● attributes could be of several different types,we’re gonna focus on ones with continuous values

● supervised learning uses past set of observations XNxD with known labels YN to learn a model able to predict accurately the labels of new observations

Supervised learning

x1

x2

w0 + w1 x1 + w2 x2

● not a trivial problem

● labeled data is usually expensive to collect

● existing research musical labeled datasets are ridiculously small

● SoundCloud dataset○ user-defined tags to infer the true ones○ many heuristics and tuning techniques to remove

the noise

Having data to learn from

Feature extraction: overview

Discrete Fourier Transform (DFT)

Magnitude Spectrum

Feature Transform Summarization

● transforms a signal from time to frequency domain

DFTA

mpl

itude

Time

Pow

er

Frequency

● representing the signal as combination of sinusoidal waves with different frequency

Feature extraction process

Feature Transform

40 ms


Frame

Feature Transform

40 ms


Frame

DFT

Feature Transform

40 ms

Feature

Transform


Frame

DFT x

single value per feature transform

Feature Transform

40 ms

Feature

Transform


Frame

DFT x

single value per feature transformmove 20 ms

● for each feature transform we obtain a sequence of real values


20 ms



20 ms

1 secondMeanStd



20 ms

1 secondMeanStd

30 secondsMean(Means)Mean(Stds)

Std(Means)Std(Stds)

Spectral Variation

● take two consecutive frames

● apply DFT on each of them

● calculate the correlation between the power frequencies

● normalize it

● we have multi-label classification task to solve

How?

● divide it to N binary classification tasks, one for each tag

● train one binary classifier per tag able to output confidence score

Building classifier

Taking Decision

● so now we obtained confidence scores

Rock

0.7

Electronic

0.2

Grunge

0.8

Classical

0.3

● how to take a decision which tags to be positive?

● tags are not independent

● in genre’s case they are organized in a tree

Taxonomy-based decisions

Classical

Opera DubstepHouse

Techno

Rock

PunkAlternative

Electronic Hip-Hop

MixtapeTrap

Jazz

Swing

● tags are not independent

● in genre’s case they are organized in a tree

Taxonomy-based decisions

Classical JazzHip-HopElectronicRock

Opera PunkAlternative Dubstep

HouseTechno

MixtapeTrap

Swing

0.3 0.7

0.2 0.6 0.8

0.3

0.1 0.20.1 0.4

0.3

0.2

0.1

0.2

Amazon AWS

System Architecture

S3new

Message broker

workersworkers

workersworkers

EC2

Inserter

MySQL API

class

ify th

is

result

House

Stores the results for each sound

APIAPI

RabbitMQ

all sounds

Reclassifying the whole catalogue

Thank you

Music Classification at SoundCloud

Data & Analytics

Transcript of Music Classification at SoundCloud