Music Classification at SoundCloud
-
Upload
petko-nikolov -
Category
Data & Analytics
-
view
192 -
download
0
Transcript of Music Classification at SoundCloud
Music Classification at SoundCloud
29.10.2013 @ Sofia University
Lecturers: Petko Nikolov, Vassil Lunchev
Automatic prediction of genres, instruments, moods and vocals
reaches 250 million people
189 in the world by Alexa.com
12 hours audio uploaded every minute
The biggest sound platform
SofiaRecommender
Auto-Tagger
Auto-Tagger
Improve the meta-information
Music Tagging: Defining the Problem
● given a track’s audio signal, find the most relevant tags from each set: genres, moods, instruments, vocals
SoundCloudMusic Tagger
Rock
Punk
Euphoric
String Instruments
Guitar
Male Vocal
System Overview: New sound
Feature Extraction
ClassificationTakingDecision
vector of real values
TAGS
System Overview: Building blocks
Feature Extraction
ClassificationTakingDecision
vector of real values
TAGS
TaxonomyDataset
Classification
● one of the problems of Machine Learning
● given a set of categories C, identify the subset of categories a new observation belongs to
● usually solved by statistical model that uses past data with known categories to adjust its parameters, i.e. classifier
Classification tasks by output type
● binary classification: each observation belongs to exactly 1 of 2 given categories
● multi-class classification: each observation belongs to exactly 1 of N categories, N > 2
● multi-label classification: each observation belongs to a subset of the set of categories
Observations /Examples/
● used as input data for a classification model
● represented by vector x with fixed size D
● each dimension of the vector is called attribute (feature)
● attributes could be of several different types,we’re gonna focus on ones with continuous values
● supervised learning uses past set of observations XNxD with known labels YN to learn a model able to predict accurately the labels of new observations
Supervised learning
x1
x2
w0 + w1 x1 + w2 x2
● not a trivial problem
● labeled data is usually expensive to collect
● existing research musical labeled datasets are ridiculously small
● SoundCloud dataset○ user-defined tags to infer the true ones○ many heuristics and tuning techniques to remove
the noise
Having data to learn from
Feature extraction: overview
Discrete Fourier Transform (DFT)
Magnitude Spectrum
Feature Transform Summarization
● transforms a signal from time to frequency domain
DFTA
mpl
itude
Time
Pow
er
Frequency
● representing the signal as combination of sinusoidal waves with different frequency
Feature extraction process
Feature Transform
40 ms
Feature extraction process
Frame
Feature Transform
40 ms
Feature extraction process
Frame
DFT
Feature Transform
40 ms
Feature
Transform
Feature extraction process
Frame
DFT x
single value per feature transform
Feature Transform
40 ms
Feature
Transform
Feature extraction process
Frame
DFT x
single value per feature transformmove 20 ms
● for each feature transform we obtain a sequence of real values
Feature extraction process
20 ms
● for each feature transform we obtain a sequence of real values
Feature extraction process
20 ms
1 secondMeanStd
● for each feature transform we obtain a sequence of real values
Feature extraction process
20 ms
1 secondMeanStd
30 secondsMean(Means)Mean(Stds)
Std(Means)Std(Stds)
Spectral Variation
● take two consecutive frames
● apply DFT on each of them
● calculate the correlation between the power frequencies
● normalize it
● we have multi-label classification task to solve
How?
● divide it to N binary classification tasks, one for each tag
● train one binary classifier per tag able to output confidence score
Building classifier
Taking Decision
● so now we obtained confidence scores
Rock
0.7
Electronic
0.2
Grunge
0.8
Classical
0.3
● how to take a decision which tags to be positive?
● tags are not independent
● in genre’s case they are organized in a tree
Taxonomy-based decisions
Classical
Opera DubstepHouse
Techno
Rock
PunkAlternative
Electronic Hip-Hop
MixtapeTrap
Jazz
Swing
● tags are not independent
● in genre’s case they are organized in a tree
Taxonomy-based decisions
Classical JazzHip-HopElectronicRock
Opera PunkAlternative Dubstep
HouseTechno
MixtapeTrap
Swing
0.3 0.7
0.2 0.6 0.8
0.3
0.1 0.20.1 0.4
0.3
0.2
0.1
0.2
● tags are not independent
● in genre’s case they are organized in a tree
Taxonomy-based decisions
Classical JazzHip-HopElectronicRock
Opera PunkAlternative Dubstep
HouseTechno
MixtapeTrap
Swing
0.3 0.7
0.2 0.6 0.8
0.3
0.1 0.20.1 0.4
0.3
0.2
0.1
0.2
Amazon AWS
System Architecture
S3new
Message broker
workersworkers
workersworkers
EC2
Inserter
MySQL API
class
ify th
is
result
House
Stores the results for each sound
APIAPI
RabbitMQ
all sounds
Reclassifying the whole catalogue
Thank you