SongNet: Real-time Music Classificationcs229.stanford.edu/proj2018/poster/53.pdf · International...

1
SongNet: Real-time Music Classification {chenc2, czhang94, yzhang16}@stanford.edu Overview Results & Discussion Chen Chen, Chi Zhang, Yue Zhang Dataset Features C-RNN Model Architecture Future Reference q Motivation ü Helps online music companies such as Spotify or Apple Music to mange their music base. ü Saves time and effort from manual classification. q Objectives ü Build deep learning (C-RNN) models to automatically classify music genres for real time. ü Improve baseline models accuracy by C-RNN. q Free Music Archive ü The dataset contains 8000 tracks of 30 seconds clips, with 8 balanced genres [2] listed below. ü 70% training / 20% Validation / 10% Test Feature Map q Mel-spectrogram ü Used librosa [3] package to calculate mel-spectrogram. ü Designed CONV layers to extract features (DEMO) [1] Kozakowski, P., & Michalak, B. (2016, October 26). Music Genre Recognition. [2] K. Benzi, M. Defferrard, P. Vandergheynst, and X. Bresson. (2016). FMA: A dataset for music analysis. [3] LibROSA (n.d.). http://librosa.github.io/librosa/ Models Accuracy Random guess 12.5% K nearest neighbors 36.38% Logistic regression 42.25% Multilayer perceptron 44.88% Support vector machine 46.38% C-RNN 65.32% q Discussion ü Genre “Experimental” is hard to be classified correctly. ü Classifications of other genres perform well. ü CONV layers extracted useful genre clips, listen to our demos. ü Recurrent models enable us to do real-time classification. ü C-RNN does not include music metadata, while baseline model does. Instrumental Rock Electronic International Pop Experimental Hip-pop Folk q Results ü Understand why C-RNN cannot perform well on “Experimental” genre and improve the accuracy of that genre. ü Consider adding music metadata to C-RNN models and further improve the accuracy. ü Implement an user interface to allow users input a music clip and visualize the real- time music classification online. q Architecture ü Input: mel-spectrogram ü 3 Convolutional layers § Batch Normalization § ReLU activation § Dropout Regularization ü Recurrent layers ü Output: probability of each genre.

Transcript of SongNet: Real-time Music Classificationcs229.stanford.edu/proj2018/poster/53.pdf · International...

Page 1: SongNet: Real-time Music Classificationcs229.stanford.edu/proj2018/poster/53.pdf · International Pop Experimental Hip-pop Folk qResults üUnderstand why C-RNN cannot perform well

SongNet: Real-time Music Classification

{chenc2, czhang94, yzhang16}@stanford.edu

Overview

Results & Discussion

Chen Chen, Chi Zhang, Yue Zhang

Dataset

Features

C-RNN Model Architecture

Future

Reference

q Motivationü Helps online music companies such as Spotify or

Apple Music to mange their music base.ü Saves time and effort from manual classification.

q Objectivesü Build deep learning (C-RNN) models to

automatically classify music genres for real time.ü Improve baseline models accuracy by C-RNN.

q Free Music Archiveü The dataset contains 8000 tracks of 30 seconds clips,

with 8 balanced genres[2] listed below.

ü 70% training / 20% Validation / 10% Test

Feature Map

q Mel-spectrogramü Used librosa[3] package to calculate mel-spectrogram.ü Designed CONV layers to extract features (DEMO)

[1] Kozakowski, P., & Michalak, B. (2016, October 26). Music Genre Recognition. [2] K. Benzi, M. Defferrard, P. Vandergheynst, and X. Bresson. (2016). FMA: A dataset for music analysis. [3] LibROSA (n.d.). http://librosa.github.io/librosa/

Models AccuracyRandom guess 12.5%

K nearest neighbors 36.38%Logistic regression 42.25%

Multilayer perceptron 44.88%Support vector machine 46.38%

C-RNN 65.32%

q Discussionü Genre “Experimental” is hard to be classified correctly.ü Classifications of other genres perform well.ü CONV layers extracted useful genre clips, listen to our demos.ü Recurrent models enable us to do real-time classification.ü C-RNN does not include music metadata, while baseline model does.

Instrumental Rock ElectronicInternational Pop ExperimentalHip-pop Folk

q Results ü Understand why C-RNN cannot perform well on “Experimental” genre and improve the accuracy of that genre.

ü Consider adding music metadata to C-RNN models and further improve the accuracy.

ü Implement an user interface to allow users input a music clip and visualize the real-time music classification online.

q Architectureü Input: mel-spectrogramü 3 Convolutional layers§ Batch Normalization§ ReLU activation§ Dropout Regularizationü Recurrent layersü Output: probability of

each genre.