Model%Free%Music%Similarity%Measure%courses.cecs.anu.edu.au/courses/CS_PROJECTS/10S2/Final...
Transcript of Model%Free%Music%Similarity%Measure%courses.cecs.anu.edu.au/courses/CS_PROJECTS/10S2/Final...
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
AI Project Presenta.on
Model Free Music Similarity Measure
Wen Shao(u4717714) Under the supervision of
Prof. Tom Gedeon
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Music Similarity Measure SeNngs
• Content-‐based retrieval (Query by singing and humming) – A natural way to search – Only possible way some.mes – Applica.ons:
Album?
Ar)st? Lyrics?
Language?
Publica)on Date?
VS.
…
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Music Similarity Measure SeNngs
• Content-‐based retrieval (Query by singing and humming) • Similarity Measure – One of the key problems
Album?
Ar)st? Lyrics?
Language?
VS.
Singing Clips Humming Clip
Publica)on Date?
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Music Similarity Measure SeNngs
• Content-‐based retrieval (Query by singing and humming) • Similarity Measure • MIREX (Music Informa.on Retrieval EXchange) – One of the tasks – Subjec.ve and Objec.ve Evalua.on
Album?
Ar)st? Lyrics?
Language?
VS.
Publica)on Date?
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Music Similarity Measure SeNngs
• Content-‐based retrieval (Query by singing and humming) • Similarity Measure • MIREX • Model Free approach – No music knowledge assumed
Album?
Ar)st? Lyrics?
Language?
VS.
Publica)on Date?
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Literature review & My work
• WAV-‐based approach [LA01] [AP02] – Feature: Mel-‐Frequency Cepstral Coefficients… – Signature: K-‐Means, Gaussian Mixture Model – Distance: Signature-‐based
• MIDI-‐based approach [LHC99] [RCP04] – Transformed to string matching problems – “ZIP” method
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Literature review & My work
• WAV-‐based approach [LA01] [AP02] • MIDI-‐based approach [LHC99] [RCP04] • My work – Evaluated two state-‐of-‐the-‐art approaches
• ZIP method (MIDI-‐based) • MFCCs + GMM + likelihood (WAV-‐based)
– Proposed a Neural Network and GMM based
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
ZIP method [RCP04]
• Kolmogorov complexity K(x) abababababababababababababababababababababababababababababababab
4c1j5b2p0cv4w1x8rx2y39umgw5q85s
• Condi.onal Kolmogorov complexity K(x|y) – The difficulty to construct x from y – A small number if y is of great help in construc.ng x, otherwise it equals to K(x)
– An indicator of the degree of similarity between x and y
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
ZIP method (cont.)
• Kolmogorov complexity only semi computable • Use compressor to approximate K
K(S1) = 56B
K(S2) = 45B
K(S1S2) = 90B
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
MFCCs + GMMs [AP02] MFCCs: Short-‐term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
MFCCs + GMMs (cont.) MFCCs GMM signature: • GMM over MFCCs. • Adap.ve component number
• Each component carries a mean vector and a covariance matrix
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
MFCCs + GMMs(cont.) MFCCs GMM signature Likelihood: How easily or likely to construct the samples from one song given the GMM of the other
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Experimental Results
0
20
40
60
80
100
120
140
Top5 Top10 Top20 Top50 Top100
Top N hits Top N hits(GMM)
Top N hits(Complexity)
• 4431 singing/humming clips, avg. 12 singing/humming for one song
• 100 clips are drawn at random
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Neural network based method
• Main idea: feed NN with two GMMs, train NN and use the output(0-‐1) as the degree of similarity.
• Price paid: Fixed number of GMMs components(3).
N( !µk,!k )
N( !µ j,! j )
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Neural network based method
• Architecture: 2 hidden layers, 264 inputs, 1 output, 9/16/27 hidden neurons – Input 1-‐24: means for each component in GMM for the first song
– Input 25-‐132: covariance in GMM for the first song – Input 133-‐156: means for each component in GMM for the second song
– Input 157-‐264: covariance in GMM for the second song
• Cross-‐entropy (instead of SSE) • Ac.va.on func.on: Logis.c sigmoid func.on
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Experimental result 48,440 ‘yes’ pauerns, and 48,440 ‘no’ pauerns for training, 24,220 ‘yes’ pauerns and 24,220 ‘no’ pauerns for tes.ng, without a third valida.on dataset
9 hidden neurons 16 hidden neurons 27 hidden neurons Total cross-entropy error 46667.92 45799.54 44086.60
Mean cross-entropy error 0.96
0.95
0.91
!! ! !!! 0.38 0.39 0.40 !! ! !!! 0.62 0.62 0.60
Difference < 0.5 25593 52.83% 23416 48.34% 20117 41.52% Difference < 0.4 21858 45.12% 18615 38.43% 13320 27.50% Difference < 0.3 17749 36.64% 13594 28.06% 7065 14.59% Difference < 0.2 13016 26.87% 7379 15.23% 2395 4.94% Difference < 0.1 6910 14.26% 1126 2.32% 199 0.41%
!
Only for presenta.onal or informa.onal purpose. Copyright © 2010 Wen Shao @ ANU. All rights reserved.
Thanks Ques.ons?