Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen National Tsing Hua University...
-
Upload
pranav-train -
Category
Documents
-
view
215 -
download
1
Transcript of Music Retrieval and Analysis Part I: Music Retrieval Arbee L.P. Chen National Tsing Hua University...
Music Retrieval and Analysis
Part I: Music Retrieval Arbee L.P. ChenNational Tsing Hua UniversityISMIR’03 Tutorial III
Outline Technologies
Architecture for Music Retrieval Music Representations Music query processing Music indexing Similarity measures
Systems and evaluation Existing systems
Meldex Themefinder SEMEX PROMS OMRAS
System evaluation Future research directions
Architecture for Music Retrieval
Users
MusicPlayer
MusicQuery
Interface
MusicStorage
manager
MusicFeature Extractor
MusicQuery
Processor
MusicDatabase
MusicIndex
Query result
Result music objectsMusic objects Music features
Music queryResult music objects
Music Representations
Music
Thematictheme*rhythmmelodychord
Acoustical
Music_Infokeybeattempo
Media
Music _Wave Music _MIDI Music_AU
IS-A relationship
composition relationship
* multi-valued attribute
infoacousticalthematic
loudnesspitchdurationbrightnessbandwidth
Styles of Music Composition
Monophony Monophonic music has at most one note playing at any given
time; before a new note starts the previous note must have ended
Homophony Homophonic music has at most one set of notes playing at the
same time. For any set of notes that start at the same time, no new note or notes may begin until every note in that set has ended
Polyphony Polyphonic music has no such restrictions. Any note or set of
notes may begin before any previous note or set of notes has ended
Monophony Representations
Absolute measure Absolute pitch
C5 C5 D5 A5 G5 G5 G5 F5 G5 Absolute duration
1 1 1 1 1 0.5 0.5 1 1 Absolute pitch and duration
(C5,1)(C5,1)(D5,1)(A5,1)(G5,1)(G5,0.5)(G5,0.5)(F5,1)(G5,1) Relative measure
Contour (in semitones) 0 +2 +7 -2 0 0 -2 +2
IOI (Inter onset interval) ratio 1 1 1 1 0.5 1 2 1
Contour and IOI ratio (0,1)(+2,1)(+7,1)(-2,1)(0,0.5)(0,1)(-2,2)(+2,1)
Polyphony Representations
All information preservation Keep all information of absolute pitch and duration (start_time, pitch,
duration) (1,C5,1)(2,C5,1)(3,D5,1)(3,A5,1)(4,F5,4)(5,C6,1)(6,G5,0.5)(6.5,G5,0.5)…
Relative note representation Record difference of start times and contour (ignore duration)
(1,0)(1,+2)(0,+7)(1,-4)… Monophonic reduction
Select one note at every time step (main melody selection) (C5,1)(C5,1)(A5,1)(F5,1)(C6,1)...
Homophonic reduction (chord reduction) Select every note at every time step
(C5)(C5)(D5,A5)(F5)(C6)(G5)(G5)…
Music Representation - Theme
Theme A short tune that is repeated or developed in a piece of music
A small part of a musical work Efficient retrieval
A highly semantic representation Effective retrieval
Automatic theme extraction Exact repeating patterns Approximate repeating patterns
Music Representation – Markov Models
Capture global information for a music pieceRepeating patternsSequential patterns
A lossy representation Good for music classification
Markov Model Representation
[Pickens and Crawford, CIKM‘02] Homophonic reduction For each chord, compute its distance with
the 24 lexical chords Capture statistical properties by Markov
modelsThe representation of each song is reduced
into a matrix
Markov Model Representation (Cont.)
Lexical chords
Markov model representation
Chord
Music Query Processing
On-line methods (string matching algorithms) Exact string matching
Brute-force method KMP algorithm Boyer-Moore algorithm Shift-Or algorithm
Partial string matching Shift-Or algorithm
Approximate string matching Dynamic programming
Brute-Force Method
T: A5 B5 A5 C5 A5 B5 A5 B5 P: A5 B5 A5 B5 Time complexity O(mn)
A5 B5 A5 B5
A5 B5 A5 C5 A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
KMP Algorithm
Left-to-right scan Failure function shift rule O(m+n)
A5 B5 A5 B5
A5 B5 A5 C5 A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
Failure function f(i) 0 0 1 2
Skip this step
Boyer-Moore Algorithm
If the pattern P is relatively long and the alphabet is reasonably large, this algorithm is likely to be the most efficient string matching algorithm
Right-to-left scan Bad character shift rule Good suffix shift rule O(m+n)
Bad Character Shift Rule
A5 B5 A5 B5
A5 B5 A5 C5 A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
A5 B5 A5 B5
Right to left scan
bad character
skip these steps
Good Suffix Shift Rule
A5 C5 A5 B5 A5 B5 C5 B5
A5 B5 A5 B5
Good suffix
Skip this step
A5 B5 A5 B5
A5 B5 A5 B5
Shift-Or AlgorithmAn example of the shift-or algorithm for p=aab and s=abcaaab
Ta b c
0 1 10 1 11 0 1
aab
aab
E S(E) T[a]
111
011
001
E S(E) T[b]
011
001
110
E S(E) T[c]
011
111
111
E S(E) T[a]
011
111
001
E S(E) T[a]
001
011
001
E S(E) T[a]
000
001
001
E S(E) T[b]
000
001
110
E
110
Shift-Or Algorithm for Partial Matching [Lemstrom and Perttu, ISMIR’00]
An example of the shift-or algorithm for p=aab and s=(ab)(ca)(aab)
Ta b c
0 1 10 1 11 0 1
aab
aab
E S(E) T[a]^T[b]
111
011
000
E S(E) T[c]^T[a]
011
001
001
E S(E) T[a]^T[a]^T[b]
001
000
000
000
E
Approximate Matching
In practical pattern matching applications, exact matching is not always suitable
In the field of MIR, approximation is measured mainly by the edit distance: the minimal number of local edit operations needed to transform one music object into another
Dynamic programming method serves this purpose
Edit Distance
Unit cost edit distance W(ab)=1, ab (Replacement) W(a )=W( b)=1 (Deletion and Insertion)
Non-unit cost edit distance (Content-sensitive) The costs of replacement, deletion and insertion can
be any values which depend on the cost function E.g., W(mn)=0.2 and W(a )=0.8
Dynamic Programming Method
Given any two strings S1=abac, S2=aaccb
The edit distance evaluated by DP
The edit distance is 3
a b a c
0 1 2 3 4
a 1 0 1 2 3
a 2 1 1 1 2
c 3 2 2 2 1
c 4 3 3 3 2
b 5 4 3 4 3
a b a c c b
a a c c b
(1 deletion, 2 insertions)
Music Indexing
Tree-based index (Suffix tree) List-based index (1D-list) N-gram index Indexing Markov models?
Tree-Based Index
[Chen, et al., ICME‘00] Music objects are coded as strings of music
segments Four segment types to model music contour Pitch and duration are considered
Index structures Augmented suffix tree
Both incipit/partial and exact/approximate matching can be handled
Tree-Based Index (Cont.)
Four segment types
note number
beat
60
62
65
64
67
(B, 3, -3)
(A, 1, +1)
(D, 3, -3)
(B, 1, -2)
(C, 1, +2)
(C, 1, +2)
(C, 1, +1)
type A
type B
type C
type D
Tree-Based Index (Cont.)
1 4
AB C
B C
C $
$
2 5
3
(a)
root
A
C
A
(b)
root
A<1,1>
C<7,8> C<1,3>
A<7,8> A<3,4>
N1
N2
The suffix tree of the string S=“ABCAB”
(a) An example suffix tree(b) A 1-D augmented suffix tree
1
A
$
A
B
$B2
A
B
$3
List-Based Index
[Liu, Hsu and Chen, ICMCS‘99] Music objects are coded as melody strings
“so-mi-mi-fa-re-re-do-re-mi-fa-so-so-so”
Melody strings are organized as linked lists Both incipit/partial and exact/approximate
matching can be handled Exact link, insertion link, dropout link, transposition
link
List-Based Index (Cont.)
1:7 1:11:41:21:5
2:1 1:31:6
2:9 2:71:91:8
2:82:22:5
2:32:6
2:4
2:10
2:11 2:12
1:10 1:11
1:12
1:13
do re mi fa so la si
1:7 1:21:5
2:1 1:31:6
2:9 1:91:8
2:22:5
2:62:10
2:11 2:12
do re mi
start end 1:7 1:21:5
2:1 1:31:6
2:9 1:91:8
2:22:5
2:62:10
2:11 2:12
do re mi
start end
(a) (b) (c)
N-Gram Index
A widely used technique in music databases Target strings are cut into index terms by a
sliding window with length N Index can be implemented by various
methods, e.g., inverted file Queries are also cut into index terms with
length N Searching and joining are then performed
N-Gram Index [Doraisamy and Ruger, ISMIR’02]
S=aabbcaab
2-Gram Position
aa 1,6
ab 2,7
bb 3
bc 4
ca 5
Inverted file
Query=bbca
bb, ca
Cut into 2-grams
Position: 3 Position: 5
The substring is found from position 3 to position 6
Join
Similarity Measures
The effectiveness of MIR depends on the similarity measureEdit distance (Suitable for short queries)
Difference between two probability matricesNote shift distance [Typke, et al., ISMIR‘03]
Probability Matrix Distance
A B C
A 0.5 0.5 0
B 0 0 1
C 0.25 0.25 0.5
A B C
A 0.7 0.3 0
B 0 0 1
C 0.3 0.3 0.3
S1: CCCAABCB
S2: CCAAABCB
ddiqqi Xx xdi
xqixqidqD
,
))(
)(log)(()||(
Kullback-Liebler (KL) divergence:The value is 0 when two matrices are the sameq: Query probability matrixd: Data probability matrixi: rowx: column
D(S1||S2)=0.1092
Probability Matrix Distance (Cont.)
Ineffective for MIR with short queries0-entries in the query model mean unknown
values?0-entries in the corpus model means facts?
Performance comparison with string matching needed
Note Shift Distance
Sum of the two dimensional distance between the notes of the query and the notes of the answer
0 0 0 0 0
Music Retrieval Systems
Music Representations Music Query processing Special features
Meldex
[McNab, et al., D-Lib Magazine‘97] "melodic contour" or "pitch profile“
113531 RUUDD (R:Repeat, U:Up, D:Down)
Approximate string matchingDynamic programming
Query by humming
Themefinder
[Kornstadt, Computing in Musicology‘98] Select themes manually Allow different query types
Pitch IntervalContour
Provide exact matching only
SEMEX
[Lemstrom and Perttu, ISMIR’00] The pattern is monophonic; the musical source
is polyphonic Finding all positions of S (source) that have an
occurrence of p p=bca S=<a, b, c><a, b><b, c><a, b>
Shift-or algorithm No similarity function
PROMS
[Clausen, et al., ISMIR‘00] Representation by pitch and onset time (ignore
duration) Index by inverted file Fault-tolerant music search
Allow missing notes Allow fuzzy notes
Query=(b, (d or c), a, b)
OMRAS
[Dovey, ISMIR‘01] Searching in a “piano roll” model Gaps based dynamic programming Example (gap = 2):
Data T0=<64,72,76>, T1=<60>, T2=<59,67,79>, T3=<55,63>,
T4=<55,67,79>
Query S0=<59,67>, S1=<55,67>
T0 T1 T2 T3 T4
S0 0 0 2 1 0
S1 0 0 0 0 2
Piano roll
System Evaluation
Traditional measures of effectiveness are precision and recall
retrievedarethatreferencesnumber of
relevantarethatreferencesretrievednumber ofprecision
referencesrelevantnumber of
relevantarethatreferencesretrievednumber ofrecall
The Recall-Precision Curve
A Platform for Evaluating MIR Systems Evaluation of various music retrieval approaches
Efficiency response time
Effectiveness recall-precision curve
The Ultima project builds such a platform [Hsu, Chen and Chen, CIKM’01] Same data set and query set for various approaches Compare recall-precision curves
The Ultima Project
Data store Query generation
module Query processing
module Result
summarization module
Report module Mediator
Data Store (MS Access)
SMF
Med
iato
r
Query Processing Module
Table
to the InternetSummarization Module
1D-List APS APM
Report Module
Query Generation Module
Future Research Directions
Music Retrieval based on music structure Music retrieval based on user’s perceptiveness Similarity measure for polyphonic music A novel index structure for polyphony Fair evaluation method
Music Retrieval and Analysis
Part II: Music Analysis
Outline
Music Segmentation and Structure Analysis Local Boundary Detection Repeating Pattern Discovery Phrase Extraction
Music Classification Music Recommendation Systems Future Research Directions
Local Boundary Detection
[Cambouropoulos, ICMC’01] Segment music by local discontinuities between notes Calculate boundary strength values for each interval of a
melodic surface, i.e., pitch, IOI, and rest, according to the strength of local discontinuities
Local Boundary Detection (Cont.) A music object m has a parametric profile Pk, which is represented
as a sequence of n intervals Pk = [x1, x2, …, xn] where k{pitch, IOI, rest}
Pitch interval measured in semitones IOI and rest intervals measured in milliseconds or numerical
duration values IOI (Inter onset interval)
The amount of time between the onset of one note and the onset of the next note
Rest The amount of time between the offset of one note and the
onset of the next note IOI
rest
pitch
Local Boundary Detection (Cont.) The degree of change r between two successive interval
values xi and xi+1 is:
The strength of the boundary si for interval xi is:
Overall local boundary strength based on the three
intervals wp*si(p)+wd*si(d)+wr*si(r)
0, and 0 iff ||
111
11,
iiii
ii
iiii xxxx
xx
xxr
0 iff 0 11, iiii xxr
)( 1,,1 iiiiii rrxs
Local Boundary Detection (Cont.)
Repeating Pattern Discovery
A repeating pattern in music data is defined as a sequence of notes which appears more than once in a music object
The themes or motives are typical kinds of repeating patterns
Exact repeating patterns [Hsu, Liu and Chen, TMM’01] By the string-join operator
Approximate repeating patterns [Lartillot, ISMIR’03] By detecting when their successive notes are sufficiently close
and their borders contrast sufficiently with the outer environment
{ 12(2), 23(3), 34(4), 45(3), 56(2) }
{ 1234(2), 2345(2), 3456(2) }
{ }
123(2) 234(3) 345(3) 456(2)
S = 1234A2345B3456C123456 Nontrivial repeating patterns
Exact Repeating Pattern Discovery
Approximate Repeating Pattern Discovery Stpe1
Find all note pairs (n1, n2) which satisfy the pitch distance constraint
Step 2 Group the similar note pairs
D((n1, n2), (n1’, n2’))
Step 3 Merge the adjacent note pairs for approximate
repeating patterns (n1, n2)(n2,n3) : (n1,n2,n3) (n1’,n2’)(n2’,n3’) : (n1’,n2’,n3’)
Approximate Repeating Pattern Discovery (Cont.)
n1, n2
p1, p2
o1, o2
n1’, n2’ p1’, p2’ o1’, o2’
D((n1, n2), (n1’, n2’)) = ( | p - p’| + 1 ) * (max { o/ o’, o’/ o }) 0.7
p = p2 – p1 p’ = p’2 – p’1
o = o2 – o1 o’ = o’2 – o’1
pi is the pitch value of ni
oi is the onset time of ni
Phrase Extraction
Two features used for phrase extraction Duration and rest
Melodic Shapes [Huron, Computing in Musicology’95] Statistics Information in Western Folksongs
The most common length of a phrase is 8 notes Half of all phrases are between 7 and 9 notes in length Three-quarters of all phrases are between 6 and 10 notes in
length
Phrase Extraction (Cont.)
A: the pitch value of the first note in the target phraseB: the pitch value of the last note in the target phraseC: the average pitch value of the remaining notes in the target
phrase
Contour Type
Number of Phrases
Percentage Arch Shape Definition
Convex 13926 38.6% A<C B<C
Descending 10376 28.8% A>C>B
Ascending 6983 19.4% ACB
Concave 3496 9.7% A>C B>C
Others 1294 3.5%
Phrase Extraction (Cont.)
Identify the positions of all the terminative notes Extract the music pieces according to the terminative
notes Select the candidate music pieces for decomposition
based on the length information If the length 12, the music piece is marked as a
phrase If the length > 12, decompose the music piece into
phrases convex > descending > ascending > concave
Phrase Extraction (Cont.)
64 62 60 57 55 67 67 69 67 69 64 62
64 62 60 57 55 67 67 69 67 69 64 62 64 62 60 57 55 67 64 64 62 60 62 64 62 62 67 67
x
y z
OrderThe Length of the Prefix Fragment
The Pitch of the First Note
The Pitches of the Remaining NotesThe Pitch of the
Last Note
1 6 64 62, 60, 57, 55 67
2 7 64 62, 60, 57, 55, 67 67
3 8 64 62, 60, 57, 55, 67, 67 69
4 9 64 62, 60, 57, 55, 67, 67, 69 67
5 10 64 62, 60, 57, 55, 67, 67, 69, 67 69
6 11 64 62, 60, 57, 55, 67, 67, 69, 67, 69 64
7 12 64 62, 60, 57, 55, 67, 67, 69, 67, 69, 64 62
Convex?No
Descending?Length = 12, A = 64, B = 62, C = 63.7
Music Classification
After music segmentation, different kinds of music units can be extracted from music objects, such as repeating patterns and phrases
Different kinds of music units may have different semantics in musicology
These extracted music units can be used in music classification, retrieval, and analysis
Music Recommendation Systems
[Chen and Chen, CIKM’01] The results of music classification can be used for
music-related services By analyzing the user access histories, we can
discover which music classes the users may be interested in and which users belonging to the same group
By using different kinds of recommendation mechanisms, we can recommend the users the music objects
Architecture
Database
MusicObject
ProfileManager
TrackSelector
Users
FeatureExtractor
representative track
feature point
ClassifierCB Method
RecommendationModule
COL Method
STA Method
UserGroups
Interface
MusicGroups
AccessHistories
FeaturePoints
MusicObjects
A polyphonic music object• one melody track• other accompaniment tracks
Recommendation Mechanisms
Content-based filtering approach Similarity between music objects and user profiles Recommend the music objects that belong to the music groups
the user is recently interested in Collaborative filtering approach
Similarity between user profiles Provide unexpected recommendations to the users in the same
user group Statistical approach
Recommend “hot” music objects
Future Research Directions
Polyphonic Music Segmentation Efficient Approximate Repeating Pattern Discovery Representation and Similarity Measure for Musical
Structures Musical Style/Form Detection
References
[Camb01] Cambouropoulos, E., “The Local Boundary Detection Model (LBDM) and its Application in the Study of Expressive Timing,” Proc. International Computer Music Conference, 2001.
[Chen00] Chen, A.L.P., M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and S.Y.S. Hua, "Query by Music Segments: An Efficient Approach for Song Retrieval," Proc. IEEE International Conference on Multimedia and Expo, 2000.
[Chen01] Chen, H.C. and A.L.P. Chen, "A Music Recommendation System Based on Music Data Grouping and User Interests grouping and user interests," Proc. ACM International Conference on Information and Knowledge Management, 2001
[Clau00] Clausen, M., R. Engelbrecht, D. Meyer, and J. Schmitz, “PROMS: A Web-based Tool for Searching in Polyphonic Music,” Proc. International Symposium on Music Information Retrieval, 2000.
[Dove01] Dovey, M.J., “A Technique for Regular Expression Style Searching in Polyphonic Music,” Proc. International Symposium on Music Information Retrieval, 2001.
References (Cont.)
[Korn98] Kornstadt, A., “Themefinder: A Web-based Melodic Search Tool,” Computing in Musicology, 11:231-236, 1998.
[Dora02] Doraisamy, S. and S.M. Ruger, “A comparative and fault-tolerance study of the use of n-grams with polyphonic music,” Proc. International Symposium on Music Information Retrieval, 2002.
[Hsu01] Hsu, J.L., C.C. Liu and A.L.P. Chen, "Discovering Nontrivial Repeating Patterns in Music Data," IEEE Transactions on Multimedia, Vol. 3, No. 3, 2001.
[Hsu02] Hsu, J.L., A.L.P. Chen and H.C. Chen, "The Effectiveness Study of Various Music Information Retrieval Approaches," Proc. ACM International Conference on Information and Knowledge Management, 2002.
[Huro95] Huron, D., “The Melodic Arch in Western Folksongs,” Computing in Musicology, Volume 10, 1995.
References (Cont.)
[Lart03] Lartillot, O., “Discovering musical patterns through perceptive heuristics,” Proc. International Symposium on Music Information Retrieval, 2003.
[Lems00]Lemstrom, K. and Sami Perttu, “SEMEX-An Efficient Music Retrieval Prototype,” Proc. International Symposium on Music Information Retrieval, 2000.
[Liu99] Liu, C.C., J.L. Hsu, and A.L.P. Chen, "An Approximate String Matching Algorithm for Content-Based Music Data Retrieval," Proc. IEEE International Conference on Multimedia Computing and Systems, 1999.
[Mcna97] McNab, R.J., L.A. Smith, D. Bainbridge, and I.H. Witten, “The New Zealand Digital Library: MELody inDEX,” D-Lib Magazine, 1997.
[Pick02] Pickens, J., and T. Crawford, “Harmonic Models for Polyphonic Music Retrieval,” Proc. ACM Conference on Information and Knowledge Management, 2002.