Music Information Retrieval

37
Base concepts The ideas Shazam Conclusions Music Information Retrieval (MIR) Approaches for content-based music retrieval Alberto De Bortoli December 1, 2009 Alberto De Bortoli Music Information Retrieval (MIR)

description

 

Transcript of Music Information Retrieval

Page 1: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music Information Retrieval (MIR)Approaches for content-based music retrieval

Alberto De Bortoli

December 1, 2009

Alberto De Bortoli Music Information Retrieval (MIR)

Page 2: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

What is MIR?

I Music information retrieval or MIR is the interdisciplinaryscience of retrieving information from music

I This includes:I Computational methods for classification, clustering, and

modellingI Formal methods and databasesI Software for music information retrievalI Human-computer interaction and interfacesI Music perception, cognition, affect, and emotionsI Music analysis and knowledge representationI Music archives, libraries, and digital collectionsI Intellectual property and rightsI Sociology and Economy of music

Alberto De Bortoli Music Information Retrieval (MIR)

Page 3: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

Dealing with Music

Music is different from text, we have to deal with itscharacteristics:

Pitch which is related to the perception of the fundamentalfrequency of a sound; pitch is said to range from lowor deep to high or acute sounds.

Intensity which is related to the amplitude, and thus to theenergy, of the vibration; textual labels for intensityrange from soft to loud; the intensity is also definedloudness.

Timbre which is defined as the sound characteristics thatallow listeners to perceive as different two soundswith same pitch and same intensity.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 4: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

Dimensions of the Music Language (1)

I Timbre depends on the perception of the quality of sounds.

I Orchestration is due to the composers and performerschoices in selecting which musical instruments are to beemployed.

I Acoustics can be considered as the contribution of roomacoustics, background noise, audio post-processing, filtering,and equalization.

I Rhythm is related to the periodic repetition, with possiblesmall variants.

I Melody is made of a sequence of tones with a similar timbrethat have a recognizable pitch within a small frequency range.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 5: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

Dimensions of the Music Language (2)

I Harmony is the organization, along the time axis, ofsimultaneous sounds with a recognizable pitch.

I Structure is a horizontal dimension whose time scale isdifferent from the previous ones, being related to macro-levelfeatures such as repetitions, interleaving of themes andchoruses, presence of breaks, changes of time signatures, andso on.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 6: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

Formats of Musical Documents

Apart from the peculiar characteristics of the forms, differentformats are able to capture only a reduced number of dimensions.The formats MIR deals with are:

I Audio formats (MP3, WAV...)I Symbolic format (score, MIDI files, MPEG7)

Alberto De Bortoli Music Information Retrieval (MIR)

Page 7: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

The Role of the User (1)

Its all about Information Need. There are three main reasons whyusers may want to access digital music:

1. listening to a particular performance or musical work;

2. building a collection of music (playlist);

3. verifying or identifying works.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 8: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

The Role of the User (2)

Potential users of MIR systems are divided in three categories:

1. casual users want to enjoy music, listening and collecting themusic they like and discovering new good music;

2. professional users need music suitable for particular usagesrelated to their activities, which may be in media productionor for advertisements;

3. music scholars, music theorists, musicologists, and musiciansare interested in studying music.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 9: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

The Casual User

Maybe the most important kind of user, queries as follow:

1. Find me a song that sounds like this

2. Given that I like these songs, find me more songs that I mayenjoy

3. I need to organize my personal collection of digital music(stored in my hard drive, portable device, MP3 player, cellphone, etc...)

Alberto De Bortoli Music Information Retrieval (MIR)

Page 10: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

The Professional User

Users may need to access music collections because of theirprofessional activity.

1. I am looking for a suitable soundtrack for...

2. Retrieve musical works that have a rhythm (melody, harmony,orchestration) similar to this one

Alberto De Bortoli Music Information Retrieval (MIR)

Page 11: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

Music languageThe formatsThe users

Music Theorists, Musicians

Differently from casual and professional users, these users areinterested mostly in obtaining information from the musical works,which is the subject of their study, rather than obtaining themusical work itself.

1. access musical scores in order to develop or refine theories onthe music language;

2. interested in the work of composers, performers, the role oftradition, the cultural interchanges;

3. retrieve scores to be performed and to listen to how renownedmusicians interpreted particular passages or complete works

Alberto De Bortoli Music Information Retrieval (MIR)

Page 12: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Metadata approach

I Simple I.R. queries

I Often absent (ID3 TAG), or not reliable (MPEG7)

I Not pure Music Retrieval, but something like Text Retrievalon Audio Metadata.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 13: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Content-based approach

I Dealing with intrinsic characteristics of the sounds.

I The main form is usually known by the MIR community asquery-by-humming (QbH), the term has been introduced in1995 (A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith.Query by humming: musical information retrieval in an audiodatabase).

I The interaction through the audio channel is an error-proneprocess. The query may be dramatically different from theoriginal song that it is intended to represent, and also differentfrom the users intentions.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 14: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

A MIR Architecture

Alberto De Bortoli Music Information Retrieval (MIR)

Page 15: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Lexical Semantical Units

I Can we treat musical notes as “words”?

I No! Notes are relative, notes are not context-free.

I What about the musical pitch intervals? Is there an analogyto the “words”?

I No! The abstraction level is too low, pitch intervals aresomething like characters.

I And what about pitch intervals sequences?

I In a certain way they are “word”!

I Pauses or rests do not play the same role of blanks in textdocuments.

I Vague concept of “stop words”.

I In general, music do not depend on intonation or on singlenotes!

Alberto De Bortoli Music Information Retrieval (MIR)

Page 16: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Extraction of the main melody

I The main melody of a piece is a good representation of it.I The computation of melodic information from a monophonic

score is straightforward.I A slightly more difficult task is given by a polyphonic score

made of several monophonic voices.I It is assumed that there is only a relevant melodic line.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 17: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Melody Segmentation: N-grams

I A simple segmentation approach consists on the extractionfrom a melody of all the subsequences of exactly N notes,called N-grams.

I The idea underlying this approach is that the effect ofmusically irrelevant N-grams will be compensated by thepresence of all the relevant ones.

I Each sequence, of any length, that is repeated at least Ktimes can be used as a content descriptor of the melodicinformation.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 18: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

The Pitch Contour (1)

I A simple but effective approach for searching melodies.I The Parsons Code is a simple notation used to identify a

piece of music through melodic motion.I u = “up” if the note is higher than the previous noteI d = “down” if the note is lower than the previous noteI r = “repeat” if the note is the same as the previous noteI * = first tone as reference

I Examples:I “Love Me Tender”: *uduududdduuI First verse in Madonna’s “Like a Virgin”: *rrurddrdrrurdudI First verse in “We Are the World”: *rduduururdrddrududuu

I Boyer & Moore, KMP algorithms or Suffix Tree structures canbe used.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 19: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

The Pitch Contour (2)

http://themefinder.com

Alberto De Bortoli Music Information Retrieval (MIR)

Page 20: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Audio form

I Fourier transform is one of the most frequently used tools forthe analysis of audio.

I A common way to represent an audio excerpt is thespectrogram.

I Spectrograms are not suitable content descriptors, because oftheir high dimensionality.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 21: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

The approachesMelody extractionPitch contourAudio elaboration

Self Similarity Matrix

I vi is the vector of the musical characteristics at time i.I The self-similarity matrix M = (mij) is defined as follow:

mij = s(vi , vj).I s is some similarity metric choosen.I M is symmetric along the diagonal.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 22: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

Shazam: magic?

DesiderataI providing a service that could connect people to music by

recognizing music by using their mobile phones;I The algorithm had to be able to recognize a short audio

sample of music (15 sec) that had been broadcast, mixed withheavy ambient noise, subject to reverb;

I The algorithm also had to perform the recognition quicklyover a large database of music.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 23: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (1)

I Each audio file is fingerprinted, a process in whichreproducible hash tokens are extracted;

I Both database and sample audio files are subjected to thesame analysis;

I The fingerprints from the unknown sample are matchedagainst a large set of fingerprints derived from the musicdatabase;

I The candidate matches are subsequently evaluated forcorrectness of match;

I Each fingerprint hash is calculated using audio samples near acorresponding point in time, so that distant events do notaffect the hash;

Alberto De Bortoli Music Information Retrieval (MIR)

Page 24: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (2)

I Fingerprint hashes derived from corresponding matchingcontent are reproducible independent of position within anaudio file;

I Robustness means that hashes generated from the originalclean database track should be reproducible from a degradedcopy of the audio;

I Consider spectrogram peaks, due to their robustness in thepresence of noise;

I A time-frequency point is a candidate peak if it has a higherenergy content than all its neighbors in a region centeredaround the point;

I Generate a costellation map of peaks;

Alberto De Bortoli Music Information Retrieval (MIR)

Page 25: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (3)

I Fingerprint hashes are formed from the constellation map, inwhich pairs of time-frequency points are combinatoriallyassociated;

I Anchor points are chosen, each anchor point having a targetzone associated with it;

I A set of hash (anchor frequency, target point frequency, timeoffset) records is generated;

I Create a database index: the above operation is carried out oneach track in a database to generate a corresponding list ofhashes and their associated offset times.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 26: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (4)

Alberto De Bortoli Music Information Retrieval (MIR)

Page 27: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (5)

Alberto De Bortoli Music Information Retrieval (MIR)

Page 28: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (6)

The corresponding times of matching features between matchingfiles have the relationship

tk ′ = tk + offset

where tk ′ is the time coordinate of the feature in the matching(clean) database soundfile and tk is the time coordinate of thecorresponding feature in the sample soundfile to be identified. Foreach (tk ′, tk) coordinate in the scatterplot, we calculate

δtk = tk ′ − tk

Then we calculate a histogram of these δtk values and scan for apeak. This may be done by sorting the set of δtk values.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 29: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

The process (7)

Alberto De Bortoli Music Information Retrieval (MIR)

Page 30: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

Error probability

I If each constellation point is taken to be an anchor point, andif the target zone has a fan-out of size F = 10, then thenumber of hashes is equal to F times the number ofconstellation points extracted from the file.

I Note that the combinatorial hashing squares the probability ofpoint survival.

I The surviving probability of at least one hash surviving for agiven anchor point would be the joint probability of the anchorpoint and at least one target point in its target zone surviving.

I p = probability of survival for all points involvedI p ∗ (1− (1− p)F ) = probability of at least one hash surviving

per anchor pointI p ≈ p ∗ (1− (1− p)F ) for large values of F , e.g. F > 10, and

reasonable values of p, e.g. p > 0.1.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 31: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

An example

I 64 frequency bins and delta time quantized to 6 bits

I Hash (f1, f2, delta) is 18 bits

I Fan-out = 10

I Peaks/sec = 3

I 450 hashes/sample (15 sec)

I 450 raw hashes are about 1 kbyte

Alberto De Bortoli Music Information Retrieval (MIR)

Page 32: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

Considerations

I Good for music where frequencies are clearly defined.

I Totally unacceptable for organic sounds and rumors (not thesupposed

Alberto De Bortoli Music Information Retrieval (MIR)

Page 33: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

Other systems (1)

I http://www.bored.com/songtapper/

Alberto De Bortoli Music Information Retrieval (MIR)

Page 34: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

What is about?The processSome considerationsOther systems

Other systems (2)

I www.midomi.com

I Sing, humming, state-of-art of QbH.

I Possible errors due to wrong-key, poor vocal performance,missing notes.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 35: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

MIR MapMIREX 2005Bibliography

MIR Map

Alberto De Bortoli Music Information Retrieval (MIR)

Page 36: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

MIR MapMIREX 2005Bibliography

MIREX 2005

I A project for the evaluation of MIR tasks.

I The campaign has been called Music Information RetrievalEvaluation eXchange (MIREX).

I There have been nine different tasks during the MIREX 2005campaign, six on the audio form and three on thesymbolicform.

Alberto De Bortoli Music Information Retrieval (MIR)

Page 37: Music Information Retrieval

Base conceptsThe ideas

ShazamConclusions

MIR MapMIREX 2005Bibliography

Bibliography

Nicola Orio. Music Retrieval: A Tutorial and Review.

Avery Li-Chun Wang. An Industrial-Strength Audio SearchAlgorithm.

James P. Ogle and Daniel P.W. Ellis. Fingerprinting to identifyrepeated sound events in long-duration personal audiorecordings.

Roberto Basili. Introduzione al Music Information Retrieval.

Michael Fingerhut. Music Information Retrieval, or how tosearch for (and maybe find) music and do away with incipits.

Progetto M.I.R.E.X.: http://www.music-ir.org.

Alberto De Bortoli Music Information Retrieval (MIR)