Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn...

18
Meinard Müller International Audio Laboratories Erlangen [email protected] 20.03.2017 Berlin MIR Meetup Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University Information Retrieval for Music and Motion 2007-2012 Senior Researcher Max-Planck Institut für Informatik, Saarland 2012: Professor Semantic Audio Processing Universität Erlangen-Nürnberg Stefan Balke Christian Dittmar Patricio López-Serrano Christof Weiß Frank Zalkow Thomas Prätzlich Group Members Stefan Balke Christian Dittmar Patricio López-Serrano Christof Weiß Frank Zalkow Thomas Prätzlich Group Members Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., 30 illus. in color, hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de International Audio Laboratories Erlangen

Transcript of Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn...

Page 1: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Meinard MüllerInternational Audio Laboratories [email protected]

20.03.2017Berlin MIR Meetup

Music Information RetrievalWhen Music Meets Computer Science

Meinard Müller

2001 PhD, Bonn University

2002/2003 Postdoc, Keio University, Japan

2007 Habilitation, Bonn University“Information Retrieval for Music and Motion”

2007-2012 Senior ResearcherMax-Planck Institut für Informatik, Saarland

2012: ProfessorSemantic Audio ProcessingUniversität Erlangen-Nürnberg

Stefan Balke

Christian Dittmar

Patricio López-Serrano

Christof Weiß

Frank Zalkow

Thomas Prätzlich

Group Members

Stefan Balke

Christian Dittmar

Patricio López-Serrano

Christof Weiß

Frank Zalkow

Thomas Prätzlich

Group Members

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., 30 illus. in color, hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

International Audio Laboratories Erlangen

Page 2: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

International Audio Laboratories Erlangen

Audio

International Audio Laboratories Erlangen

Audio

Audio Coding

Music ProcessingPsychoacoustics

3D Audio

Music

Music Information Retrieval

Sheet Music (Image) CD / MP3 (Audio) MusicXML (Text)

Music Film (Video)

Dance / Motion (Mocap) MIDI

Music Literature (Text)Singing / Voice (Audio)

Music

Music Information Retrieval

MusicMusicology

LibrarySciences

User Interfaces

Signal Processing

MachineLearning

Information Retrieval

Piano Roll Representation

Page 3: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Player Piano (1900)

Time

Pitch

J.S. Bach, C-Major Fuge

(Well Tempered Piano, BWV 846)

Piano Roll Representation (MIDI)

Query:

Goal: Find all occurrences of the query

Piano Roll Representation (MIDI)

Matches:

Piano Roll Representation (MIDI)

Query:

Goal: Find all occurrences of the query

Music Retrieval

Query

Database

Hit

Bernstein (1962) Beethoven, Symphony No. 5

Beethoven, Symphony No. 5: Bernstein (1962) Karajan (1982) Gould (1992)

Beethoven, Symphony No. 9 Beethoven, Symphony No. 3 Haydn Symphony No. 94

Audio-ID

Version-ID

Category-ID

Music Synchronization: Audio-Audio

Beethoven’s Fifth

Page 4: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Music Synchronization: Audio-Audio

Time (seconds)

Beethoven’s Fifth

Orchester(Karajan)

Piano(Scherbakov)

Music Synchronization: Audio-Audio

Time (seconds)

Beethoven’s Fifth

Orchester(Karajan)

Piano(Scherbakov)

Application: Interpretation Switcher Music Synchronization: Image-AudioIm

age

Aud

io

Music Synchronization: Image-Audio

Imag

eA

udio

How to make the data comparable?

Imag

eA

udio

Page 5: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

How to make the data comparable?

Imag

eA

udio

Image Processing: Optical Music Recognition

How to make the data comparable?

Imag

eA

udio

Audio Processing: Fourier Analysis

Image Processing: Optical Music Recognition

How to make the data comparable?

Imag

eA

udio

Audio Processing: Fourier Analysis

Image Processing: Optical Music Recognition

Application: Score Viewer

Music Processing

Coarse Level Fine Level

What do different versions have in common?

What are the characteristics of a specific version?

Music Processing

Coarse Level Fine Level

What do different versions have in common?

What are the characteristics of a specific version?

What makes up a piece of music? What makes music come alive?

Page 6: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Music Processing

Coarse Level Fine Level

What do different versions have in common?

What are the characteristics of a specific version?

What makes up a piece of music? What makes music come alive?

Identify despite of differences Identify the differences

Music Processing

Coarse Level Fine Level

What do different versions have in common?

What are the characteristics of a specific version?

What makes up a piece of music? What makes music come alive?

Identify despite of differences Identify the differences

Example tasks:Audio MatchingCover Song Identification

Example tasks:Tempo EstimationPerformance Analysis

Schumann: Träumerei

Performance Analysis

Time (seconds)

Performance:

Schumann: Träumerei

Performance Analysis

Time (seconds)

Score (reference):

Performance:

Schumann: Träumerei

Performance Analysis

Time (seconds)

Score (reference):

Performance:

Strategy: Compute score-audio synchronizationand derive tempo curve

Performance Analysis

Musical time (measures)

Mus

ical

tem

po (B

PM

)

Schumann: Träumerei

Score (reference):

Tempo Curve:

Page 7: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Performance Analysis

Musical time (measures)

Tempo Curves:

Schumann: Träumerei

Score (reference):

Mus

ical

tem

po (B

PM

)

Performance Analysis

Musical time (measures)

Tempo Curves:

Schumann: Träumerei

Score (reference):

Mus

ical

tem

po (B

PM

)

Performance Analysis

Musical time (measures)

Tempo Curves:

Schumann: Träumerei

Score (reference):

Mus

ical

tem

po (B

PM

) ?

Performance Analysis

Musical time (measures)

Mus

ical

tem

po (B

PM

)

Tempo Curves:

Schumann: Träumerei

What can be done if no reference is available?

Music Processing

Relative Absolute

Given: Several versions Given: One version

Music Processing

Relative Absolute

Given: Several versions Given: One version

Comparison of extractedparameters

Direct interpretation of extractedparameters

Page 8: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Music Processing

Relative Absolute

Given: Several versions Given: One version

Comparison of extractedparameters

Direct interpretation of extractedparameters

Extraction errors have often no consequence on final result

Extraction errors immediatelybecome evident

Music Processing

Relative Absolute

Given: Several versions Given: One version

Comparison of extractedparameters

Direct interpretation of extractedparameters

Extraction errors have often no consequence on final result

Extraction errors immediatelybecome evident

Example tasks:Music SynchronizationGenre Classification

Example tasks:Music TranscriptionTempo Estimation

Tempo Estimation and Beat Tracking

Basic task: “Tapping the foot when listening to music’’

Tempo Estimation and Beat Tracking

Time (seconds)

Example: Queen – Another One Bites The Dust

Basic task: “Tapping the foot when listening to music’’

Time (seconds)

Tempo Estimation and Beat Tracking

Example: Queen – Another One Bites The Dust

Basic task: “Tapping the foot when listening to music’’Example: Happy Birthday to you

Pulse level: Measure

Tempo Estimation and Beat Tracking

Page 9: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Example: Happy Birthday to you

Pulse level: Tactus (beat)

Tempo Estimation and Beat Tracking

Example: Happy Birthday to you

Pulse level: Tatum (temporal atom)

Tempo Estimation and Beat Tracking

Example: Chopin – Mazurka Op. 68-3

Pulse level: Quarter note

Tempo: ???

Tempo Estimation and Beat Tracking

Example: Chopin – Mazurka Op. 68-3

Pulse level: Quarter note

Tempo: 50-200 BPM

Time (beats)

Tem

po (B

PM

)

50

200

Tempo curve

Tempo Estimation and Beat Tracking

Tempo Estimation and Beat Tracking

Which temporal level?

Local tempo deviations

Sparse information(e.g., only note onsets available)

Vague information(e.g., extracted note onsets corrupt)

Freq

uenc

y (H

z)

Tempo Estimation and Beat Tracking

1. SpectrogramSteps:

Spectrogram

Time (seconds)

Page 10: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Tempo Estimation and Beat Tracking

1. Spectrogram2. Log Compression

Steps:Compressed Spectrogram

Time (seconds)

Freq

uenc

y (H

z)

Tempo Estimation and Beat Tracking

1. Spectrogram2. Log Compression3. Differentiation

Steps:Difference Spectrogram

Time (seconds)

Freq

uenc

y (H

z)

Tempo Estimation and Beat Tracking

1. Spectrogram2. Log Compression3. Differentiation4. Accumulation

Steps:

Novelty Curve

Time (seconds)

Tempo Estimation and Beat Tracking

1. Spectrogram2. Log Compression3. Differentiation4. Accumulation

Steps:

Novelty Curve

Time (seconds)

Local Average

Tempo Estimation and Beat Tracking

1. Spectrogram2. Log Compression3. Differentiation4. Accumulation5. Normalization

Steps:

Novelty Curve

Time (seconds)

Tempo Estimation and Beat Tracking

Tem

po (B

PM

)

Inte

nsity

Page 11: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Tempo Estimation and Beat Tracking

Tem

po (B

PM

)

Inte

nsity

Tempo Estimation and Beat Tracking

Tem

po (B

PM

)

Inte

nsity

Tempo Estimation and Beat Tracking

Tem

po (B

PM

)

Inte

nsity

Tem

po (B

PM

)

Inte

nsity

Time (seconds)

Tempo Estimation and Beat Tracking

Tempo Estimation and Beat Tracking

Novelty Curve

Predominant Local Pulse (PLP)

Time (seconds)

Tempo Estimation and Beat Tracking

Light effects

Music recommendation

DJ

Audio editing

Page 12: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Why is Music Processing Challenging?

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform

Chopin, Mazurka Op. 63 No. 3 Example:

Am

plitu

de

Time (seconds)

Why is Music Processing Challenging?

Waveform / Spectrogram

Chopin, Mazurka Op. 63 No. 3 Example:

Freq

uenc

y (H

z)

Time (seconds)

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Polyphony

Chopin, Mazurka Op. 63 No. 3 Example:

Main Melody

AccompanimentAdditional melody line

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Source Separation

Page 13: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Source Separation

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Several input signals

Sources are assumed to be statistically independent

Source Separation (Music)

Time

Time

Main melody, accompaniment, drum track

Instrumental voices

Individual note events

Only mono or stereo

Sources are often highly dependent

Harmonic-Percussive Decomposition

Mixture:

Harmonic-Percussive Decomposition

Harmonic component

Percussive component

Clearly percussive soundsClearly harmonic sounds

Mixture:

Harmonic-Percussive Decomposition

Clearly percussive soundsClearly harmonic sounds

Mixture:

Harmonic component

Residualcomponent

Percussive component

Harmonic-Percussive Decomposition

Mixture:

• Clearly harmonic sounds of singing voice and accompaniment

• Drum hits• Fricatives &

plosives in singing voice

• Noise-like sounds• Vibrato/glissando

sounds

Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/

Harmonic component

Percussive component

Residualcomponent

Literature: [Driedger/Müller/Disch, ISMIR 2014]

Page 14: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Singing Voice Extraction

Singing voice Accompaniment

Original Recording

Singing Voice Extraction

Original recording HPR

Harmonic component Residual componentPercussive component

Harmonic portion singing voice

MR TR SL

F0 annotation

Harmonic portion accompaniment

Fricativessinging voice

Instrument onsetsaccompaniment

Vibrato & formantssinging voice

Diffuse instruments soundsaccompaniment

+ +

Estimatesinging voice

Estimateaccompaniment

Time

Freq

uenc

y

Score-Informed Source SeparationExploit musical score to support separation process

Time

Pitc

hP

itch

Time

Pitc

h

Time

Freq

uenc

y (H

z)

Render

Parametric Model Approach

Estimate

Parameters

Time (seconds) Time (seconds)

Freq

uenc

y (H

z)

Rebuild spectrogram information

NMF (Nonnegative Matrix Factorization)

≈N

K

K

M

≥ 0 ≥ 0 ≥ 0

M

NMF (Nonnegative Matrix Factorization)

Templates Activations

N

M K

K

M

Magnitude Spectrogram

Templates: Pitch + Timbre

Activations: Onset time + Duration

“How does it sound”

“When does it sound”

Page 15: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

NMF-DecompositionN

ote

num

ber

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Random initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Note number

Not

e nu

mbe

r

Time

Learnt templates Learnt activations

Initialized template Initialized activations

Random initialization → No semantic meaning

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Constrained initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Activation constraints for p=55

Initialized template Initialized activations

Template constraint for p=55

Constrained initialization

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Not

e nu

mbe

r

Time

Org

Model

Note number

Initialized template Initialized activations

Constrained initialization → NMF as refinement

Learnt templates Learnt activations

Score-Informed Audio Decomposition

500

580

523

Freq

uenc

y (H

ertz

)

0 10.5Time (seconds)

9876

1600

1200

800

400

9876

1600

1200

800

400

500

580

554

Freq

uenc

y (H

ertz

)

0 10.5Time (seconds)

Application: Audio editing

Page 16: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Informed Drum-Sound Decomposition

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparationLiterature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]

Remix:

Loop Decomposition of EDM

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-ISMIR-EMLoopLiterature: [López-Serrano/Dittmar/Müller, ISMIR 2016]

Decomposition Patterns Activations

Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be

Mosaic signal: Let it Bee

Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBeeLiterature: [Driedger/Müller, ISMIR 2015]

NMF-Inspired Audio Mosaicing

. =

Non-negative matrix factorization (NMF)

Proposed audio mosaicing approach

.

Non-negative matrix Components Activations

Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

fixed

learnedfixed

learned

fixed

learned

=

Time source

Freq

uenc

y

Tim

e so

urce

Time targetTime target

Freq

uenc

y

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Core idea: support the development of sparse diagonal activation structures

Activation matrix

This image cannot currently be displayed.This image cannot currently be displayed.

Iterative updates

Preserve temporal context

Page 17: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times

Mosaic signal

Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep

Mosaic signal

Motivic Similarity Motivic Similarity

B A C H

Page 18: Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University

Teaching

Academic training of students

Fundamental research

Summary

Music information retrieval

Audio decomposition techniques

Machine learning

Music applications & musicology

Multimedia scenarios

Web-based interfaces

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

MIR-Related Events in Germany

AES Conference onSemantic Audio22 – 24 June 2017Erlangen

GI Jahrestagung25 – 29 September 2017Chemnitz

Workshop: Musik trifft Informatik26 September 2017

Tutorial: Musikverarbeitung25 September 2017