2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music...

17
Meinard Müller International Audio Laboratories Erlangen [email protected] 01.02.2017 IWR-Colloquium, Heidelberg Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University, Japan 2007 Habilitation, Bonn University Information Retrieval for Music and Motion 2007-2012 Senior Researcher Max-Planck Institut für Informatik, Saarland 2012: Professor Semantic Audio Processing Universität Erlangen-Nürnberg Stefan Balke Christian Dittmar Patricio López-Serrano Christof Weiß Frank Zalkow Group Members Book: Fundamentals of Music Processing Meinard Müller Fundamentals of Music Processing Audio, Analysis, Algorithms, Applications 483 p., 249 illus., 30 illus. in color, hardcover ISBN: 978-3-319-21944-8 Springer, 2015 Accompanying website: www.music-processing.de International Audio Laboratories Erlangen International Audio Laboratories Erlangen Audio

Transcript of 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music...

Page 1: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Meinard MüllerInternational Audio Laboratories [email protected]

01.02.2017IWR-Colloquium, Heidelberg

Music Information RetrievalWhen Music Meets Computer Science

Meinard Müller

2001 PhD, Bonn University

2002/2003 Postdoc, Keio University, Japan

2007 Habilitation, Bonn University“Information Retrieval for Music and Motion”

2007-2012 Senior ResearcherMax-Planck Institut für Informatik, Saarland

2012: ProfessorSemantic Audio ProcessingUniversität Erlangen-Nürnberg

Stefan Balke

Christian Dittmar

Patricio López-Serrano

Christof Weiß

Frank Zalkow

Group Members Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., 30 illus. in color, hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

International Audio Laboratories Erlangen International Audio Laboratories Erlangen

Audio

Page 2: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

International Audio Laboratories Erlangen

Audio

Audio Coding

Music ProcessingPsychoacoustics

3D Audio

Music

Music Information Retrieval

Sheet Music (Image) CD / MP3 (Audio) MusicXML (Text)

Music Film (Video)

Dance / Motion (Mocap) MIDI

Music Literature (Text)Singing / Voice (Audio)

Music

Music Information Retrieval

MusicMusicology

LibrarySciences

User Interfaces

Signal Processing

MachineLearning

Information Retrieval

Piano Roll Representation Player Piano (1900)

Page 3: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Time

Pitch

J.S. Bach, C-Major Fuge

(Well Tempered Piano, BWV 846)

Piano Roll Representation (MIDI)

Query:

Goal: Find all occurrences of the query

Piano Roll Representation (MIDI)

Matches:

Piano Roll Representation (MIDI)

Query:

Goal: Find all occurrences of the query

Music Retrieval

Query

Database

Hit

Bernstein (1962) Beethoven, Symphony No. 5

Beethoven, Symphony No. 5: Bernstein (1962) Karajan (1982) Gould (1992)

Beethoven, Symphony No. 9 Beethoven, Symphony No. 3 Haydn Symphony No. 94

Audio-ID

Version-ID

Kategorie-ID

Music Synchronization: Audio-Audio

Beethoven’s Fifth

Music Synchronization: Audio-Audio

Time (seconds)

Beethoven’s Fifth

Orchester(Karajan)

Piano(Scherbakov)

Page 4: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Music Synchronization: Audio-Audio

Time (seconds)

Beethoven’s Fifth

Orchester(Karajan)

Piano(Scherbakov)

Application: Interpretation Switcher

Music Synchronization: Image-Audio

Imag

eA

udio

Music Synchronization: Image-AudioIm

age

Aud

io

How to make the data comparable?

Imag

eA

udio

How to make the data comparable?

Imag

eA

udio

Image Processing: Optical Music Recognition

Page 5: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

How to make the data comparable?

Imag

eA

udio

Audio Processing: Fourier Analyse

Image Processing: Optical Music Recognition

How to make the data comparable?

Imag

eA

udio

Audio Processing: Fourier Analyse

Image Processing: Optical Music Recognition

Application: Score Viewer Why is Music Processing Challenging?

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform

Chopin, Mazurka Op. 63 No. 3 Example:

Am

plitu

de

Time (seconds)

Why is Music Processing Challenging?

Waveform / Spectrogram

Chopin, Mazurka Op. 63 No. 3 Example:

Freq

uenc

y (H

z)

Time (seconds)

Page 6: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Chopin, Mazurka Op. 63 No. 3 Example:

Why is Music Processing Challenging?

Waveform / Spectrogram

Performance– Tempo– Dynamics– Note deviations– Sustain pedal

Polyphony

Chopin, Mazurka Op. 63 No. 3 Example:

Main Melody

AccompanimentAdditional melody line

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Source Separation Source Separation

Decomposition of audio stream into different sound sources

Central task in digital signal processing

“Cocktail party effect”

Several input signals

Sources are assumed to be statistically independent

Source Separation (Music)

Time

Time

Main melody, accompaniment, drum track

Instrumental voices

Individual note events

Only mono or stereo

Sources are often highly dependent

Score-Informed Source SeparationExploit musical score to support seperation process

Time

Pitc

hP

itch

Time

Pitc

h

Time

Page 7: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Score-Informed Audio Decomposition

Parametric model: Rebuild spectrogram

NMF: Decompose spectrogram

Melody tracking

Strategies

Freq

uenc

y (H

z)

Render

Parametric Model Approach

Estimate

Parameters

Time (seconds) Time (seconds)

Freq

uenc

y (H

z)

Rebuild spectrogram information

Original audio recording

Parametric Model Approach

Freq

uenc

y (H

z)

Time (seconds)

Model initialized withscore information

Parametric Model Approach

Freq

uenc

y (H

z)

Time (seconds)

Original audio recording

Freq

uenc

y (H

z)

Time (seconds)

Parametric Model Approach

Freq

uenc

y (H

z)

Time (seconds)

Original audio recording

Freq

uenc

y (H

z)

Time (seconds)

Temporal synchronization Dynamics parameters

Parametric Model Approach

Freq

uenc

y (H

z)

Time (seconds)

Original audio recording

Freq

uenc

y (H

z)

Time (seconds)

Page 8: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Parametric Model ApproachFr

eque

ncy

(Hz)

Time (seconds)

Original audio recording

Freq

uenc

y (H

z)

Time (seconds)

Timbre parameters Model after several iterations

Parametric Model Approach

Freq

uenc

y (H

z)

Time (seconds)

Original audio recording

Freq

uenc

y (H

z)

Time (seconds)

Parametric Model Approach

Integration of musical knowledge is easily possible High degree of robustness

Advantages:

Problems:

Inaccurate if model assumptions are violated Computationally expensive

Each note parameterizes a portion of the spectrogram Explicit model for

pitch + timing dynamics timbre + instrumentation

Idea:

NMF (Nonnegative Matrix Factorization)

≈N

K

K

M

≥ 0 ≥ 0 ≥ 0

M

NMF (Nonnegative Matrix Factorization)

Templates Activiations

N

M K

K

M

Magnitude Spectrogram

Templates: Pitch + Timbre

Activiations: Onset time + Duration

“How does it sound”

“When does it sound”

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Random initialization

Page 9: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

NMF-DecompositionN

ote

num

ber

Freq

uenc

yFr

eque

ncy

Note number

Not

e nu

mbe

r

Time

Learnt templates Learnt activations

Initialized template Initialized activations

Random initialization → No semantic meaning

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Template constraints

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Template constraint for p=55

Template constraints

Initialized template Initialized activations

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Note number

Not

e nu

mbe

r

Time

Org

Model

Template constraints → Semantic decomposition

Initialized template Initialized activations

Learnt templates Learnt activations

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Activation constraints

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Activation constraints for p=55

Initialized template Initialized activations

Activation constraints

Page 10: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

NMF-DecompositionN

ote

num

ber

Freq

uenc

yFr

eque

ncy

Not

e nu

mbe

r

Time

Org

Model

Note number

Initialized template Initialized activations

Activation constraints → NMF as refinement

Learnt templates Learnt activations

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number Time

Initialized template Initialized activations

Additional onset models → NMF as refinement

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

y

Note number TimeOnset template for p=55 Activation constraints for onset template

Additional onset models → NMF as refinement

Initialized template Initialized activations

NMF-Decomposition

Not

e nu

mbe

r

Freq

uenc

yFr

eque

ncy

Note number

Not

e nu

mbe

r

Time

Org

Model

Learnt templates Learnt activations

Additional onset models → NMF as refinement

Initialized template Initialized activations

Score-Informed Source Separation

1. Split activation matrix

Score-Informed Source Separation

1. Split activation matrix

Page 11: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Score-Informed Source Separation

1. Split activation matrix2. Model spectrogram for left/right

Score-Informed Source Separation

1. Split activation matrix2. Model spectrogram for left/right3. Separation masks for left/right

Score-Informed Source Separation

1. Split activation matrix2. Model spectrogram for left/right3. Separation masks for left/right4. Estimated spectrograms for left/right

Score-Informed Audio Decomposition

Chopin, Waltz Op. 64, No. 1

Original

Application: Separating left and right hands for piano

Score-Informed Audio Decomposition

Chopin, Waltz Op. 64, No. 1

Original

Left/right hand

Right hand

Left hand

Application: Separating left and right hands for piano

Score-Informed Audio DecompositionParameterize audio signal using score’s note events

Score

Audio

Note-based Audio Events

Residual Audio

Page 12: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Score-Informed Audio DecompositionParameterize audio signal using score’s note events

Score

Audio

Note-based Audio Events

Residual Audio

Score-Informed Audio Decomposition

500

580

523

Freq

uenc

y (H

ertz

)

0 10.5Time (seconds)

9876

1600

1200

800

400

9876

1600

1200

800

400

500

580

554

Freq

uenc

y (H

ertz

)

0 10.5Time (seconds)

Application: Audio editing

NMF-Decomposition

Idea: Factorization of spectrogram Implicit modeling of signal properties

Advantages: Flexible and easy to implement Efficient

Problems: Decompostion difficult to control Often no semantic meaning

Strategy: Multiplicative update rules allow for introducinghard constraints to control the decomposition

Audio Decomposition

Works reasonable

Audio Decomposition

Much more difficult

Audio Decomposition

Related problems: F0 estimation Melody tracking Human voice Vibrato

Much more difficult

Page 13: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Melody Tracking

Justin Salamon and Emilia Gómez:Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE-TASLP 2012

F0 estimation Voice detection Pitch contour creation Melody selection

F0 Estimation

100

200

400

800

1600

19 2520 21 22 23 24

Time (seconds)

Freq

uenc

y (H

ertz

)

SpectrogramOriginal audio

F0 Estimation

19 2520 21 22 23 24

Time (seconds)

Freq

uenc

y (H

ertz

)

100

200

400

800

1600

Log-spectrogramOriginal audio

F0 Estimation

19 2520 21 22 23 24

Time (seconds)

Freq

uenc

y (H

ertz

)

100

200

400

800

1600

Log-spectrogramOriginal audio

F0 Estimation

19 2520 21 22 23 24

Time (seconds)

Freq

uenc

y (H

ertz

)

100

200

400

800

1600

Salience log-spectrogramOriginal audio

F0 Estimation

19 2520 21 22 23 24

Time (seconds)

Freq

uenc

y (H

ertz

)

100

200

400

800

1600

Score-informed F0

F0 estimation

Original audio

Page 14: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Score-Informed Source SeparationApplication: Voice separation and editing

Original audio

Separated voice

Score-Informed Source SeparationApplication: Voice separation and editing

Amplified vibrato

Separated voice

Original audio

Score-Informed Source SeparationApplication: Voice separation and editing

Pitch shift

Separated voice

Amplified vibrato

Original audio

Audio Mosaicing

Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be

Mosaic signal: Let it Bee

NMF-Inspired Audio Mosaicing

. =

Non-negative matrix factorization (NMF)

Proposed audio mosaicing approach

.

Non-negative matrix Components Activations

Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram

fixed

learnedfixed

learned

fixed

learned

[Driedger et al. ISMIR 2015]

=

Time source

Freq

uenc

y

Tim

e so

urce

Time targetTime target

Freq

uenc

y

Page 15: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

Core idea: support the development of sparse diagonal activation structures

Activation matrix

This image cannot currently be displayed.This image cannot currently be displayed.

Iterative updates

Preserve temporal context

NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

[Driedger et al. ISMIR 2015]NMF-Inspired Audio Mosaicing

Time target

Freq

uenc

y

Time source

Freq

uenc

y

Freq

uenc

y

Tim

e so

urce

Time targetTime target

. =≈

Spectrogram target

Spectrogram source

SpectrogrammosaicActivation matrix

[Driedger et al. ISMIR 2015]

NMF with Additional Update Rules

Time target

Tim

e so

urce

Time target

Tim

e so

urce

Time target

Tim

e so

urce

Time target

Tim

e so

urce

Activation matrix Repetition-restrictedactivation matrix

Neighborhoodconstraints

Polyphony-restrictedactivation matrix

Columnconstraints

Continuity-enhancedactivation matrix

Diagonal smoothing

Constraints are enforced by additional update rules Additional rules are interleaved with standard NMF update rules Soft alternative to NMFD

NMF with Additional Update Rules

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20195

6

7

8

9

10

11

12∙

Iteration number ℓ

Kul

lbac

k-Le

ible

rdiv

erge

nce

104

|| ℓ || ℓ || ℓ || ℓ|| ℓ

Kullback-Leibler Divergence between Target and Mosaic

Page 16: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times

Mosaic signal

Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep

Mosaic signal

Motivic Similarity

Beethoven’s Fifth (1st Mov.)

Motivic Similarity

Beethoven’s Fifth (1st Mov.)

Beethoven’s Fifth (3rd Mov.)

Motivic Similarity

Beethoven’s Fifth (1st Mov.)

Beethoven’s Fifth (3rd Mov.)

Beethoven’s Appassionata

Motivic Similarity

Page 17: 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn University 2002/2003 Postdoc,

Motivic Similarity

B A C H

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de

Book: Fundamentals of Music Processing

Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015

Accompanying website: www.music-processing.de