2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music...
Transcript of 2001 PhD, Bonn University Music Information Retrieval · Music Information Retrieval When Music...
Meinard MüllerInternational Audio Laboratories [email protected]
01.02.2017IWR-Colloquium, Heidelberg
Music Information RetrievalWhen Music Meets Computer Science
Meinard Müller
2001 PhD, Bonn University
2002/2003 Postdoc, Keio University, Japan
2007 Habilitation, Bonn University“Information Retrieval for Music and Motion”
2007-2012 Senior ResearcherMax-Planck Institut für Informatik, Saarland
2012: ProfessorSemantic Audio ProcessingUniversität Erlangen-Nürnberg
Stefan Balke
Christian Dittmar
Patricio López-Serrano
Christof Weiß
Frank Zalkow
Group Members Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., 30 illus. in color, hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
International Audio Laboratories Erlangen International Audio Laboratories Erlangen
Audio
International Audio Laboratories Erlangen
Audio
Audio Coding
Music ProcessingPsychoacoustics
3D Audio
Music
Music Information Retrieval
Sheet Music (Image) CD / MP3 (Audio) MusicXML (Text)
Music Film (Video)
Dance / Motion (Mocap) MIDI
Music Literature (Text)Singing / Voice (Audio)
Music
Music Information Retrieval
MusicMusicology
LibrarySciences
User Interfaces
Signal Processing
MachineLearning
Information Retrieval
Piano Roll Representation Player Piano (1900)
Time
Pitch
J.S. Bach, C-Major Fuge
(Well Tempered Piano, BWV 846)
Piano Roll Representation (MIDI)
Query:
Goal: Find all occurrences of the query
Piano Roll Representation (MIDI)
Matches:
Piano Roll Representation (MIDI)
Query:
Goal: Find all occurrences of the query
Music Retrieval
Query
Database
Hit
Bernstein (1962) Beethoven, Symphony No. 5
Beethoven, Symphony No. 5: Bernstein (1962) Karajan (1982) Gould (1992)
Beethoven, Symphony No. 9 Beethoven, Symphony No. 3 Haydn Symphony No. 94
Audio-ID
Version-ID
Kategorie-ID
Music Synchronization: Audio-Audio
Beethoven’s Fifth
Music Synchronization: Audio-Audio
Time (seconds)
Beethoven’s Fifth
Orchester(Karajan)
Piano(Scherbakov)
Music Synchronization: Audio-Audio
Time (seconds)
Beethoven’s Fifth
Orchester(Karajan)
Piano(Scherbakov)
Application: Interpretation Switcher
Music Synchronization: Image-Audio
Imag
eA
udio
Music Synchronization: Image-AudioIm
age
Aud
io
How to make the data comparable?
Imag
eA
udio
How to make the data comparable?
Imag
eA
udio
Image Processing: Optical Music Recognition
How to make the data comparable?
Imag
eA
udio
Audio Processing: Fourier Analyse
Image Processing: Optical Music Recognition
How to make the data comparable?
Imag
eA
udio
Audio Processing: Fourier Analyse
Image Processing: Optical Music Recognition
Application: Score Viewer Why is Music Processing Challenging?
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform
Chopin, Mazurka Op. 63 No. 3 Example:
Am
plitu
de
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Chopin, Mazurka Op. 63 No. 3 Example:
Freq
uenc
y (H
z)
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Polyphony
Chopin, Mazurka Op. 63 No. 3 Example:
Main Melody
AccompanimentAdditional melody line
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Source Separation Source Separation
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Several input signals
Sources are assumed to be statistically independent
Source Separation (Music)
Time
Time
Main melody, accompaniment, drum track
Instrumental voices
Individual note events
Only mono or stereo
Sources are often highly dependent
Score-Informed Source SeparationExploit musical score to support seperation process
Time
Pitc
hP
itch
Time
Pitc
h
Time
Score-Informed Audio Decomposition
Parametric model: Rebuild spectrogram
NMF: Decompose spectrogram
Melody tracking
Strategies
Freq
uenc
y (H
z)
Render
Parametric Model Approach
Estimate
≈
Parameters
Time (seconds) Time (seconds)
Freq
uenc
y (H
z)
Rebuild spectrogram information
Original audio recording
Parametric Model Approach
Freq
uenc
y (H
z)
Time (seconds)
Model initialized withscore information
Parametric Model Approach
Freq
uenc
y (H
z)
Time (seconds)
Original audio recording
Freq
uenc
y (H
z)
Time (seconds)
Parametric Model Approach
Freq
uenc
y (H
z)
Time (seconds)
Original audio recording
Freq
uenc
y (H
z)
Time (seconds)
Temporal synchronization Dynamics parameters
Parametric Model Approach
Freq
uenc
y (H
z)
Time (seconds)
Original audio recording
Freq
uenc
y (H
z)
Time (seconds)
Parametric Model ApproachFr
eque
ncy
(Hz)
Time (seconds)
Original audio recording
Freq
uenc
y (H
z)
Time (seconds)
Timbre parameters Model after several iterations
Parametric Model Approach
Freq
uenc
y (H
z)
Time (seconds)
Original audio recording
Freq
uenc
y (H
z)
Time (seconds)
Parametric Model Approach
Integration of musical knowledge is easily possible High degree of robustness
Advantages:
Problems:
Inaccurate if model assumptions are violated Computationally expensive
Each note parameterizes a portion of the spectrogram Explicit model for
pitch + timing dynamics timbre + instrumentation
Idea:
NMF (Nonnegative Matrix Factorization)
≈N
K
K
M
≥ 0 ≥ 0 ≥ 0
M
NMF (Nonnegative Matrix Factorization)
≈
Templates Activiations
N
M K
K
M
Magnitude Spectrogram
Templates: Pitch + Timbre
Activiations: Onset time + Duration
“How does it sound”
“When does it sound”
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Random initialization
NMF-DecompositionN
ote
num
ber
Freq
uenc
yFr
eque
ncy
Note number
Not
e nu
mbe
r
Time
Learnt templates Learnt activations
Initialized template Initialized activations
Random initialization → No semantic meaning
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Template constraints
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Template constraint for p=55
Template constraints
Initialized template Initialized activations
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Note number
Not
e nu
mbe
r
Time
Org
Model
Template constraints → Semantic decomposition
Initialized template Initialized activations
Learnt templates Learnt activations
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Activation constraints
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Activation constraints for p=55
Initialized template Initialized activations
Activation constraints
NMF-DecompositionN
ote
num
ber
Freq
uenc
yFr
eque
ncy
Not
e nu
mbe
r
Time
Org
Model
Note number
Initialized template Initialized activations
Activation constraints → NMF as refinement
Learnt templates Learnt activations
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Additional onset models → NMF as refinement
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number TimeOnset template for p=55 Activation constraints for onset template
Additional onset models → NMF as refinement
Initialized template Initialized activations
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Note number
Not
e nu
mbe
r
Time
Org
Model
Learnt templates Learnt activations
Additional onset models → NMF as refinement
Initialized template Initialized activations
Score-Informed Source Separation
1. Split activation matrix
Score-Informed Source Separation
1. Split activation matrix
Score-Informed Source Separation
1. Split activation matrix2. Model spectrogram for left/right
Score-Informed Source Separation
1. Split activation matrix2. Model spectrogram for left/right3. Separation masks for left/right
Score-Informed Source Separation
1. Split activation matrix2. Model spectrogram for left/right3. Separation masks for left/right4. Estimated spectrograms for left/right
Score-Informed Audio Decomposition
Chopin, Waltz Op. 64, No. 1
Original
Application: Separating left and right hands for piano
Score-Informed Audio Decomposition
Chopin, Waltz Op. 64, No. 1
Original
Left/right hand
Right hand
Left hand
Application: Separating left and right hands for piano
Score-Informed Audio DecompositionParameterize audio signal using score’s note events
Score
Audio
Note-based Audio Events
Residual Audio
Score-Informed Audio DecompositionParameterize audio signal using score’s note events
Score
Audio
Note-based Audio Events
Residual Audio
Score-Informed Audio Decomposition
500
580
523
Freq
uenc
y (H
ertz
)
0 10.5Time (seconds)
9876
1600
1200
800
400
9876
1600
1200
800
400
500
580
554
Freq
uenc
y (H
ertz
)
0 10.5Time (seconds)
Application: Audio editing
NMF-Decomposition
Idea: Factorization of spectrogram Implicit modeling of signal properties
Advantages: Flexible and easy to implement Efficient
Problems: Decompostion difficult to control Often no semantic meaning
Strategy: Multiplicative update rules allow for introducinghard constraints to control the decomposition
Audio Decomposition
Works reasonable
Audio Decomposition
Much more difficult
Audio Decomposition
Related problems: F0 estimation Melody tracking Human voice Vibrato
Much more difficult
Melody Tracking
Justin Salamon and Emilia Gómez:Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE-TASLP 2012
F0 estimation Voice detection Pitch contour creation Melody selection
F0 Estimation
100
200
400
800
1600
19 2520 21 22 23 24
Time (seconds)
Freq
uenc
y (H
ertz
)
SpectrogramOriginal audio
F0 Estimation
19 2520 21 22 23 24
Time (seconds)
Freq
uenc
y (H
ertz
)
100
200
400
800
1600
Log-spectrogramOriginal audio
F0 Estimation
19 2520 21 22 23 24
Time (seconds)
Freq
uenc
y (H
ertz
)
100
200
400
800
1600
Log-spectrogramOriginal audio
F0 Estimation
19 2520 21 22 23 24
Time (seconds)
Freq
uenc
y (H
ertz
)
100
200
400
800
1600
Salience log-spectrogramOriginal audio
F0 Estimation
19 2520 21 22 23 24
Time (seconds)
Freq
uenc
y (H
ertz
)
100
200
400
800
1600
Score-informed F0
F0 estimation
Original audio
Score-Informed Source SeparationApplication: Voice separation and editing
Original audio
Separated voice
Score-Informed Source SeparationApplication: Voice separation and editing
Amplified vibrato
Separated voice
Original audio
Score-Informed Source SeparationApplication: Voice separation and editing
Pitch shift
Separated voice
Amplified vibrato
Original audio
Audio Mosaicing
Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be
Mosaic signal: Let it Bee
NMF-Inspired Audio Mosaicing
≈
. =
Non-negative matrix factorization (NMF)
Proposed audio mosaicing approach
≈
.
Non-negative matrix Components Activations
Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram
fixed
learnedfixed
learned
fixed
learned
[Driedger et al. ISMIR 2015]
=
Time source
Freq
uenc
y
Tim
e so
urce
Time targetTime target
Freq
uenc
y
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
Core idea: support the development of sparse diagonal activation structures
Activation matrix
This image cannot currently be displayed.This image cannot currently be displayed.
Iterative updates
Preserve temporal context
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
[Driedger et al. ISMIR 2015]NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
[Driedger et al. ISMIR 2015]
NMF with Additional Update Rules
Time target
Tim
e so
urce
Time target
Tim
e so
urce
Time target
Tim
e so
urce
Time target
Tim
e so
urce
Activation matrix Repetition-restrictedactivation matrix
Neighborhoodconstraints
Polyphony-restrictedactivation matrix
Columnconstraints
Continuity-enhancedactivation matrix
Diagonal smoothing
Constraints are enforced by additional update rules Additional rules are interleaved with standard NMF update rules Soft alternative to NMFD
NMF with Additional Update Rules
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20195
6
7
8
9
10
11
12∙
Iteration number ℓ
Kul
lbac
k-Le
ible
rdiv
erge
nce
104
|| ℓ || ℓ || ℓ || ℓ|| ℓ
Kullback-Leibler Divergence between Target and Mosaic
Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times
Mosaic signal
Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep
Mosaic signal
Motivic Similarity
Beethoven’s Fifth (1st Mov.)
Motivic Similarity
Beethoven’s Fifth (1st Mov.)
Beethoven’s Fifth (3rd Mov.)
Motivic Similarity
Beethoven’s Fifth (1st Mov.)
Beethoven’s Fifth (3rd Mov.)
Beethoven’s Appassionata
Motivic Similarity
Motivic Similarity
B A C H
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de