Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn...
Transcript of Music Information Retrieval · When Music Meets Computer Science Meinard Müller 2001 PhD, Bonn...
Meinard MüllerInternational Audio Laboratories [email protected]
20.03.2017Berlin MIR Meetup
Music Information RetrievalWhen Music Meets Computer Science
Meinard Müller
2001 PhD, Bonn University
2002/2003 Postdoc, Keio University, Japan
2007 Habilitation, Bonn University“Information Retrieval for Music and Motion”
2007-2012 Senior ResearcherMax-Planck Institut für Informatik, Saarland
2012: ProfessorSemantic Audio ProcessingUniversität Erlangen-Nürnberg
Stefan Balke
Christian Dittmar
Patricio López-Serrano
Christof Weiß
Frank Zalkow
Thomas Prätzlich
Group Members
Stefan Balke
Christian Dittmar
Patricio López-Serrano
Christof Weiß
Frank Zalkow
Thomas Prätzlich
Group Members
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., 30 illus. in color, hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
International Audio Laboratories Erlangen
International Audio Laboratories Erlangen
Audio
International Audio Laboratories Erlangen
Audio
Audio Coding
Music ProcessingPsychoacoustics
3D Audio
Music
Music Information Retrieval
Sheet Music (Image) CD / MP3 (Audio) MusicXML (Text)
Music Film (Video)
Dance / Motion (Mocap) MIDI
Music Literature (Text)Singing / Voice (Audio)
Music
Music Information Retrieval
MusicMusicology
LibrarySciences
User Interfaces
Signal Processing
MachineLearning
Information Retrieval
Piano Roll Representation
Player Piano (1900)
Time
Pitch
J.S. Bach, C-Major Fuge
(Well Tempered Piano, BWV 846)
Piano Roll Representation (MIDI)
Query:
Goal: Find all occurrences of the query
Piano Roll Representation (MIDI)
Matches:
Piano Roll Representation (MIDI)
Query:
Goal: Find all occurrences of the query
Music Retrieval
Query
Database
Hit
Bernstein (1962) Beethoven, Symphony No. 5
Beethoven, Symphony No. 5: Bernstein (1962) Karajan (1982) Gould (1992)
Beethoven, Symphony No. 9 Beethoven, Symphony No. 3 Haydn Symphony No. 94
Audio-ID
Version-ID
Category-ID
Music Synchronization: Audio-Audio
Beethoven’s Fifth
Music Synchronization: Audio-Audio
Time (seconds)
Beethoven’s Fifth
Orchester(Karajan)
Piano(Scherbakov)
Music Synchronization: Audio-Audio
Time (seconds)
Beethoven’s Fifth
Orchester(Karajan)
Piano(Scherbakov)
Application: Interpretation Switcher Music Synchronization: Image-AudioIm
age
Aud
io
Music Synchronization: Image-Audio
Imag
eA
udio
How to make the data comparable?
Imag
eA
udio
How to make the data comparable?
Imag
eA
udio
Image Processing: Optical Music Recognition
How to make the data comparable?
Imag
eA
udio
Audio Processing: Fourier Analysis
Image Processing: Optical Music Recognition
How to make the data comparable?
Imag
eA
udio
Audio Processing: Fourier Analysis
Image Processing: Optical Music Recognition
Application: Score Viewer
Music Processing
Coarse Level Fine Level
What do different versions have in common?
What are the characteristics of a specific version?
Music Processing
Coarse Level Fine Level
What do different versions have in common?
What are the characteristics of a specific version?
What makes up a piece of music? What makes music come alive?
Music Processing
Coarse Level Fine Level
What do different versions have in common?
What are the characteristics of a specific version?
What makes up a piece of music? What makes music come alive?
Identify despite of differences Identify the differences
Music Processing
Coarse Level Fine Level
What do different versions have in common?
What are the characteristics of a specific version?
What makes up a piece of music? What makes music come alive?
Identify despite of differences Identify the differences
Example tasks:Audio MatchingCover Song Identification
Example tasks:Tempo EstimationPerformance Analysis
Schumann: Träumerei
Performance Analysis
Time (seconds)
Performance:
Schumann: Träumerei
Performance Analysis
Time (seconds)
Score (reference):
Performance:
Schumann: Träumerei
Performance Analysis
Time (seconds)
Score (reference):
Performance:
Strategy: Compute score-audio synchronizationand derive tempo curve
Performance Analysis
Musical time (measures)
Mus
ical
tem
po (B
PM
)
Schumann: Träumerei
Score (reference):
Tempo Curve:
Performance Analysis
Musical time (measures)
Tempo Curves:
Schumann: Träumerei
Score (reference):
Mus
ical
tem
po (B
PM
)
Performance Analysis
Musical time (measures)
Tempo Curves:
Schumann: Träumerei
Score (reference):
Mus
ical
tem
po (B
PM
)
Performance Analysis
Musical time (measures)
Tempo Curves:
Schumann: Träumerei
Score (reference):
Mus
ical
tem
po (B
PM
) ?
Performance Analysis
Musical time (measures)
Mus
ical
tem
po (B
PM
)
Tempo Curves:
Schumann: Träumerei
What can be done if no reference is available?
Music Processing
Relative Absolute
Given: Several versions Given: One version
Music Processing
Relative Absolute
Given: Several versions Given: One version
Comparison of extractedparameters
Direct interpretation of extractedparameters
Music Processing
Relative Absolute
Given: Several versions Given: One version
Comparison of extractedparameters
Direct interpretation of extractedparameters
Extraction errors have often no consequence on final result
Extraction errors immediatelybecome evident
Music Processing
Relative Absolute
Given: Several versions Given: One version
Comparison of extractedparameters
Direct interpretation of extractedparameters
Extraction errors have often no consequence on final result
Extraction errors immediatelybecome evident
Example tasks:Music SynchronizationGenre Classification
Example tasks:Music TranscriptionTempo Estimation
Tempo Estimation and Beat Tracking
Basic task: “Tapping the foot when listening to music’’
Tempo Estimation and Beat Tracking
Time (seconds)
Example: Queen – Another One Bites The Dust
Basic task: “Tapping the foot when listening to music’’
Time (seconds)
Tempo Estimation and Beat Tracking
Example: Queen – Another One Bites The Dust
Basic task: “Tapping the foot when listening to music’’Example: Happy Birthday to you
Pulse level: Measure
Tempo Estimation and Beat Tracking
Example: Happy Birthday to you
Pulse level: Tactus (beat)
Tempo Estimation and Beat Tracking
Example: Happy Birthday to you
Pulse level: Tatum (temporal atom)
Tempo Estimation and Beat Tracking
Example: Chopin – Mazurka Op. 68-3
Pulse level: Quarter note
Tempo: ???
Tempo Estimation and Beat Tracking
Example: Chopin – Mazurka Op. 68-3
Pulse level: Quarter note
Tempo: 50-200 BPM
Time (beats)
Tem
po (B
PM
)
50
200
Tempo curve
Tempo Estimation and Beat Tracking
Tempo Estimation and Beat Tracking
Which temporal level?
Local tempo deviations
Sparse information(e.g., only note onsets available)
Vague information(e.g., extracted note onsets corrupt)
Freq
uenc
y (H
z)
Tempo Estimation and Beat Tracking
1. SpectrogramSteps:
Spectrogram
Time (seconds)
Tempo Estimation and Beat Tracking
1. Spectrogram2. Log Compression
Steps:Compressed Spectrogram
Time (seconds)
Freq
uenc
y (H
z)
Tempo Estimation and Beat Tracking
1. Spectrogram2. Log Compression3. Differentiation
Steps:Difference Spectrogram
Time (seconds)
Freq
uenc
y (H
z)
Tempo Estimation and Beat Tracking
1. Spectrogram2. Log Compression3. Differentiation4. Accumulation
Steps:
Novelty Curve
Time (seconds)
Tempo Estimation and Beat Tracking
1. Spectrogram2. Log Compression3. Differentiation4. Accumulation
Steps:
Novelty Curve
Time (seconds)
Local Average
Tempo Estimation and Beat Tracking
1. Spectrogram2. Log Compression3. Differentiation4. Accumulation5. Normalization
Steps:
Novelty Curve
Time (seconds)
Tempo Estimation and Beat Tracking
Tem
po (B
PM
)
Inte
nsity
Tempo Estimation and Beat Tracking
Tem
po (B
PM
)
Inte
nsity
Tempo Estimation and Beat Tracking
Tem
po (B
PM
)
Inte
nsity
Tempo Estimation and Beat Tracking
Tem
po (B
PM
)
Inte
nsity
Tem
po (B
PM
)
Inte
nsity
Time (seconds)
Tempo Estimation and Beat Tracking
Tempo Estimation and Beat Tracking
Novelty Curve
Predominant Local Pulse (PLP)
Time (seconds)
Tempo Estimation and Beat Tracking
Light effects
Music recommendation
DJ
Audio editing
Why is Music Processing Challenging?
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform
Chopin, Mazurka Op. 63 No. 3 Example:
Am
plitu
de
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Chopin, Mazurka Op. 63 No. 3 Example:
Freq
uenc
y (H
z)
Time (seconds)
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Chopin, Mazurka Op. 63 No. 3 Example:
Why is Music Processing Challenging?
Waveform / Spectrogram
Performance– Tempo– Dynamics– Note deviations– Sustain pedal
Polyphony
Chopin, Mazurka Op. 63 No. 3 Example:
Main Melody
AccompanimentAdditional melody line
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Source Separation
Source Separation
Decomposition of audio stream into different sound sources
Central task in digital signal processing
“Cocktail party effect”
Several input signals
Sources are assumed to be statistically independent
Source Separation (Music)
Time
Time
Main melody, accompaniment, drum track
Instrumental voices
Individual note events
Only mono or stereo
Sources are often highly dependent
Harmonic-Percussive Decomposition
Mixture:
Harmonic-Percussive Decomposition
Harmonic component
Percussive component
Clearly percussive soundsClearly harmonic sounds
Mixture:
Harmonic-Percussive Decomposition
Clearly percussive soundsClearly harmonic sounds
Mixture:
Harmonic component
Residualcomponent
Percussive component
Harmonic-Percussive Decomposition
Mixture:
• Clearly harmonic sounds of singing voice and accompaniment
• Drum hits• Fricatives &
plosives in singing voice
• Noise-like sounds• Vibrato/glissando
sounds
Demo: https://www.audiolabs-erlangen.de/resources/2014-ISMIR-ExtHPSep/
Harmonic component
Percussive component
Residualcomponent
Literature: [Driedger/Müller/Disch, ISMIR 2014]
Singing Voice Extraction
Singing voice Accompaniment
Original Recording
Singing Voice Extraction
Original recording HPR
Harmonic component Residual componentPercussive component
Harmonic portion singing voice
MR TR SL
F0 annotation
Harmonic portion accompaniment
Fricativessinging voice
Instrument onsetsaccompaniment
Vibrato & formantssinging voice
Diffuse instruments soundsaccompaniment
+ +
Estimatesinging voice
Estimateaccompaniment
Time
Freq
uenc
y
Score-Informed Source SeparationExploit musical score to support separation process
Time
Pitc
hP
itch
Time
Pitc
h
Time
Freq
uenc
y (H
z)
Render
Parametric Model Approach
Estimate
≈
Parameters
Time (seconds) Time (seconds)
Freq
uenc
y (H
z)
Rebuild spectrogram information
NMF (Nonnegative Matrix Factorization)
≈N
K
K
M
≥ 0 ≥ 0 ≥ 0
M
NMF (Nonnegative Matrix Factorization)
≈
Templates Activations
N
M K
K
M
Magnitude Spectrogram
Templates: Pitch + Timbre
Activations: Onset time + Duration
“How does it sound”
“When does it sound”
NMF-DecompositionN
ote
num
ber
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Random initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Note number
Not
e nu
mbe
r
Time
Learnt templates Learnt activations
Initialized template Initialized activations
Random initialization → No semantic meaning
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Initialized template Initialized activations
Constrained initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
y
Note number Time
Activation constraints for p=55
Initialized template Initialized activations
Template constraint for p=55
Constrained initialization
NMF-Decomposition
Not
e nu
mbe
r
Freq
uenc
yFr
eque
ncy
Not
e nu
mbe
r
Time
Org
Model
Note number
Initialized template Initialized activations
Constrained initialization → NMF as refinement
Learnt templates Learnt activations
Score-Informed Audio Decomposition
500
580
523
Freq
uenc
y (H
ertz
)
0 10.5Time (seconds)
9876
1600
1200
800
400
9876
1600
1200
800
400
500
580
554
Freq
uenc
y (H
ertz
)
0 10.5Time (seconds)
Application: Audio editing
Informed Drum-Sound Decomposition
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-IEEE-TASLP-DrumSeparationLiterature: [Dittmar/Müller, IEEE/ACM-TASLP 2016]
Remix:
Loop Decomposition of EDM
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2016-ISMIR-EMLoopLiterature: [López-Serrano/Dittmar/Müller, ISMIR 2016]
Decomposition Patterns Activations
Audio MosaicingSource signal: BeesTarget signal: Beatles–Let it be
Mosaic signal: Let it Bee
Demo: https://www.audiolabs-erlangen.de/resources/MIR/2015-ISMIR-LetItBeeLiterature: [Driedger/Müller, ISMIR 2015]
NMF-Inspired Audio Mosaicing
≈
. =
Non-negative matrix factorization (NMF)
Proposed audio mosaicing approach
≈
.
Non-negative matrix Components Activations
Target’s spectrogram Source’s spectrogram Activations Mosaic’s spectrogram
fixed
learnedfixed
learned
fixed
learned
=
Time source
Freq
uenc
y
Tim
e so
urce
Time targetTime target
Freq
uenc
y
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
Core idea: support the development of sparse diagonal activation structures
Activation matrix
This image cannot currently be displayed.This image cannot currently be displayed.
Iterative updates
Preserve temporal context
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
NMF-Inspired Audio Mosaicing
Time target
Freq
uenc
y
Time source
Freq
uenc
y
Freq
uenc
y
Tim
e so
urce
Time targetTime target
. =≈
Spectrogram target
Spectrogram source
SpectrogrammosaicActivation matrix
Audio MosaicingSource signal: WhalesTarget signal: Chic–Good times
Mosaic signal
Audio MosaicingSource signal: Race carTarget signal: Adele–Rolling in the Deep
Mosaic signal
Motivic Similarity Motivic Similarity
B A C H
Teaching
Academic training of students
Fundamental research
Summary
Music information retrieval
Audio decomposition techniques
Machine learning
Music applications & musicology
Multimedia scenarios
Web-based interfaces
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
Book: Fundamentals of Music Processing
Meinard MüllerFundamentals of Music ProcessingAudio, Analysis, Algorithms, Applications483 p., 249 illus., hardcoverISBN: 978-3-319-21944-8Springer, 2015
Accompanying website: www.music-processing.de
MIR-Related Events in Germany
AES Conference onSemantic Audio22 – 24 June 2017Erlangen
GI Jahrestagung25 – 29 September 2017Chemnitz
Workshop: Musik trifft Informatik26 September 2017
Tutorial: Musikverarbeitung25 September 2017