Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video...

49
Audio-visual Analysis of Music Performance Bochen Li Advisor: Zhiyao Duan Audio Information Research Lab University of Rochester

Transcript of Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video...

Page 1: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Audio-visual Analysis of Music Performance

Bochen LiAdvisor: Zhiyao Duan

Audio Information Research LabUniversity of Rochester

Page 2: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Background

Music is a multi-modal art form

Page 3: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Background

• Visual component is not a marginal phenomenon in music perception, but an important factor in the communication of meanings [Platz’12]

• Visually perceived elements of performances (gesture, motion, facial expressions of the performer), affect the evaluations of judges [Tsay’13]

Music is a multi-modal art form

Page 4: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Background

More music video streaming service nowadays

Page 5: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Background

Multi-modal Music Information Retrieval (MIR)

• Instrument Recognition

• Playing Activity Detection

• Polyphonic Music Analysis

• Fingering Estimation

• Conductor Following

• Cross-modal Music Generation/Retrieval

Page 6: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Background

State of the art

Tasks Percussion Piano Guitar Strings Wind Singing

Fingering N/A ✔ ✔ ✔ N/A

Association ✔

Play/Non-play ✔ ✔ ✔

Onset ✔ ✔

Vibrato N/A N/A ✔

Transcription ✔ ✔ ✔ ✔

Separation ✔

Page 7: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Overview

Research Topics

Case study 1: Multi-modal Source Association

- Body motion [Li’17a], finger motion, vibrato motion [Li’17b]

Case study 2: Performance Expressiveness Analysis

- Vibrato analysis [Li’17c]- Visual performance generation [Li’18]

Case study 3: Visually Informed Music Transcription

- Guitar transcription [Paleari’08]- Piano transcription [Akbari’15]- Violin transcription [Zhang’07]- Multi-pitch analysis for strings [Dinesh’17]

Case study 4: Visually Informed Audio Source Separation

- Motion-driven [Parekh’17]- Cross-modal deep representations [Zhao’18]

Page 8: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1:Multi-modal Source Association

• Concept

• System 1: Body Motion Analysis [Li’17a]

• System 2: Finger Motion Analysis

• System 3: Vibrato Motion Analysis [Li’17b]

• Integrated System

Page 9: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Concept

Input Output

Video

Audio

Score

Page 10: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Concept

Track associated

Temporally not aligned

Player/track not associated

Temporally aligned

Player/track not associated

Temporally not aligned

Video Performance

Score Tracks

Audio Tracks

• Alignment: mapping of event sequence in temporal domain

• Association: bijection between tracks/players in ensemble

Audio tracks are separated using score-informed techniques

Page 11: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Concept

Traditional MIR task

• Alignment: mapping of event sequence in temporal domain

• Association: bijection between tracks/players in ensemble

Track associated

Temporally not aligned

Player/track not associated

Temporally aligned

Player/track not associated

Temporally not aligned

Video Performance

Score Tracks

Audio Tracks

Audio tracks are separated using score-informed techniques

Page 12: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Concept

• Alignment: mapping of event sequence in temporal domain

• Association: bijection between tracks/players in ensemble

Traditional MIR task

Track associated

Temporally aligned

Player/track not associated

Temporally aligned

Player/track not associated

Temporally aligned

Video Performance

Score Tracks

Audio Tracks

Audio tracks are separated using score-informed techniques

Proposed Work

Page 13: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

System 1: Body Motion Analysis

Association Optimization

Source 2

Source 3Audio-scoreAlignment

Source Separation

Motion AnalysisVideo

Audio

MIDI

Player A

Player B

Player C

Overview

• Designed for string instrumentalists

• Bow stroke tone onset

Source 1

Page 14: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Visual feature extraction

• Method 1: Optical flow estimation

- Pixel-level motion velocity- Calculated between adjacent frames

System 1: Body Motion Analysis

Page 15: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Visual feature extraction

• Method 2: Human Pose Estimation

- Semantic representation- Frame-wise estimation without tracking- Less computation cost

System 1: Body Motion Analysis

Page 16: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Onset Likelihood Estimation

System 1: Body Motion Analysis

Page 17: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Association Optimization

M1,1 M2,1 M3,1 M4,1

M1,2 M2,2 M3,2 M4,2

M1,3 M2,3 M3,3 M4,3

M1,4 M2,4 M3,4 M4,4

M: pair-wise correspondence scoreσ: permutation function

System 1: Body Motion Analysis

Page 18: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

System 2: Finger Motion Analysis

Visual feature extraction

• Method: Optical flow estimation

Page 19: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Onset Likelihood Estimation

System 2: Finger Motion Analysis

Page 20: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

System 3: Vibrato Motion Analysis

Visual feature extraction

• Method: Optical flow estimation

Page 21: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Vibrato correspondence

VideoMotion Velocity

AudioPitch Trajectory

System 3: Vibrato Motion Analysis

Page 22: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Integrated System

Overview

• Works for all common instruments in Western chamber music

• Universal framework

Page 23: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 1: Multi-modal Source Association

Evaluations

• Longer duration higher chance to retrieve correct association

• Association accuracy: string > wind/brass

Integrated System

Page 24: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2:Performance Expressiveness Analysis

• Vibrato Detection and Analysis [Li’17c]

• Visual Performance Generation [Li’18]

Page 25: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Concept

• Important artistic effect• Pitch modulation of a note in a periodic fashion• Characterized by Rate & Extent

Spectrogram

Audio

Vibrato

Non-vibrato

Page 26: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Vibrato Detection

• Note-level vibrato/non-vibrato classification

Vibrato Analysis

• Vibrato rate: speed of pitch variation (1/T Hz)

• Vibrato extent: amount of pitch variation (A cents)

T

A

Pitch

Time

Time

Pitch

Page 27: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Ground-truth

Audio-based, polyphonic

Video-based

Pitch

PitchSpec

Hand Hand Displacement

0.2 0.4 0.6 0.8 1.0 1.2 sec

0 0.2 0.4 0.6 0.8 1.0 1.2 sec

0 0.2 0.4 0.6 0.8 1.0 1.2 sec

Page 28: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Motion Feature Extraction

• Hand tracking• Optical flow estimation

Page 29: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Vibrato Detection

• Each note segment as a sample• Support Vector Machine (SVM)

Classifier

Vibrato / Non-vibrato

8-D

t

Note segment

Page 30: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Vibrato Analysis

• Principal Component Analysis• Vibrato rate: motion rate

Page 31: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

Vibrato Analysis

• Principal Component Analysis• Vibrato rate: motion rate• Vibrato extent: motion extent rescaled by pitch contour

Estimated vib extent

Pitch contour

Motion extent

Estimated pitch contour Motion displacement Curve X(t)

Ground-truth pitch contour

Page 32: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Vibrato Detection and Analysis

EvaluationsProposed

• Outperforms audio-based baseline systems

Page 33: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Visual Performance Generation

InputMIDI score

OutputSkeleton movement as pianist

downbeat other

pick-up

Pitch

Beat

Time

Concept

Page 34: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Visual Performance Generation

LSTM

CNN CNN

50d 10d

MIDI Note Stream Metric Structure

Output

Input

timetime

pitch beat

Body SkeletonMethod

• Convolutional Neural Network (CNN)

- Automatic feature extraction from MIDI

• Recurrent Neural Network (RNN)

- Model the temporal coherence

Page 35: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 2: Performance Expressiveness Analysis

Visual Performance Generation

Evaluations

YouTube Playlist: https://www.youtube.com/playlist?list=PLSf9SKAnNHL1je3Cfsx9xho07sWcvJhGV

Page 36: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3:Visually Informed Music Transcription

• Music Transcription for Guitar [Paleari’08]

• Music Transcription for Piano [Akbari’15]

• Music Transcription for Violin [Zhang’07]

• Multi-pitch Analysis for String Ensemble [Dinesh’17]

Page 37: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Guitar Music Transcription [Paleari’08]

Method

1. Guitar body localization2. Fretboard tracking3. Hand detection4. Audio-visual information fusion

Evaluations

• 89% correctly detected notes

Page 38: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Piano Music Transcription [Akbari’15]

Method

1. Keyboard registration2. Illumination normalization3. Pressed key detection4. Music transcription

Evaluations

• 95% F-measure

Page 39: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Violin Music Transcription [Zhang’07]

Method

1. Motion compensation2. Multiple finger tracking3. String detection4. Fingering event detection5. Note inference

Page 40: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Multi-pitch Analysis for String Ensembles [Dinesh’17]

Concept

• Multi-pitch Estimation (MPE)

- Estimate instantaneous pitches and polyphony

• Multi-pitch Streaming (MPS)

- Organize pitches into streams corresponding to individual sources

Original Spectrogram MPE MPS

Page 41: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Multi-pitch Analysis for String Ensembles

System overview

• Video play/non-play (P/NP) activity detection

• P/NP provide instantaneous polyphony number

• P/NP only assign detected pitches to active sources

Page 42: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Multi-pitch Analysis for String Ensembles

Evaluations

Multi-pitch Estimation

Page 43: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 3: Music Transcription

Multi-pitch Analysis for String Ensembles

Evaluations

Multi-pitch Streaming

Page 44: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4:Visually Informed Source Separation

• Motion-driven [Parekh’17]

• Cross-modal Deep Representation [Zhao’18]

Page 45: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4: Source Separation

Motion-driven [Parekh’17]]

Overview

• Speed of sound-producing motion Characteristics of sound event• Extends the Non-negative Matrix Factorization (NMF)

Page 46: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4: Source Separation

Motion-driven [Parekh’17]]

Overview

• V: audio mixture’s spectrogram • W: audio dictionary• M: clustered average motion speeds• H: activity matrix (shared by video and audio)• A: regression coefficients for each motion cluster

Page 47: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4: Source Separation

Cross-modal Deep Representation [Zhao’18]

Overview

• Train a two-stream network using large amount of audio-video data• Learn cross-modal representation (embedding in aligned space)• Localize audio source in the video frame• Separate the sound for each pixel in the video

Page 48: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4: Source Separation

Cross-modal Deep Representation [Zhao’18]

Results

• Clustering of sound in space

Page 49: Bochen Li Advisor: Zhiyao Duan - University of …zduan/teaching/ece477/lectures...More music video streaming service nowadays Background Multi-modal Music Information Retrieval (MIR)

Case Study 4: Source Separation

Cross-modal Deep Representation [Zhao’18]

Results

• Visualization of corresponding channel activations