Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr....

55
Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr. Michael Casey Centre for Computational Creativity Department of Computing City University, London

Transcript of Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr....

Centre for Computational Creativity

Semantic Audio

Studio Tools and Techniques

using MPEG-7

Dr. Michael CaseyCentre for Computational Creativity

Department of Computing

City University, London

Centre for Computational Creativity

Overview• MPEG-7 Tools

• Low Level Audio Descriptors• Statistical Sound Models (Semantic ?)

• Music Unmixing• Independent Spectrogram Separation

• Sound Classification• Automatic label extraction• “Semantic” processing

• Segment Similarity, Structure Extraction Musaics• S-Matrix (Self-Similarity Matrix)• C-Matrix (Cross-Similarity Matrix)• Segment Replacement• Musaics

Centre for Computational Creativity

Semantic Audio Analysis

Acoustic Features

Extraction

Semantic Audio Description

Centre for Computational Creativity

MPEG-7 Audio DescriptorsHeader

Centre for Computational Creativity

MPEG-7 Audio Descriptors

Segments

Centre for Computational Creativity

MPEG-7 Audio Descriptors

Descriptor

Centre for Computational Creativity

Some Useful Descriptors for Music Processing

• AudioSpectrumEnvelopeD

• AudioSpectrumBasisD

• AudioSpectrumProjectionD

• SoundModelDS

• SoundModelStatePathD

• SoundModelStateHistogramD

Centre for Computational Creativity

EXAMPLE 1MUSIC UNMIXING

Centre for Computational Creativity

AudioSpectrumBasisD

Centre for Computational Creativity

AudioSpectrumBasisD

SVD / ICABasis Rotation

AudioSpectrumProjectionD

AudioSpectrumBasisD

Centre for Computational Creativity

AudioSpectrumBasisD

Centre for Computational Creativity

AudioSpectrumProjectionD

SVD / ICABasis Rotation

AudioSpectrumProjectionD

AudioSpectrumBasisD

Centre for Computational Creativity

AudioSpectrumProjectionD

Centre for Computational Creativity

Outer Product Spectrum Reconstruction Individual Basis Component

Centre for Computational Creativity

4 Component Reconstruction

Centre for Computational Creativity

10 Component Reconstruction

Centre for Computational Creativity

• Linear basis projection using SVD and ICA• spectrum subspace separation • fast computation of subspace ICA• full-rate filterbank masking

• Blocked ICA functions• subspace reconstruction Y = XVV• cluster subspaces to identify “tracks”• sum masked filterbank output to create audio

Music Unmixing

+j j j

Centre for Computational Creativity

1 Component

4 Components

10 Components

SubspaceExtraction

Mixture Spectrogram

Independent Spectrogram Subspace Layers

Spectral Basis

Time Function

Spectrogram Layer

Centre for Computational Creativity

Music Unmixing Example(Pink Floyd: mono -> 9 subspace tracks)

Centre for Computational Creativity

EXAMPLE 2

AUTOMATIC AUDIO CLASSIFICATION

Centre for Computational Creativity

Sound Model DSand related descriptors

1 3 3 2 2 3 4 4 4 4 ...

1

2 3

4

ContinuousHiddenMarkovModelDS

SoundModelStatePathD

AudioSpectrumBasisD

T(i,j)

x

AudioSpectrumEnvelopeD

AudioSpectrumProjectionD

Centre for Computational Creativity

Sound Recognition using HMMs

SOUNDINDEXES

MP7DATABASE

MODEL +STATES

STOREDHMMsAND

BASES

HMM

StatePathD

HMM 2

HMM 1

HMM N-1

HMM N

SOUND FILE FEATUREEXTRACT

ISABasis

SoundClassifier DS

HEADER:SoundModel DS

MaxlimumLikelihood

Trained HMMs

Sound Database

Centre for Computational Creativity

MPEG-7: Intelligent Music Browsing

Centre for Computational Creativity

Music Genre Classification:Class Name Num of Files Num Segments 1) Blues 79 862) hiphop 15 1293) Gospel 23 254) Country 27 285) DrumNBass 26 2756) Classical 8 1567) 2Step 39 3118) Merengue 34 3049) Reggae 80 39810) Salsa 39 425-------------------------------------------Totals 370 2137

Centre for Computational Creativity

Music Genre Classification

Centre for Computational Creativity

Semantic Audio:General Sound Taxonomy

Strings

Violin

Music

Brass

Trumpet Cello

Foley

LaughterPercussion

WoodWinds

Piano

Alto FluteShoes

FootStep Squeak

GunShots Explosions

Impacts

Animals

Dog Barks Birds

Applause

GlassSmash

Telephones

General Taxonomy

Guitar

Speech

Male

Female

English Horn

Centre for Computational Creativity

DS: General Audio Classification

Centre for Computational Creativity

EXAMPLE 3STRUCTURE EXTRACTION

Centre for Computational Creativity

Structure Discovery

Acoustic Features

State-Space Models

Hierarchical Structure Discovery

Centre for Computational Creativity

SoundModelStatePathD

State Path

A simplified representation of spectral dynamics

Centre for Computational Creativity

SoundModelStateHistogramD

seconds

state

index

state

index

0.01s Frames

Centre for Computational Creativity

High-Level Structure Discovery

Centre for Computational Creativity

S-Matrix

Centre for Computational Creativity

STRUCTURE EXTRACTION ==SEGMENTATION

Centre for Computational Creativity

Structure Discovery

Low level features

High-level Structure

Acoustic Features

State-Space Models

Hierarchical Structure Discovery

Centre for Computational Creativity

Alanis MorrisetteH

um

an

Se

gm

en

tatio

nM

ach

ine

Se

gm

en

tatio

n

High-Level Structure Discovery

Centre for Computational Creativity

CranberriesH

um

an

Se

gm

en

tatio

nM

ach

ine

Se

gm

en

tatio

n

High-Level Structure Discovery

Centre for Computational Creativity

NirvanaH

um

an

Se

gm

en

tatio

nM

ach

ine

Se

gm

en

tatio

n

High-Level Structure Discovery

Centre for Computational Creativity

High-Level Structure Discovery

Centre for Computational Creativity

EXAMPLE 4

MUSAICS

Centre for Computational Creativity

Musaics (Music Mosaics)

• C-Matrix : Cross-Song Similarity Matrix• Outer product of target and source histograms

• Find segments similar to target segment• Similarity between all target and database segments• SORT columns of similarity matrix

• Replace segments with similar material• Segmentation boundaries (beat alignment)• Replace with “best fit” using DTW on most similar segments

• EXAMPLES

Centre for Computational Creativity

Musaics

Target Extract

MPEG-7Database

StatePathHistograms

Segment

Beats

Match

Replace

Musaic

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

Centre for Computational Creativity

Musaics

• New Content by Similarity Replacement• C-Matrix: Cross-Song Similarity Map• 1 Target, Many Sources

• Constraints• Preserve Rhythm by Beat Tracking• Preserve Beats by DTW alignment

• Bigger Source Database == Better• Greater Number of Accurate Matches

Centre for Computational Creativity

Acknowledgements

• International Standards Organisation• ISO/IEC JTC 1 SC29 WG11 (MPEG)

• Mitsubishi Electric Research Labs• Massachusetts Institute of Technology

• Music Mind Machine Group (formerly Machine Listening Group)

• Paris Smaragdis, Youngmoo Kim, Brian Whitman• Iroro Orife, John Hershey, Alex Westner, Kevin Wilson

• City University • Department of Computing• Centre for Computational Creativity