Computational Modelling Within Johnson Matthey Technology Centre
Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr....
-
Upload
edmund-hart -
Category
Documents
-
view
218 -
download
0
Transcript of Centre for Computational Creativity Semantic Audio Studio Tools and Techniques using MPEG-7 Dr....
Centre for Computational Creativity
Semantic Audio
Studio Tools and Techniques
using MPEG-7
Dr. Michael CaseyCentre for Computational Creativity
Department of Computing
City University, London
Centre for Computational Creativity
Overview• MPEG-7 Tools
• Low Level Audio Descriptors• Statistical Sound Models (Semantic ?)
• Music Unmixing• Independent Spectrogram Separation
• Sound Classification• Automatic label extraction• “Semantic” processing
• Segment Similarity, Structure Extraction Musaics• S-Matrix (Self-Similarity Matrix)• C-Matrix (Cross-Similarity Matrix)• Segment Replacement• Musaics
Centre for Computational Creativity
Semantic Audio Analysis
Acoustic Features
Extraction
Semantic Audio Description
Centre for Computational Creativity
Some Useful Descriptors for Music Processing
• AudioSpectrumEnvelopeD
• AudioSpectrumBasisD
• AudioSpectrumProjectionD
• SoundModelDS
• SoundModelStatePathD
• SoundModelStateHistogramD
Centre for Computational Creativity
AudioSpectrumBasisD
SVD / ICABasis Rotation
AudioSpectrumProjectionD
AudioSpectrumBasisD
Centre for Computational Creativity
AudioSpectrumProjectionD
SVD / ICABasis Rotation
AudioSpectrumProjectionD
AudioSpectrumBasisD
Centre for Computational Creativity
Outer Product Spectrum Reconstruction Individual Basis Component
Centre for Computational Creativity
• Linear basis projection using SVD and ICA• spectrum subspace separation • fast computation of subspace ICA• full-rate filterbank masking
• Blocked ICA functions• subspace reconstruction Y = XVV• cluster subspaces to identify “tracks”• sum masked filterbank output to create audio
Music Unmixing
+j j j
Centre for Computational Creativity
1 Component
4 Components
10 Components
SubspaceExtraction
Mixture Spectrogram
Independent Spectrogram Subspace Layers
Spectral Basis
Time Function
Spectrogram Layer
Centre for Computational Creativity
Sound Model DSand related descriptors
1 3 3 2 2 3 4 4 4 4 ...
1
2 3
4
ContinuousHiddenMarkovModelDS
SoundModelStatePathD
AudioSpectrumBasisD
T(i,j)
x
AudioSpectrumEnvelopeD
AudioSpectrumProjectionD
Centre for Computational Creativity
Sound Recognition using HMMs
SOUNDINDEXES
MP7DATABASE
MODEL +STATES
STOREDHMMsAND
BASES
HMM
StatePathD
HMM 2
HMM 1
HMM N-1
HMM N
SOUND FILE FEATUREEXTRACT
ISABasis
SoundClassifier DS
HEADER:SoundModel DS
MaxlimumLikelihood
Trained HMMs
Sound Database
Centre for Computational Creativity
Music Genre Classification:Class Name Num of Files Num Segments 1) Blues 79 862) hiphop 15 1293) Gospel 23 254) Country 27 285) DrumNBass 26 2756) Classical 8 1567) 2Step 39 3118) Merengue 34 3049) Reggae 80 39810) Salsa 39 425-------------------------------------------Totals 370 2137
Centre for Computational Creativity
Semantic Audio:General Sound Taxonomy
Strings
Violin
Music
Brass
Trumpet Cello
Foley
LaughterPercussion
WoodWinds
Piano
Alto FluteShoes
FootStep Squeak
GunShots Explosions
Impacts
Animals
Dog Barks Birds
Applause
GlassSmash
Telephones
General Taxonomy
Guitar
Speech
Male
Female
English Horn
Centre for Computational Creativity
Structure Discovery
Acoustic Features
State-Space Models
Hierarchical Structure Discovery
Centre for Computational Creativity
SoundModelStatePathD
State Path
A simplified representation of spectral dynamics
Centre for Computational Creativity
SoundModelStateHistogramD
seconds
state
index
state
index
0.01s Frames
Centre for Computational Creativity
Structure Discovery
Low level features
High-level Structure
Acoustic Features
State-Space Models
Hierarchical Structure Discovery
Centre for Computational Creativity
Alanis MorrisetteH
um
an
Se
gm
en
tatio
nM
ach
ine
Se
gm
en
tatio
n
High-Level Structure Discovery
Centre for Computational Creativity
CranberriesH
um
an
Se
gm
en
tatio
nM
ach
ine
Se
gm
en
tatio
n
High-Level Structure Discovery
Centre for Computational Creativity
NirvanaH
um
an
Se
gm
en
tatio
nM
ach
ine
Se
gm
en
tatio
n
High-Level Structure Discovery
Centre for Computational Creativity
Musaics (Music Mosaics)
• C-Matrix : Cross-Song Similarity Matrix• Outer product of target and source histograms
• Find segments similar to target segment• Similarity between all target and database segments• SORT columns of similarity matrix
• Replace segments with similar material• Segmentation boundaries (beat alignment)• Replace with “best fit” using DTW on most similar segments
• EXAMPLES
Centre for Computational Creativity
Musaics
Target Extract
MPEG-7Database
StatePathHistograms
Segment
Beats
Match
Replace
Musaic
Centre for Computational Creativity
Musaics
• New Content by Similarity Replacement• C-Matrix: Cross-Song Similarity Map• 1 Target, Many Sources
• Constraints• Preserve Rhythm by Beat Tracking• Preserve Beats by DTW alignment
• Bigger Source Database == Better• Greater Number of Accurate Matches
Centre for Computational Creativity
Acknowledgements
• International Standards Organisation• ISO/IEC JTC 1 SC29 WG11 (MPEG)
• Mitsubishi Electric Research Labs• Massachusetts Institute of Technology
• Music Mind Machine Group (formerly Machine Listening Group)
• Paris Smaragdis, Youngmoo Kim, Brian Whitman• Iroro Orife, John Hershey, Alex Westner, Kevin Wilson
• City University • Department of Computing• Centre for Computational Creativity