Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine...

Informatik og Matematisk Modellering / Intelligent Signalbehandling

1Kaare Brandt Petersen

Machine Learning on Sound... how hard can it be?

Audio Information SeminarThursday, June 8, 2006Kaare Brandt Petersen

Kaare Brandt Petersen 2

Agenda Motivation

The reason it might be hard:- From data and information- Features

The good news:- Computer power and machine learning- Examples

Conclusions

Motivation What can we do with audio information?

News archive: Find the grumpy voice in a TV broadcasting from a busy street in the middle east. Search in newsarchives

Music: 6 billion friends. Navigating in the world landscape of music

Data Sound as perceived by humans

and by computers

-0.000762939453130.00231933593750-0.00714111328125 0.007720947265630.00076293945313-0.00772094726563-0.00900268554688-0.00527954101563-0.00076293945313-0.00231933593750-0.007141113281250.000244140625000.013122558593750.00650024414063-0.01052856445313-0.01089477539063-0.00305175781250-0.01052856445313-0.01089477539063-0.00305175781250

[ Beeps ]

- "There's the televison"

[ Music - violins ]

[ Steps ]- "Its all right there"- "All right there!"

- "Look. Listen. Neel. Pray" - "Commericals!"

[ Male voice - indoor ]

Dialogue Sound events

12 MonkeysMovie from 1995

Data Is the data-to-information translation really necessary?

1) Query by signal processing[ humans learn how computers think ]

2) Query by information[ computers learn how humans think ]

3) Query by example[ various approaches ]

"happy jazz"

ZCR < 198

Archive

Data Going from 5 million real

numbers to "Opera"

Bridging the gap: From data to information

Constructing sound features the right way

Information

Meaning

Context

Features Many shorttime features

Zero crossing rateSpectral flatnessSpectral bandwidthSpectral centroidsSpectral rolloffSpectral fluxEnergy...

Mel Frequency Cepstral Coefficients (MFCC) [Foote97, Rabiner93]Real Cepstral Coefficients (RCC) Linear Prediction Coefficients (LPC)Wavelets Gamma-tone-filterbanksSone / BarkChroma features...

MFCC 1

Sp-Flatness

MFCC 2-7

Waveform

Sp-BandwidthSp-Centroid

Chroma

12 Monkeys sound clip

Features Aggregating shorttime features

Audio clip = data cloud

Distribution of valuesBasic statistics [Wold96]Histograms and vector quantization [Foote97]Gaussian Mixture Models [Auc02]K-means clustering [Logan01]Anchors by Neural Networks [Beren03]

Temporal modellingSVD of e.g. spectrogram [Gu04] AR-coefficients [Meng05]

Features What we are trying to do: From data to information

-0.000762939453130.00231933593750-0.00714111328125 0.007720947265630.00076293945313-0.00772094726563-0.00900268554688-0.00527954101563-0.00076293945313-0.00231933593750-0.007141113281250.000244140625000.013122558593750.00650024414063-0.01052856445313-0.01089477539063

ZCRSpectralMFCCChromaSone/BarkRCCLPC...

Low-levelFeatures

Basic statsGMMKmeansAnchorsAR coeffSVDHMM...

High-levelFeatures

"Rough""Deep""Sparky""Broad""Melancolic""Majestic""Jazz""Rock"...

Information

Features Music similarity example

"Shape of my heart"Backstreet Boys, 2000

"Thats the way it is"Celine Dion, 2000

"Cantaloop"Us3, 1993

"The limitations observed in this paper (...) suggests that the usual route to timbre similarity may not be the optimal one" [Auc04]

The bad news Sound data is far from the information

Not all features are useful

It is not obvious what the information labels should be

The good news Computer power Signal processing

- strong development in signal processing and machine learning in general

- Large amounts of data

- Increased interest in sound and music processing

Example: Genre estimation Genre estimation by temporal

integration

Peter AhrendtAnders Meng[Meng05]

Processing:Sound -> MFCC -> AR

Example: Genre estimation Genre estimation by temporal integration +

kernel methods

Jeronimo Arenas-GarciaTue Lehn-SchiølerKaare Brandt Petersen [ArGa06]

Processing:Sound -> MFCC -> AR -> KOPLS

Btw: A data harvesting tool coming up - ISMIR 2006

Example: Source separation Spectrogram modelling with

sparse NTF2D

Morten MørupMikkel Schmidt, [Mørup06]

W = time-frequency patternsH = time, amplitude, pitch

0 2 4 6

Time [s]

0 0.2 0.4 0.6 0.8200

Original (mixed)

Separated sources (Harp) (Flute)

Example: CNN Translating a CNN news broadcast

Kasper JørgensenLasse MølgaardLars Kai Hansen[Jorg06]

Music or Speech?Sound -> MFCC, STE, SpF, ZCR -> mean/var

Speaker change detectionSound -> MFCC -> VQ

Speech recognitionSphinx 4 (Carnegie Mellon)

ConclusionsIt is hard:

Sound data is far from the information Good features are hard to find

but machine learning is catching up:

Examples: Genre, Source separation, CNN-translation

References[Wold96] Wold, E.; Blum, T.; Keislar, D. & Wheaton, J. "Content-based Classification, Search, and Retrieval of Audio" IEEE Multimedia, 1996, 3, 27-36 [Foote97] Foote, J."Content-based retrieval of music and audio", Multimedia Storage and Archiving Systems II, Proc. of SPIE, 1997, 3229, 138-147[Logan01] Logan and Salomon, "A music similarity function based on signal analysis", ICME 2001[Beren03] Berenzweig, Ellis and Lawrence, "Anchorspace for classification and similarity measurement of music" ICME 2003[Rabiner93] Rabiner, L. & Juang, B.H. "Fundamentals of Speech Recognition", Prentice-Hall, 1993 [Gu04] Gu, Lu, Cai and Zhang, "Dominant Feature vector based audio similarity measure", Proceedings of the Pacific Rim Conference on Multimedia, PCM, 2004[Tza02] Tzanetakis and Cook, "Music Genre Classification of Music", IEEE Transactions on Speech and Audio Processing, 2002, 10, 293-302[Auc02] Aucouturier and Pachet, "Music Similarity Measures: Whats the use?" ISMIR 2002 [Meng05] Anders Meng, Peter Ahrendt and Jan Larsen: "Improving Music Genre Classification by Short-Time Feature Integration", ICASSP, 2005. [Auc04] Aucouturier, Pachet, "Improving Timbre Similarity: How high is the sky?", JNRSAS, 2004[Mørup06] Sparse Non-negative Tensor Factor Double Deconvolution (SNTF2D) for multi channel time-frequency analysis", submitted to JMLR 2006[ArGa06], "Reduced Kaernel Orthonormal Partial Least Squares", submitted for NIPS 2006[Jorg06] Kasper Jørgensen, Lasse Mølgaard, Lars Kai Hansen, "Unsupervised speaker change detection for broadcast news segmentation", EUSIPCO 2006

Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine...

Documents

Transcript of Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine...

Bruk av Wavelets (en relativt ny matematisk metode) innen medisinsk bildebehandling.

Henrik Schiøler Konstruktion, modellering og validering af sikkerhedskritiske SW systemer.

Matematisk kommunikation mellan lärare och elever

Signalbehandling og matematik (Tidsdiskrete signaler og systemer)

Matematisk modelering og simulering Hans Petter Langtangen Simula Research Laboratory Dept. of Informatics, Univ. of Oslo.

Sigbeh, Hels, tenta i aug 1998 - eit.lth.se · Lesson 1 Optimal Signal Processing Optimal signalbehandling LTH September 2013 Statistical Digital Signal Processing and Modeling, Hayes,

Översikt Signaler, information & 2D signalbehandling … · 2016-02-23 · 2 För en digital bild gäller • Ibland är samplen flyttalsvärden. Dessa transformeras till intervallet

Computer Graphics Modellering engels

PROJEKTKURS I ADAPTIV SIGNALBEHANDLING · 1 PROJEKTKURS I ADAPTIV SIGNALBEHANDLING Room Acoustics with associated fundamentals of acoustics PURPOSE ...

Biostatistik og modellering af fysiologiske processerweb.math.ku.dk/~susanne/foredragEsbjerghandouts.pdf · Tidlige eksempler • 1747: James Lind randomiserede 12 skørbugspatienter

I Matematisk Seminar Universitetet i Oslo December 1966 ... · Matematisk Seminar Universitetet i Oslo Nr. 11 I December 1966. ON THE FACIAL STRUCTURE OF A COMPAKT CONVEX K AND THE

Institut for Informatik & Matematisk Modellering

17.02.2003SIF 8060 - Modellering av informasjonssystemer, 20031 Processes in Requirements Engineering Raimundas Matulevičius.

Hierarchical Linear Models and Structural Equation ...Examensarbete i matematisk statistik, 30 hp Handledare: Paul Lichtenstein, Medicinsk epidemiologi och biostatistik, Karolinska

John-Olof Nilsson Signalbehandling KTH fileFirst responder positioning in urban operations GPS GPS A robust, accurate positioning system for urban operations requires the use of a

Turbulens – Teori och modellering · Turbulens – Teori och modellering. Introduction. Two questions: • Why did you chose this course? • What are your expectations? ... Osborne

INF5120 Modellering med Objekter 06.05 - Universitetet i oslo · INF5120 Modellering med Objekter 06.05.2004 5 ICT – Information and Communication Technologies 9 OMG Metamodeling

Sigbeh, Hels, tenta i aug 1998 - LTH · Lesson 1 Optimal Signal Processing Optimal signalbehandling LTH September 2013 Statistical Digital Signal Processing and Modeling, Hayes, M:

Modellering og regulering af et konstrueret pneumatisk system til ...

SIF 8060 - Modellering av informasjonssystemer, 20031 VORD Viewpoint Oriented Requirements Definition Raimundas Matulevičius.