Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World...

27
. Werner Hemmert, CPR ST 03-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition Multimodal human machine interaction System integration Scene analysis and representation

Transcript of Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World...

Page 1: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 1

Neuro-IT Roadmap: Successful in the Physical World

• Robust perception

• Image processing

• Speech recognition

• Multimodal human machine interaction

• System integration

• Scene analysis and representation

Page 2: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 2

Automotive: Overtake-Checker and Door-Opener Assistant

Lane-basedtransformation

eVehicles

Temporal feedback

Image

Lane

b

c d

f

Contourextraction

Motion estimationalong contours

Temporally stabilizedmotion segmentation

Vehicle detection

a Dr. Axel TechmerInfineon Technologies

Page 3: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 3

Security: Face Detection & Recognition

a

dc

b

Leading edge approach of face

detection (University of Bochum) Detection of face regions (a) Pre-selecting of frontal faces (b) Face recognition (c,d)

Elastic graph matchingGabor Wavelet Transform

Ruhr University Bochum

Page 4: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 4

Vision Instruction Processor (VIP)

Infineon Technologies, Corporate Research, Systems Technology

Page 5: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 5

Vision Instruction Processor (VIP)

16 parallelProcessing

Elements

Prototype available since May 2001:

SIMD - Architecture

204 instructions 10 Million logic transistors On-chip memory: 37KB Technology: 0.35µm Clock: 100 MHz Power consumption:

100µW/MOPS Die size: 22mm x 23mm Peak Performance: 53 GOPS

in 0.13µm CMOS Technology:

Clock: 200 MHz

Peak Perf.: 106 GOPS

Die Size: 70 mm²

Power Consump.: 700 mW

PCI-Board with VIP and camera

submodulesSoftware Tools for VIP:

Compiler, Debugger, ProfilerSoftware Tools on Host:

MS Visual C++ with VPL++-

LibraryApplication demonstrators

Car Vision, Face recognition,

MPEG2, GraphicInfineon Technologies, Corporate Research, Systems Technology

Page 6: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 6

Car Vision Components - Hardware

othersensor

s

CPUVehiclecontrol

othersensors

Dr. Axel TechmerInfineon Technologies

Page 7: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 7

Neuro-IT Roadmap: Successful in the Physical World

Robust perception

Image processing

• Speech recognition

• Multimodal human machine interaction

• System integration

• Scene analysis and representation

Page 8: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 8

Classical Sound Processing for Speech Recognition

A/D8 kHz

| FFT |

25 mswindow

every10 ms

Meltransformation

smoothedCepstrum

&loudness

normalized

Features HiddenMarkovModel

components

d/dt

d/dt

firstderivatives

secondderivatives

LOG&

threshold40 Hz

100 frequencies 24 channels 12 components 36 features

2 kHz

.

.

.

.

.

.

.

.

.

.

.

.

Filter

Microphone

4 kHz

80 Hz

160 Hz

Page 9: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 9

Speech production: time waveform

Page 10: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 10

|FFT| resolves neither frequency nor temporal structure

20 ms window

|FFT|• frequency resolution: 50 Hz• temporal resolution: 20 ms

Page 11: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 11

Classical Sound Processing for Speech Recognition

A/D8 kHz

| FFT |

25 mswindow

every10 ms

Meltransformation

smoothedCepstrum

&loudness

normalized

Features HiddenMarkovModel

components

d/dt

d/dt

firstderivatives

secondderivatives

LOG&

threshold40 Hz

100 frequencies 24 channels 12 components 36 features

2 kHz

.

.

.

.

.

.

.

.

.

.

.

.

Filter

Microphone

4 kHz

80 Hz

160 Hz

time structure of speech signal (<20 ms) is lost in the magnitude spectrum (|FFT|)

Humans extract both temporal- and spectralinformation for robust speech recognition

Page 12: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 12

Auditory Sound Processing

soundsignal

earcanal

middleear

Page 13: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 13

Auditory Sound Processing

inner earhydrodynamics

100µm

soundsignal

earcanal

middleear

Page 14: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 14

0 5 10 15 20 25 30 35

10-6

10-7

10-8

10-9

10-10

cochlear location (mm)

BM

dis

pla

cem

ent

(m)

level (dBSPL)

120

100

80

60

40

20

0

Dynamic Compression in the Inner Ear

basal apical

rate threshold

spee

ch

rang

e

Inner ear model responses to 1 kHz tones

spee

ch

rang

e

BW

Page 15: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 15

Auditory Sound Processing

sensorycell

inner earhydrodynamics

soundsignal

earcanal

middleear

synapticmechanisms

Page 16: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 16

Coding of Sound into Action Potentials

time (ms)

coch

lea

r lo

catio

n (

mm

)

0 20 40 60 80 100

5

10

15

20

25

30

F3

F2

F1

F0

regular firing pattern (t=10 ms f0=100 Hz)

low

high

fre

qu

enc

y

Page 17: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 17

Spectral- and Temporal Sound Processing in the Auditory Pathway

Page 18: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 18

Neuro-IT Roadmap: Successful in the Physical World

Robust perception

Image processing

Speech recognition

• Multimodal human machine interaction

• System integration

• Scene analysis and representation

Page 19: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 19

Audio-Visual Speech Recognition

Page 20: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 20

Audio-Visual Speech Recognition

Tracking of lip motion with sub-pixel precision

Page 21: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 21

Audio-Visual Speech Recognition

Tracking of lip motion with sub-pixel precision

“two - one - seven - three - five - nine - eight - zero - four - six”Hidden-

Markov

Speech

Recognizer

0 2 4 6 8 10 12

10 pixels

Variation of

mouth width

mouth height

nose to chindistance

time (s)

Page 22: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 22

Multi-modal: Pointing, gaze, gestures, mimics,…

Dr. Axel Steinhage, Infineon Technologies AG

Page 23: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 23

Neuro-IT Roadmap: Successful in the Physical World

Robust perception

Image processing

Speech recognition

Audio-visual speech recognition

Multimodal human machine interaction

• System integration

• Scene analysis and representation

Page 24: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 24

Man-Machine-Interaction based on natural communication channels

Virtual Personal Assistant (VPA)

Natural channels speech, lip-motion, gestures ...

Cheap sensors(Webcam,Microphone)

Items presented by VPA

Interactive comunication between user and VPA

Dr. Axel Steinhage, Infineon Technologies

Page 25: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 25

Man-Machine-Interaction based on

natural communication channels

Virtual Personal Assistant (VPA)

Human expert via Advanced Videophone (HHI)

Natural channels speech, lip-motion, gestures ...

Cheap sensors(Webcam,Microphone)

Items presented by VPA

Interactive comunication between user and VPA

Advanced Videophone

Dr. Axel Steinhage, Infineon Technologies

Page 26: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 26

What do we earn from Neuro-IT ?

• Sensitive Sensors

World knowledge “Constructed brain”

Robust processing “Tools for Neuroscience” “Successful in the Physical World”

“Conscious Machines”

• Robust perception

• Image processing

• Speech recognition

• Scene analysis and representation

• Intelligent human-machine interaction

• Natural feedback

• Intelligent virtual person

• Self learning Software

• Massively parallel processing hardware Digital and/or analog neuronal networks

“Factor 10”

Page 27: Dr. Werner Hemmert, CPR ST 2003-12-02 Page 1 Neuro-IT Roadmap: Successful in the Physical World Robust perception Image processing Speech recognition.

Dr. Werner Hemmert, CPR ST2003-12-02 Page 27

Neuro-IT Roadmap: Successful in the Physical World

Prof. Dr. Dr. h.c. H.-P. ZennerProf. Dr. A.W. Gummer

Werner Hemmert Infineon technologies AG CPR-ST

Prof. Dr. D.M. FreemanDr. M. Mermelstein, B. Tsai

U. Dürig, M. Despont, G. Genolet,U. Drechsler, P. Vettiger, G. Binning

MIT Micromechanics Group

Prof. Dr. U. Ramacher J.-P. de la Cruz-Guiterrez, M. HolmbergDr. A. Steinhage, Dr. A. Techmer

Explore the Future -Corporate Research