Where, Who and What? @AIT Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece

Post on 06-Jan-2016

40 views 0 download

Tags:

description

Where, Who and What? @AIT Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece. Aristodemos Pnevmatikakis, John Soldatos and Fotios Talantzis Athens Information Technology, Autonomic & Grid Computing. Overview. CHIL AIT SmartLab Signal Processing for perceptual components - PowerPoint PPT Presentation

Transcript of Where, Who and What? @AIT Intelligent Affective Interaction ICANN, Sept. 14, Athens, Greece

Where, Who and What?

@AIT

Intelligent Affective InteractionICANN, Sept. 14, Athens, Greece

Aristodemos Pnevmatikakis, John Soldatos and Fotios TalantzisAthens Information Technology, Autonomic & Grid Computing

Overview

• CHIL– AIT SmartLab

• Signal Processing for perceptual components– Video Processing– Audio Processing

• Services

• Middleware– Easing application assembly

Computers in the Human Interaction Loop• EU FP6 Integrated Project (IP

506909) • Coordinators: Universität

Karlsruhe (TH) Fraunhofer Institute IITB

• Duration: 36 months• Total Project costs: Over 24M€• Goal: Create environments in

which computers serve humans who focus on interacting with other humans as opposed to having to attend to and being preoccupied with the machines themselves

• Key Research Areas:– Perceptual Technologies– Software Infrastructure– Human-Centric Pervasive

Services

AIT SmartLab Equipment• Five fixed cameras (one with fish-eye lens)• PTZ camera• NIST 64-channel array• 4 clusters of 4 inverted T-shaped SHURE

microphone clusters• 4 tabletop microphones• 6 dual Xeon 3 GHz, 2 Gb PCs• Firewire cables & repeaters

AIT SmartLab

Perceptual Components

Detection and Identification System

Recognizer

Detector

Eye detector

Head detector

TrackerFace

normalizerFace

recognizer

Frontal verifier

Confidence estimator

Weighted voting

Classifier confidence

ID

Frontality confidence

Unconstrained Video Difficulties

Where and Who are the World Cup Finalists?

• and European Champions?

Tracking

Adaptive background

Parameters’ adaptation

Adaptive Background Module

Frames

Target association

Evidence Generation Module

Track initialization

Targetsplit?

Kalman Module

State

Prediction

Measurement update

Edge detection

Evidence extraction

SplitExisting

New

Predicted tracks

PPM

State information

No split

New state

Edges

Track consistency

Track memory

Track Consistency ModuleTargets

Tracking – Smart Spaces

Tracking – 3D from Synchronized Cameras

Tracking – Outdoors Surveillance

• AIT system 2nd in the VACE / NIST surveillance evaluations

Head DetectionEye

detectorHead

detectorTracker

Face normalizer

Face recognizer

Frontal verifier

Confidence estimator

Weighted voting

• Detection of head by processing the outline of the foreground belonging to the body

Eye DetectionEye

detectorHead

detectorTracker

Face normalizer

Face recognizer

Frontal verifier

Confidence estimator

Weighted voting

• Vector quantization of colors in head region

• Detect candidate eye regions– Based on resemblance to skin, brightness, shape and size

• Selection amongst candidates based on face geometry

Face Recognition from Video

Effect of Eye Misalignment: LDA

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

Numbero of training images per person

PM

C (

%)

Ideal eyes

Ideal for training, detected for testingDetected for training, testing

Effect of Eye Misalignment

0 1 2 3 4 5 6 70

10

20

30

40

50

60

RMS eye perturbation (%, relative to eye distance)

PM

C (%

)PCA

PCAw/o3

LDA

EBGM

Laplacianfaces

MACE

2D-HMM

Edginess No preprocessing Feature vector Post-decision

5

10

15

20

25

30

PM

C (

%)

Classifier FusionIllumination variations Pose variations

• Classifier fusion addresses the fact that different classifiers are optimum for different recognition impairments

Edginess No preprocessing Feature vector Post-decision0

10

20

30

40

50

60

70

80

PM

C (

%)

Fusion Across Time, Classifiers and Modalities

Speech of an individual collected

over 5 seconds

Faces of an individual collected

over 5 secondsH

isto

gram

eq

ualiz

atio

n

PCA LDA

Fus

ion

acro

ss ti

me

Fus

ion

acro

ss ti

me

N imagesN images N imagesN IDs and

confidences, PMC of 60%

N IDs and confidences, PMC of 58%

Fusion across classifiers

Single ID and confidence, PMC of 31%

Single ID and confidence, PMC of 36%

Visual ID and confidence, PMC of 29%

Fusion across modalities

Audio ID and confidence,

PMC of 9.7%

Audio-Visual ID, PMC of 6.8%

Face Recognition @ CLEAR2006

15 sec training 30 sec training

Testing duration

(sec)1 5 10 20 1 5 10 20

AIT 50.57 29.68 23.18 20.22 47.31 31.14 26.64 24.72

UKA 46.82 33.58 28.03 23.03 40.13 23.11 20.42 16.29

UPC 79.77 78.59 77.51 76.40 80.42 77.13 74.39 73.03

New AIT

45.35 27.01 17.65 15.73 43.72 17.76 13.49 7.86

Speaker ID @ CLEAR2006

15 sec training 30 sec training

Testing duration

(sec)1 5 10 20 1 5 10 20

AIT 26.92 9.73 7.96 4.49 15.17 2.68 1.73 0.56

CMU 23.65 7.79 7.27 3.93 14.36 2.19 1.38 0.00

LIMSI 51.71 10.95 6.57 3.37 38.83 5.84 2.08 0.00

UPC 24.96 10.71 10.73 11.80 15.99 2.92 3.81 2.81

AIT IS2006

25.69 5.60 4.50 2.25 15.01 2.19 2.42 0.0

Audiovisual ID @ CLEAR200615 sec training 30 sec training

Testing duration

(sec)1 5 10 20 1 5 10 20

AIT 23.65 6.81 6.57 2.81 13.70 2.19 1.73 0.56

UIUC primary

17.61 2.68 1.73 0.56 13.21 2.43 1.38 0.56

UIUC contrast

20.55 5.60 3.81 2.25 15.99 3.41 2.42 1.12

UKA / CMU

43.07 29.20 23.88 20.22 35.73 19.71 16.61 12.36

UPC 23.16 8.03 5.88 3.93 13.38 2.92 2.08 1.12

Audiovisual Tracker• Information-theoretic

speaker localization from mic. array– Accurate azimuth,

approximate depth, no elevation

• Moderate targeting of speaker’s face using a PTZ camera

• Refine targeting by visual face detection

Services

Memory Jog• Memory Jog:

– Context-Aware Human-Centric Assistant for meetings, lectures, presentations

– Proactive, Reactive Assistance and Information Retrieval

• Features-Functionalities– Sophisticated Situation Modeling / Tracking– Essentially Non-obtrusive Operation– Intelligent Meeting Recording Functionality– GUI runs also on PDA– Full Compliance to CHIL Architecture– Integration actuating devices (Targeted Audio,

Projectors)

Context as Network of SituationsTransition Elements & Components

NIL S1 Table Watcher (people in table area), SAD

S1 S2White-Board Watcher (presenter in speaker area),

Face ID, Speaker ID

S2 S3Speaker ID (speaker ID ≠ presenter ID), Speaker

Tracking

S3 S2Face Detection (presenter in speaker area),

Face ID, Speaker ID

S2 S4White-Board Watcher (no face in speaker area for N seconds), Table Watcher (all participants in meeting

table)

S4 S5 Table Watcher (nobody in table area)

What Happened While I was Away?

Middleware

Virtualized Sensor Access

CHIL Compliant Perceptual Components

• Several sites develop site, room, configuration specific Perceptual Components for CHIL

• Provide common abstractions in the input and output of the PC (black box)

• Facilitate Component Exchange Across Sites & Vendors

• Standardization commenced for Body Trackers– Continues to Face ID Components

Architecture for Body Tracker Exchange

Information retrieval

Transparent connection to sensor output

Common control API (CHILiX)

Services complying to current API

Non-CHIL Compliant Body Tracker

Sensor abstraction

Thank you!Questions?