1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and...

30
1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris Multimedia Knowledge Laboratory CERTH-ITI October 24, 2008 Thessaloniki, Greece

Transcript of 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and...

Page 1: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

1Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Exploitation of knowledge in video recordings

Dr. Alexia Briassouli, Dr. Yiannis Kompatsiaris

Multimedia Knowledge LaboratoryCERTH-ITI

October 24, 2008Thessaloniki, Greece

Page 2: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

22

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Evolution of Content• 1-2 exabytes (millions of

terabytes) of new information produced world-wide annually

• 80 billion of digital images are captured each year

• Over 1 billion images related to commercial transactions are available through the Internet

• This number is estimated to increase by ten times in the next two years.

• 4 000 new films are produced each year

• 300 000 world-wide available films

• 33 000 television stations and 43 000 radio stations

• 100 billions of hours of audiovisual content

Personal Content

Sport - News

Movies

Web Mobile

Page 3: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

33

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

DIRECTOR

SCENE

TAKE

TITLE

Multimedia Content

Networks

Storage & Devices

Segmentation KA

AnalysisLabeling

Cross-media analysis

Context

Reasoning

Metadata Generation &

Representation

Content adaptation and distribution -

Multiple Terminal & Networks

Hybrid / Content-based retrieval

recommendations and personalization

Semantic technology in

MarketsWeb 2.0 photo

- video applications

Page 4: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

44

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Need for annotation + medatata

“The value of information depends on how easily it can be

found, retrieved, accessed, filtered or managed in an active, personalized way”

Page 5: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

5Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis

Video analysis that exploits knowledge provides significant advantages:Improved accuracy of semantics from videoHigher level concepts inferred through

exploitation of knowledge combined with video processing:Knowledge about behavior, event detection

More efficient storage, access, retrieval, dissemination of multimodal data because of the (automatically generated) annotations

Page 6: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

6Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis in JUMAS

Page 7: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

77

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Text-based indexing

• Manual annotation • + Straightforward• + High/Semantic level• + Efficient during

content creation• Most commonly used• Necessary in a number

of applications• - Time consuming• - Operator-application

dependent• - Text related problems

(synonyms etc)

• Annotation using captions and related

text• Web, Video, Documents etc

• + Straightforward• + High/Semantic

level• + Multimodal

approach• - Text processing

restrictions and limitations

• - Captions must exist

Page 8: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

88

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Addressing the Semantic GapSemantic Gap

• Semantic Gap for multimedia: To map automatically generated numerical low level-features to higher level human-understandable semantic concepts<?xml version='1.0' encoding='ISO-8859-1' ?>

<Mpeg7 xmlns…> <DescriptionUnit xsi:type = "DescriptorCollectionType">

<Descriptor xsi:type = "DominantColorType"> <SpatialCoherency>31</SpatialCoherency>

<Value> <Percentage>31</Percentage>

<Index>19 23 29 </Index> <ColorVariance>0 0 0 </ColorVariance>

</Value> </Descriptor>

</DescriptionUnit></Mpeg7>

<?xml version='1.0' encoding='ISO-8859-1' ?><Mpeg7 xmlns…>

<DescriptionUnit xsi:type = "DescriptorCollectionType"> <Descriptor xsi:type = "DominantColorType"> <SpatialCoherency>31</SpatialCoherency>

<Value> <Percentage>31</Percentage>

<Index>19 23 29 </Index> <ColorVariance>0 0 0 </ColorVariance>

</Value> </Descriptor>

</DescriptionUnit></Mpeg7>

Dominant Color Descriptor of a sky region

This image contains a sky region and is a holiday image

Page 9: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

99

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Problem definition

• Semantic image analysis: how to translate the automatically extracted visual descriptions into human like conceptual ones

• Low-level features provide cues for strengthen/weaken evidence based on visual similarity

• Prior knowledge is needed to support semantics disambiguation

Page 10: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

1010

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Additional Analysis

Information

Knowledge Infrastructure (Multimedia Ontology)

Manual Annotation

- Models

Semantic Analysis

Single Modality Analysis

Knowledge ExtractionA common view

Feature extractionText, Image

analysisSegmentation,

SVMsEvidence

generation“Vehicle”, “Building”

Classifiers fusionGlobal vs. Local

Modalities fusionContext

“Ambulance”

ReasoningFusion of

annotationsConsistency

checkingHigher-level

concepts/events“Emergency scene”

Multimedia content

annotation toolsTraining

(Statistical) Modeling

Domain Multimedia content

AnnotationsAlgorithms -

FeaturesContext

Page 11: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

11Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Knowledge from Video analysis

Semantics from videoImplicitly derived via machine learning

methods i.e. based on training: SVM, HMM, Neural Networks, Bayesian

Networks

Training uses appropriate data, relevant to the semantics that interest us

Training finds models that connect low level features (e.g. motion trajectories) with high-level annotations

These models are then applied to test data

Page 12: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

1212

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Natural-Person: 0.456798Sailing-Boat: 0.463645

Sand: 0.476777Building: 0.415358

Pavement: 0.454740Road: 0.503242

Body-Of-Water: 0.489957Cliff: 0.472907

Cloud: 0.757926Mountain: 0.512597

Sea: 0.455338Sky: 0.658825

Stone: 0.471733Waterfall: 0.500000

Wave: 0.476669Dried-Plant: 0.494825

Dried-Plant-Snowed: 0.476524Foliage: 0.497562Grass: 0.491781Tree: 0.447355

Trunk: 0.493255Snow: 0.467218

Sunset: 0.503164Car: 0.456347

Ground: 0.454769Lamp-Post: 0.499387

Statue: 0.501076

Classification ResultsaceMedia

Segment’s hypothesis

set

Page 13: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

1313

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Frame Region – Concept AssociationFrame Region – Concept Association• Region feature vectorRegion feature vector formed from local descriptors formed from local descriptors• Individual SVM introduced for every defined local Individual SVM introduced for every defined local

concept, receiving as input the concept, receiving as input the region feature vectorregion feature vector• Training identical to global concept training caseTraining identical to global concept training case• Every region evaluated by all trained SVMs, segment’s Every region evaluated by all trained SVMs, segment’s

local concept hypothesis set created ( )local concept hypothesis set created ( )hijL

Ground: 0.89 Grass: 0.44Mountain: 0.21 Boat: 0.07Smoke: 0.41 Dirty-Water: 0.18Trunk: 0.12 Foam: 0.19Debris: 0.34 Mud: 0.31Water: 0.42 Sky: 0.22Ashes: 0.11 Subtitles: 0.24Flames: 0.13 Vehicle: 0.12Building: 0. 25 Foliage: 0.84Person: 0.32 Road: 0.39

Segment’s hypothesis set

Page 14: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

1414

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Initial Region-Concept Association• Region feature vector formed from local descriptors

• Individual SVM introduced for every defined concept, receiving as input the region feature vector

• Training identical to global training case• Every region evaluated by all trained SVMs,

segment’s concept hypothesis set created ( )

Building: 0.89 Roof: 0.29Grass: 0.21 Tree: 0.07Stone: 0.41 Ground: 0.15Dried-plant: 0.12 Sky: 0.19Person: 0.34 Trunk: 0.31Vegetation: 0.42 Rock: 0.22Boat: 0.11 Sand: 0.44Sea: 0.13 Wave: 0.12

Segment’s hypothesis set

hijC

Page 15: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

15Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Knowledge for Video analysisExplicit Semantics from video

Based on previously known models Explicitly defined models, rules, facts Rules from preliminary scripts and

standards from similar cases

Explicit and implicit knowledge can be combined with results from low-level video processing to extract meaningful high-level knowledge

Page 16: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

16Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

System Overview

Multimedia Content

Video Analysis (face recognition, motion segmentation etc)

Knowledge

Infrastructure (explicit or

implicit)

Semantic Multimedia Description

Page 17: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

17Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video analysis

Motion AnalysisMotion detectionTrackingDetection of when motion occurs

Motion SegmentationObject segmentation based on motion

characteristicsGeneration of ‘active regions’

Page 18: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

18Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Activity Areas from motion analysis

Page 19: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

19Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Sub-activity Areas After statistical processing for temporal

localization of motion and events

People walking towards each other People meet

People leave together

Page 20: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

2020

Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Fight Sequence

Page 21: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

21Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Processing (1) Pre-processing

Separate video from audioSplit video into framesNoise removal via spatiotemporal filtering

Scene/shot detectionShot = frames taken by single cameraDetect transition between framesUses only low-level informationScene = story-telling unitUses higher-level knowledge, semantics

Page 22: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

22Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Processing (2)Spatial segmentation:

Spatial segmentation in images, video framesExtracts object(s) based on color, texture

features

Motion segmentation:Groups pixels with similar motion

Spatiotemporal segmentation:Finds objects over several frames through

combination of motion, appearance featuresMerges spatial and motion segmentation

results

Page 23: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

23Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Knowledge in Video Analysis (1)

Low level features can be combined with knowledge/rules for higher-level results Spatiotemporally segmented objects can

be used for object recognition Face/gesture recognition after training with

faces/gestures of significance Motion in specific parts of a video (e.g.

near court entrance, near prisoner’s seat) has additional significance:Needs prior knowledge of which parts of the video

frames are important and why

Page 24: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

24Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Knowledge in Video Analysis (2)

Knowledge structures can provide additional information about the relations between different low-level features Interactions e.g. two motions in opposite

directions, relation of extracted gestures, may mean something: people meeting, fighting, pointing, gesticulating

Face recognition combined with prior knowledge can show who is present when an event occurs

Page 25: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

25Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Conclusions

Combined use of video processing with knowledge can lead to richer and more accurate high-level descriptions of multimedia data

Can be used in many more applications than currently, because the knowledge introduces flexibility and adaptability to the system: The same algorithms and low-level

features can provide much more information when used in combination with explicit and implicit knowledge

Page 26: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

26Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

26

Thank you!CERTH-ITI / Multimedia Knowledge

Laboratoryhttp://mklab.iti.gr

Page 27: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

27Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis State of the Art

Spatiotemporal segmentation:Find spatiotemporally homogeneous objects

i.e. similar appearance and motionApply spatial segmentation on each frame Match segmented objects in successive

frames using low-level features (e.g. similar color, texture, continuous motion)

Use motion information – project position of object in current/next frames

Page 28: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

28Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis State of the Art

Page 29: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

29Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis State of the Art

Page 30: 1 Image Video & Multimedia Systems Laboratory Multimedia Knowledge Laboratory Informatics and Telematics Institute Exploitation of knowledge in video recordings.

30Image Video & Multimedia Systems LaboratoryMultimedia Knowledge LaboratoryInformatics and Telematics Institute

Video Analysis State of the Art

Spatial segmentation: Spatial segmentation in images, video frames: Region Based: Most methods are based on grouping

similar features like color, texture, location – based on homogeneity of intensity, texture, position

Gradient/edge based: detecting changes in spatial distribution of features e.g. pixel illumination

Some methods combine region/edge information