Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning
description
Transcript of Scene Understanding perception, multi-sensor fusion, spatio-temporal reasoning
1
Scene Understandingperception, multi-sensor fusion, spatio-temporal reasoning
and activity recognition.
Francois BREMOND
PULSAR project-team,
INRIA Sophia Antipolis, FRANCE
http://www-sop.inria.fr/pulsar/
Key words: Artificial intelligence, knowledge-based systems,
cognitive vision, human behavior representation, scenario recognition
2
• ETISEO: French initiative for algorithm validation and knowledge acquisition: http://www-sop.inria.fr/orion/ETISEO/
• Approach: 3 critical evaluation concepts• Selection of test video sequences
• Follow a specified characterization of problems• Study one problem at a time, several levels of difficulty• Collect long sequences for significance
• Ground truth definition• Up to the event level• Give clear and precise instructions to the annotator
• E.g., annotate both visible and occluded part of objects• Metric definition
• Set of metrics for each video processing task• Performance indicators: sensitivity and precision
Video Understanding: Performance Evaluation (V. Valentin, R. Ma)
3
Evaluation : current approach(AT. NGHIEM)
• ETISEO limitations:• Selection of video sequence according to difficulty levels is subjective• Generalization of evaluation results is subjective.• One video sequence may contain several video processing problems at many
difficulty levels
• Approach: treat each video processing problem separately• Define a measure to compute difficulty levels of input data (e.g. video sequences)• Select video sequences containing only the current problems at various difficulty
levels• For each algorithm, determine the highest difficulty level for which this algorithm
still has acceptable performance.
• Approach validation : applied to two problems• Detect weakly contrasted objects• Detect objects mixed with shadows
4
Evaluation : conclusion • A new evaluation approach to generalise evaluation results.
• Implement this approach for 2 problems.
• Limitations: only detect the upper bound of algorithm capacity.
• The difference between the upper bound and the real performance may be significant if:
• The test video sequence contains several video processing problems
• The same set of parameters is tuned differently to adapt to several concurrent problems
• Ongoing evaluation campaigns:• PETS at ECCV2008
• TRECVid (NIST) with ILids video
• Benchmarking databases:• http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG363
• http://www.hitech-projects.com/euprojects/cantata/datasets_cantata/dataset.html
5
Video Understanding: Program Supervision
6
Goal : easy creation of reliable supervised video understanding systems
Approach• Use of a supervised video understanding platform
• A reusable software tool composed of three separate components: program library – control – knowledge base
• Formalize a priori knowledge of video processing programs• Explicit the control of video processing programs
Issues ?• Video processing programs which can be supervised• A friendly formalism to represent knowledge of programs• A general control engine to implement different control strategies• A learning tool to adapt system parameters to the environment
Supervised Video Understanding:Proposed Approach
7
Control
ApplicationDomain Expert
Video ProcessingExpert
Application domainknowledge base
Scene environmentknowledge base
Video processing programknowledge baseLearning
Evaluation
ParticularSystem
Evaluation
Video Processing
ProgramLibrary
Proposed Approach
8
• Use of an operator formalism [Clément and Thonnat, 93] to represent knowledge of video processing programs
• Composed of frames and production rules• Frames: declarative knowledge
• Operators: abstract model of a video processing program– primitive: particular program– composite: particular combination of programs
• Production rules: inferential knowledge• Choice and optional criteria• Initialization criteria• Assessment criteria• Adjustment and repair criteria
Supervised Video Understanding Platform:Operator Formalism
9Program Supervision: Knowledge and Reasoning Primitive operator
FunctionalityCharacteristicsInput dataParametersOutput dataPreconditionsPostconditionsEffects
Calling syntax
Rule Bases
Parameter initialization rulesParameter adjustment rulesResult evalutation rulesRepair rules
Composite operator
FunctionalityCharacteristicsInput dataParametersOutput dataPreconditionsPostconditionsEffectsDecomposition into suboperators(sequential, parallel, alternative)Data flow
Rule bases
Parameter initialization rulesParameter adjustment rules Choice rulesResult evalutation rulesRepair rules
10
• Objective: a learning tool to automatically tune algorithm parameters with experimental data
• Used for learning the segmentation parameters with respect to the illumination conditions
• Method• Identify a set of parameters of a task
• 18 segmentation thresholds • depending on environment characteristics
• Image intensity histogram
• Study the variability of the characteristic• Histogram clustering -> 5 clusters
• Determine optimal parameters for each cluster• Optimization of the 18 segmentation thresholds
Video Understanding: Learning Parameters (B.Georis)
11Video Understanding: Learning Parameters
Camera View
12
Learning Parameters Clustering the Image Histograms
Number of pixels [%]
Pixelintensity [0-255]
X
Z
Y
A X-Z slice represents an image histogram
ßiopt4
ßiopt1
ßiopt2
ßiopt5
ßiopt3
13
• CARETAKER: An FP6 IST European initiative to provide an efficient tool for the management of large multimedia collections.
• Applications to surveillance and safety issues, in urban/environment planning, resource optimization, disabled/elderly person monitoring.
• Currently being validated
on large underground video
recordings ( Torino, Roma).
Complex Events
Raw Data
Simple Events
Knowledge Discovery
Raw data
Primitives Event and Meta data
Audio/Video acquisition and
encoding
Multiple Audio/Video
sensors
Knowledge Discovery
Generic Event recognition
Video Understanding : Knowledge Discovery (E. Corvee, JL. Patino_Vilchis)
14
Event detection examples
15
Data Flow
Object/EventDetection
InformationModelling
Object Detection•Id
•Type
•Info 2D
•Info 3D
Event Detection•Id
•Type (inside_zone, stays_inside_zone)
•Involved Mobile Object
•Involved Contextual Object
Mobile object table
Event table
Contextual object table
16
Mobile Objects
People characterised by:
•Trajectory
•Shape
•Significant Event in which they are involved
•…
Contextual Objects
Find interactions between mobile objects and contextual objects
•Interaction type
•Time
•…
Table Contents
Events
Model the normal activities in the metro station
•Event type
•Involved objects
•Time
•…
17
Knowledge Discovery: trajectory clustering
Objective: Clustering of trajectories into k groups to match people activities
• Feature set• Entry and exit points of an object• Direction, speed, duration, …
• Clustering techniques• Agglomerative Hierarchical Clustering.• K-means• Self-Organizing (Kohonen) Maps
• Evaluation of each cluster set based on Ground-Truth
20
Results on Torino subway (45min), 2052 trajectories
21
SOM K-means Agglomerative
Groups with mixed overlap
Trajectory: Analysis
22
Trajectory: Semantic characterisation
SOM CL14 / Kmeans CL12 Agglomerative CL 21
Consistency of clusters between algorithmsSemantic meaning: walking towards vending machines
23
Intraclass & Interclass variance
• SOM algorithm has the lowest intraclass and higher interclass separation,• Parameter tuning: which clustering techniques?
Trajectory: Analysis
J
v
v
J
vvInterclass
J
vv
Intraclass
jj
ii
i ijij
2
25
Mobile Objects
26
Mobile Object Analysis
0
50
100
150
200
250
time
nb o
f per
sons
ove
r 5
min
Building statistics on Objects
There is an increase of people after 6:45
27
Contextual Object Analysis
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
time
per
cnte
nta
ge
of
use
ove
r 5
min
Vending Machine 2
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
time
per
cnte
nta
ge
of
use
ove
r 5
min
Vending Machine 1
With an increase of people, there is an increase on theuse of vending machines
30
Results : Trajectory Clustering
Cluster 38 Cluster 6 Number of objects 385 15 Object types types: {'Unknown'}
freq: 385 types: {'Person'} freq: 15
Start time (min) [0.1533, 48.4633] [28.09, 46.79] Duration (sec) [0.04, 128.24] [2.04, 75.24] Trajectory types types: {'4' '3' '7'}
freq: [381 1 3] types: {'13' '12' '19'} freq: [13 1 1]
Significant event types: {'void '} freq: 385
types: {'inside_zone_Platform '} freq: 15
31
• Semantic knowledge extracted by the off-line long term analysis of on-line interactions between moving objects and contextual objects:
• 70% of people are coming from north entrance• Most people spend 10 sec in the hall• 64% of people are going directly to the gates without stopping at the ticket
machine• At rush hours people are 40% quicker to buy a ticket, …
• Issues:• At which level(s), should be designed clustering techniques: low level (image
features)/ middle level (trajectories, shapes)/ high level (primitive events)? • to learn what : visual concepts, scenario models? • uncertainty (noise/outliers/rare), what are the activities of interest?• Parameter tuning (e.g. distance, clustering tech.) and • performance evaluation (criteria, ground-truth).
Knowledge Discovery: achievements
32
Video Understanding : Learning Scenario Models (A. Toshev)
or Frequent Composite Event Discovery in Videosevent time series
33
• Why unsupervised model learning in Video Understanding?
• Complex models containing many events,
• Large variety of models,• Different parameters for different
models
The learning of models should be automated.
Learning Scenarios: Motivation
Video surveillance in a parking lot
34
• Input: A set of primitive events from the vision module:object-inside-zone(Vehicle, Entrance) [5,16]
• Output: frequent event patterns.
• A pattern is a set of events:object-inside-zone(Vehicle, Road) [0, 35]object-inside-zone(Vehicle, Parking_Road) [36, 47]object-inside-zone(Vehicle, Parking_Places) [62, 374]object-inside-zone(Person, Road) [314, 344]
Learning Scenarios: Problem Definition
• Goals:• Automatic data-driven modeling of composite events,
• Reoccurring patterns of primitive events correspond to frequent activities,
Find classes with large size & similar patterns.
Zones
35
• Approach:• Iterative method from data mining for efficient frequent patterns discovery in large
datasets,• A PRIORI: Sub-patterns of frequent patterns are also frequent (Agrawal & Srikant,
1995),• At i th step consider only i-patterns which have frequent (i-1) – sub-patterns the search space is thus pruned.
• A PRIORI-property for activities represented as classes:
size(C m-1) ≥ size(C m)
where C m is a class containing patterns of length m, C m-1 is a sub-activity of C m.
Learning Scenarios: A PRIORI Method
36
Learning Scenarios: A PRIORI Method
Merge two i-patterns with (i-1) primitive events in common to form an (i+1)-pattern:
37
2 types of Similarity Measure between event patterns :• similarities between event attributes• similarities between pattern structures
Generic Similarity Measure :• Generic properties when possible easy usage in different domains,• It should incorporate domain-dependent properties relevance to the
concrete application.
Learning Scenarios: Similarity Measure
38
Attributes: the corresponding events in two patterns should have similar (same) attributes (duration, names, object types,...).
Learning Scenarios: Attribute Similarity
• Comparison between corresponding events (same type, same color).
• For numeric attributes: G(x,y)=
• attr(pi, pj) = average of all event attribute similarities.
xy
yx
e
2
39
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
40
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
41
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
42
Test data:
•Video surveillance at a parking lot,
•4 hours records from 2 days in 2 test sets,
•Every test set contains appr. 100 primitive events.
Learning Scenarios: Evaluation
Results: In both test sets the following event pattern was recognized:object-inside-zone(Vehicle, Road)
object-inside-zone(Vehicle, Parking_Road)
object-inside-zone(Vehicle, Parking_Places)
object-inside-zone(Person, Parking_Road)
Maneuver Parking!
43
Conclusion:• Application of a data mining approach,• Handling of uncertainty without losing computational effectiveness,• General framework: only a similarity measure and a primitive event library
must be specified.
Future Work:• Other similarities,• Handling of different aspects of uncertainty,• Qualification of the learned patterns,
• Frequent equal interesting ?• Different applications: different event libraries or features.
Learning Scenarios: Conclusion & Future Work
44
GERHOME (CSTB, INRIA, CHU Nice) : Ageing population
http://gerhome.cstb.fr/
Approach :• Multi-sensor analysis based on sensors embedded in the home environment • Detect in real-time any alarming situation• Identify a person profile – his/her usual behaviors - from the global trends of life
parameters, and then to detect any deviation from this profile
HealthCare Monitoring: (N. Zouba)
45
Monitoring of Activities of Daily Living for ElderlyMonitoring of Activities of Daily Living for Elderly
• Goal:Goal: Increase independence and quality of life: Increase independence and quality of life:• Enable elderly to Enable elderly to live longerlive longer in their preferred environment. in their preferred environment.
• Reduce Reduce costscosts for public health systems. for public health systems.
• Relieve Relieve familyfamily members and caregivers. members and caregivers.
• Approach:Approach:• Detecting alarming situations Detecting alarming situations (eg. Falls)(eg. Falls)• Detecting changes in Detecting changes in behaviorbehavior
((missing activities, disorder, interruptions, missing activities, disorder, interruptions, repetitions, inactivityrepetitions, inactivity).).
• Calculate the degree of Calculate the degree of frailtyfrailty of elderly people. of elderly people.
Example of normal activity: Example of normal activity: Meal preparation (in kitchen) (11h– 12h)Meal preparation (in kitchen) (11h– 12h)Eating (in dinning room) (12h -12h30)Eating (in dinning room) (12h -12h30) Resting, TV watching, (in living room) (13h– 16h)Resting, TV watching, (in living room) (13h– 16h) … …
46
• GERHOME (Gerontology at Home) : GERHOME (Gerontology at Home) : homecare laboratory homecare laboratory
http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html
• Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis Experimental site in CSTB (Centre Scientifique et Technique du Bâtiment) at Sophia Antipolis
http://gerhome.cstb.fr
• Partners:Partners: INRIA, CSTB, CHU-Nice, Philips-NXP, CG06… INRIA, CSTB, CHU-Nice, Philips-NXP, CG06…
Gerhome laboratoryGerhome laboratory
47
Gerhome laboratoryGerhome laboratory
Position of the sensors in Gerhome laboratory
• Video camerasVideo cameras installed in the kitchen and in the living-room to detect and track the installed in the kitchen and in the living-room to detect and track the person in the apartment.person in the apartment.
• Contact sensorsContact sensors mounted on many devices to determine the interactions with the person. mounted on many devices to determine the interactions with the person.
• Presence sensorsPresence sensors installed in front of sink and cooking stove to detect the presence of installed in front of sink and cooking stove to detect the presence of people near sink and stove. people near sink and stove.
48
Dans la cuisine
Sensors installed in Gerhome laboratorySensors installed in Gerhome laboratory
Video camera in the living-room
Pressure sensor underneath the legs of armchair
Contact sensor in the windowContact sensor in the cupboard door
49
• We have modelled a set of activities by using a event recognition language We have modelled a set of activities by using a event recognition language developed developed
• in our team. This is an example for “Meal preparation” event.in our team. This is an example for “Meal preparation” event.Composite Event (Prepare_meal_1, “detected by a video camera combined with a contact sensors”
Physical Objects ( (p: Person), (Microwave: Equipment), (Fridge: Equipment), (Kitchen: Zone))
Components ((p_inz: PrimitiveState Inside_zone (p, Kitchen)) “detected by video camera”
(open_fg: PrimitiveEvent Open_Fridge (Fridge)) “detected by contact sensor”
(close_fg: PrimitiveEvent Close_Fridge (Fridge)) “detected by contact sensor”
(open_mw: PrimitiveEvent Open_Microwave (Microwave)) “detected by contact sensor”
(close_mw: PrimitiveEvent Close_Microwave (Microwave))) “detected by contact sensor”
Constraints ((open_fg during p_inz )
(open_mw before_meet open_fg )
(open_fg Duration>= 10)
(open_mw Duration>=5))
Action ( AText (“Person prepares meal”)
AType (“NOT URGENT”)) )
Event modellingEvent modelling
50
Multi-sensor monitoring: results and evaluationMulti-sensor monitoring: results and evaluation
• We have validated and visualized the recognized events with a We have validated and visualized the recognized events with a 3D visualization 3D visualization tool.tool.
ActivityActivity # Videos# Videos # Events# Events TPTP FNFN FPFP PrecisionPrecision SensitivitySensitivity
In the kitchenIn the kitchen 10 45 40 5 0 1 0,888
In the living-roomIn the living-room 10 35 40 0 5 0,888 1
Open microwaveOpen microwave 8 15 15 0 0 1 1
Open fridgeOpen fridge 8 24 24 0 0 1 1
Open cupboardOpen cupboard 8 30 30 0 0 1 1
Preparing meal 1Preparing meal 1 8 3 3 0 0 1 1
• We have We have studied and testedstudied and tested a range of activities in the a range of activities in the GerhomeGerhome laboratory, laboratory, such as: using microwave, using fridge, preparing meal, …such as: using microwave, using fridge, preparing meal, …
51
Recognition of the “Prepare meal” eventRecognition of the “Prepare meal” event
Visualization of a recognized event in the Gerhome laboratory
• The person is recognized with the posture "standing with The person is recognized with the posture "standing with one arm upone arm up”, “located in the ”, “located in the kitchenkitchen” and “using the ” and “using the microwavemicrowave”.”.
52
RecognitionRecognition of the “Resting in living-room” eventof the “Resting in living-room” event
• The person is recognized with the posture “The person is recognized with the posture “sittingsitting in the armchair” and “located in the in the armchair” and “located in the living-living-roomroom”.”.
Visualization of a recognized event in the Gerhome laboratory
53End-usersEnd-users•There are There are several end-usersseveral end-users in homecare: in homecare:
• Doctors (gerontologists):Doctors (gerontologists):• Frailty measurement (depression, …)Frailty measurement (depression, …)• AlarmAlarm detection (falls, gas, dementia, …). detection (falls, gas, dementia, …).
• Caregivers and nursing home:Caregivers and nursing home:• CostCost reduction: no false alarm and reduction employee involvement. reduction: no false alarm and reduction employee involvement.• Employee protection.Employee protection.
• Persons with special needs, including young children, disabled and elderly people: Persons with special needs, including young children, disabled and elderly people: • Feeling safe at home.Feeling safe at home.• AutonomyAutonomy: at night, lighting up the way to bathroom.: at night, lighting up the way to bathroom.• Improving life: smart mirror, summary of user day, week, month, in terms of walking Improving life: smart mirror, summary of user day, week, month, in terms of walking
distance, TV, water consumption.distance, TV, water consumption.
• Family members and relatives: Family members and relatives: • Elderly Elderly safetysafety and protection. and protection.• Social connectivity.Social connectivity.
54Social problems and solutionsSocial problems and solutionsProblemsProblems SolutionsSolutions
Privacy confidentiality and ethics: video (and Privacy confidentiality and ethics: video (and other data) recording, processing and transmission.other data) recording, processing and transmission.
No video recording and transmission, only textual No video recording and transmission, only textual alarms.alarms.
Acceptability for elderlyAcceptability for elderly User empowerment.User empowerment.
UsabilityUsability Easy ergonomic interface (no keyboard, large Easy ergonomic interface (no keyboard, large screen), friendly usage of the system.screen), friendly usage of the system.
Cost effectivenessCost effectiveness The right service for the right price, large variety The right service for the right price, large variety of solutions.of solutions.
Legal issues, no certificationLegal issues, no certification Robustness, benchmarking, on site evaluationRobustness, benchmarking, on site evaluation
Installation, maintenance, training, interoperability Installation, maintenance, training, interoperability with other home deviceswith other home devices
Adaptability, X-Box integration, wireless, Adaptability, X-Box integration, wireless, standards (OSGI, …)standards (OSGI, …)
Research financing Research financing ? France (no money, lobbies), Europe (delay), US, ? France (no money, lobbies), Europe (delay), US, Asia.Asia.
55
Conclusion
A global framework for building video understanding systems:
• Hypotheses: • mostly fixed cameras• 3D model of the empty scene• predefined behavior models
• Results: • Video understanding real-time systems for Individuals, Groups of People, Vehicles, Crowd,
or Animals …• Knowledge structured within the different abstraction levels (i.e. processing worlds)
• Formal description of the empty scene• Structures for algorithm parameters• Structures for object detection rules, tracking rules, fusion rules, … • Operational language for event recognition (more than 60 states and events), video
event ontology• Tools for knowledge management
• Metrics, tools for performance evaluation, learning• Parsers, Formats for data exchange• …
56
Object and video event detection• Finer human shape description: gesture models • Video analysis robustness: reliability computation
Knowledge Acquisition• Design of learning techniques to complement a priori knowledge:
• visual concept learning• scenario model learning
System Reusability• Use of program supervision techniques: dynamic configuration of programs and
parameters • Scaling issue: managing large network of heterogeneous sensors (cameras,
microphones, optical cells, radars….)
Conclusion: perspectives