cvpr2011: human activity recognition - part 4: syntactic
-
Upload
zukun -
Category
Technology
-
view
623 -
download
1
Transcript of cvpr2011: human activity recognition - part 4: syntactic
![Page 1: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/1.jpg)
Frontiers of Human Activity Analysis
J. K. Aggarwal Michael S. Ryoo
Kris M. Kitani
![Page 2: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/2.jpg)
Overview
2
Human Activity Recognition
Single-layer Hierarchical
Statistical
Syntactic
Descriptive
Space-time
Sequential
![Page 3: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/3.jpg)
Motivation
3
How do we interpret a sequence of actions?
![Page 4: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/4.jpg)
Hierarchy
4
Hierarchy implies decomposition into sub-parts
![Page 5: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/5.jpg)
Now we’ll cover…
5
Human Activity Recognition
Single-layer Hierarchical
Statistical Syntactic
Descriptive
Space-time
Sequential
![Page 6: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/6.jpg)
6
Syntactic Approaches
![Page 7: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/7.jpg)
Syntactic Models
7
Activities as strings of symbols.
What is the underlying structure?
![Page 8: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/8.jpg)
Early applications to Vision
8
Tsai and Fu 1980. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition.
![Page 9: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/9.jpg)
Hierarchical syntactic approach
Useful for activities with: Deep hierarchical structure Repetitive (cyclic) structure
Not for Systems with a lot of errors and uncertainty Activities with shallow structure
9
![Page 10: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/10.jpg)
Basics
10
Generic Language Natural Languages
Start Symbol (S) Sentences
Set of Terminal Symbols (T) Words
Set of Non-Terminal Symbols (N) Parts of Speech
Set of Production Rules (P) Syntax Rules
Context-Free Grammar
![Page 11: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/11.jpg)
Parsing with a grammar
11 ants like flies swat
S → NP VP (0.8) PP → PREP NP (1.0) S → VP (0.2) PREP → like (1.0) NP → NOUN (0.4) VERB → swat (0.2) NP → NOUN PP (0.4) VERB → flies (0.4) NP → NOUN NP (0.2) VERB → like (0.4) VP → VERB (0.3) NOUN → swat (0.05) VP → VERB NP (0.3) NOUN → flies (0.45) VP → VERB PP (0.2) NOUN → ants (0.5) VP → VERB NP PP (0.2)
![Page 12: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/12.jpg)
Parsing with a grammar
12
S
VP
NP
VERB NOUN NOUN
(0.8)
(0.05)
(0.4)
(0.45) (0.4) (0.5)
NP
NOUN (0.2)
NP (0.3)
(0.4)
ants like flies swat
S → NP VP (0.8) PP → PREP NP (1.0) S → VP (0.2) PREP → like (1.0) NP → NOUN (0.4) VERB → swat (0.2) NP → NOUN PP (0.4) VERB → flies (0.4) NP → NOUN NP (0.2) VERB → like (0.4) VP → VERB (0.3) NOUN → swat (0.05) VP → VERB NP (0.3) NOUN → flies (0.45) VP → VERB PP (0.2) NOUN → ants (0.5) VP → VERB NP PP (0.2)
![Page 13: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/13.jpg)
Video analysis with CFGs
13
The “Inverse Hollywood problem”: From video to scripts and storyboards via causal analysis. Brand 1997
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001
![Page 14: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/14.jpg)
CFG for human activities
14
enter detach leave enter detach attach touch touch detach attach leave
M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards
via causal analysis. AAAI 1997.
![Page 15: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/15.jpg)
Parse tree
15
enter detach
ADD
IN ACTION (Open PC)
leave
OUT
enter detach
IN
ADD
attach touch touch detach
ACTION (unscrew)
MOVE
MOTION MOTION
attach leave
OUT
REMOVE
SCENE (Open up a PC)
M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.
• Deterministic low-level primitive detection • Deterministic parsing
![Page 16: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/16.jpg)
Stochastic CFGs
16
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 17: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/17.jpg)
Gesture analysis with CFGs
17 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
Primitive recognition with HMMs
![Page 18: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/18.jpg)
18
left-right
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 19: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/19.jpg)
19
up-down
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 20: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/20.jpg)
20
right-left
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 21: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/21.jpg)
21
down-up
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 22: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/22.jpg)
Parse Tree
22
left-right up-down right-left down-up
LR
UD
RL
DU TOP BOT
RH
S
![Page 23: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/23.jpg)
Errors
23 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
HMM a
HMM b
Likelihood value over time (not discrete symbols)
Errors are inevitable…
but the grammar acts as a top-down constraint
![Page 24: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/24.jpg)
Dealing with uncertainty & errors
Stolcke-Early (probabilistic) parser SKIP rules to deal with insertion errors
24
HMM a
HMM b
HMM c
Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
![Page 25: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/25.jpg)
SCFG for Blackjack
25
Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar.
Moore and Essa 2001
• Deals with more complex activities • Deals with more error types
![Page 26: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/26.jpg)
extracting primitive actions
26
![Page 27: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/27.jpg)
Game grammar
27 Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001
![Page 28: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/28.jpg)
Dealing with errors
Ungrammatical strings cause parser to fail
Account for errors with multiple hypothesis Insertion, deletion, substitution
Issues How many errors should we tolerate? Potentially exponential hypothesis space Ungrammatical strings: vision problem or illegal
activity?
28
![Page 29: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/29.jpg)
Observations CFGs good for structured activities Can incorporate uncertainty in observations Natural contextual prior for recognizing errors
Not clear how to deal with errors Assumes ‘good’ action classifiers Need to define grammar manually
Can we learn the grammar from data?
29
![Page 30: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/30.jpg)
Heuristic Grammatical Induction
30
1. Lexicon learning • Learn HMMs • Cluster HMMs
2. Convert video to string
3. Learn Grammar
Unsupervised Analysis of Human Gestures. Wang et al 2001
![Page 31: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/31.jpg)
COMPRESSIVE
31
a b c d a b c d b c d a b a b
substring deletion of substring
insertion of new rule
On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. Nevill-Manning 2000.
length occurrence new rule new symbol
![Page 32: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/32.jpg)
example
32
S→a b c d a b c d b c d a b a b
S→a A a A A a b a b A→b c d
(DL=16)
(DL=14)
Repeat until compression becomes 0.
![Page 33: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/33.jpg)
Critical assumption
No uncertainty No errors
insertions deletions substitution
Can we learn grammars despite errors?
33
![Page 34: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/34.jpg)
Learning with noise
34
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
Can we learn the basic structure of a transaction?
![Page 35: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/35.jpg)
extracting primitives
35 Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 36: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/36.jpg)
Underlying structure?
36
D → a x b y c a b x c y a b c x
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 37: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/37.jpg)
Underlying structure?
37
D → a x b y c a b x c y a b c x
D → a b c a b c a b c
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 38: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/38.jpg)
Underlying structure?
38
D → a x b y c a b x c y a b c x
D → a b c a b c a b c
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 39: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/39.jpg)
Underlying structure?
39
D → a x b y c a b x c y a b c x
D → a b c a b c a b c
A → a b c D → A A A Simple grammar Efficient compression
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 40: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/40.jpg)
Information Theory Problem (MDL)
40
Model complexity Data compression
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
G = argminG
{DL(G) +DL(D|G)}
![Page 41: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/41.jpg)
Information Theory Problem (MDL)
41
DL(G) = − log p(G)
= − log p(θS , GS)
= − log p(θS |GS)− log p(GS)
= DL(θS |GS)−DL(GS)
Model complexity Data compression
Grammar parameters Grammar structure
Model complexity
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
G = argminG
{DL(G) +DL(D|G)}
![Page 42: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/42.jpg)
Information Theory Problem (MDL)
42
DL(G) = − log p(G)
= − log p(θS , GS)
= − log p(θS |GS)− log p(GS)
= DL(θS |GS)−DL(GS)
DL(D|G) = − log p(D|G)
Model complexity Data compression
Grammar parameters Grammar structure
Likelihood (inside probabilities)
Model complexity
Data compression
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
G = argminG
{DL(G) +DL(D|G)}
![Page 43: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/43.jpg)
Minimum Description Length
43 Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 44: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/44.jpg)
Minimum Description Length
44 Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 45: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/45.jpg)
45 Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 46: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/46.jpg)
46 Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
![Page 47: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/47.jpg)
Conclusions
Possible to learn basic structure Robust to errors
(insertion, deletion, substitution)
Need a lot of training data Computational complexity
47
![Page 48: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/48.jpg)
Bayesian Approaches
48
Infinite Hierarchical Hidden Markov Models. Heller et al 2009.
The Infinite PCFG using Hierarchical Dirichlet Processes. Liang et al 2007.
![Page 49: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/49.jpg)
Take home message Hierarchical Syntactic Models
Useful for activities with: Deep hierarchical structure Repetitive (cyclic) structure
Not for Systems with a lot of errors and uncertainty Activities with weak structure
49
![Page 50: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/50.jpg)
50
Statistical Approaches
![Page 51: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/51.jpg)
Using a hierarchical statistical approach
51
Use when Low-level action detectors are noisy Structure of activity is sequential Integrating dynamics
Not for Activities with deep hierarchical structure Activities with complex temporal structure
![Page 52: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/52.jpg)
Statistical (State-based) Model
52
Activities as a stochastic path.
What are the underlying dynamics?
![Page 53: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/53.jpg)
Characteristics
Strong Markov assumption Strong dynamics prior Robust to uncertainty
Modifications to account for Hierarchical structure Concurrent structure
53
![Page 54: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/54.jpg)
Hierarchical activities
54
Problem: How do we model
hierarchical activities?
Solution: “stack” actions for
hierarchical activities
combinatory state space!
![Page 55: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/55.jpg)
Hierarchical hidden Markov model
55
Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
![Page 56: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/56.jpg)
Context-free activity grammar
56 Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
![Page 57: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/57.jpg)
Context-free activity grammar
57 Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
![Page 58: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/58.jpg)
Observations
Tree structures useful for hierarchies Tight integration of trajectories with
abstract semantic states
Activities are not always a single sequence (ie. they sometimes happen in parallel )
58
![Page 59: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/59.jpg)
Concurrent activities
59
Problem: How do we model
concurrent activities?
Solution: “stand-up” model for concurrent activities
combinatory state space!
![Page 60: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/60.jpg)
Propagation network
60
Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
![Page 61: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/61.jpg)
61 Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
![Page 62: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/62.jpg)
temporal inference
62 Inference by standing the state transition model on its side
![Page 63: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/63.jpg)
Inferring structure (storylines)
63
Understanding Videos, Constructing Plots – Learning a Visually Grounded Storyline Model from Annotated Videos
Gupta, Srinivasan, Shi and Davis CVPR 2009
Learn AND-OR graphs from weakly labeled data
![Page 64: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/64.jpg)
Scripts from structure
64 Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. Gupta, Srinivasan, Shi and Davis CVPR 2009
![Page 65: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/65.jpg)
Take home message Hierarchical statistical model
Use when Low-level action detectors are noisy Structure of activity is sequential Integrating dynamics
Not for Activities with deep hierarchical structure Activities with complex temporal structure
65
![Page 66: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/66.jpg)
Contrasting hierarchical approaches
66
Actions as: Activities as: Model Characteristic
Statistic probabilistic states paths DBN Robust to
uncertainty
Syntactic discrete symbols strings CFG Describes
deep hierarchy
Descriptive logical relationships sets CFG, MLN
Encodes complex
logic
![Page 67: cvpr2011: human activity recognition - part 4: syntactic](https://reader034.fdocuments.us/reader034/viewer/2022052622/55961ddd1a28ab920e8b46b3/html5/thumbnails/67.jpg)
References (not included in ACM survey paper)
W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.
M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.
T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human Gestures. PRCM 2001.
C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.
K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Models. AISTATS 2009.
P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using Hierarchical Dirichlet Processes. EMNLP 2007.
A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. CVPR 2009.
67