Biometrics and Sensors Venu Govindaraju CUBS, University at Buffalo [email protected].
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
3
Transcript of Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Learning and Recognizing Activities in Streams of Video
Dinesh Govindaraju
Motivation
Activity recognition from video for higher functionalityWho is presenting
agenda itemAttendee interest
levels
Motivation
Want it to be automatic and not involve hand generation of modelsImpractical in the case of many
activitiesLess versatile as you might be
constrained to particular aspects of the problem
Problem Definition
Video Data Observations are extracted
movement deltas via face tracking Hand label training segments Learn underlying models from
training segments Carry out activity recognition
Approach - Learning
Assume underlying models can be approximated by HMMs
Use Baum Welch to learn best model using training segments
Need to find observation space and number of states
Approach - Learning
To find observation space:Run through all training segments
and add observationsFor new observation when doing
recognition, augment learned observation matrices
Approach - Learning
To find number of states, Q (for each activity):Set upper bound as length of longest
training segmentIterate over values and generate
most likely model using Baum Welch
Approach - Learning
To find number of states, Q (for each activity):Choose best Q using N-fold cross
validation using criterion of discriminative power
With best Q, run Baum Welch using a number of sets of randomly initialized parameters to get λa
Approach - Recognition
Define a window width, w From the beginning, sequentially
consider windows of observations (where L is length of entire sequence)
Approach - Recognition
Calculate likelihood of each window segment
L Rabinier, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings IEEE, 1989
Approach - Recognition
Label middle frame in each window with activity with highest likelihood
Evaluation and Results
Activities being observed:
Evaluation and Results
Observation stream obtained from 87 second long image sequence
1296 individual frames Example frames after face detection:
Evaluation and Results
Observation sequence first hand labeled
Segments showing same activity extracted
4 training segments used to learn each activity
Evaluation and Results
Evaluation and Results
Once underlying models were learned, calculate likelihood using sliding window
Value of 21 was used for the window width, w, as this was the average length of training segments
Evaluation and Results
Evaluation and Results
Carry out recognition using the likelihoods by assigning activities to the frames
Compare against hand assigned labels
Accuracy approximately 76%
Evaluation and Results
Algorithm assigned:
Different from hand label
Same as hand label
Evaluation and Results
Hand assigned:
Different from algorithm label
Same as algorithm label
Future Work
Learn underlying model generating sequence of activities themselves
Standardize lengths of training segments using Dynamic Time Warping and use that as the window width
The End
Questions