Wearable Computing - Part III: The Activity Recognition Chain (ARC)
-
Upload
daniel-roggen -
Category
Technology
-
view
1.882 -
download
1
description
Transcript of Wearable Computing - Part III: The Activity Recognition Chain (ARC)
Daniel Roggen
2011
Wearable ComputingPart III
The Activity Recognition Chain (ARC)
© Daniel Roggen www.danielroggen.net [email protected]
Focus: activity recognition
• Activity is a key element of context!
Fitness coaching
iPhone: Location-based services
Step counter
Wii
Fall detection, alarm
Elderly assistant
© Daniel Roggen www.danielroggen.net [email protected]
There is no « Drink Sensor »
• Simple sensors (e.g. RFID) can provide a "binary" information– Presence (e.g. RFID, Proximity infrared sensors)– Movement (e.g. ADXL345 accelerometer ‘activity/inactivity
pin’)– Fall (e.g. ADXL345 accelerometer ‘freefall pin’)
• But in general « activity-X sensor » does not exist– Sensor data must be interpreted
– Multiple sensors must be correlated (data fusion)– Several factors influence the sensor data
• Drinking while standing: the arm reaches the object then the mouth
• Drinking while walking: the arm moves, and also the whole body
• Context is interpreted from the sensor data with– Signal processing– Machine learning
– Reasoning
• Can be integrated into a « sensor node » or « smart sensor »– Sensor chip + data processing in a device
© Daniel Roggen www.danielroggen.net [email protected]
User Activity Structure
Working Resting Working Resting Working Resting
Year 1 Year 2 Year 3
Go to work Read mailMeeting Shopping Go home
Enter Give talk Listen Leave
Walk ShowSpeak Stand SpeakSpeak
Week 10 Week 11 Week 12
© Daniel Roggen www.danielroggen.net [email protected]
How to detect a presentation?
• Place– Conference room
– In front of audience
– Generally at the lectern
• Sound– User speaks
– Maybe short interruptions
– Otherwise silence
• Motion– Mostly standing, with small walking motion
– Hand motion, pointing
– Typical head motion
© Daniel Roggen www.danielroggen.net [email protected]
Greeting
Sensorplatzierung
Upper body Right wrist Left upper leg
Activity Person is seated Stands up Greets somebody Seats again
© Daniel Roggen www.danielroggen.net [email protected]
Data recording
Stand up
Sit down
Seating
Standing
Seating
Upper body
WristHand on
tableHand on tableArm
motionArm
motion
Handshake
Time [s]
Combination from individual data is distinctive of the activity!
Acc
ele
rati
on [
g]
© Daniel Roggen www.danielroggen.net [email protected]
Turn pages:Drink from a glass:
How to recognize activities?With sensors on the body, in objects, in the environment, …
1. Activities are represented by typical signal patterns
Sensor data
2. Recognition: "comparison" between template and sensor data
Drink recognized Turn page recognized
Motion sensorActivity = movement
Activity = sound Microphone
© Daniel Roggen www.danielroggen.net [email protected]
Characteristic Type Description
Execution Offline The system records the sensor data first. The recognition is performed afterwards. Typically used for non-interactive applications such as activity monitoring for health-related applications.
Online The system acquires sensor data and processes it on-the-fly to infer activities. Typically used for activity-based computing and interactive applications (HCI).
Recognition Continuous The system “spots” the occurrence of activities or gestures in streaming data. It implements data stream segmentation, classification and null class rejection.
Isolated / segmented
The system assumes that the sensor data stream is segmented at the start and end of a gesture by an oracle. It only classifies the sensor data into the activity classes. The oracle can be an external system in a working system (e.g. cross-modality segmentation), or the experimenter when assessing classification performance during design phases.
Recognition system characteristics
© Daniel Roggen www.danielroggen.net [email protected]
Activity recognition: learning by demonstration
• Sensor data• 1) train activity models• 2) recognition
Sensor data Recognition
=?ContextActivity
Training data is required
Activity models
Model TrainingTraining
© Daniel Roggen www.danielroggen.net [email protected]
Characteristic Type Description
World model Stateless The recognition system does not model the state of the world. Activities are recognized by spotting specific sensor signals. This is currently the dominant approach when dealing with the recognition of activity primitives (e.g. reach, grasp).
Stateful The system uses a model of the world, such as user’s context or environment map with location of objects. This enhances activity recognition performance, at the expense of designtime knowledge and more complex recognition system.
Activity models
© Daniel Roggen www.danielroggen.net [email protected]
Assumptions
• Constant sensor-signal to activity-class mapping
• Design-time: identify sensor-signal/activity-class mapping– Sensor setup– Activity sets
• Run-time: "low"-variability– Can't displace sensors or modify garments– Can't change the way activities are done
© Daniel Roggen www.danielroggen.net [email protected]
The activity recognition chain (ARC)
• A standard set of steps followed by most research in activity recognition (e.g. [1,2,3,4])
• Streaming signal processing
• Machine learning
• Reasoning
[1] J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and accelerometers,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.
[2] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l Conference, Apr. 2004, pp. 1–17.
[3] D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,” Pervasive and Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.
[4] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010
© Daniel Roggen www.danielroggen.net [email protected]
Low-level activity models
(primitives)
Design-time: Training phase
Optim
ize
Sensor data
AnnotationsHigh-level activity
models
Optim
ize
ContextActivity
Reasoning
Symbolic processing
Activity-aware application
A1, p1, t1
A2, p2, t2
A3, p3, t3
A4, p4, t4
t
[1] Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation Magazine, 2011
Runtime: Recognition phase
FS2 P2
S1 P1
S0 P0
S3 P3
S4 P4
S0
S1
S2
S3
S4
F1
F2
F3
F0 C0
C1
C2
PreprocessingSensor sampling Segmentation
Feature extractionClassification
Decision fusion
R
Null classrejection
Subsymbolic processing
© Daniel Roggen www.danielroggen.net [email protected]
Segmentation• A major challenge!
• Find the boundaries of activities for later classification
Classification
Drink Turn• Methods:
– Sliding window segmentation– Energy-based segmentation– Rest-position segmentation– HMM [1], DTW [2,3], SWAB [4]
[1] J. Deng and H. Tsui. An HMM-based approach for gesture segmentation and recognition. In 15th International Conference on Pattern Recognition, volume 3, pages 679–682. 2000.
[2] M. Ko, G. West, S. Venkatesh, and M. Kumar, “Online context recognition in multisensor systems using dynamic time warping,” in Proc. Int. Conf. on Intelligent Sensors, Sensor Networks and Information Processing, 2005, pp. 283–288.
[3] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008
[4] E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Proceedings of the IEEE International Conference on Data
Mining, pages 289–96, 2001.
• Classification here undefined– classifier not trained on "no activity"
– "null class" hard to model: can be anything
• Or use "null class rejection" after classification
© Daniel Roggen www.danielroggen.net [email protected]
Segmentation: sliding/jumping window
• Commonly used for audio processing– E.g. 20 ms windows
• or for periodic activities– E.g. walking, with windows of few seconds
© Daniel Roggen www.danielroggen.net [email protected]
Characteristic Type Description
Activity kinds Periodic Nature of activities exhibiting periodicity, such as walking, running, rowing, biking, etc. Sliding window and frequency-domain features are generally used.
Sporadic The activity or gesture occurs sporadically, interspersed with other activities or gestures. Segmentation plays a key role to to isolate the subset of data containing the gesture.
Static The system deals with the detection of static postures or static pointing gestures. Sliding window and time-domain features are generally used.
Activity characteristics
© Daniel Roggen www.danielroggen.net [email protected]
Segmentation• Energy-based segmentation [1]
– Between activities the user does not move– Low energy in the acceleration signal– E.g. standard deviation of acceleration compared to a threshold
• Rest-position segmentation [1]– User comes back to a rest position between gestures– Can be trained
• Challenge:– Usually no 'pause' or 'rest' between activities!– Combination of segmentation and null class rejection– E.g. DTW [2]
[1] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010
[2] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008
© Daniel Roggen www.danielroggen.net [email protected]
Feature extraction• Compute features on signal that emphasize signal
characteristics related to the activities
• Tradeoffs– Reduce dimensionality– Computational complexity– Maximize separation between classes– Specificity of the features to the classes: robustness, overfitting
• Some common features for acceleration data [1]:
[1] Figo, Diniz, Ferreira, Cardoso. Preprocessing techniques for context recognition from accelerometer data, Pers Ubiquit Comput, 14:645–662, 2010
mean
std
© Daniel Roggen www.danielroggen.net [email protected]
Car manufacturing activities
Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008
Dataset available at: http://www.wearable.ethz.ch/resources/Dataset
© Daniel Roggen www.danielroggen.net [email protected]
Feature space: car manufacturing activities
Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008
Dataset available at: http://www.wearable.ethz.ch/resources/Dataset
Angle X, angle Y, angle Z Energy X, Energy Y, Energy Z Energy, angle X, angle Y
Energy X, Energy Y Energy, angle XAngle X, angle Y
© Daniel Roggen www.danielroggen.net [email protected]
1 = Stand; 2= Walk; 3 = Sit; 4 = Lie
• Mean crossing rate of x, y and z axes, std of magnitude
Feature space: modes of locomotion (FS1)
Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of Locomotion, INSS, 2011
© Daniel Roggen www.danielroggen.net [email protected]
• Mean value of x, y and z axes, std of magnitude
Feature space: modes of locomotion (FS2)
1 = Stand; 2= Walk; 3 = Sit; 4 = Lie
Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of Locomotion, INSS, 2011
© Daniel Roggen www.danielroggen.net [email protected]
• Ratio of x and y axes, ratio of y and z axes, std of magnitude
Feature space: modes of locomotion (FS3)
1 = Stand; 2= Walk; 3 = Sit; 4 = Lie
Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of Locomotion, INSS, 2011
© Daniel Roggen www.danielroggen.net [email protected]
• Mean value of x, y and z axes, std of x, y and z axes
Feature space: modes of locomotion (FS4)
1 = Stand; 2= Walk; 3 = Sit; 4 = Lie
Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of Locomotion, INSS, 2011
© Daniel Roggen www.danielroggen.net [email protected]
Less overlapping features yield better accuracies regardless with all classifiers
Classification Accuracy
Feature set 1 Feature set 2 Feature set 3 Feature set 4
NCC 11-NN NCC 11-NN NCC 11-NN NCC 11-NN
Knee 0.64 0.71 0.94 0.95 0.94 0.94 0.95 0.94
Shoe 0.53 0.65 0.68 0.86 0.7 0.86 0.77 0.87
Back 0.6 0.7 0.79 0.81 0.66 0.74 0.78 0.82
RUA 0.53 0.58 0.77 0.84 0.72 0.75 0.73 0.86
RLA 0.45 0.59 0.72 0.81 0.67 0.8 0.61 0.84
LUA 0.55 0.64 0.86 0.85 0.78 0.85 0.75 0.87
LLA 0.6 0.66 0.7 0.82 0.75 0.8 0.68 0.82
Hip 0.57 0.62 0.77 0.81 0.81 0.79 0.77 0.79
kNN better than NCC, more evident for more overlapping features
© Daniel Roggen www.danielroggen.net [email protected]
Feature extraction
• Ideally: explore as many features as possible – Not limited to "human design space"
• Evolutionary techniques to search a larger set of solutions– E.g. genetic programming
[1] Förster et al., Evolving discriminative features robust to sensor displacement for activity recognition in body area sensor networks, ISSNIP, 2009
Space of all possible designs
Human design space
Example evolved featureCross-over genetic operator
© Daniel Roggen www.danielroggen.net [email protected]
Feature selection• Select the "best" set of features• Improve the performance of learning models by:
– Alleviating the effect of the curse of dimensionality.– Enhancing generalization capability.– Speeding up learning process.– Improving model interpretability.
• Tradeoffs– Select features that correlate strongest to the classification variable (maximum relevance), ...– ... and are mutually far away from each other (minimum redundancy)– Emphasize characteristics of signal related to activity– Computational complexity (minimize feature number)– Complementary– Robustness
F1
F2F3
F4
F5
F6 F7
F8
F9
[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005
© Daniel Roggen www.danielroggen.net [email protected]
Feature selection
Filter methods• Does not involve a classifier but a
'filter', e.g. mutual information
• +– Computationally light– General: good for a larger set of
classifiers
• -– Feature set may not be ideal for all
classifiers– Larger subsets of features
Set of candidate features
Subset selection algorithm
Learning algorithm
Wrapper methods• Involves the classifier
• +– Higher accuracy (exploits
classifier's characteristics)– Can avoid overfitting with
crossvalidation
• -– Computationally expensive– Not general features
Set of candidate features
Subset evaluation
Learning algorithm
learning algorithm
Subset selection algorithm
© Daniel Roggen www.danielroggen.net [email protected]
Sequential foward selection (SFS)
• "Brute force" is not applicable!– With N candidate features: 2N feature sets to test
1. Start from an empty feature set Y0={Ø}
2. Select best feature x+ that maximize an objective function J(Yk+x+): x+ = argmax[J(Yk+x+)]
3. Update feature set: Yk+1 = Yk + x+; k=k+1
4. Go to 2
[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005
• Works well with small number of features• Objective: measure of “goodness” of the features
– E.g. accuracy
© Daniel Roggen www.danielroggen.net [email protected]
Classification
• Map feature vector to a class label
© Daniel Roggen www.danielroggen.net [email protected]
Bayesian classification• F: sensor reading, features
• C: activity class
P(F)P(C¦F) =
P(F¦C) · P(C)
P(F¦C): conditional probability of sensor reading Z knowing x
P(C): prior probability of class
P(F): marginal probability (sum of all the probabilities to obtain F)
P(C¦F): posteriori probability
Bayes theorem:
• With multiple sensors: conditional independence (Naive Bayes)
P(F)P(C¦F1,...Fn) =
P(F1,....Fn¦C) · P(C)
P(F)
P(F1¦C) · ... · P(Fn¦C) · P(C)=
• In practice only the numerator is important (denominator is constant)• Classification with a detector: e.g. class with max posterior probability
From training data
© Daniel Roggen www.danielroggen.net [email protected]
• Memory: C class centers
• Classification: C comparisons
• Pros:– Simple implementation– Online model update: add/remove classes, adapt class center– Fast, few memory
• Cons:– Simple class boundaries– Suited when classes cluster in the feature space
Nearest centroid classifier (NCC)
• Simplest classification methods– No parameters– Classify to the nearest class center
?
F1
F2
© Daniel Roggen www.danielroggen.net [email protected]
k-nearest neighbor (k-NN)
• Simple classification methods– Instance based learning– Classify to most represented around the test point– Parameter: k– k=1: nearest neighbor (overfit)– k>>: "smoothes" noise in training data
[1] Garcia et al, K-nearest neighbor search-fast GPU-based implementations and application to high-dimensional feature matching, ICIP, 2010
Figure from http://jakehofman.com/ddm/2009/09/lecture-02/
?
F1
F2
• Memory: N training points• Classification: N comparisons
• Pros:– Simple implementation– Online model update (add/remove instances, classes)– Complex boundaries
• Cons:– Potentially slow, or lots of memory
• Some faster versions– GPGPU [1]– Kd-trees to optimize neighborhood search
© Daniel Roggen www.danielroggen.net [email protected]
Decision tree
• Simple classification methods– Programmatic tree– Parameters: decision boundaries
• C4.5
?
F1
F2
t1
t2F1
F2
< t1 >= t1
< t2 >= t2
• Memory: Decision boundaries• Classification: lightweight if/else comparisons• Pros:
– Simple implementation– Continuous and discrete values, symbols
• Cons:– Appropriate when classes separate along feature dimensions
• Or PCA
– Limit the size of the tree to avoid overfitting
Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993
© Daniel Roggen www.danielroggen.net [email protected]
Null-class rejection
• Continuous activity recognition with sliding window segmentation– Gestures are not always present in a segment
– Must be "null class"
• Or confidence in the classification result is too low
[1] Calatroni et al., ETHZ Tech Report, 2010
[2] I. Cohen and M. Goldszmidt, “Properties and benefits of calibrated classifiers,” in Proc. Knowledge Discovery in Databases (PKDD), 2004.
NCC kNN
• Many classifiers can be "calibrated" to have probabilistic outputs [2]– Statistical test / likelihood of an activity
© Daniel Roggen www.danielroggen.net [email protected]
Sliding window and temporal data structure
• Activities where temporal data structure generally not important:– Walking, running, rowing, biking...
– Generally periodic activities
• Activities where it is important:– Open dishwasher: walk, grasp handle up, pull down, walk
– Close dishwasher: walk, grasp handle down, pull up, walk
– Opening or closing car door
– Generally manipulative gestures
– Complex hierarchical activities
• Problem with some features:– Different sensor readings but identical features: μ1 = μ2
μ1
μ2
Act A
Act B
© Daniel Roggen www.danielroggen.net [email protected]
Sliding window and temporal data structure
• Time to space mapping• Encode the temporal unfolding in the feature vector
– E.g. subwindows
μ1,1
μ2,1
μ1,2
μ2,2
Act A
Act Bsw1
sw2
A
B
• Other approaches:– Hidden Markov models
– Dynamic time warping / string matching
– Signal predictors
© Daniel Roggen www.danielroggen.net [email protected]
t-1 Predictor (CTRNN)
Error
axt
ayt
azt
ayt-1
azt-1
axt-1
pyt
pzt
pxt
Prediction error for gesture of
class 1
Gesture recognition using neural-network signal predictors
• Signal: 3-D acceleration vector• Predict future acceleration vector • Operation on raw signal a
t
Predictor
t0 t1
Time delayRaw acceleration
Prediction Prediction error
Class=best predictiont-1 Predictor
(CTRNN)Error
Prediction error for gesture of
class 2
• Predictors “trained” on gesture classes• Prediction error smaller on trained class
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Predictor: Continuous Time Recurrent Neural Network (CTRNN)
Continuous-time recurrent neural network (CTRNN)• Continuous model neurons• Fully connected network• Rich dynamics (non-linear, temporal dynamics)• Theoretically: approximation of any dynamical system
• Well suited as universal predictor
i j
ij
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Architecture of CTRNN Predictor • 5 neurons, fully connected• 3 inputs. Acceleration vector in previous step• “Hidden” neurons• 3 outputs. Acceleration vector in next step• Connections between neurons/inputs
= State of neuron i at time t
= Connection weight between neuron i and j
= Connection weight of input k to neuron i
= Value of input k (X,Y,Z)
= Bias of neuron j
= Time constant of neuron i
= 0.01 secsDiscretization using Forward Euler numerical integration:[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Training of the signal predictors
• Record instances of each gesture class
• Train one predictor for each class
• For each class: minimize prediction error
• Genetic algorithm– Robust in complex search spaces– Representation of the parameters by a genetic string (binary string)
• Global optimization of neural network parameters– Neuron interconnection weights– Neuron input weights– Time constant– Bias
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Genetic algorithm
···Neuronweights
Inputweights
6 bits
Bias&
TimeConstant
Neuron parameters: 60 bits
Genetic string (5 neurons): 300 bits
Fitness function• Minimize prediction error for a given class• Measured on N of a training set T (T1...TN)• Lower is better (smaller prediction error)
GA Parameters• 100 individuals• Rank selection of the 30 best individuals• One-point crossover rate: 70% • Mutation rate: 1% per bit• Elitism
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Experiments
• 8 gesture classes
• Planar
• Acceleration sensor on wrist
• 20 instances per class (one person)
• "Restricted" setup– No motion between gestures– Automatic segmentation (magnitude of the signal >1g indicates gesture)
• "Unconstrained" setup– Freely moving in an office, typical activities (sitting, walking, reading …)– Manual segmentation pressing a button
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Results: unconstrained setup
• Training: 62%-100% (80.5% average); testing: 48%-92% (63.6% average)• User egomotion
Training Testing
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Prediction error: gesture of class A
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Prediction error: one instance per class
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007
© Daniel Roggen www.danielroggen.net [email protected]
Activity segmentation and classification with string matching
Strings
Trajectories
Sensors +Signal Processing
Sensors
becfcca
aabadca
bad
cfcc
Templates
Segments
bad
cfcc
StringMatchin
gFusion
OverlapDetection
Activity Spotting
Filtering
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008
© Daniel Roggen www.danielroggen.net [email protected]
Motion encoding
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008
ab
c
de
f
g
h
Codebook
x
y
b
b
c
c
b
b
d
d
c
c
b
b
b
b
DirectionVector
Trajectory
© Daniel Roggen www.danielroggen.net [email protected]
• Approximate string matching algorithm is used to spot activity occurrences in the motion string– Based on a distance measure called Levensthein or edit distance
– Edit distance involves symbol operations associated with dedicated costs
• substitution/replacement r• insertion i• deletion d
– Crucial algorithm modification to find template occurrences at arbitrary positions within the motion string
String matching
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008
© Daniel Roggen www.danielroggen.net [email protected]
Approximate string matching
i i
r = = = =d
= = r ==
b c b d c b b
a b c b b
b d a h c g b
a b c b b
b d
a h
c g b
C1(t0) = r + d
C2(t0) = 2i + r
Template String 1
Template String 2
Motion String
Timet-1t-6 t-5 t-4 t-3 t-2 t0
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008
© Daniel Roggen www.danielroggen.net [email protected]
Spotting operation
t
MatchingCost C1(t)
b bd cc bb
kthr,1
ActivityEnd PointActivity
Start Point
Spotted Segment
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008
© Daniel Roggen www.danielroggen.net [email protected]
String matching
• +– Easily implemented in FPGAs / ASIC– Lightweight– Computational complexity scales linearly with number of
templates– Multiple templates per activities
• -– Need a string encoding– Hard to decide how to quantize sensor data– Online implementation requires to "forget the past"
© Daniel Roggen www.danielroggen.net [email protected]
Activity Recognition with Hidden Markov model
• Markov chain– Discrete-time stochastic process
– Describes the state of a system at successive times
– State transitions are probabilistic
– Markov property: state transition depends only on the current state
– State is visible to the observer
– Only parameter: state transition probabilities
• Hidden Markov model– Statistical model which assumes the system being
modeled is a Markov chain
– Unknown parameters
– State is NOT visible to the observer
– But variables influenced by the state are visible (probability distribution for each state)
– Observations generated by HMM give information about the state sequence
0
1
2
3
a01
a02
a12
a13
a23
a00
0
1
2
3
a01
a02
a12
a13
a23
a00
Z1Z0
b21b20
Z2
b22
© Daniel Roggen www.danielroggen.net [email protected]
Hidden Markov model: parameters
0
1
2
3
a01
a02
a12
a13
a23
a00
aij: state transition probabilities (A={aij})bij: observation probabilities (B={bij})Π: initial state probabilities
N: number of states (N=4)M: number of symbols (M=3)X: State space, X={x1, x2, x3...}Z: Observations, Z={z1, z2, z3...}
a00 a01 a02 a03
a10 a11 a12 a13
a20 a21 a22 a23
a30 a31 a32 a33
b00 b01 b02
b10 b11 b12
b20 b21 b22
b30 b31 b32
Π0 Π1 Π2 Π3
λ(A,B,Π): HMM model
Z1Z0
b21b20
Z2
b22
© Daniel Roggen www.danielroggen.net [email protected]
Hidden Markov model: 3 main questions
Find most likely sequence of states generating Z: {xi}T
• Model parameters λ known, output sequence Z known
• Viterbi algorithm
HMM training: find the HMM parameters λ• (Set of) Output sequence(s) known
• Find the observation prob., state transition prob., ....
• Statistics, expectation maximization: Baum-Welch algorithm
Find probability of output sequence: P(Z¦ λ) • Model parameters λ known, output sequence Z known
• Forward algorithm
© Daniel Roggen www.danielroggen.net [email protected]
• Waving hello (by raising the hand)– Raising the arm
– Lowering the arm immediately after
• Handshake– Raising the arm
– Shaking
– Lowering the arm
Handraise v.s. Handshake
• Measurements: angular speed of the lower arm at the elbow
• Only 3 discrete values:– <0, negative angular speed– =0, zero angular speed– >0, positive angular speed
α
> > > > < < < = < <
> > = < = = < > < < > < < < < < = < < <
Handraise
Handshake?
© Daniel Roggen www.danielroggen.net [email protected]
Classification with separate HMMs
• Train HMM for each class (HMM0, HMM1, ....) with Baum-Welch
– HMM models the gestures
• Classify a sequence of observations– Compute the probability of the sequence with each HMM
• Forward algorithm: P(Z / HMMi).
– Consider the HMM probability as the a priori probability for the classification
• In general the class corresponds to the HMM with the highest probability
C=0,1
Gesture 0
Gesture 1
HMM0
HMM1
P(G=0)
P(G=1)MaxGesture
Gesture 2
Gesture 3
HMM2 P(G=2)
HMM3 P(G=3)
C=0,1,2,3
Training / testing dataset Likelihood estimation Classification w/maximum likelihood
© Daniel Roggen www.danielroggen.net [email protected]
Validation of activity recognition [1]
• Recognition performance– Confusion matrix
– ROC curve
– Continuous activity recognition measures
– Latency
• User-related measures– Comfort / user acceptance
– Robustness
– Cost
• Processing-related measures– Computational complexity, memory
– Energy
• ... application dependent!
[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context (QuaCon), 2009
© Daniel Roggen www.danielroggen.net [email protected]
Performance measures: Confusion matrix
• Instance based• Indicates how an instance is classified / what is the true class• Ideally: diagonal matrix• TP / TN: True positive / negative
– correctly detected when there is (or isn't) an activity
• FP / FN: False positive / negative– detected an activity when there isn't, or not detected when there is
• Substitution: correctly detected, but incorrectly classified
[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context (QuaCon), 2009
© Daniel Roggen www.danielroggen.net [email protected]
Performance measures: ROC curve
• Receiver operating characteristic• Indicates classifier performance when a parameter is varied
– E.g. null class rejection threshold
• True positive rate (TPR) or Sensitivity– TPR = TP / P = TP / (TP + FN)
• True negative rate– FPR = FP / N = FP / (FP + TN)
– Specificity = 1 − FPR
[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context (QuaCon), 2009
© Daniel Roggen www.danielroggen.net [email protected]
Performance measures: online activity recognition
• Problem with previous measures: suited for isolated activity recognition– I.e. the activity is perfectly segmented
• Does not reflect performance of online (continuous) recognition• Ward et al introduce [2]:
– Overfill / underfill: activities detected as longer/shorted than ground truth
– Insertion / deletions
– Merge / fragmentation / substitutions
[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context (QuaCon), 2009
[2] Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011
from [2]from [1]
© Daniel Roggen www.danielroggen.net [email protected]
ValidationEntire dataset
Training / evaluation
Train set Test set• Optimization of the ARC on
the train set
• Includes feature selection, classifier training, null class rejection, etc
• Never seen during training
• Assess generalization
• Used only once for testing
• (otherwise, indirectly optimizing on test set)
Cross-validation Fold 1
Fold 2
Fold 3
Fold 4
• 4-fold cross-validation• Assess whether results generalize to independent
dataset
© Daniel Roggen www.danielroggen.net [email protected]
Validation
• Leave-one-out cross-validation:– Train on the entire samples minus one
– Test on the left-out sample
• In wearable computing, various goals:– Robustness to multiple user (user-independent)
– Robustness to multiple sensor placement (placement-independent)
– ...
Leave out Assess performance
Person User-independent
Day, week, ... Time-independent (e.g. if the user can change behavior over time)
Sensor placement Sensor-placement-independent
Sensor modality Modality-independent
... ...
© Daniel Roggen www.danielroggen.net [email protected]
For further readingARC• Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation
Magazine, 2011
Activity recognition• Stiefmeier et al, Wearable Activity Tracking in Car Manufacturing, PCM, 2008• J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and
accelerometers,”• IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.• L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l
Conference, Apr. 2004, pp. 1–17.• D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,”
Pervasive and Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.• Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor
Networks (BSN), 2010
Classification / Machine learning / Pattern recognition• Duda, Hart, Stork, Pattern Classification, Wiley Interscience, 2000• Bishop, Pattern recognition and machine learning, Springer, 2007 (http://research.microsoft.com/en-us/um/people/cmbishop/prml/)
Performance measures• Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality
of Context (QuaCon), 2009• Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011
© Daniel Roggen www.danielroggen.net [email protected]