HMM-based Activity Recognition with a Ceiling RGB...

1
HMM-based Activity Recognition with a Ceiling RGB-D Camera Daniele Liciotti, Emanuele Frontoni, Primo Zingaretti, Nicola Bellotto, and Tom Duckett Università Politecnica delle Marche (Italy), University of Lincoln (UK) {d.liciotti, e.frontoni, p.zingaretti}@univpm.it, {nbellotto, tduckett}@lincoln.ac.uk Introduction The main goal is to classify and predict the probability of an analysed subject action. We perform activity detection and recognition us- ing an inexpensive RGB-D camera. Human ac- tivities, despite their unstructured nature, tend to have a natural hierarchical structure; for instance, generally making a coffee involves a three-step process of turning on the coffee ma- chine, putting sugar in the cup and opening the fridge for milk. Action sequence recognition is then handled using a discriminative hidden Markov model (HMM). The innovative aspects are in proposing an ade- quate HMM structure and also the use of head and hands 3D positions to estimate the prob- ability that a certain action will be performed, which has never been done before, for the best of our knowledge, in ADLs recognition in indoor environments. ADLs Model Activity 1 Activity 2 Activity 3 Activity n 3D Head Point 3D Hand Point HMM 1 HMM 2 HMM 3 Activity 1 Activity 2 Activity 3 Activity n ... 3D Head Point 3D Hand Point HMM 1 HMM 2 HMM 3 ... Observations (O) Classication Select model with Maximum Likehood Figure 1: Block diagram of the recognition process. Information provided by head and hands detec- tion algorithms can be used as input for a set of HMMs. Each of these recognise different actions sequence. After training the model, we consider an action sequence s = {s 1 ,s 2 ,...,s n } and calculate its probability λ for the observa- tion sequence P (s|λ). Then we classify the ac- tion as the one which has the largest posterior probability. Figure 1 depicts the general scheme of the recognition process. In particular, we used three different HMMs, which have as observa- tions 3D points of: the head; the hands; both head and hands together. Finally, the classifica- tion module provides the action x j that maxi- mizes P HMM i . It is the HMM trajectory proba- bility that follows the activity sequence s given the sequence of n observations, i.e.: x j = arg max i P HMM i (X 1:n seq n (s)|o 1:n ) Setup and Acquisition (a) (b) (c) (d) Figure 2: Snapshots of RADiAL session registration. To evaluate the usefulness of our approach for activity recognition, we built a new dataset (RADiAL) that contains common daily activi- ties such as making coffee, making tea, open- ing the fridge and using the kettle. The RGB-D camera was installed on the ceiling of L-CAS laboratory at approximatel 4m above the floor (Figure 3). fridge tea sugar kettle coee machine RGB-D Camera Figure 3: Reconstructed layout of the kitchenette where RGB-D camera is installed. The RADiAL dataset 1 was collected in an open- plan office of the Lincoln Centre for Autonomous Systems (L-CAS). The office consists of a kitch- enette, resting area, lounge and 20 working places that are occupied by students and post- doctoral researchers. We installed a ceiling RGB-D camera (Figure 3) that took a snapshot (with dimensions of 320 × 240 pixels, Figure 2) of the kitchenette area every second for 5 days, and we hand-annotated activities of one of the re- searchers over time. Furthermore, the RADiAL dataset contains the 3D positions of the head and hands for each person with a minute-by-minute timeline of 5 different activities performed at the kitchen over the course of days. RADiAL con- tains 100 trials. Each trial includes the actions related to one person. Experimental Results Five models were used to recognize activities in the RADiAL dataset and correspond re- spectively to the activities “other” (this action contains all the other activities perfomed in a kitchen environment), “coffee” (making a coffee), “kettle” (taking the kettle), “tea/sugar” (making tea or taking sugar), and “fridge” (opening the fridge). The results were obtained using two dif- ferent validation techniques. other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.574 0.236 0.029 0.105 0.056 0.072 0.797 0.114 0.015 0.002 0.053 0.070 0.701 0.161 0.015 0.069 0.048 0.121 0.704 0.058 0.091 0.033 0.025 0.243 0.609 0.08 0.16 0.24 0.32 0.40 0.48 0.56 0.64 0.72 (a) HMM 1 other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.704 0.106 0.032 0.089 0.068 0.045 0.831 0.052 0.070 0.003 0.020 0.194 0.579 0.202 0.006 0.036 0.067 0.198 0.678 0.021 0.056 0.084 0.022 0.184 0.654 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (b) HMM 2 other coffee kettle tea/sugar fridge Predicted label other coffee kettle tea/sugar fridge True label 0.759 0.073 0.024 0.087 0.058 0.031 0.875 0.041 0.052 0.002 0.014 0.139 0.696 0.143 0.007 0.027 0.047 0.156 0.748 0.021 0.036 0.063 0.019 0.168 0.713 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (c) HMM 3 Figure 4: k -fold cross-validation confusion matrices. Table 1: Classification Results Cross Validation. HMM 1 HMM 2 HMM 3 PPV TPR F 1 PPV TPR F 1 PPV TPR F 1 other 0.73 0.57 0.64 0.89 0.70 0.79 0.93 0.76 0.84 coffee 0.67 0.80 0.73 0.69 0.83 0.75 0.76 0.87 0.81 kettle 0.60 0.70 0.65 0.47 0.58 0.52 0.58 0.70 0.63 tea/sugar 0.66 0.70 0.68 0.64 0.68 0.66 0.70 0.75 0.72 fridge 0.74 0.61 0.67 0.74 0.65 0.69 0.78 0.71 0.74 avg 0.68 0.68 0.68 0.73 0.71 0.71 0.78 0.77 0.77 a http://vrai.dii.univpm.it/radial-dataset

Transcript of HMM-based Activity Recognition with a Ceiling RGB...

Page 1: HMM-based Activity Recognition with a Ceiling RGB …vrai.dii.univpm.it/sites/default/files/adls-poster.pdfactivity recognition, we built a new dataset (RADiAL) that contains common

HMM-based Activity Recognitionwith a Ceiling RGB-D Camera

Daniele Liciotti, Emanuele Frontoni, Primo Zingaretti,Nicola Bellotto, and Tom Duckett

Università Politecnica delle Marche (Italy), University of Lincoln (UK){d.liciotti, e.frontoni, p.zingaretti}@univpm.it, {nbellotto, tduckett}@lincoln.ac.uk

Introduction

The main goal is to classify and predict theprobability of an analysed subject action. Weperform activity detection and recognition us-ing an inexpensive RGB-D camera. Human ac-tivities, despite their unstructured nature, tendto have a natural hierarchical structure; forinstance, generally making a coffee involves athree-step process of turning on the coffee ma-chine, putting sugar in the cup and openingthe fridge for milk. Action sequence recognitionis then handled using a discriminative hiddenMarkov model (HMM).The innovative aspects are in proposing an ade-quate HMM structure and also the use of headand hands 3D positions to estimate the prob-ability that a certain action will be performed,which has never been done before, for the bestof our knowledge, in ADLs recognition in indoorenvironments.

ADLs Model

Activity1

Activity2

Activity3

Activityn

3D HeadPoint

3D HandPoint

HMM1

HMM2

HMM3

Activity1

Activity2

Activity3

Activityn

...3D Head

Point

3D HandPoint

HMM1

HMM2

HMM3 ...

Observations (O)

ClassificationSelect model with

MaximumLikehood

Figure 1: Block diagram of the recognition process.

Information provided by head and hands detec-tion algorithms can be used as input for a set ofHMMs. Each of these recognise different actionssequence. After training the model, we consideran action sequence

s = {s1, s2, . . . , sn}and calculate its probability λ for the observa-tion sequence P (s|λ). Then we classify the ac-tion as the one which has the largest posteriorprobability. Figure 1 depicts the general schemeof the recognition process. In particular, we usedthree different HMMs, which have as observa-tions 3D points of: the head; the hands; bothhead and hands together. Finally, the classifica-tion module provides the action xj that maxi-mizes PHMMi

. It is the HMM trajectory proba-bility that follows the activity sequence s giventhe sequence of n observations, i.e.:xj = arg max

iPHMMi

(X1:n ∈ seqn(s)|o1:n)

Setup and Acquisition

(a) (b)

(c) (d)Figure 2: Snapshots of RADiAL session registration.

To evaluate the usefulness of our approach foractivity recognition, we built a new dataset(RADiAL) that contains common daily activi-ties such as making coffee, making tea, open-ing the fridge and using the kettle. The RGB-Dcamera was installed on the ceiling of L-CASlaboratory at approximatel 4m above the floor(Figure 3).

fridge

teasugar

kettlecoffee

machine

RGB-DCamera

Figure 3: Reconstructed layout of the kitchenette whereRGB-D camera is installed.

The RADiAL dataset1 was collected in an open-plan office of the Lincoln Centre for AutonomousSystems (L-CAS). The office consists of a kitch-enette, resting area, lounge and 20 workingplaces that are occupied by students and post-doctoral researchers. We installed a ceilingRGB-D camera (Figure 3) that took a snapshot(with dimensions of 320×240 pixels, Figure 2) ofthe kitchenette area every second for 5 days, andwe hand-annotated activities of one of the re-searchers over time. Furthermore, the RADiALdataset contains the 3D positions of the head andhands for each person with a minute-by-minutetimeline of 5 different activities performed at thekitchen over the course of days. RADiAL con-tains 100 trials. Each trial includes the actionsrelated to one person.

Experimental Results

Five models were used to recognize activitiesin the RADiAL dataset and correspond re-spectively to the activities “other” (this actioncontains all the other activities perfomed in akitchen environment), “coffee” (making a coffee),“kettle” (taking the kettle), “tea/sugar” (makingtea or taking sugar), and “fridge” (opening thefridge). The results were obtained using two dif-ferent validation techniques.

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.574 0.236 0.029 0.105 0.056

0.072 0.797 0.114 0.015 0.002

0.053 0.070 0.701 0.161 0.015

0.069 0.048 0.121 0.704 0.058

0.091 0.033 0.025 0.243 0.609 0.08

0.16

0.24

0.32

0.40

0.48

0.56

0.64

0.72

(a)HMM1

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.704 0.106 0.032 0.089 0.068

0.045 0.831 0.052 0.070 0.003

0.020 0.194 0.579 0.202 0.006

0.036 0.067 0.198 0.678 0.021

0.056 0.084 0.022 0.184 0.6540.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(b)HMM2

othe

r

coffe

e

kettl

e

tea/

suga

r

fridg

e

Predicted label

other

coffee

kettle

tea/sugar

fridge

Tru

e lab

el

0.759 0.073 0.024 0.087 0.058

0.031 0.875 0.041 0.052 0.002

0.014 0.139 0.696 0.143 0.007

0.027 0.047 0.156 0.748 0.021

0.036 0.063 0.019 0.168 0.713 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

(c)HMM3

Figure 4: k-fold cross-validation confusion matrices.

Table 1: Classification Results Cross Validation.HMM1 HMM2 HMM3

PPV TPR F1 PPV TPR F1 PPV TPR F1

other 0.73 0.57 0.64 0.89 0.70 0.79 0.93 0.76 0.84coffee 0.67 0.80 0.73 0.69 0.83 0.75 0.76 0.87 0.81kettle 0.60 0.70 0.65 0.47 0.58 0.52 0.58 0.70 0.63

tea/sugar 0.66 0.70 0.68 0.64 0.68 0.66 0.70 0.75 0.72fridge 0.74 0.61 0.67 0.74 0.65 0.69 0.78 0.71 0.74avg 0.68 0.68 0.68 0.73 0.71 0.71 0.78 0.77 0.77

ahttp://vrai.dii.univpm.it/radial-dataset