Laban Movement Analysis and LDA Distributed Monitoring · used for detection of personality traits...

Laban Movement Analysis andLDA Distributed Monitoring

Ran Bernstein

Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016

Laban Movement Analysis andLDA Distributed Monitoring

Research Thesis

Submitted in partial fulfillment of the requirements

for the degree of Master of Computer Science

Ran Bernstein

Submitted to the Senate

of the Technion — Israel Institute of Technology

Kislev Hatashva Haifa December 2016


This research was carried out under the supervision of Prof. Assaf Schuster, in the

Faculty of Computer Science.

Some results in this thesis have been published as articles by the author and research

collaborators in conferences and journals during the course of the author’s doctoral

research period, the most up-to-date versions of which being:

Bernstein, Ran, et al. ”Laban movement analysis using kinect.” Int. J. Comput.

Electr. Autom. Control Inform. Eng 9 (2015): 1394-1398.

Ran, Bernstein, et al. ”Multitask learning for Laban movement analysis.” Pro-

ceedings of the 2nd International Workshop on Movement and Computing. ACM,

2015.

Acknowledgements

I would like to thank my advisor, my parents and my girlfriend.

The generous financial help of the Technion is gratefully acknowledged.


Contents

Abstract 1

1 Laban Movement Analysis of Movements Recorded by Kinect 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 Laban Movement Analysis (LMA) . . . . . . . . . . . . . . . . . 4

1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Kinect Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.2 Clip collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Multi Label Classification . . . . . . . . . . . . . . . . . . . . . . 7

1.2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 10

1.2.6 Single Task Learning (STL) . . . . . . . . . . . . . . . . . . . . . 11

1.2.7 Multi- Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.1 Single-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.2 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 LDA Model Monitoring in Distributed Systems 17

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Setup and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.1 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Monitoring Problem . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3 Monitoring Distributed LDA With Convex Subsets . . . . . . . . . . . . 20

2.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.2 Convex Safe Zones . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.3 Convex Bound for Local Condition . . . . . . . . . . . . . . . . . 23

2.3.4 Proof of the Convex Bound Lemma . . . . . . . . . . . . . . . . 24

2.4 Distributed LDA Monitoring Algorithm . . . . . . . . . . . . . . . . . . 26


2.4.1 Probabilistic Distributed LDA Monitoring . . . . . . . . . . . . . 27

2.4.2 Analysis of the probabilistic version, PDLDA . . . . . . . . . . . 28

2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5.1 Synthetic Data Experiments . . . . . . . . . . . . . . . . . . . . . 28

2.5.2 Real Data Experiments . . . . . . . . . . . . . . . . . . . . . . . 31

3 Conclusion 37


List of Figures

1.1 Skeleton positions relative to the human body . . . . . . . . . . . . . . . 6

1.2 Kinect Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Influence of the number of features on the performance. The selection was

made according to statistical significance: The blue line is the difference

between the score with and without feature selection. It can be seen that

the optimal percentage of features to select is 10% . . . . . . . . . . . . 11

1.4 Recall, precision and F1 score of each Laban quality separately. The

evaluation was conducted on a dataset that was captured on only one

CMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5 Performance on ordinary people (non-CMAs) instructed to perform sev-

eral tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Example of incorrect monitoring by applying LDA locally. The initial

state of the data is presented in (A) and the state at a later point is

shown in (B). In (B) every node (green and red dashed lines) calculates

the same angle for the separator as it was in (A). But it can be seen that

the global separator’s (blue solid line) angle has changed significantly. . 21

2.2 Illustration of the generation process of the synthetic data. The class

P (denoted in blue) is fixed, while Q changes three times, every 1,500

rounds (the changes are depicted by the dark arrows). 2.3. . . . . . . . . 29

2.3 DLDA error (blue) vs. PER(100) error (red), for the synthetic data de-

scribed above. Horizontal axis represents rounds, vertical axis represents

the norm of the difference between the real (global) model and the current

model held at the nodes. Window size is 1,000. The maximum allowed

error (which DLDA guarantees will never be surpassed) is T = 0.997

(which corresponds to a difference of 0.077 radians, or 4.4 degrees, in the

classifier’s direction). Both algorithms transmit the same overall number

of bytes, but at different rounds; while PER sends alerts periodically,

DLDA alerts only when the classifier may have changed. For this reason,

PER yields a larger error when the two classes (and the classifier) change. 30

2.4 A toy example demonstrating early detection of a change in the data. . 31


2.5 Communication as the function of model drift for DLDA and PER. The

periodic algorithm is tuned to achieve the same max model drift as DLDA

for each model drift threshold. . . . . . . . . . . . . . . . . . . . . . . . 31

2.6 Communication as a function of the number of nodes for fixed (blue) and

changing (green dashed line) datasets . . . . . . . . . . . . . . . . . . . 32

2.7 Communication as function of window size . . . . . . . . . . . . . . . . 32

2.8 Communication as a function of input dimension for fixed (blue) and

changing (green dashed line) datasets . . . . . . . . . . . . . . . . . . . 32

2.9 Comparison between maximal (over nodes) DLDA model drift (blue) and

the true global model drift (green dashed line) for k = 2, W = 450. It

can be seen that DLDA responds to the change in the data that occurs

after 600 rounds (red dotted vertical line) and causes a synchronization

in round 698 (blue dashed vertical line). . . . . . . . . . . . . . . . . . . 33

2.10 The top and the center figures show the DLDA algorithm on the Power

Supply data set for a small (top) and large (center) number of nodes. The

blue line represents the value of the local bound expression, corresponding

to the node with the maximum value. The green dashed line shows the

model drift (normalized by the threshold); the model is computed after

the data was aggregated from all nodes. The bottom plot shows the

results of the PDLDA on the same dataset. The blue line in the bottom

plot represents the fraction of violated nodes. . . . . . . . . . . . . . . . 35

2.11 Demonstration of PDLDA on the Gas Sensor dataset. A comparison

between the true model drift (green) to the fraction of the nodes that

are violated in the current round (blue). The experiment is configured

for k=100 nodes, and the violation threshold is VT=80. . . . . . . . . . 36


Abstract

The first chapter of the thesis deals with Laban Movement Analysis (LMA), which is a

method for describing, interpreting and documenting all varieties of human movement.

Analyzing movements using LMA is advantageous over kinematic description of the

movement, as it captures qualitative aspects in addition to the quantitative aspects of

the movement. As such, it has many applications and its popularity is increasing in

recent years as the preferred method for movement analysis in motor research, theater

training, and the development of interactive gaming animations and robotics. In this

study we aimed to develop an automated method for recognizing 18 different Laban

motor elements (motor characteristics) from markerless 3D movement data captured by

the ubiquitous Kinect camera. Using machine-learning methods we have succeeded to

obtain a recall rate of 38-94% (65% on average) and precision rate of 29-100% (59% on

average) for the 18 motor elements that were tested.

The second chapter of the thesis deal with systems for mining dynamic data streams

should be able to detect changes that affect the accuracy of their model. A distributed

setting is one of the main challenges in this kind of change detection. In a distributed

setting, model training requires centralizing the data from all nodes (hereafter, syn-

chronization), which is very costly in terms of communication. In order to minimize

the communication, a monitoring algorithm should be executed locally at each node,

while preserving the validity of the global model (the model that will be computed if a

synchronization will occur). For minimizing this communication, we propose the first

communication-efficient algorithm for monitoring a classification model over distributed,

dynamic data streams. The classification algorithm that we chose to monitor is Linear

Discriminant Analysis (LDA), which is a popular method used for classification and

dimensionality reduction in many fields. This choice was made due to the strong theo-

retical guarantee of correctness that we prove on the monitoring algorithm of this kind

of model. In addition to its theoretical guarantee, we demonstrated how our algorithm

and a probabilistic variant of it reduce communication volume by up to two orders of

magnitude (compared to synchronization in every round) on three real data sets from

different worlds of content. Moreover, our approach monitors the classification model

itself as opposed to its misclassifications, which makes it possible to detect the change

before the misclassification occurs.

1


2


Chapter 1

Laban Movement Analysis of

Movements Recorded by Kinect

1.1 Introduction

Recent years there has been a surge of interest in automated analysis of human motor

behavior in the fields of robotics, computer science and animation. Computerized

recognition of movement characteristics has many potential applications: It could be

used for detection of personality traits that are associated with specific motor tendencies

[LD03] during, for example, a job interview, and for early detection and/or for severity

assessment of various illnesses characterized by abnormal motor behavior, such as autism

[Dot95] , schizophrenia or Parkinson’s disease. Automated emotion recognition from

movement, based on associations between certain emotions and specific motor behaviors

[ACC15] is another important application, which may have a variety of uses such as

online feedback to presenters to help them convey through their body-language the

emotional message they want to communicate (e.g., politicians and public speakers or

dancers and actors in training), or recognition of people’s emotions during interactive

games such as those played using the Xbox. Automated analysis of motor behavior can

be used also to assess the progression and improvement of participants in a variety of

training programs that employ virtual reality environment [AC13]; it can be used for

motion retrieval from large motion database [KCT+13] and for movement indexing and

classification [AC13] in the field of animation. Lastly, machine learning of a person’s

movement patterns has enormous potential for future, from security identification,

to interactive environments. Most of the studies dealing with automatic analysis of

human movement captured movement using complex and expensive 3D motion capture

systems. However, in order to implement the many potential uses mentioned above

in our everyday life, we should be able to do such automated analysis using a small,

inexpensive (affordable) and easy to use 3D camera. One such camera that has been

successfully used in interactive games is the Kinect camera.

3


1.1.1 Our Contribution

In this study we aimed to develop an automated method for recognizing the motor

characteristics of any human movement captured by a Kinect camera. Once the

movement is captured in 3D, its assessment and analysis can be done in various ways. In

this study we chose to develop the computerized recognition of movement characteristics

based on Laban Movement Analysis (LMA).

1.1.2 Laban Movement Analysis (LMA)

LMA is a well-established and widely accepted systematic language for describing and

documenting movement. LMA’s comprehensiveness as a motor analysis method could be

inferred from its diverse use in research: it has been used to evaluate fighting behaviors of

rats [FP03], to analyze behavior of nonhuman animals in naturalistic settings [FCK97],

to diagnose autistic individuals [Dot95], to evaluate motor recovery of stroke patients

[FW06], and to characterize the development of infants’ reaching movements [FW12].

In recent years it gained additional popularity among computer science researchers who

have used it in studies that describe, recognize or create bodily emotional expressions

for applications in human-robot interactions, interactive games such as the Xbox,

and in animations [CLV03, RDA08, ZGCA13, LVBB10, ZB05, MKI09, MK10], and

recently it has even been attempted, through the use of EEG, to identify the brain

mechanisms underlying the production of some of the LMA motor elements [CGHN+14].

In addition, some studies have found correlations between some Laban motor elements

and personality traits or emotional states [LD03, STW15].

Analyzing movements using LMA is advantageous over other methods, as it cap-

tures various qualitative motor elements (movement characteristics) in addition to

quantitative (kinematic) aspects of the movement. LMA categorizes movement with

four main components: Body, Effort, Shape, and Space. Body (i.e. which body parts

move) and Space (i.e. the direction of movement such as Vertical: Up/Down, Sagittal:

Forward/Back or Horizontal: to the side), describe how the many spatial-temporal body

and limb relationships change. The category of Body also includes specific common body

actions such as jump and walk. Effort describes the qualitative aspect of movement

expressive of a person’s inner attitude towards movement via four Effort factors: Weight,

Time, Space and Flow. Each Factor identifies movement on a continuum between two

poles: fighting against the motor quality of that factor and indulging in that quality.

1) Weight Effort , identifies the amount of force or pressure exerted in movement,

on the continuum from Strong to Light (and movements lacking weight activation,

i.e., Passive/Heavy movement); 2) Time Effort identifies the degree of urgency or

acceleration/deceleration involved in a movement, i.e., Sudden or Sustained movement;

3) Space Effort , describes the focus or attitude towards a chosen pathway, i.e., is the

movement Direct or Indirect and 4) Flow Effort describes the element of control or

the degree to which a movement is Bound, i.e., controlled by muscle contraction, versus

4


Free, i.e., being released/liberated. Finally, Shape refers to the way the body ’sculpts’

itself in space: It describes the changes in the relationship of body parts to one another

and to the surrounding environment that occur when a body moves (e.g., whether

the body Encloses or Spreads, Rises or Sinks, etc.). In addition, LMA examines other

movement characteristic, such as the phrasing of the movement, which means the way

movement elements are sequenced into action. Analogous to phrasing in music, a motor

phrase can be rhythmic (repetitive), even (monotonous), etc. (For a more detailed and

systematic description of LMA see [BL80, SC13, Fer14]).

As can be seen from this short description, LMA is very thorough. It captures

a variety of movement dimensions, and has therefore become the preferred method

for movement analysis used by many scientists. Indeed, in a recent study that used

both Effort-Shape (part of LMA) and kinematic analyses to identify movement char-

acteristics associated with positive and negative emotions experienced during walking,

more differences among emotions were identified with Effort-Shape analysis than with

kinematic analysis [GCF12] , and both Chi et al.,[CCZB00] and Masuda et al.,[MK10]

chose to develop a computer generated animation [CCZB00] or robotic [MK10] system

that transforms simple movements into emotionally expressive movements, by modifying

certain movement parameters of the animated character or robot, based on LMA. Thus,

we have chosen to use LMA for the purpose of developing the automated method for

recognizing movement characteristics.

Because LMA is a comprehensive system with tens of different motor characteristics,

and because many of the current applications for automated analysis of movement have

to do with creation or recognition of emotional expressions in movements, we focused

this study on identification of the 18 Laban motor elements (Table 1.1 in the results

section) found to be associated with specific emotions[18]. Thus, we created a data

base of movements captured by a Kinect camera and developed machine learning-based,

algorithms for automated identification of the 18 Laban motor elements expressive of

emotion, from our Kinect data.

1.2 Method

1.2.1 Kinect Sensor Data

The Kinect Software Development Kit (SDK) detects the skeleton of the videotaped

moving person and provides the 3D coordinates of 24 joints along this skeleton, as seen

in Fig. 1.1.

The coordinates of these joints are given in a ”real world” coordinate system whose

origin [0,0,0] is in the sensor and whose x, y, and z axis are as depicted in Fig. 1.2 below.

Data were collected by the Kinect camera at 30 Hz.

5


Figure 1.1: Skeleton positions relative to the human body

Figure 1.2: Kinect Coordinate System

1.2.2 Clip collection

In order to develop the ability to automatically identify Laban motor elements we had

to ensure that the movements in the data set used for the machine learning, included

those elements. Thus, for this study, we generated two specific data sets:

• CMA dataset: This dataset consisted of clips of movements performed by Certified

(Laban) Movement Analysts experts (CMAs). Six CMAs performed movement

sequences of approximately 3 seconds long, which consisted of different combina-

tions of LMA motor elements. Before each movement sequence (clip) the CMAs

were given a list of 2-4 Laban motor elements out of the 18 motor elements that

were studied, and were instructed to move any movement that they want, as long

as it incorporates those required motor elements. Each of the CMAs moved about

80 such different combinations of 2-4 motor elements, for a total of 550 clips. To

achieve uniform distribution of the Laban qualities over the dataset, in every

movement sequence (clip) each CMA was asked to perform actions that included

several specific motor elements, and nothing but them.

6


• Non-CMA dataset: This dataset consisted of movement sequences performed by

two people without a background in LMA, who were asked to move as if they are

performing different every-day tasks such as greeting a friend or playing with a

balloon. Their movements lasted also about 3 seconds long and a total of 30 such

clips were collected. Their movements were tagged by a CMA who determined

which of the 18 Laban qualities that we tested in this study appeared in each of

their movement sequences (clips).

Both the CMAs and non-CMAs performed their movement sequences within a

316× 128 cm rectangular frame marked on the floor, whose front side was located 272

cm from the front of the Kinect Camera. By limiting the space within which the people

could move, we ensured that the Kinect camera could capture all of the mover’s joints

at any point in time throughout the movement sequence, and no joint came out of the

camera range.

1.2.3 Multi Label Classification

In multi-label learning each instance is associated with multiple labels simultaneously,

and the number of labels is not fixed from instance to instance. The task in this learning

paradigm is to predict the label set (Laban motor elements in our study) for each new

unseen instance (skeletal recording, i.e., clip), based on analysis of training instances

with known label sets. In other words, by providing the system with clips identified by

the Laban motor elements they include, the system learns to recognize the appropriate

motor elements in new clips which it didn’t “see” before. In this study we dealt with

three different classification problems, with increasing complexity. First we provided the

system with clips and the Laban motor elements included in them from one CMA, and

taught it to recognize those Laban elements in new unseen clips of the same CMA. This

method can be developed to teach a system to recognize qualities in an individual’s

unique movement expression. In the second step we taught the system to recognize the

Laban elements in the clips (i.e., movements) of each new CMA based on the labeled

clips of the other CMAs. Lastly, based on all CMAs dataset, the system learned to

recognize those motor elements in clips of the non-CMAs’ movements.

Clip Labeling

Clips were labeled by the motor elements in the instructions for each clip, with the

assumption that as experts in LMA, the CMAs indeed performed the required elements.

Thus, the instructions given to the CMAs regarding which motor elements to move,

were used as the ground truth for labeling the motor elements in each clip of the CMA

data set. Labeling of the motor elements in the movements of the non-CMAs was done

by one of the authors who is a CMA who observed those movements.

7


1.2.4 Feature Extraction

The machine learned to recognize the different Laban qualities by extracting many

features from each movement, and by learning from the training-set clips which features

characterize each motor element. It then identified the Laban elements in new clips

based on the features extracted from the movement in those new clips.

To enable the CMAs to express the motor elements in a variety of different movement

sequences, we did not want to constrain the lengths of the clips to be exactly 3 seconds.

Thus, in order to get feature vectors of uniform length (regardless of the original length

of the clips), every extracted feature was a function of the whole clip, i.e., all the

extracted features were in whole clip granularity.

Two groups of features were extracted: the first was relatively small, containing

a handful of features, each of which was designed to portray a specific Laban motor

element based on ”translation” of the meaning of that element into kinematic terms.

The second group contained about 6000 features, and exploited the rich data that was

provided by the Kinect software, by extracting from every joint in the skeleton, its

derivatives: angular velocity, acceleration and jerk. For every time series of [joint ×dimension (X,Y, Z)× derivative], we calculated about 20 statistics, such as: mean,

variance, skewness, kurtosis.

The following are examples for some of the manually composed features that were

designed to portray some of the specific motor elements, and for each of which we also

calculated the 20 statistics:

Advance and Retreat

Advance and retreat are two Laban motor elements that incorporate changes in the

Shape of the body in the sagittal plane, where part of the body’s core (axial skeleton),

usually the upper body, moves forward (Advance) or backward (Retreat) in relation to

the lower part of the body. These elements were quantified by projecting the velocity

vector of the Center of Mass (CM) on the vector of the front of the body. The CM

was approximated in this case by the average of all the joints. The front of the body

was approximated by the perpendicular vector to the vector between the Left Shoulder

(LS) and the Right Shoulder (RS). From the definition of CM of a physical system we

calculate:

~PCM (t) =∑

j∈Jointsαj~Pj(t), (1.1)

~Pshoulders(t) = ~PLS(t)− ~PRS(t), (1.2)

8


the front is perpendicular to ~Pshoulders, so we can easily calculate it with:

~Pfront = ~Pshoulders

0 0 1

0 1 0

−1 0 0

,

Ssag(t) = ~PCM (t) · ~Pfront(t), (1.3)

~Fsag = φ([Ssag(1), . . . Ssag(n)]), (1.4)

where ~Pj(t) is the vector of the position of joint j (as we get it from the Kinect) in

time t in a clip with n frames, and αj is a coefficient proportional to the mass around

the joint. φ is the function that creates the 20 statistics from the time series. S(t) is a

scalar in the time series at time t. F denotes the calculated features for Advance and

Retreat, and sag stands for sagittal.

Spread and Enclose

These are two Laban motor elements describing opposite changes in the Shape of the

body in the horizontal plane. In Spread the body becomes wider and when Enclosing,

the body becomes narrower. These elements were quantified by measuring the changes

in the average distance between every joint and the vertical axis of the body that extends

from the Head (H) to the Spine Base (SB):

dj =

∣∣∣(~Pj − ~PSB)× (~Pj − ~PH)∣∣∣∣∣∣~PH − ~PSB

∣∣∣ , (1.5)

Shoriz(t) =∑

j∈Jointsdj(t), (1.6)

~Fhoriz = φ([Shoriz(1), . . . Shoriz(n)]), (1.7)

Where P, S, φ, CM and F are defined as in the previous paragraph, and horiz stands

for horizontal.

Rise and Sink

Rise and Sink are changes in the Shape of the body in the vertical plane, where during

Rising, the body elongates upward and during Sinking the body goes down and shortens.

The distinction between these two Laban motor elements was quantified by measuring

9


the average distance on the Y axis of each joint from the CM :

Svert(t) =∑

j∈Joints

∣∣∣~Pj − ~PCM

∣∣∣ , (1.8)

~Fvert = φ([Svert(1), . . . Svert(n)]), (1.9)

where P, S,Θ, CM and F are defined as previously and vert stands for vertical.

Sudden and Sustain

Sudden and Sustain are two opposing motor elements of the Time dimension of the Effort

factor of the movement. The distinction between them was quantified by calculating

the skewness of the acceleration, based on the assumption that the acceleration of a

sudden movement has higher values at the beginning of the movement, i.e. is skewed to

the left.

~Vj(t) = ~Pj(t+ 1)− ~Pj(t), (1.10)

~aj(t) = ~Vj(t+ 1)− ~Vj(t), (1.11)

Skewj =1

n

n∑i=1

(aj(t)− µ

σ)3 (1.12)

Where ~Pj(t) is the vector of the position of joint j in time t, ~V j(t) is the vector of

the velocity of joint j in time t, µ and σ are the mean and standard deviation of the

accelerations (aj(t)), and n is the length of the time series (clip).

1.2.5 Performance Evaluation

From a statistical point of view, for every clip we had 18 possible labels (Laban motor

elements). The movement in each clip was constructed from a combination of 2-4 of

these elements, which meant that there was about 85% chance that a certain element

will not appear in a clip. Due to this sparsity, accuracy (defined as the percentage

of clips that have been labeled correctly out of the total number of clips) alone was

not a relevant metric for performance evaluation, since one could get 85% accuracy by

stating that for every clip none of the motor elements appear in it. However, if we define

Truly Positive Clips (TPC) as clips in which the relevant Laban element truly appeared,

and if we define Classified Positively Clips (CPC) as clips that our classifier found to

include the relevant motor element, then we can get a better performance evaluation by

combining precision (defined as the percentage of retrieved clips that were relevant) and

recall (defined as the percentage of relevant instances that were retrieved) to create the

10


more concise performance evaluation measure F1 score, which was defined as follow:

precision =|{TPC} ∩ {CPC}|

|{CPC}|. (1.13)

recall =|{TPC} ∩ {CPC}|

|{TPC}|. (1.14)

F1 =2 · precision · recallprecision+ recall

. (1.15)

1.2.6 Single Task Learning (STL)

In the single task learning, for each CMA, the machine learned to evaluate for each

Laban motor element separately whether it existed in a certain movement sequence,

based on the features that were detected in that movement sequence (clip). This was

done by a binary decision for every Laban element whether it existed in that movement

sequence or not.

Feature selection

For the purpose of the single task learning (STL) from each clip we extracted a vector

of 6120 features, most of which were noisy and redundant and required massive feature

selection in order to conduct the machine learning task. The feature selection was done

in three stages.

In the first stage we computed p-value for every feature. As seen in Fig. 1.3, filtering

out most of the features yielded better results than not filtering them, where using the

top 10% of the features was optimal.

Figure 1.3: Influence of the number of features on the performance. The selection wasmade according to statistical significance: The blue line is the difference between thescore with and without feature selection. It can be seen that the optimal percentage offeatures to select is 10%

The second stage of feature selection was conducted on the features which were not

filtered out in the first stage. In this stage the features were ranked according to their

information gain (IG), which is defined as:

IG(T, f) = H(T )−H(T |f), (1.16)

11


where T is the training set, f is a feature, and H() is the information entropy of a

dataset. During this stage 60% of the features were selected.

For the third stage of feature selection, which was performed on the features surviving

the first and second stage, we conducted a Least Absolute Shrinkage and Selection

Operator (LASSO) regularization [Tib96]. At the end of this three stages process, for

each motor element different number of features were selected, which were 5-20% of the

original number of features.

1.2.7 Multi- Task Learning

Multi Task Learning (MTL) framework [Car97] is an approach to multi-label learning

that learns each task (i.e., each problem solving) simultaneously with other tasks (with

solving other related problems), using shared representations, even when the tasks are

different. In our study MTL was achieved by simultaneously learning to recognize all 18

Laban motor elements. Unlike STL, which trains a separate model for every task (i.e.,

in our study, for learning to detect each Laban motor element separately), and in which

the data might be represented for each learning task differently, the goal in MTL was to

improve the performance of learning algorithms by learning classifiers for multiple tasks

jointly. MTL works particularly well when all the tasks have some commonality and are

generally slightly under sampled. For the MTL we used Multitask Elastic Net (MEN)

regularization, which is the multi-task regularization method used by Zou et al.,[ZH05].

MEN promotes sparsity and behaves also as feature selection mechanism. In MEN the

optimization objective is to minimize the following expression, where Y represents the

labels, X represents the samples, and W is the matrix that we want to learn:

‖Y −XW‖2F + λ1 · ‖W‖2,1 + λ2 · ‖W‖2F , (1.17)

λ1, and λ2 are hyper-parameters, where,

‖W‖2,1 =∑i

√∑j

w2ij , (1.18)

i.e., the sum of norm of each row (also known as mixed norm), and

‖W‖2F =∑i

∑j

w2ij , (1.19)

Feature selection for the MTL was carried out by averaging the statistical significance

of each feature with respect to all of the tasks. This was in contrast to the single task

learning, where every task had its own feature selection.

12


Figure 1.4: Recall, precision and F1 score of each Laban quality separately. Theevaluation was conducted on a dataset that was captured on only one CMA.

1.3 Results

1.3.1 Single-Task Learning

In the STL we used as training set 80% of the clips produced by each CMA to teach

the machine to recognize each motor element in the other 20% of the clips performed

by the same person. Fig. 4 demonstrates the precision, recall and F1 for each of the

Laban qualities for the first CMA whose data was collected. As can be seen in Figure

4, the performance varied from one Laban motor element to another and was about

40-85% for precision and recall. F1 values were in between the values for precision and

recall, and on average for all Laban elements were 0.6.

1.3.2 Multi-Task Learning

Evaluation performance of each Laban quality

The MEN regularization was performed on all the clips of all 6 CMAs that participated

in the study. As a result, 282 features were selected (same features for all of the tasks).

The performance of every motor element as classified by the MEN regularization is

presented in Table 1.1.

Multi-task vs. Single-task learning

The generalization ability of the model was enhanced, compared to that in the STL, by

the fact that the decision of which features to select was influenced by all the motor

elements. The most significant improvements were in the Laban elements that performed

worse in the single task learning setting (Strong and Sudden for example). As seen in

13


Table 1.1: Precision, recall, and F1 score of each Laban motor element that resultedfrom the MTL performed on all clips from all CMAs. The F1 average and standarddeviation over the motor elements are shown in the last row of the table

Motor Element Precis-ion

Recall F1score

Jump 0.89 0.81 0.85

Twist and Back 0.69 0.85 0.76

Sink 0.62 0.79 0.69

Rhythmicity 0.59 0.72 0.65

Spread 0.55 0.76 0.64

Head drop 0.60 0.66 0.63

Rotation 0.66 0.60 0.63

Free and Light 0.45 0.94 0.61

Up and Rise 0.67 0.54 0.60

Condense and En-close

0.44 0.84 0.58

Arms To UpperBody

0.67 0.54 0.60

Advance 1.00 0.38 0.55

Retreat 0.50 0.59 0.54

Passive 0.40 0.85 0.54

Bind 0.44 0.61 0.51

Direct 0.56 0.49 0.52

Sudden 0.61 0.41 0.49

Strong 0.29 0.42 0.34

Average 0.59 0.65 0.60

SD 0.17 0.17 0.11

Table 1.2, the multitask learning improved the overall F1 score by 4% compared to the

STL.

Table 1.2: Multitask vs Single task learning performance evaluation on a data set ofseveral CMA’s.

Metric Single task Multitask

Precision 0.46 0.59

Recall 0.71 0.65

F1 0.56 0.6

Evaluation performance for movements of unseen CMA

In this experiment the test set was taken from the clips of one CMA while the training

set was composed from the clips of all other 5 CMAs. Testing was performed for each

CMA separately and the results were averaged over all six CMAs. Performance as

measured by F1 degraded on the unseen CMA from 0.6 to 0.57. This degradation seems

mild considering the large variation/diversity among clips from one CMA to another, as

14


Rot

atio

nC

ond

ense

&E

ncl

ose

Sin

kF

ree&

Lig

ht

Arm

sToU

pp

erB

od

yS

pre

ad

Ju

mp

0

0.2

0.4

0.6

0.8

F1

score

Figure 1.5: Performance on ordinary people (non-CMAs) instructed to perform severaltasks.

every CMA performed different gestures, in different postures (some sitting and some

standing) and in different contexts (some were dancing while some were acting).

Evaluation performance for everyday movements of non-CMAs

The data set of non-CMAs consisted of several daily movements two people were asked

to do, such as pretending to greet a friend or play with a balloon. This data set was

small, and the people were instructed to do movements that we hoped would include

the Laban motor elements that we have examined in this study, but we had no direct

control over the exact movement they chose to do or the motor elements included in

those movements. A CMA who watched those movements determined which of the 18

Laban motor elements examined in this study appeared in those movement sequences.

As shown in Fig 5, which describes the learning performance for this data set, it turned

out that only 7 motor elements of the 18 examined in this study appeared in those

people’s movements. Performance as measured by average F1 degraded from 0.57 for

an unseen CMA to 0.54 for a non-CMA.

15


16


Chapter 2

LDA Model Monitoring in

Distributed Systems

2.1 Introduction

2.1.1 Setup and Motivation

In this chapter of the thesis, we address the problem of mining data streams when the

data is distributed over a large number of nodes with the same data generation process

at every node. However, the streaming data is not stationary; notably, it can change

over time. Classic examples of real-life prediction problems that involve this kind of

change are user preference prediction and fraud detection. In the former, the choices

of the user can change over time; in the latter, the fraudulent transactions change

constantly to avoid detection. In both, the change can render the prediction model

invalid.

In a setting where the model must be updated to stay valid and communication

is costly — the question is when to recompute the model. The naive solution for this

problem is recomputing the model periodically. The problem with this solution is that it

involves needless work if the model changes infrequently, yet may introduce unacceptable

errors between scheduled updates. In contrast to periodical computation, we focus on

monitoring the quality of a given model, and recomputing it only as needed.

In this work we focus on linear binary classifiers, using LDA [Fis36] as the learning

algorithm. This choice is due the popularity of linear classifiers in real applications and

that they serve as a platform for more complex classifiers, such as ensemble model in

the work of [HMR12, MGE11], neural networks in the work of [OHK15], and even

deep architectures in the work of [GDDM14]. Our method is distinct from the previous

work in the following two aspects:

Model-Based Monitoring: Monitoring the model and not the misclassifications has

an important benefit: the need for synchronization can be detected before the misclas-

sifications occurs. In contrast to most previous work on monitoring a classifier (that

17


utilizes misclassification rates to draw conclusions about the change in the distribution

[BGdCAF+06, GMCR04, NY07]), we propose to monitor the change in the model itself.

Distributed Setting: Monitoring a classifier has been actively studied in centralized

settings. In contrast to these studies, our is one of the very few works that monitor

in a distributed setting. In such setting, data is distributed over a large number of

nodes and the model is learned globally after synchronization. While the few existing

methods for classifier monitoring in distributed settings rely on heuristics ( [AGZ+13]),

our approach is the only one that provides a provable guarantees of correctness.

2.1.2 Our Contribution

We propose Distributed Linear Discriminant Analysis (DLDA): a novel communication-

efficient monitoring algorithm for LDA models of distributed, dynamic data streams.

We show how can we use the monitoring of the LDA model for concept drift detection.

To the best of our knowledge, this is the first algorithm that monitors the model itself,

rather than its prediction or fit. Given a previously computed global model, we derive

local constraints on the local data at each node. A node only communicates if its

constraint is broken. These constraints guarantee that if no node communicates, the

global hypothetical model is sufficiently close to the precomputed model. Evaluation

on three real datasets shows that it reduces communication by up to two orders of

magnitude. We also present and demonstrate and demonstrate Probabilistic Distributed

LDA Monitoring (PDLDA). This framework harnesses the distributed nature of the

system, and decides to sync according to the number of violations over the entire set

of nodes, rather than syncing every time a single node has a violation. To the best of

our knowledge, the LDA monitoring problem addressed in this paper has never been

addressed over a distributed setting.

2.1.3 Related Work

Monitoring dynamic data streams is a broad topic that has been addressed in different

research communities. Within this field, we focus on detecting a change in the data

stream that renders the prediction model invalid. In distributed settings, this problem is

even harder and referred to as distributed monitoring, and it is concerned with designing

local tests for monitoring a function that is defined globally over all the nodes in the

system. Our approach to this problem is to define a constraint over the local data

(at each node) that guarantees the validity of the global model. If local data (in one

or more nodes) does not meet the local condition, it leads to synchronization. The

synchronization process has large communication costs, and the goal of the distributed

monitoring methods is thus to minimize the number of synchronizations. Most of the

work on distributed monitoring has been concerned with simple functions of the data,

such as linear functions in the work of [KCR06] and [KRRS08] or monotonic functions

in the work of [MTW05]. For non-linear functions, examples include work on monitoring

18


the value of a single-variable polynomial as in the work of [SR08], and eigenvalue

perturbation as in the work of [HNG+07]. While the previous work handled specific

families of functions, we chose to use geometric approach for monitoring arbitrary

functions over distributed streams, as was proposed, later extended and generalized in

[SSK07, KSA+14, KSSL12]. A recently introduced work by [GKS15] on monitoring

Least Square Regression (LSR) using geometric monitoring is the closest to ours, but

our problem is more complex: unlike the global scatter matrix (required by LSR) the

global covariance matrix (required in LDA) is not the mean of the local covariance

matrices which makes the monitoring problem more much harder.

2.2 Problem Definition

We first describe the Linear Discriminant Analysis (LDA) algorithm and then define

the monitoring problem.

2.2.1 Linear Discriminant Analysis

LDA seeks a linear combination of features that characterize or separate two or more

classes of samples. The resulting combination may be used as a linear classifier, or for

dimensionality reduction before later classification.

In LDA the problem is approached by assuming that the conditional probability

density functions Pr(~x|y = p) and Pr(~x|y = q) are both normally distributed with

mean and covariance parameters (p,Bp) and (q,Bq), for two target classes P and Q

respectively. (x1, y1), . . . , (xn, yn) are i.i.d. samples, xi ∈ Rd and yi ∈ {0, 1}.

We seek a linear transformation (model), w ∈ Rd, that maximizes the separation

between the classes, where the separation is defined to be the ratio of the variance

between the classes to the variance within the classes:

S :=σ2between

σ2within

=(wT (p− q))2

wT (Bp +Bq)w. (2.1)

Solving the maximization problem yields that the decision criterion is a threshold on

the dot product

w · x > c

where

w ∝ (Bp +Bq)−1(p− q) (2.2)

c =1

2(T − pTS−1p p+ qTS−1q q). (2.3)

In this work we monitor w, and will refer it as the classification model.

19


2.2.2 Monitoring Problem

We denote k as the number of nodes and W as the number of samples in a node. Our

model uses discrete time (hereafter, rounds). Every node receives a new sample in a

round. We use the sliding window model, every node keeps two sliding windows (one

for each class) of length of W/2. As a node receives a new observation, it replaces the

oldest one from its class. xij and yij are the j’th sample and label in the i’th node and

xiold(p) and xiold(q) are the oldest samples from each class in the sliding window of the

i’th node. As data evolves, it is possible that the previously computed model no longer

matches the current true model. Let w0 be the existing model (vector of weights of a

linear classifier), previously computed at some point in the past (the synchronization

time), and let w be the true LDA model (the hypothetical model that synchronization

would yield if it occur). We wish to maintain an accurate estimation w0 of the current

global LDA model, w. For the classification purpose, the most important property of

a linear classifier is its direction. Therefore, we monitor the change in this direction:

given a threshold T , our goal is to raise an alert if

< w,w0 >

‖ w ‖‖ w0 ‖< T. (2.4)

i.e. if the angle between w0 and w is above a certain threshold (inner product between

unit vector is the cosine of the angle between them).

Due to the complexity of condition 2.4, we will monitor a restriction of it: we

replaced the cone containment condition to a sphere containment condition, i.e.,

||w − w0|| > R0, (2.5)

where R0 := ||w0||√

1− T 2 is the radius of the maximal volume sphere of which w0 is

its center and resides inside the cone from condition 2.4.

2.3 Monitoring Distributed LDA With Convex Subsets

Monitoring distributed LDA models is difficult because the global model cannot be

inferred from the local model at each node. Even when all current local models wi are

similar to the precomputed local models w0, the current global model w may be very

different from the precomputed model w0: consider the example in Figure 2.1 with

k = 2 nodes and dimension d = 2. The angle deviation of the global model (shown in

solid lines) is large (45 degrees) even though the local models (shown in dashed lines)

are identical to what they were at the initial point.

To overcome this difficulty, we impose constraints on local data at the nodes, rather

than on the function of the global aggregate. Given a function of the average of all

local data and the threshold, we compute a “good” convex subsets, called safe zones,

for each node.

20


Figure 2.1: Example of incorrect monitoring by applying LDA locally. The initial stateof the data is presented in (A) and the state at a later point is shown in (B). In (B)every node (green and red dashed lines) calculates the same angle for the separator asit was in (A). But it can be seen that the global separator’s (blue solid line) angle haschanged significantly.

As we show below, convexity plays a key role in the correctness of this scheme. As

long as local data stay inside the safe zones, we guarantee that the function of the

global average — the euclidean distance between the true global model to the one that

was computed in the last synchronization (hereafter, model drift) — does not cross a

threshold. Nodes communicate only when local data leaves the safe zone, which we

call a safe zone violation (hereafter, violation). Once that happens, violations can be

resolved, for example by synchronization. In other words, we want to impose conditions

on the local data at each node so that as long as they hold, ||w − w0|| < R0, i.e., the

global model is valid.

2.3.1 Notation

We recall that P and Q are the classes in the binary classification problem. (p, q) and

(pi, qi) are the global and local means of classes P and Q.

S and Si are the global and local normalized scatter matrices of the feature space:

Si :=1

W

W∑j=1

xij(xij)

T

21


S :=1

Wk

k∑i=1

W∑j=1

xij(xij)

T =1

k

k∑i=1

Si.

Similarly, u and ui are the distance between the means of the classes, i.e., u := p− qand ui := pi − qi.B is the global covariance matrix, which is the sum of the covariance matrices of the

two classes, i.e., B := Bp +Bq. It can be shown that B = S − ppT − qqT .

Let w be our current true model. Then, following Eq. 2.2, we can express:

w := (S − ppT − qqT )−1(p− q) = B−1u. (2.6)

Let w0 be the existing model, previously computed from (S0, p0, q0) or from (B0, u0) at

the time of synchronization. Then,

w0 := (S0 − p0pT0 − q0qT0 )−1(p0 − q0) = B−10 u0. (2.7)

If Si0, p

i0 and qi0 are the local normalized scatter and averages of the samples in a

node at the time of last synchronization, we define the local drifts to be:

∆is := Si − Si

0

δip := pi − pi0δiq := qi − qi0.

We define ∆s, δp, and δq — the global drift vectors of S, p, and q — to be:

∆s := S − S0δp := p− p0δq := q − q0.

Remark. It is easy to see that every global drift vector is the average of the local drift

vectors:

∆s =1

k

∑∆i

s,

δp =1

k

∑δip,

δq =1

k

∑δiq.

2.3.2 Convex Safe Zones

Each node monitors its own drift vector: as long as current values at local nodes

(Si, pi, qi) are sufficiently similar to their values at synchronization time (Si0, p

i0, q

i0), w0

22


is guaranteed to be close to w. Formally, we define a convex set C such that:

(∆s, δp, δq) ∈ C ⇒‖ w − w0 ‖ < R0. (2.8)

Lemma 2.3.1. Let C be a convex set that satisfies Eq. 2.8. If (∆is, δ

ip, δ

iq) ∈ C for all i,

then

||w − w0|| < R0.

Proof. We express S, p and q as their values at synchronization with the addition of the

average of the local drift vectors:

(S, p, q) =1

k

∑i

(Si, pi, qj)

= (S0, p0, q0) +1

k

∑i

(∆is, δ

ip, δ

iq).

(2.9)

From C’s convexity and using Remark 1 we get:

∀i(∆is, δ

ip, δ

iq) ∈ C ⇒

1

k

∑i

(∆is, δ

ip, δ

iq) ∈ C

⇒ (∆s, δp, δq) ∈ C.(2.10)

Finally, from the definition of C we obtain:

(∆s, δp, δq) ∈ C ⇒‖ w − w0 ‖ < R0, (2.11)

2.3.3 Convex Bound for Local Condition

We denote the change in the global covariance matrix

∆ :=B −B0

= (S0 + ∆S − (p0 + δp)(p0 + δp)T

− (q0 + δq)(q0 + δq)T )

− (S0 − p0pT0 − q0qT0 )

= − δpδTp − δqδTq+ ∆S − p0δTp− δppT0 − q0δTq − δqqT0 .

23


We break ∆ into its quadratic part,

M := −δpδTp − δqδTq

M i := −δip(δip)T − δiq(δiq)T

and its linear part,

L := ∆S − p0δTp − δppT0 − q0δTq − δqqT0

Li := ∆iS − pi0(δip)T − δip(pi0)T − qi0(δiq)T − δiq(qi0)T ,

and hence

∆ = L+M,

∆i := Li +M i.

We denote the change of the distance between the means as

δ := u− u0 = δp − δq,

δi := δip − δiq.

Now we can define a convex bound for our problem:

Lemma 2.3.2. Let G be the set of triplets (∆is, δ

ip, δ

iq) that satisfies the bound:

||B−10 δi||+(||w0||+R0)(

∥∥∥B−10 Li∥∥∥+

∥∥∥B−10 M i∥∥∥) ≤ R0 (2.12)

where∥∥∥A∥∥∥ is the operator norm of the matrix A, and ||v|| is the euclidean norm of the

vector v.

If∥∥∥B−10 ∆i

∥∥∥ < 1, then G ⊆ C and G is convex.

2.3.4 Proof of the Convex Bound Lemma

We must find a convex subset C satisfying the condition of Eq. 2.8. Let us start by

recalling the definition of the operator norm of a matrix:

Definition 2.3.3. Let A be a matrix. Its operator norm or spectral norm (hereafter

just norm), is defined as: ∥∥∥A∥∥∥ = supx 6=0

||Ax||||x||

. (2.13)

The following result is very useful in the forthcoming analysis:

Lemma 2.3.4. If A is square and∥∥∥A∥∥∥ < 1, then

∥∥∥(I +A)−1∥∥∥ < 1

1−∥∥∥A∥∥∥ .

24


The proof for this lemma can be found in [GKS15].

We recall that C is the convex subset that satisfies inequality 2.8, and G is the set of

triplets (∆is, δ

ip, δ

iq) which satisfy the inequality 2.12.

Lemma 2.3.5. G ⊆ C

Proof. We can write the sphere inclusion condition 2.5 in terms of B0,∆, u0 and δ, by

using the triangle inequality:

||w − w0|| = ||(B0 + ∆)−1(u0 + δ)−B−10 u0||

< ||(B0 + ∆)−1δ||

+ ||((B0 + ∆)−1 −B−10 )u0||.

(2.14)

We split the right side of the last inequality into two parts:

E1 := ||(B0 + ∆)−1δ||

E2 := ||((B0 + ∆)−1 −B−10 )u0||.(2.15)

Under the assumption ||B−10 ∆|| ≤ 1, it follows from lemma 2.3.4:

E1 ≤||B−10 δ||

1−∥∥∥B−10 ∆

∥∥∥E2 ≤

||B−10 ∆w0||

1−∥∥∥B−10 ∆

∥∥∥ .(2.16)

From standard properties of the norm we get:

||B−10 ∆w0|| ≤∥∥∥B−10 ∆

∥∥∥||w0||. (2.17)

Substituting Eq. 2.15, 2.16 and 2.17 in Eq. 2.14, we get:

||w − w0 ‖ ≤ E1 + E2

≤||B−10 δ||+

∥∥∥B−10 ∆∥∥∥||w0||

1−∥∥∥B−10 ∆

∥∥∥≤ R0.

(2.18)

After rearranging the terms, we have

||B−10 δ||+∥∥∥B−10 ∆

∥∥∥||w0|| ≤ R0(1−∥∥∥B−10 ∆

∥∥∥). (2.19)

From the triangle inequality we can rewrite:∥∥∥B−10 ∆∥∥∥ ≤ ∥∥∥B−10 L

∥∥∥+∥∥∥B−10 M

∥∥∥. (2.20)

25


And finally, combining inequalities 2.19 and 2.20, we get the following bound:

||B−10 δ||+(||w0||+R0)(∥∥∥B−10 L

∥∥∥+∥∥∥B−10 M

∥∥∥) ≤ R0.

Lemma 2.3.6. ||B−10 δ||+ (||w0||+R0)(∥∥∥B−10 L

∥∥∥+∥∥∥B−10 M

∥∥∥ is convex in (∆s, δp, δq).

Proof. Multiplication by B−10 is a linear operation, and norm is a convex operation.

Therefore ||B−10 δ|| is convex in δ.

We recall that:

L := ∆S − p0δTp − δppT0 − q0δTq − δqqT0 .

L is linear in (∆s, δp) and therefore∥∥∥B−10 L

∥∥∥ is convex in these variables.

We recall that:

M := −δpδTp − δqδTq .

It is left to prove that∥∥∥B−10 M

∥∥∥ is convex in (δp, δq).

From the definition of the operator norm, we can rewrite:∥∥∥M∥∥∥ =||B−10 ( max||u||=1

{uT δpδTp u}+ max||u||=1

{uT δqδTq u})||

=||B−10 ( max||u||=1

{||uT δp||2}+ max||u||=1

{||uT δq||2})||.

We observe that the maximum over any number (infinite in this case) of convex functions

is also a convex function, and since multiplication by a matrix and the norm operation

preserve convexity, this concludes the proof.

Corollary 2.1. The proofs of Lemmas 2.3.5 and Lemma 2.3.6 complete the proof of

Lemma 2.12. From Lemma 2.12 and from Lemma 2.3.1 we conclude that

(||B−10 δ||+ (||w0||+R0)(∥∥∥B−10 L

∥∥∥+∥∥∥B−10 M

∥∥∥)

≤ R0)⇒ (||w − w0|| ≤ R0).(2.21)

which validates the convex bound.

2.4 Distributed LDA Monitoring Algorithm

In the following, we present two frameworks for LDA model monitoring that use the

bound in Eq. 2.12. In both frameworks, we define a coordinator, whose role is to monitor

the violation alerts from the nodes and aggregate the data from all the nodes when it

happens. The coordinator recomputes the model after data aggregation and sends the

26


new covariance matrix and the norm of the new model to the nodes. In both frameworks

every node runs the same update algorithm as detailed in Alg. 2.1. The frameworks

differ in their synchronization policy. The first, Distributed LDA Monitoring (DLDA),

will synchronize in a round in which at least one node has reported a violation (condition

2.12 in the node is not satisfied) as detailed in Alg. 2.2). The second, Probabilistic

Distributed LDA Monitoring (PDLDA), will synchronize in a round in which the number

of nodes with a violation is above a certain threshold. The derivation of this threshold

is presented Section 2.4.1.

Algorithm 2.1 Node Update: i is the index of the node, (x, y) is a new sample.

1: procedure Update2: if y is class P then3: pi = pi + x− xiold(p)4: Si = Si + xxT − xiold(p) ∗ (xiold(p))T

5: else6: qi = qi + x− xiold(q)7: Si = Si + xxT − xiold(q) ∗ (xiold(q))T

8: (∆is, δ

ip, δ

iq) = (Si − Si

0, pi − pi0, qi − qi0)

9: if ||B−10 δi||+ (||w0||+R0)(||B−10 Li||+ ||B−10 M i||)> R0 then

10: Report violation to coordinator11: Receive new global B−10 , ||w0||12: (Si

0, pi0, q

i0) = (Si, pi, qi)

Algorithm 2.2 Coordinator synchronization algorithm.

1: procedure Sync2: if One of the nodes has reported for violation then3: Ask from the nodes for their data4: Receive from every node i the triplet (Si, pi, qi)5: Compute updated ||w0|| and B−10 and distribute.

2.4.1 Probabilistic Distributed LDA Monitoring

DLDA triggers synchronization when a single node reports a violation. Our empirical

evaluation with a large number of nodes showed that such a strict policy causes

synchronization even when the global model is still valid. Loosely speaking, it is because

the condition in Equation 2.12 is stricter than the original condition from Equation 2.5.

Formally, it appears that in most of the datasets, G (the convex subset of C) is a proper

subset of C, and usually much smaller. To resolve this problem, we suggest to change

the synchronization policy of the system and synchronizing when a certain portion of

nodes report a violation. This portion is learned empirically on the training set of the

system and is notated as VT.

27


2.4.2 Analysis of the probabilistic version, PDLDA

2.5 Evaluation

We evaluated the performance of the proposed monitoring algorithms, DLDA and

PDLDA, on synthetic and real data. For each dataset we simulated a distributed data

stream by partitioning the data between the nodes and streaming it one sample in a

round.

2.5.1 Synthetic Data Experiments

We use synthetic data, in which all model assumptions hold, to exemplify the communi-

cation efficiency of our method (Section 2.5.1) and its ability to decide that the model

isn’t valid before the misclassifications (Section 2.5.1). We then (Section 2.5.1) analyze

the communication efficiency of our method as a function of the algorithm parameters.

Communication Efficiency

We compare DLDA to the T -periodic algorithm, denoted PER(T ), a sampling algorithm

that sends updates every T rounds. Our main performance metric is communication,

measured in normalized messages (the average number of messages sent per round by

each node). PER can achieve arbitrarily low communication at the cost of larger model

drift. However, periodic synchronization can miss the point of change in the data; hence

PER cannot guarantee to maintain the model drift under a fixed threshold, in contrast

to DLDA. Further, DLDA has additional intrinsic advantages over PER:

1. DLDA can be instantly calibrated to fit a given drift threshold, while for PER

the interval between synchronizations can only be determined empirically.

2. The rate that the data evolves might change. While DLDA adapts to the new

changing rate, PER suffers from its fixed period that has to be suboptimal to the

new one.

3. For a sudden change in the data, DLDA adapts immediately — the algorithm’s

latency is 0 — while for PER the latency might be up to the period length.

In this experiment we used a simple data generation process. There are 10 nodes,

each of which contains two data classes: P , a Gaussian centered at the origin and

with unit covariance matrix; and Q, a Gaussian also with unit covariance matrix,

but whose mean changes every 1,500 rounds, starting at (1, 0), and then changing to

(0,−1), (−1, 0), (0, 1) (see Fig. 2.2.

Figure 2.3 shows the behavior of the DLDA monitoring algorithm over the synthetic

dataset, with three points in time at which the data abruptly changes. DLDA achieves

a communication overhead of 0.01 messages per node per round, with the model error

guaranteed to always be below the given threshold. Conversely, the equivalent PER(100)

28


Figure 2.2: Illustration of the generation process of the synthetic data. The class P(denoted in blue) is fixed, while Q changes three times, every 1,500 rounds (the changesare depicted by the dark arrows). 2.3.

algorithm doesn’t maintain the model error below the threshold (red dashed line).

Figure 2.3 shows that the periodic algorithm does not always synchronize when the

model drift exceeds a given threshold. Moreover, it triggers redundant synchronizations

when there is no change in the data.

Early Drift Detection

To further expound on the advantage of the proposed DLDA algorithm, we consider

a toy example (Fig. 2.4), in which 2D data arrives from two classes (P ’s samples are

shown as plus signs and Q’s samples as minus signs). The means of the classes change

according to the depicted grey arrows, from time t1 to tL. The dark line at an angle of

−45◦ represents the optimal projection direction at time t1. As the classes change, this

initial projection direction remains ”correct”, in the sense that it still separates the two

classes; alas, at time tL, the two classes have switched their positions relative to the

projection’s direction, and the classifier fails. Hence, a monitoring algorithm which only

checks for misclassification at the nodes will fail to detect the drift in the classes until it

is too late – i.e., that the classifier fails – while DLDA will alert earlier, when the real

(global) classifier will have changed by more than the provided threshold (in this case

0.52 radians, or 30◦); this point is marked by an arrow in Fig. 2.4.

Parameter Analysis

Next, we analyze the parameters of the DLDA algorithm.

Model Drift Threshold: Model Drift Threshold is given by the user. Above it the

model drift is too big. It can be quantified in two ways: as the maximal angle between w

and w0, or as the euclidean distance between them. Figure 2.5 shows the communication

29


Figure 2.3: DLDA error (blue) vs. PER(100) error (red), for the synthetic datadescribed above. Horizontal axis represents rounds, vertical axis represents the norm ofthe difference between the real (global) model and the current model held at the nodes.Window size is 1,000. The maximum allowed error (which DLDA guarantees will neverbe surpassed) is T = 0.997 (which corresponds to a difference of 0.077 radians, or 4.4degrees, in the classifier’s direction). Both algorithms transmit the same overall numberof bytes, but at different rounds; while PER sends alerts periodically, DLDA alerts onlywhen the classifier may have changed. For this reason, PER yields a larger error whenthe two classes (and the classifier) change.

requirements of the DLDA algorithm as a function of the model drift threshold, and

the minimal communication required to match DLDA using PER. It can been seen that

for both fixed and dynamic data, DLDA outperforms PER for any given model drift

threshold.

Node Scalability: Node Scalability is how DLDA performs with different number of

nodes. Figure 2.6 shows the communication volume as a function of the number of

nodes k. We observe that communication increases slowly, reaching 0.25% on the fixed

data and 0.6% on the dynamic data distributed across 25 nodes.

Window Size: Figure 2.7 shows how communication decreases as a result of enlarging

the window size W . One can increase the window size to compensate for other factors in

the system that increase the communication. One of those is noise (which is quantified

in our context by the standard deviation of the data generating distribution).

Another parameter directly related to the window size is the dimension of the

data. The number of samples required for accurate estimation of the covariance matrix

grows with the dimension. In our settings, the number of training samples is linked to

the window size. When window size is fixed, communication grows linearly with the

dimension (see Figure 2.8).

30


Figure 2.4: A toy example demonstrating early detection of a change in the data.

Figure 2.5: Communication as the function of model drift for DLDA and PER. Theperiodic algorithm is tuned to achieve the same max model drift as DLDA for eachmodel drift threshold.

2.5.2 Real Data Experiments

In this section we test the algorithm on three real data sets. The first (USENET) is

too small to test the probabilistic approach; thus we use this set only for the DLDA

test. The second (Power Consumption Monitoring) is a medium size dataset (it is

distributed over 36 nodes) and we test both DLDA and PDLDA on it. The third (Gas

Sensor Time Series Monitoring) is a big set(it is distributed over 100 nodes). The DLDA

synchronization policy is too strict for a large number of nodes; hence we use this set

only for the PDLDA test.

Message Preference Monitoring — Usenet

The USENET dataset ( 2.9) is a text dataset that simulates a stream of messages from

three newsgroups (medicine, space, baseball); the messages are presented sequentially

to a user, who then labels them as interesting or junk, according to personal interest.

31


Figure 2.6: Communication as a function of the number of nodes for fixed (blue) andchanging (green dashed line) datasets

Figure 2.7: Communication as function of window size

Figure 2.8: Communication as a function of input dimension for fixed (blue) andchanging (green dashed line) datasets

32


Figure 2.9: Comparison between maximal (over nodes) DLDA model drift (blue) andthe true global model drift (green dashed line) for k = 2, W = 450. It can be seen thatDLDA responds to the change in the data that occurs after 600 rounds (red dottedvertical line) and causes a synchronization in round 698 (blue dashed vertical line).

Attribute values are binary, indicating the presence or absence of the 128 informative

words. The change in the data occurs from a change in the user’s preference (from space

to baseball). Figure 2.9 shows the results of the DLDA algorithm with W = 450 . The

first 450 rounds over the data correspond to the initialization phase and are omitted.

During the next 50 rounds the DLDA model drift (the value is calculated using the left

side of the inequality in Eq. 2.12) increases due to noise in the data; there is no change

in the user’s preferences. From round 500 to 600 the DLDA model drift is stable, and

again is due only to the noise. In round 600 there is a concept drift. From this point

both the DLDA model drift and the true model drift increase until the synchronization

in round 698.

Power Consumption Monitoring

The Power Consumption dataset contains the hourly power supply of an Italian electric

company as recorded from two sources: power supplied by the main grid and power

transformed from other grids. This stream contains three-year power supply records

from 1995 to 1998, and our learning task is to predict which hour (1 out of 24 hours) the

current power supply belongs to. Thchange in the dataft in this stream is mainly caused

by such factors as season, weather, time of day, and the differences between working

days and weekend. We demonstrate the algorithms on the following binary classification

problem: given a power supply measurement, decide whether it corresponds to night or

day. This dataset is an example of gradual change in the data (seasons do not change

abruptly). Figure 2.10 depicts the results of the DLDA and PDLDA algorithms. For a

small number of nodes, k = 4, and for large window size, W = 5000, DLDA requires

only 0.003 normalized messages. For a more distributed system, k = 36, and a smaller

window size, W = 600, DLDA requires 0.09 normalized messages. For PDLDA with

k = 36 and W = 600 and a violation threshold (VT) of 50%, PDLDA requires 0.02

33


normalized messages, much better than DLDA in the same setting.

Gas Sensor Time Series Monitoring

Data in this experiment consists of measurements collected by an array of 16 chemical

sensors in a lab, recording at a sampling rate of 100Hz for 24 hours, resulting in 8378504

data points for each sensor. During the first 12 hours the task is to detect the presence

of carbon monoxide (CO) in a mixture of chemicals, and from the 13th hour the task

is to detect the presence of methane, which corresponds to an abrupt change in the

data. Figure 2.11 demonstrates the results of PDLDA algorithm. First, we can observe

that the fraction of violated nodes (shown in blue) correlates with the true model drift

(shown in green). Second, we can see two patterns of behavior, which are separated by

an abrupt switch in the data (marked by the vertical red line). Before the switch, the

synchronization occurs every 150 rounds, and after the switch, it goes down to every

50 rounds. There is a transition period of about 1000 rounds that follows the point of

the data switch. In this interval, the sliding window mixes the old (before switch) data

and the new (after the switch) data, but once the window aggregates enough data, the

algorithms stabilizes and reduces the communication requirements. This experiment

shows that the PDLDA algorithm detects the abrupt change in the data and adapts to

the new conditions after a short period of time.

34


(a) DLDA behavior on Power Consumption data with: k=4, W=5000, VT=0

(b) DLDA behavior on Power Consumption data with: k=36 Nodes, W=600,VT=0

(c) PDLDA behavior on Power Consumption data with: k=36 Nodes, W=600,VT=18

Figure 2.10: The top and the center figures show the DLDA algorithm on the PowerSupply data set for a small (top) and large (center) number of nodes. The blue linerepresents the value of the local bound expression, corresponding to the node withthe maximum value. The green dashed line shows the model drift (normalized by thethreshold); the model is computed after the data was aggregated from all nodes. Thebottom plot shows the results of the PDLDA on the same dataset. The blue line in thebottom plot represents the fraction of violated nodes.

35


Figure 2.11: Demonstration of PDLDA on the Gas Sensor dataset. A comparisonbetween the true model drift (green) to the fraction of the nodes that are violated in thecurrent round (blue). The experiment is configured for k=100 nodes, and the violationthreshold is VT=80.

36


Chapter 3

Conclusion

In the first chapter we succeeded with a relatively high success rate to capture the essence,

and develop automatic recognition of 18 LMA motor elements, using an inexpensive

and widely available sensor. We hope that our work will provide the foundation and

inspiration for developing an in-home, inexpensive LMA based feedback system that

will be used for multiple purposes, such as therapy, arts, video games, communication

and human-robot interaction.

In the second chapter we introduced the first communication-efficient monitoring

algorithm for a linear classifier model that monitors the models itself, but does not

require knowledge of the global model at the local nodes. As long as all nodes meet

their local condition, the global model is guaranteed to be valid. Our algorithm has

important benefits:

• Our method works with distributed data in a communication efficient way.

• Monitoring the model as opposed to monitoring the misclassifications allows for

early detection of the changing even before misclassification occurs.

We evaluated the theoretical scheme – DLDA, and its probabilistic version – PDLDA,

on three real data sets. For a small number of nodes we used DLDA with its theoretical

guarantee, and for a greater number of nodes we used PDLDA. We showed that

the proposed scheme outperforms PER: it maintains a smaller Euclidean distance

between the last computed model and the current true model with a lower volume

of communication. This work is the first step in designing communication-efficient

algorithms with theoretical guarantees for monitoring classification models over dynamic

distributed data streams. One of the future directions is to extend the proposed

framework to ensembles of linear classifiers and neural networks, including deep learning

networks.

37


38


Bibliography

[AC13] Andreas Aristidou and Yiorgos Chrysanthou. Motion indexing of

different emotional states using lma components. In SIGGRAPH

Asia 2013 Technical Briefs, page 21. ACM, 2013.

[ACC15] Andreas Aristidou, Panayiotis Charalambous, and Yiorgos

Chrysanthou. Emotion analysis and classification: Understanding

the performers’ emotions using the lma entities. In Computer

Graphics Forum, volume 34, pages 262–276. Wiley Online Library,

2015.

[AGZ+13] Hock Hee Ang, Vivekanand Gopalkrishnan, Indre Zliobaite,

Mykola Pechenizkiy, and Steven C. H. Hoi. Predictive handling of

asynchronous concept drifts in distributed environments. IEEE

Trans. Knowl. Data Eng., 25(10):2343–2355, 2013.

[BGdCAF+06] Manuel Baena-Garcia, Jose del Campo-Avila, Raul Fidalgo, Al-

bert Bifet, R Gavalda, and R Morales-Bueno. Early drift detection

method. In Fourth International Workshop on Knowledge Discov-

ery from Data Streams, volume 6, pages 77–86, 2006.

[BL80] Irmgard Bartenieff and Dori Lewis. Body movement: Coping with

the environment. Psychology Press, 1980.

[BN+93] Michele Basseville, Igor V Nikiforov, et al. Detection of abrupt

changes: theory and application, volume 104. Prentice Hall Engle-

wood Cliffs, 1993.

[Car97] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75,

1997.

[CCZB00] Diane Chi, Monica Costa, Liwei Zhao, and Norman Badler. The

emote model for Effort and Shape. In Proceedings of the 27th

annual conference on Computer graphics and interactive techniques,

pages 173–182. ACM Press/Addison-Wesley Publishing Co., 2000.

39


[CGHN+14] Jesus G Cruz-Garza, Zachery R Hernandez, Sargoon Nepaul,

Karen K Bradley, and Jose L Contreras-Vidal. Neural decoding of

expressive human movement from scalp electroencephalography

(eeg). Front. Hum. Neurosci, 8(188), 2014.

[CLV03] Antonio Camurri, Ingrid Lagerlof, and Gualtiero Volpe. Recog-

nizing emotion from dance movement: comparison of spectator

recognition and automated techniques. International journal of

human-computer studies, 59(1):213–225, 2003.

[DHS12] Richard O Duda, Peter E Hart, and David G Stork. Pattern

Classification. John Wiley & Sons, 2012.

[Dot95] Leonella Parteli Dott. Aesthetic listening: Contributions of

dance/movement therapy to the psychic understanding of motor

stereotypes and distortions in autism and psychosis in childhood

and adolescence. The Arts in Psychotherapy, 22(3):241–247, 1995.

[FCK97] Robert Fagan, Jan Conitz, and Elizabeth Kunibe. Observing

behavioral qualities. International Journal of Comparative Psy-

chology, 10(4), 1997.

[Fer14] Ciane Fernandes. The Moving Researcher: Laban/Bartenieff

Movement Analysis in Performing Arts Education and Creative

Arts Therapies. Jessica Kingsley Publishers, 2014.

[Fis36] Ronald A Fisher. The Use of Multiple Measurements in Taxonomic

Problems. Annals of Eugenics, 7(2):179–188, 1936.

[FP03] Afra Foroud and Sergio M Pellis. The development of roughness in

the play fighting of rats: A Laban movement analysis perspective.

Developmental Psychobiology, 42(1):35–43, 2003.

[FW06] Afra Foroud and Ian Q Whishaw. Changes in the kinematic

structure and non-kinematic features of movements during skilled

reaching after stroke: A laban movement analysis in two case

studies. Journal of Neuroscience Methods, 158(1):137–149, 2006.

[FW12] Afra Foroud and Ian Whishaw. The consummatory origins of

visually guided reaching in human infants: a dynamic integration

of whole-body and upper-limb movements. Behavioural brain

research, 231(2):343–355, 2012.

[GCF12] M Melissa Gross, Elizabeth A Crane, and Barbara L Fredrickson.

Effort-shape and kinematic assessment of bodily expression of

40


emotion during gait. Human movement science, 31(1):202–221,

2012.

[GDDM14] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra

Malik. Rich feature hierarchies for accurate object detection and

semantic segmentation. In IEEE Conference on Computer Vision

and Pattern Recognition, (CVPR), Columbus, OH, USA, June

23-28, 2014, pages 580–587, 2014.

[GKS15] Moshe Gabel, Daniel Keren, and Assaf Schuster. Monitoring Least

Squares Models of Distributed Streams. In Proceedings of the 21th

ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, pages 319–328. ACM, 2015.

[GMCR04] Joao Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues.

Learning with drift detection. In Advances in artificial intelligence–

SBIA, pages 286–295. Springer, 2004.

[HMR12] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discrim-

inative decorrelation for clustering and classification. In ECCV,

pages 459–472, 2012.

[HNG+07] Ling Huang, XuanLong Nguyen, Minos Garofalakis, Joseph M

Hellerstein, Michael Jordan, Anthony D Joseph, Nina Taft, et al.

Communication-efficient online detection of network-wide anoma-

lies. In 26th IEEE International Conference on Computer Com-

munications (INFOCOM), pages 134–142, 2007.

[KCR06] Ram Keralapura, Graham Cormode, and Jeyashankher Ra-

mamirtham. Communication-efficient distributed monitoring of

thresholded counts. In Proceedings of the SIGMOD International

Conference on Management of Data, pages 289–300. ACM, 2006.

[KCT+13] Mubbasir Kapadia, I-kao Chiang, Tiju Thomas, Norman I Badler,

Joseph T Kider Jr, et al. Efficient motion retrieval in large motion

databases. In Proceedings of the ACM SIGGRAPH Symposium

on Interactive 3D Graphics and Games, pages 19–28. ACM, 2013.

[KRRS08] Srinivas Kashyap, Jeyashankher Ramamirtham, Rajeev Rastogi,

and Pushpraj Shukla. Efficient constraint monitoring using adap-

tive thresholds. In IEEE 24th International Conference on Data

Engineering (ICDE), pages 526–535, 2008.

[KSA+14] Daniel Keren, Guy Sagy, Amir Abboud, David Ben-David, Assaf

Schuster, Izchak Sharfman, and Antonios Deligiannakis. Geometric

41


monitoring of heterogeneous streams. IEEE Transactions on

Knowledge and Data Engineering, 26(8):1890–1903, 2014.

[KSSL12] Daniel Keren, Izchak Sharfman, Assaf Schuster, and Avishay

Livne. Shape sensitive geometric monitoring. IEEE Transactions

on Knowledge and Data Engineering, 24(8):1520–1535, 2012.

[LD03] Jacqyln A Levy and Marshall P Duke. The use of Laban movement

analysis in the study of personality, emotional state and movement

style: An exploratory investigation of the veridicality of” body

language”. Individual Differences Research, 1(1), 2003.

[LVBB10] Tino Lourens, Roos Van Berkel, and Emilia Barakova. Com-

municating emotions and mental states to robots in a real time

parallel framework using Laban movement analysis. Robotics and

Autonomous Systems, 58(12):1256–1265, 2010.

[MGE11] Tomasz Malisiewicz, Abhinav Gupta, and Alexei A. Efros. En-

semble of exemplar-svms for object detection and beyond. In

IEEE International Conference on Computer Vision - (ICCV),

Barcelona, Spain, November 6-13, pages 89–96, 2011.

[MK10] Megumi Masuda and Shohei Kato. Motion rendering system

for emotion expression of human form robots based on laban

movement analysis. In RO-MAN, 2010 IEEE, pages 324–329.

IEEE, 2010.

[MKI09] Megumi Masuda, Shohei Kato, and Hidenori Itoh. Emotion de-

tection from body motion of human form robot based on laban

movement analysis. In Principles of Practice in Multi-Agent Sys-

tems, pages 322–334. Springer, 2009.

[MTW05] Sebastian Michel, Peter Triantafillou, and Gerhard Weikum. Klee:

A framework for distributed top-k query algorithms. In Proceedings

of the 31st International Conference on Very Large Databases,

pages 637–648. VLDB Endowment, 2005.

[NY07] Kyosuke Nishida and Koichiro Yamauchi. Detecting concept drift

using statistical testing. In International conference on discovery

science, pages 264–269. Springer, 2007.

[OHK15] Margarita Osadchy, Tamir Hazan, and Daniel Keren. K-hyperplane

Hinge-Minimax Classifier. In Proceedings of the 32nd International

Conference on Machine Learning (ICML-15), pages 1558–1566,

2015.

42


[RDA08] Joerg Rett, Jorge Dias, and Juan-Manuel Ahuactzin. Laban move-

ment analysis using a bayesian model and perspective projections.

Brain, Vision and AI, 4(6):978–953, 2008.

[SC13] Karen Studd and L Cox. Evrybody is a body. Dogear publishing,

2013.

[SR08] Shetal Shah and Krithi Ramamritham. Handling non-linear poly-

nomial queries over dynamic data. In IEEE 24th International

Conference on Data Engineering (ICDE), pages 1043–1052. IEEE,

2008.

[SSK07] Izchak Sharfman, Assaf Schuster, and Daniel Keren. A geomet-

ric approach to monitoring threshold functions over distributed

data streams. ACM Transactions on Database Systems (TODS),

32(4):23, 2007.

[STW15] Tal Shafir, Rachelle P Tsachor, and Kathleen B Welch. Emotion

regulation through movement: Unique sets of movement character-

istics are associated with and enhance basic emotions. Frontiers

in Psychology, 6, 2015.

[THM10] Pejman Tahmasebi, Ardeshir Hezarkhani, and Mojtaba Mortazavi.

Application of discriminant analysis for alteration separation; 1sun-

gun copper deposit, East Azerbaijan, Iran. Australian Journal of

Basic and Applied Sciences, 6(4):564–576, 2010.

[Tib96] Robert Tibshirani. Regression shrinkage and selection via the lasso.

Journal of the Royal Statistical Society. Series B (Methodological),

pages 267–288, 1996.

[TP91] Matthew Turk and Alex Pentland. Eigenfaces for recognition.

Journal of cognitive neuroscience, 3(1):71–86, 1991.

[ZB05] Liwei Zhao and Norman I Badler. Acquiring and validating motion

qualities from live limb gestures. Graphical Models, 67(1):1–16,

2005.

[ZGCA13] Haris Zacharatos, Christos Gatzoulis, Yiorgos Chrysanthou, and

Andreas Aristidou. Emotion recognition for exergames using Laban

movement analysis. In Proceedings of the Motion on Games, pages

39–44. ACM, 2013.

[ZH05] Hui Zou and Trevor Hastie. Regularization and variable selection

via the elastic net. Journal of the Royal Statistical Society: Series

B (Statistical Methodology), 67(2):301–320, 2005.

43


i

בצורה גלובלית על ידי ריכוז המידע מכל המקורות. בעוד שהשיטות הקיימות לניטור מידע מסתמכות על

יקות, לשיטתינו ישנה הבטחה תאורטית.יוריסט

האלגוריתם נוסה על שלושה מאגרי מידע אמיתיים מהעולם:

מאגר של הודעות של חדשות. המשימה הייתה לסווג האם המשתמש יתעניין או לא בידיעת החדשות שהגיעה, .1

את המשתמש על פי ההגדרה שהמשתמש מתעניין אך ורק בנושאים מסויימים. הנושאים אותם מעניינים

השתנו במהלך הזמן מה שגרם למודל להיות לא מדוייק. את השינו הזה הצלחנו לנטר.

מאגר של קריאות חשמל מתחנת כח על פני שלוש שנים. המשימה הייתה לסווג האם הקריאה נעשתה ביום או .2

בלילה. השינוי בהתפלגות המידע הוא עונות השנה. גם את השינויים הללו הצלחנו לנטר.

שעות. המטרה הייתה לסווג האם יש באוויר גז מסויים או אין. 24ר של קריאות של חיישני גז על פני מאג .3

שעות. גם את השינוי הזה הצליחנו לנטר. 12נובעת משינוי בהרכב הגזים לאחר השינוי בהתפלגות המידע


ii

תקציר

, או Laban Movement Analysis, Laban Movement Studiesיתוח תנועה לאבאן )באנגלית: נ

Labanotation היא שיטה ושפה לתיאור, פירוש והמחשה של התנועה האנושית. השיטה פותחה על בסיס התאוריות )

בעולם לתנועה, והיא משמשת של רודולף לאבאן העוסקות במהות התנועה. זו אחת משיטות הניתוח הנפוצות ביותר

על בסיס התאוריות של לאבאן פותח רקדנים, שחקנים, אתלטים, מורים למחול, מורים בחינוך מיוחד, מטפלים ועוד.

"ניתוח תנועה לאבאן". ניתוח זה נעשה בלי קשר לסגנון או שיטת תנועה מסוימת )כגון יוגה, בלט, מחול מודרני,

תן לבחון באמצעותו כל אחד מתחומי התנועה המגוונים. נקודת המוצא של לאבאן היא כי אמנויות לחימה ועוד(, ולכן ני

התנועה מייצרת ומבטאת את הצד המנטלי והרגשי של האדם. לטענתו, בין הגוף והנפש ישנם יחסים דו כיווניים:

היא שהפכה את ניתוח לתנועה ישנה השפעה על הגוף, הדעת והנפש, ובו זמנית היא גם מבטאת צדדים אלו. הנחה זו

תנועה לאבאן לגוף ידע רלוונטי בתחומי הנפש, תחומי הגוף ותחומי אמנות רבים.

ניתוח תנועות בעזרת שיטה זו עדיפה על פני תאור קינמטי של התנועה כיון שהיא מתארת גם אספקטים איכותיים של

ישומים רבים והיא השיטה המועדפת במחקר התנועה בנוסף למאפיינים כמותיים. בשל יתרון זה, שיטה זו פופולרית ב

מוטורי של תנועה, לימודי תאטרון ויוצרת עניין בעולם משחקי המחשב ורובוטיקה. במחקר זה פתחנו מערכת

.59%. בעזרת מערכת לומדת בדיוק של Kinectמאפייני לבן שונים ממצלמת העומק ה 18אוטומתית המזהה

מאגר של הקלטות של האיכויות שאנחנו יצרנו, במספר תצורות שונות: האלגוריתם לזיהוי לאבאן נוסה על מספר

למידה על שחקן מסויים והערכת ביצועים על דוגמאות מאותו השחקן )דוגמאות שלא היו במאגר עליו המודל

אומן כמובן(.

.למידה משחקן אחד והערכת הביצועים על שחקן אחר

אנשים ללא רקע בלאבאן.למידה משחקנים מקצועיים והערכת הביצועים על

הפרק השני של התיזה עוסק בגילוי של שינוי בהתפלגות של מידע, כאשר המידע מבוזר בין מקורות שונים וכאשר

וע בדיוק הסיווג של המודל שאומן על פי מידע ישן. בתצורה המבוזרת, אימון מודל דורש השינוי בהתפלגות עלול לפג

מקום אחד, דבר שהוא יקר במונחי התקשורת של הנתונים. על מנת למזער את ריכוז כל המידע מהמקורות השונים ל

מבוצע בכל אחד מהמקורות בצורה לוקלית )ללא תקשורת בין המקורות(, ה קשורת, אנו מציעים אלגוריתם ניטורהת

ל המודל אשר היה מחושב אם היו מרכזים את המידע מכ)הגלובלי שימור הדיוק של המודל הבטחה על תוך כדי

, אשר הוא אלגוריתם פופלרי Linear Discriminant Analysisאלגוריתם הסיווג אותו בחרנו לנטר הוא (.המקורות

עבור סיווג והורדת מימדיות בישומים רבים. בחירה זו נעשתה בשל ההבטחה התאורטית החזקה על נכונות הניטור

רותית שלו מקטינים את התקשורת בשני סדרי גודל אשר הוכחנו על מסווג זה. הדגמנו כיצד האלגוריתם וגרסה הסתב

)בהשוואה לסנכרון כל פעם שמגיעה דוגמא חדשה( עבור שלושה מאגרי מידע מתחומים שונים. בנוסף על כך,

האלגוריתם שלנו מנטר את המודל עצמו, ולא את שגיאותיו בניגוד לאגלוריתמים אחרים, עובדה המאפשרת לנו לזהות

ד לפני שסיווג לא נכון קורה.את השינוי במידע עו

בעבודה זו אני מחדשים על ידי התייחסות לקונספטים הבאים:

על ידי ניטור המודל ניתן לשלול את המודל הישן עוד לפני הופעתן של שגיאות. -ניטור המודל ולא השגיאות

בעדו שניטור מודלים בסביבה לא מבוזרת נחקר לעומק, אין הרבה עבודות על –ניטור מודל בסביבה מבזורת

ניטור מודל בסביבה מבוזרת. בסביבה זו, המידע מבוזר על פני מספר רב של מקורות, ואילו המודל נלמד


iii


iv

המחקר בוצע בהנחייתו של פרופסור אסף שוסטר מהפקולטה למדעי המחשב.

תודה לטכניון על התמיכה הכלכלית במחקרי.


v


vi

חבור על מחקר

לשם מילוי הדרישות לקבלת תואר מאסטר במדעי המחשב

ברנשטייןרן

מכון טכנולוגי לישראל ---הוגש לסנט הטכניון

2016כסלו תשע"ו חיפה דצמבר


vii


viii


ix

וניטור מסווג לינארי ניתוח תנועה על פי לבן

במערכת מבוזרת

רן ברנשטיין


Laban Movement Analysis and LDA Distributed Monitoring · used for detection of personality traits...

Documents

Transcript of Laban Movement Analysis and LDA Distributed Monitoring · used for detection of personality traits...