Laban Movement Analysis and LDA Distributed Monitoring · used for detection of personality traits...
Transcript of Laban Movement Analysis and LDA Distributed Monitoring · used for detection of personality traits...
Laban Movement Analysis andLDA Distributed Monitoring
Ran Bernstein
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Laban Movement Analysis andLDA Distributed Monitoring
Research Thesis
Submitted in partial fulfillment of the requirements
for the degree of Master of Computer Science
Ran Bernstein
Submitted to the Senate
of the Technion — Israel Institute of Technology
Kislev Hatashva Haifa December 2016
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
This research was carried out under the supervision of Prof. Assaf Schuster, in the
Faculty of Computer Science.
Some results in this thesis have been published as articles by the author and research
collaborators in conferences and journals during the course of the author’s doctoral
research period, the most up-to-date versions of which being:
Bernstein, Ran, et al. ”Laban movement analysis using kinect.” Int. J. Comput.
Electr. Autom. Control Inform. Eng 9 (2015): 1394-1398.
Ran, Bernstein, et al. ”Multitask learning for Laban movement analysis.” Pro-
ceedings of the 2nd International Workshop on Movement and Computing. ACM,
2015.
Acknowledgements
I would like to thank my advisor, my parents and my girlfriend.
The generous financial help of the Technion is gratefully acknowledged.
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Contents
Abstract 1
1 Laban Movement Analysis of Movements Recorded by Kinect 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Laban Movement Analysis (LMA) . . . . . . . . . . . . . . . . . 4
1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Kinect Sensor Data . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Clip collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Multi Label Classification . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.6 Single Task Learning (STL) . . . . . . . . . . . . . . . . . . . . . 11
1.2.7 Multi- Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Single-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Multi-Task Learning . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 LDA Model Monitoring in Distributed Systems 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Setup and Motivation . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Monitoring Problem . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Monitoring Distributed LDA With Convex Subsets . . . . . . . . . . . . 20
2.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.2 Convex Safe Zones . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.3 Convex Bound for Local Condition . . . . . . . . . . . . . . . . . 23
2.3.4 Proof of the Convex Bound Lemma . . . . . . . . . . . . . . . . 24
2.4 Distributed LDA Monitoring Algorithm . . . . . . . . . . . . . . . . . . 26
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
2.4.1 Probabilistic Distributed LDA Monitoring . . . . . . . . . . . . . 27
2.4.2 Analysis of the probabilistic version, PDLDA . . . . . . . . . . . 28
2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Synthetic Data Experiments . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Real Data Experiments . . . . . . . . . . . . . . . . . . . . . . . 31
3 Conclusion 37
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
List of Figures
1.1 Skeleton positions relative to the human body . . . . . . . . . . . . . . . 6
1.2 Kinect Coordinate System . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Influence of the number of features on the performance. The selection was
made according to statistical significance: The blue line is the difference
between the score with and without feature selection. It can be seen that
the optimal percentage of features to select is 10% . . . . . . . . . . . . 11
1.4 Recall, precision and F1 score of each Laban quality separately. The
evaluation was conducted on a dataset that was captured on only one
CMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Performance on ordinary people (non-CMAs) instructed to perform sev-
eral tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Example of incorrect monitoring by applying LDA locally. The initial
state of the data is presented in (A) and the state at a later point is
shown in (B). In (B) every node (green and red dashed lines) calculates
the same angle for the separator as it was in (A). But it can be seen that
the global separator’s (blue solid line) angle has changed significantly. . 21
2.2 Illustration of the generation process of the synthetic data. The class
P (denoted in blue) is fixed, while Q changes three times, every 1,500
rounds (the changes are depicted by the dark arrows). 2.3. . . . . . . . . 29
2.3 DLDA error (blue) vs. PER(100) error (red), for the synthetic data de-
scribed above. Horizontal axis represents rounds, vertical axis represents
the norm of the difference between the real (global) model and the current
model held at the nodes. Window size is 1,000. The maximum allowed
error (which DLDA guarantees will never be surpassed) is T = 0.997
(which corresponds to a difference of 0.077 radians, or 4.4 degrees, in the
classifier’s direction). Both algorithms transmit the same overall number
of bytes, but at different rounds; while PER sends alerts periodically,
DLDA alerts only when the classifier may have changed. For this reason,
PER yields a larger error when the two classes (and the classifier) change. 30
2.4 A toy example demonstrating early detection of a change in the data. . 31
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
2.5 Communication as the function of model drift for DLDA and PER. The
periodic algorithm is tuned to achieve the same max model drift as DLDA
for each model drift threshold. . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Communication as a function of the number of nodes for fixed (blue) and
changing (green dashed line) datasets . . . . . . . . . . . . . . . . . . . 32
2.7 Communication as function of window size . . . . . . . . . . . . . . . . 32
2.8 Communication as a function of input dimension for fixed (blue) and
changing (green dashed line) datasets . . . . . . . . . . . . . . . . . . . 32
2.9 Comparison between maximal (over nodes) DLDA model drift (blue) and
the true global model drift (green dashed line) for k = 2, W = 450. It
can be seen that DLDA responds to the change in the data that occurs
after 600 rounds (red dotted vertical line) and causes a synchronization
in round 698 (blue dashed vertical line). . . . . . . . . . . . . . . . . . . 33
2.10 The top and the center figures show the DLDA algorithm on the Power
Supply data set for a small (top) and large (center) number of nodes. The
blue line represents the value of the local bound expression, corresponding
to the node with the maximum value. The green dashed line shows the
model drift (normalized by the threshold); the model is computed after
the data was aggregated from all nodes. The bottom plot shows the
results of the PDLDA on the same dataset. The blue line in the bottom
plot represents the fraction of violated nodes. . . . . . . . . . . . . . . . 35
2.11 Demonstration of PDLDA on the Gas Sensor dataset. A comparison
between the true model drift (green) to the fraction of the nodes that
are violated in the current round (blue). The experiment is configured
for k=100 nodes, and the violation threshold is VT=80. . . . . . . . . . 36
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Abstract
The first chapter of the thesis deals with Laban Movement Analysis (LMA), which is a
method for describing, interpreting and documenting all varieties of human movement.
Analyzing movements using LMA is advantageous over kinematic description of the
movement, as it captures qualitative aspects in addition to the quantitative aspects of
the movement. As such, it has many applications and its popularity is increasing in
recent years as the preferred method for movement analysis in motor research, theater
training, and the development of interactive gaming animations and robotics. In this
study we aimed to develop an automated method for recognizing 18 different Laban
motor elements (motor characteristics) from markerless 3D movement data captured by
the ubiquitous Kinect camera. Using machine-learning methods we have succeeded to
obtain a recall rate of 38-94% (65% on average) and precision rate of 29-100% (59% on
average) for the 18 motor elements that were tested.
The second chapter of the thesis deal with systems for mining dynamic data streams
should be able to detect changes that affect the accuracy of their model. A distributed
setting is one of the main challenges in this kind of change detection. In a distributed
setting, model training requires centralizing the data from all nodes (hereafter, syn-
chronization), which is very costly in terms of communication. In order to minimize
the communication, a monitoring algorithm should be executed locally at each node,
while preserving the validity of the global model (the model that will be computed if a
synchronization will occur). For minimizing this communication, we propose the first
communication-efficient algorithm for monitoring a classification model over distributed,
dynamic data streams. The classification algorithm that we chose to monitor is Linear
Discriminant Analysis (LDA), which is a popular method used for classification and
dimensionality reduction in many fields. This choice was made due to the strong theo-
retical guarantee of correctness that we prove on the monitoring algorithm of this kind
of model. In addition to its theoretical guarantee, we demonstrated how our algorithm
and a probabilistic variant of it reduce communication volume by up to two orders of
magnitude (compared to synchronization in every round) on three real data sets from
different worlds of content. Moreover, our approach monitors the classification model
itself as opposed to its misclassifications, which makes it possible to detect the change
before the misclassification occurs.
1
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
2
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Chapter 1
Laban Movement Analysis of
Movements Recorded by Kinect
1.1 Introduction
Recent years there has been a surge of interest in automated analysis of human motor
behavior in the fields of robotics, computer science and animation. Computerized
recognition of movement characteristics has many potential applications: It could be
used for detection of personality traits that are associated with specific motor tendencies
[LD03] during, for example, a job interview, and for early detection and/or for severity
assessment of various illnesses characterized by abnormal motor behavior, such as autism
[Dot95] , schizophrenia or Parkinson’s disease. Automated emotion recognition from
movement, based on associations between certain emotions and specific motor behaviors
[ACC15] is another important application, which may have a variety of uses such as
online feedback to presenters to help them convey through their body-language the
emotional message they want to communicate (e.g., politicians and public speakers or
dancers and actors in training), or recognition of people’s emotions during interactive
games such as those played using the Xbox. Automated analysis of motor behavior can
be used also to assess the progression and improvement of participants in a variety of
training programs that employ virtual reality environment [AC13]; it can be used for
motion retrieval from large motion database [KCT+13] and for movement indexing and
classification [AC13] in the field of animation. Lastly, machine learning of a person’s
movement patterns has enormous potential for future, from security identification,
to interactive environments. Most of the studies dealing with automatic analysis of
human movement captured movement using complex and expensive 3D motion capture
systems. However, in order to implement the many potential uses mentioned above
in our everyday life, we should be able to do such automated analysis using a small,
inexpensive (affordable) and easy to use 3D camera. One such camera that has been
successfully used in interactive games is the Kinect camera.
3
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
1.1.1 Our Contribution
In this study we aimed to develop an automated method for recognizing the motor
characteristics of any human movement captured by a Kinect camera. Once the
movement is captured in 3D, its assessment and analysis can be done in various ways. In
this study we chose to develop the computerized recognition of movement characteristics
based on Laban Movement Analysis (LMA).
1.1.2 Laban Movement Analysis (LMA)
LMA is a well-established and widely accepted systematic language for describing and
documenting movement. LMA’s comprehensiveness as a motor analysis method could be
inferred from its diverse use in research: it has been used to evaluate fighting behaviors of
rats [FP03], to analyze behavior of nonhuman animals in naturalistic settings [FCK97],
to diagnose autistic individuals [Dot95], to evaluate motor recovery of stroke patients
[FW06], and to characterize the development of infants’ reaching movements [FW12].
In recent years it gained additional popularity among computer science researchers who
have used it in studies that describe, recognize or create bodily emotional expressions
for applications in human-robot interactions, interactive games such as the Xbox,
and in animations [CLV03, RDA08, ZGCA13, LVBB10, ZB05, MKI09, MK10], and
recently it has even been attempted, through the use of EEG, to identify the brain
mechanisms underlying the production of some of the LMA motor elements [CGHN+14].
In addition, some studies have found correlations between some Laban motor elements
and personality traits or emotional states [LD03, STW15].
Analyzing movements using LMA is advantageous over other methods, as it cap-
tures various qualitative motor elements (movement characteristics) in addition to
quantitative (kinematic) aspects of the movement. LMA categorizes movement with
four main components: Body, Effort, Shape, and Space. Body (i.e. which body parts
move) and Space (i.e. the direction of movement such as Vertical: Up/Down, Sagittal:
Forward/Back or Horizontal: to the side), describe how the many spatial-temporal body
and limb relationships change. The category of Body also includes specific common body
actions such as jump and walk. Effort describes the qualitative aspect of movement
expressive of a person’s inner attitude towards movement via four Effort factors: Weight,
Time, Space and Flow. Each Factor identifies movement on a continuum between two
poles: fighting against the motor quality of that factor and indulging in that quality.
1) Weight Effort , identifies the amount of force or pressure exerted in movement,
on the continuum from Strong to Light (and movements lacking weight activation,
i.e., Passive/Heavy movement); 2) Time Effort identifies the degree of urgency or
acceleration/deceleration involved in a movement, i.e., Sudden or Sustained movement;
3) Space Effort , describes the focus or attitude towards a chosen pathway, i.e., is the
movement Direct or Indirect and 4) Flow Effort describes the element of control or
the degree to which a movement is Bound, i.e., controlled by muscle contraction, versus
4
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Free, i.e., being released/liberated. Finally, Shape refers to the way the body ’sculpts’
itself in space: It describes the changes in the relationship of body parts to one another
and to the surrounding environment that occur when a body moves (e.g., whether
the body Encloses or Spreads, Rises or Sinks, etc.). In addition, LMA examines other
movement characteristic, such as the phrasing of the movement, which means the way
movement elements are sequenced into action. Analogous to phrasing in music, a motor
phrase can be rhythmic (repetitive), even (monotonous), etc. (For a more detailed and
systematic description of LMA see [BL80, SC13, Fer14]).
As can be seen from this short description, LMA is very thorough. It captures
a variety of movement dimensions, and has therefore become the preferred method
for movement analysis used by many scientists. Indeed, in a recent study that used
both Effort-Shape (part of LMA) and kinematic analyses to identify movement char-
acteristics associated with positive and negative emotions experienced during walking,
more differences among emotions were identified with Effort-Shape analysis than with
kinematic analysis [GCF12] , and both Chi et al.,[CCZB00] and Masuda et al.,[MK10]
chose to develop a computer generated animation [CCZB00] or robotic [MK10] system
that transforms simple movements into emotionally expressive movements, by modifying
certain movement parameters of the animated character or robot, based on LMA. Thus,
we have chosen to use LMA for the purpose of developing the automated method for
recognizing movement characteristics.
Because LMA is a comprehensive system with tens of different motor characteristics,
and because many of the current applications for automated analysis of movement have
to do with creation or recognition of emotional expressions in movements, we focused
this study on identification of the 18 Laban motor elements (Table 1.1 in the results
section) found to be associated with specific emotions[18]. Thus, we created a data
base of movements captured by a Kinect camera and developed machine learning-based,
algorithms for automated identification of the 18 Laban motor elements expressive of
emotion, from our Kinect data.
1.2 Method
1.2.1 Kinect Sensor Data
The Kinect Software Development Kit (SDK) detects the skeleton of the videotaped
moving person and provides the 3D coordinates of 24 joints along this skeleton, as seen
in Fig. 1.1.
The coordinates of these joints are given in a ”real world” coordinate system whose
origin [0,0,0] is in the sensor and whose x, y, and z axis are as depicted in Fig. 1.2 below.
Data were collected by the Kinect camera at 30 Hz.
5
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 1.1: Skeleton positions relative to the human body
Figure 1.2: Kinect Coordinate System
1.2.2 Clip collection
In order to develop the ability to automatically identify Laban motor elements we had
to ensure that the movements in the data set used for the machine learning, included
those elements. Thus, for this study, we generated two specific data sets:
• CMA dataset: This dataset consisted of clips of movements performed by Certified
(Laban) Movement Analysts experts (CMAs). Six CMAs performed movement
sequences of approximately 3 seconds long, which consisted of different combina-
tions of LMA motor elements. Before each movement sequence (clip) the CMAs
were given a list of 2-4 Laban motor elements out of the 18 motor elements that
were studied, and were instructed to move any movement that they want, as long
as it incorporates those required motor elements. Each of the CMAs moved about
80 such different combinations of 2-4 motor elements, for a total of 550 clips. To
achieve uniform distribution of the Laban qualities over the dataset, in every
movement sequence (clip) each CMA was asked to perform actions that included
several specific motor elements, and nothing but them.
6
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
• Non-CMA dataset: This dataset consisted of movement sequences performed by
two people without a background in LMA, who were asked to move as if they are
performing different every-day tasks such as greeting a friend or playing with a
balloon. Their movements lasted also about 3 seconds long and a total of 30 such
clips were collected. Their movements were tagged by a CMA who determined
which of the 18 Laban qualities that we tested in this study appeared in each of
their movement sequences (clips).
Both the CMAs and non-CMAs performed their movement sequences within a
316× 128 cm rectangular frame marked on the floor, whose front side was located 272
cm from the front of the Kinect Camera. By limiting the space within which the people
could move, we ensured that the Kinect camera could capture all of the mover’s joints
at any point in time throughout the movement sequence, and no joint came out of the
camera range.
1.2.3 Multi Label Classification
In multi-label learning each instance is associated with multiple labels simultaneously,
and the number of labels is not fixed from instance to instance. The task in this learning
paradigm is to predict the label set (Laban motor elements in our study) for each new
unseen instance (skeletal recording, i.e., clip), based on analysis of training instances
with known label sets. In other words, by providing the system with clips identified by
the Laban motor elements they include, the system learns to recognize the appropriate
motor elements in new clips which it didn’t “see” before. In this study we dealt with
three different classification problems, with increasing complexity. First we provided the
system with clips and the Laban motor elements included in them from one CMA, and
taught it to recognize those Laban elements in new unseen clips of the same CMA. This
method can be developed to teach a system to recognize qualities in an individual’s
unique movement expression. In the second step we taught the system to recognize the
Laban elements in the clips (i.e., movements) of each new CMA based on the labeled
clips of the other CMAs. Lastly, based on all CMAs dataset, the system learned to
recognize those motor elements in clips of the non-CMAs’ movements.
Clip Labeling
Clips were labeled by the motor elements in the instructions for each clip, with the
assumption that as experts in LMA, the CMAs indeed performed the required elements.
Thus, the instructions given to the CMAs regarding which motor elements to move,
were used as the ground truth for labeling the motor elements in each clip of the CMA
data set. Labeling of the motor elements in the movements of the non-CMAs was done
by one of the authors who is a CMA who observed those movements.
7
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
1.2.4 Feature Extraction
The machine learned to recognize the different Laban qualities by extracting many
features from each movement, and by learning from the training-set clips which features
characterize each motor element. It then identified the Laban elements in new clips
based on the features extracted from the movement in those new clips.
To enable the CMAs to express the motor elements in a variety of different movement
sequences, we did not want to constrain the lengths of the clips to be exactly 3 seconds.
Thus, in order to get feature vectors of uniform length (regardless of the original length
of the clips), every extracted feature was a function of the whole clip, i.e., all the
extracted features were in whole clip granularity.
Two groups of features were extracted: the first was relatively small, containing
a handful of features, each of which was designed to portray a specific Laban motor
element based on ”translation” of the meaning of that element into kinematic terms.
The second group contained about 6000 features, and exploited the rich data that was
provided by the Kinect software, by extracting from every joint in the skeleton, its
derivatives: angular velocity, acceleration and jerk. For every time series of [joint ×dimension (X,Y, Z)× derivative], we calculated about 20 statistics, such as: mean,
variance, skewness, kurtosis.
The following are examples for some of the manually composed features that were
designed to portray some of the specific motor elements, and for each of which we also
calculated the 20 statistics:
Advance and Retreat
Advance and retreat are two Laban motor elements that incorporate changes in the
Shape of the body in the sagittal plane, where part of the body’s core (axial skeleton),
usually the upper body, moves forward (Advance) or backward (Retreat) in relation to
the lower part of the body. These elements were quantified by projecting the velocity
vector of the Center of Mass (CM) on the vector of the front of the body. The CM
was approximated in this case by the average of all the joints. The front of the body
was approximated by the perpendicular vector to the vector between the Left Shoulder
(LS) and the Right Shoulder (RS). From the definition of CM of a physical system we
calculate:
~PCM (t) =∑
j∈Jointsαj~Pj(t), (1.1)
~Pshoulders(t) = ~PLS(t)− ~PRS(t), (1.2)
8
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
the front is perpendicular to ~Pshoulders, so we can easily calculate it with:
~Pfront = ~Pshoulders
0 0 1
0 1 0
−1 0 0
,
Ssag(t) = ~PCM (t) · ~Pfront(t), (1.3)
~Fsag = φ([Ssag(1), . . . Ssag(n)]), (1.4)
where ~Pj(t) is the vector of the position of joint j (as we get it from the Kinect) in
time t in a clip with n frames, and αj is a coefficient proportional to the mass around
the joint. φ is the function that creates the 20 statistics from the time series. S(t) is a
scalar in the time series at time t. F denotes the calculated features for Advance and
Retreat, and sag stands for sagittal.
Spread and Enclose
These are two Laban motor elements describing opposite changes in the Shape of the
body in the horizontal plane. In Spread the body becomes wider and when Enclosing,
the body becomes narrower. These elements were quantified by measuring the changes
in the average distance between every joint and the vertical axis of the body that extends
from the Head (H) to the Spine Base (SB):
dj =
∣∣∣(~Pj − ~PSB)× (~Pj − ~PH)∣∣∣∣∣∣~PH − ~PSB
∣∣∣ , (1.5)
Shoriz(t) =∑
j∈Jointsdj(t), (1.6)
~Fhoriz = φ([Shoriz(1), . . . Shoriz(n)]), (1.7)
Where P, S, φ, CM and F are defined as in the previous paragraph, and horiz stands
for horizontal.
Rise and Sink
Rise and Sink are changes in the Shape of the body in the vertical plane, where during
Rising, the body elongates upward and during Sinking the body goes down and shortens.
The distinction between these two Laban motor elements was quantified by measuring
9
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
the average distance on the Y axis of each joint from the CM :
Svert(t) =∑
j∈Joints
∣∣∣~Pj − ~PCM
∣∣∣ , (1.8)
~Fvert = φ([Svert(1), . . . Svert(n)]), (1.9)
where P, S,Θ, CM and F are defined as previously and vert stands for vertical.
Sudden and Sustain
Sudden and Sustain are two opposing motor elements of the Time dimension of the Effort
factor of the movement. The distinction between them was quantified by calculating
the skewness of the acceleration, based on the assumption that the acceleration of a
sudden movement has higher values at the beginning of the movement, i.e. is skewed to
the left.
~Vj(t) = ~Pj(t+ 1)− ~Pj(t), (1.10)
~aj(t) = ~Vj(t+ 1)− ~Vj(t), (1.11)
Skewj =1
n
n∑i=1
(aj(t)− µ
σ)3 (1.12)
Where ~Pj(t) is the vector of the position of joint j in time t, ~V j(t) is the vector of
the velocity of joint j in time t, µ and σ are the mean and standard deviation of the
accelerations (aj(t)), and n is the length of the time series (clip).
1.2.5 Performance Evaluation
From a statistical point of view, for every clip we had 18 possible labels (Laban motor
elements). The movement in each clip was constructed from a combination of 2-4 of
these elements, which meant that there was about 85% chance that a certain element
will not appear in a clip. Due to this sparsity, accuracy (defined as the percentage
of clips that have been labeled correctly out of the total number of clips) alone was
not a relevant metric for performance evaluation, since one could get 85% accuracy by
stating that for every clip none of the motor elements appear in it. However, if we define
Truly Positive Clips (TPC) as clips in which the relevant Laban element truly appeared,
and if we define Classified Positively Clips (CPC) as clips that our classifier found to
include the relevant motor element, then we can get a better performance evaluation by
combining precision (defined as the percentage of retrieved clips that were relevant) and
recall (defined as the percentage of relevant instances that were retrieved) to create the
10
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
more concise performance evaluation measure F1 score, which was defined as follow:
precision =|{TPC} ∩ {CPC}|
|{CPC}|. (1.13)
recall =|{TPC} ∩ {CPC}|
|{TPC}|. (1.14)
F1 =2 · precision · recallprecision+ recall
. (1.15)
1.2.6 Single Task Learning (STL)
In the single task learning, for each CMA, the machine learned to evaluate for each
Laban motor element separately whether it existed in a certain movement sequence,
based on the features that were detected in that movement sequence (clip). This was
done by a binary decision for every Laban element whether it existed in that movement
sequence or not.
Feature selection
For the purpose of the single task learning (STL) from each clip we extracted a vector
of 6120 features, most of which were noisy and redundant and required massive feature
selection in order to conduct the machine learning task. The feature selection was done
in three stages.
In the first stage we computed p-value for every feature. As seen in Fig. 1.3, filtering
out most of the features yielded better results than not filtering them, where using the
top 10% of the features was optimal.
Figure 1.3: Influence of the number of features on the performance. The selection wasmade according to statistical significance: The blue line is the difference between thescore with and without feature selection. It can be seen that the optimal percentage offeatures to select is 10%
The second stage of feature selection was conducted on the features which were not
filtered out in the first stage. In this stage the features were ranked according to their
information gain (IG), which is defined as:
IG(T, f) = H(T )−H(T |f), (1.16)
11
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
where T is the training set, f is a feature, and H() is the information entropy of a
dataset. During this stage 60% of the features were selected.
For the third stage of feature selection, which was performed on the features surviving
the first and second stage, we conducted a Least Absolute Shrinkage and Selection
Operator (LASSO) regularization [Tib96]. At the end of this three stages process, for
each motor element different number of features were selected, which were 5-20% of the
original number of features.
1.2.7 Multi- Task Learning
Multi Task Learning (MTL) framework [Car97] is an approach to multi-label learning
that learns each task (i.e., each problem solving) simultaneously with other tasks (with
solving other related problems), using shared representations, even when the tasks are
different. In our study MTL was achieved by simultaneously learning to recognize all 18
Laban motor elements. Unlike STL, which trains a separate model for every task (i.e.,
in our study, for learning to detect each Laban motor element separately), and in which
the data might be represented for each learning task differently, the goal in MTL was to
improve the performance of learning algorithms by learning classifiers for multiple tasks
jointly. MTL works particularly well when all the tasks have some commonality and are
generally slightly under sampled. For the MTL we used Multitask Elastic Net (MEN)
regularization, which is the multi-task regularization method used by Zou et al.,[ZH05].
MEN promotes sparsity and behaves also as feature selection mechanism. In MEN the
optimization objective is to minimize the following expression, where Y represents the
labels, X represents the samples, and W is the matrix that we want to learn:
‖Y −XW‖2F + λ1 · ‖W‖2,1 + λ2 · ‖W‖2F , (1.17)
λ1, and λ2 are hyper-parameters, where,
‖W‖2,1 =∑i
√∑j
w2ij , (1.18)
i.e., the sum of norm of each row (also known as mixed norm), and
‖W‖2F =∑i
∑j
w2ij , (1.19)
Feature selection for the MTL was carried out by averaging the statistical significance
of each feature with respect to all of the tasks. This was in contrast to the single task
learning, where every task had its own feature selection.
12
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 1.4: Recall, precision and F1 score of each Laban quality separately. Theevaluation was conducted on a dataset that was captured on only one CMA.
1.3 Results
1.3.1 Single-Task Learning
In the STL we used as training set 80% of the clips produced by each CMA to teach
the machine to recognize each motor element in the other 20% of the clips performed
by the same person. Fig. 4 demonstrates the precision, recall and F1 for each of the
Laban qualities for the first CMA whose data was collected. As can be seen in Figure
4, the performance varied from one Laban motor element to another and was about
40-85% for precision and recall. F1 values were in between the values for precision and
recall, and on average for all Laban elements were 0.6.
1.3.2 Multi-Task Learning
Evaluation performance of each Laban quality
The MEN regularization was performed on all the clips of all 6 CMAs that participated
in the study. As a result, 282 features were selected (same features for all of the tasks).
The performance of every motor element as classified by the MEN regularization is
presented in Table 1.1.
Multi-task vs. Single-task learning
The generalization ability of the model was enhanced, compared to that in the STL, by
the fact that the decision of which features to select was influenced by all the motor
elements. The most significant improvements were in the Laban elements that performed
worse in the single task learning setting (Strong and Sudden for example). As seen in
13
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Table 1.1: Precision, recall, and F1 score of each Laban motor element that resultedfrom the MTL performed on all clips from all CMAs. The F1 average and standarddeviation over the motor elements are shown in the last row of the table
Motor Element Precis-ion
Recall F1score
Jump 0.89 0.81 0.85
Twist and Back 0.69 0.85 0.76
Sink 0.62 0.79 0.69
Rhythmicity 0.59 0.72 0.65
Spread 0.55 0.76 0.64
Head drop 0.60 0.66 0.63
Rotation 0.66 0.60 0.63
Free and Light 0.45 0.94 0.61
Up and Rise 0.67 0.54 0.60
Condense and En-close
0.44 0.84 0.58
Arms To UpperBody
0.67 0.54 0.60
Advance 1.00 0.38 0.55
Retreat 0.50 0.59 0.54
Passive 0.40 0.85 0.54
Bind 0.44 0.61 0.51
Direct 0.56 0.49 0.52
Sudden 0.61 0.41 0.49
Strong 0.29 0.42 0.34
Average 0.59 0.65 0.60
SD 0.17 0.17 0.11
Table 1.2, the multitask learning improved the overall F1 score by 4% compared to the
STL.
Table 1.2: Multitask vs Single task learning performance evaluation on a data set ofseveral CMA’s.
Metric Single task Multitask
Precision 0.46 0.59
Recall 0.71 0.65
F1 0.56 0.6
Evaluation performance for movements of unseen CMA
In this experiment the test set was taken from the clips of one CMA while the training
set was composed from the clips of all other 5 CMAs. Testing was performed for each
CMA separately and the results were averaged over all six CMAs. Performance as
measured by F1 degraded on the unseen CMA from 0.6 to 0.57. This degradation seems
mild considering the large variation/diversity among clips from one CMA to another, as
14
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Rot
atio
nC
ond
ense
&E
ncl
ose
Sin
kF
ree&
Lig
ht
Arm
sToU
pp
erB
od
yS
pre
ad
Ju
mp
0
0.2
0.4
0.6
0.8
F1
score
Figure 1.5: Performance on ordinary people (non-CMAs) instructed to perform severaltasks.
every CMA performed different gestures, in different postures (some sitting and some
standing) and in different contexts (some were dancing while some were acting).
Evaluation performance for everyday movements of non-CMAs
The data set of non-CMAs consisted of several daily movements two people were asked
to do, such as pretending to greet a friend or play with a balloon. This data set was
small, and the people were instructed to do movements that we hoped would include
the Laban motor elements that we have examined in this study, but we had no direct
control over the exact movement they chose to do or the motor elements included in
those movements. A CMA who watched those movements determined which of the 18
Laban motor elements examined in this study appeared in those movement sequences.
As shown in Fig 5, which describes the learning performance for this data set, it turned
out that only 7 motor elements of the 18 examined in this study appeared in those
people’s movements. Performance as measured by average F1 degraded from 0.57 for
an unseen CMA to 0.54 for a non-CMA.
15
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
16
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Chapter 2
LDA Model Monitoring in
Distributed Systems
2.1 Introduction
2.1.1 Setup and Motivation
In this chapter of the thesis, we address the problem of mining data streams when the
data is distributed over a large number of nodes with the same data generation process
at every node. However, the streaming data is not stationary; notably, it can change
over time. Classic examples of real-life prediction problems that involve this kind of
change are user preference prediction and fraud detection. In the former, the choices
of the user can change over time; in the latter, the fraudulent transactions change
constantly to avoid detection. In both, the change can render the prediction model
invalid.
In a setting where the model must be updated to stay valid and communication
is costly — the question is when to recompute the model. The naive solution for this
problem is recomputing the model periodically. The problem with this solution is that it
involves needless work if the model changes infrequently, yet may introduce unacceptable
errors between scheduled updates. In contrast to periodical computation, we focus on
monitoring the quality of a given model, and recomputing it only as needed.
In this work we focus on linear binary classifiers, using LDA [Fis36] as the learning
algorithm. This choice is due the popularity of linear classifiers in real applications and
that they serve as a platform for more complex classifiers, such as ensemble model in
the work of [HMR12, MGE11], neural networks in the work of [OHK15], and even
deep architectures in the work of [GDDM14]. Our method is distinct from the previous
work in the following two aspects:
Model-Based Monitoring: Monitoring the model and not the misclassifications has
an important benefit: the need for synchronization can be detected before the misclas-
sifications occurs. In contrast to most previous work on monitoring a classifier (that
17
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
utilizes misclassification rates to draw conclusions about the change in the distribution
[BGdCAF+06, GMCR04, NY07]), we propose to monitor the change in the model itself.
Distributed Setting: Monitoring a classifier has been actively studied in centralized
settings. In contrast to these studies, our is one of the very few works that monitor
in a distributed setting. In such setting, data is distributed over a large number of
nodes and the model is learned globally after synchronization. While the few existing
methods for classifier monitoring in distributed settings rely on heuristics ( [AGZ+13]),
our approach is the only one that provides a provable guarantees of correctness.
2.1.2 Our Contribution
We propose Distributed Linear Discriminant Analysis (DLDA): a novel communication-
efficient monitoring algorithm for LDA models of distributed, dynamic data streams.
We show how can we use the monitoring of the LDA model for concept drift detection.
To the best of our knowledge, this is the first algorithm that monitors the model itself,
rather than its prediction or fit. Given a previously computed global model, we derive
local constraints on the local data at each node. A node only communicates if its
constraint is broken. These constraints guarantee that if no node communicates, the
global hypothetical model is sufficiently close to the precomputed model. Evaluation
on three real datasets shows that it reduces communication by up to two orders of
magnitude. We also present and demonstrate and demonstrate Probabilistic Distributed
LDA Monitoring (PDLDA). This framework harnesses the distributed nature of the
system, and decides to sync according to the number of violations over the entire set
of nodes, rather than syncing every time a single node has a violation. To the best of
our knowledge, the LDA monitoring problem addressed in this paper has never been
addressed over a distributed setting.
2.1.3 Related Work
Monitoring dynamic data streams is a broad topic that has been addressed in different
research communities. Within this field, we focus on detecting a change in the data
stream that renders the prediction model invalid. In distributed settings, this problem is
even harder and referred to as distributed monitoring, and it is concerned with designing
local tests for monitoring a function that is defined globally over all the nodes in the
system. Our approach to this problem is to define a constraint over the local data
(at each node) that guarantees the validity of the global model. If local data (in one
or more nodes) does not meet the local condition, it leads to synchronization. The
synchronization process has large communication costs, and the goal of the distributed
monitoring methods is thus to minimize the number of synchronizations. Most of the
work on distributed monitoring has been concerned with simple functions of the data,
such as linear functions in the work of [KCR06] and [KRRS08] or monotonic functions
in the work of [MTW05]. For non-linear functions, examples include work on monitoring
18
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
the value of a single-variable polynomial as in the work of [SR08], and eigenvalue
perturbation as in the work of [HNG+07]. While the previous work handled specific
families of functions, we chose to use geometric approach for monitoring arbitrary
functions over distributed streams, as was proposed, later extended and generalized in
[SSK07, KSA+14, KSSL12]. A recently introduced work by [GKS15] on monitoring
Least Square Regression (LSR) using geometric monitoring is the closest to ours, but
our problem is more complex: unlike the global scatter matrix (required by LSR) the
global covariance matrix (required in LDA) is not the mean of the local covariance
matrices which makes the monitoring problem more much harder.
2.2 Problem Definition
We first describe the Linear Discriminant Analysis (LDA) algorithm and then define
the monitoring problem.
2.2.1 Linear Discriminant Analysis
LDA seeks a linear combination of features that characterize or separate two or more
classes of samples. The resulting combination may be used as a linear classifier, or for
dimensionality reduction before later classification.
In LDA the problem is approached by assuming that the conditional probability
density functions Pr(~x|y = p) and Pr(~x|y = q) are both normally distributed with
mean and covariance parameters (p,Bp) and (q,Bq), for two target classes P and Q
respectively. (x1, y1), . . . , (xn, yn) are i.i.d. samples, xi ∈ Rd and yi ∈ {0, 1}.
We seek a linear transformation (model), w ∈ Rd, that maximizes the separation
between the classes, where the separation is defined to be the ratio of the variance
between the classes to the variance within the classes:
S :=σ2between
σ2within
=(wT (p− q))2
wT (Bp +Bq)w. (2.1)
Solving the maximization problem yields that the decision criterion is a threshold on
the dot product
w · x > c
where
w ∝ (Bp +Bq)−1(p− q) (2.2)
c =1
2(T − pTS−1p p+ qTS−1q q). (2.3)
In this work we monitor w, and will refer it as the classification model.
19
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
2.2.2 Monitoring Problem
We denote k as the number of nodes and W as the number of samples in a node. Our
model uses discrete time (hereafter, rounds). Every node receives a new sample in a
round. We use the sliding window model, every node keeps two sliding windows (one
for each class) of length of W/2. As a node receives a new observation, it replaces the
oldest one from its class. xij and yij are the j’th sample and label in the i’th node and
xiold(p) and xiold(q) are the oldest samples from each class in the sliding window of the
i’th node. As data evolves, it is possible that the previously computed model no longer
matches the current true model. Let w0 be the existing model (vector of weights of a
linear classifier), previously computed at some point in the past (the synchronization
time), and let w be the true LDA model (the hypothetical model that synchronization
would yield if it occur). We wish to maintain an accurate estimation w0 of the current
global LDA model, w. For the classification purpose, the most important property of
a linear classifier is its direction. Therefore, we monitor the change in this direction:
given a threshold T , our goal is to raise an alert if
< w,w0 >
‖ w ‖‖ w0 ‖< T. (2.4)
i.e. if the angle between w0 and w is above a certain threshold (inner product between
unit vector is the cosine of the angle between them).
Due to the complexity of condition 2.4, we will monitor a restriction of it: we
replaced the cone containment condition to a sphere containment condition, i.e.,
||w − w0|| > R0, (2.5)
where R0 := ||w0||√
1− T 2 is the radius of the maximal volume sphere of which w0 is
its center and resides inside the cone from condition 2.4.
2.3 Monitoring Distributed LDA With Convex Subsets
Monitoring distributed LDA models is difficult because the global model cannot be
inferred from the local model at each node. Even when all current local models wi are
similar to the precomputed local models w0, the current global model w may be very
different from the precomputed model w0: consider the example in Figure 2.1 with
k = 2 nodes and dimension d = 2. The angle deviation of the global model (shown in
solid lines) is large (45 degrees) even though the local models (shown in dashed lines)
are identical to what they were at the initial point.
To overcome this difficulty, we impose constraints on local data at the nodes, rather
than on the function of the global aggregate. Given a function of the average of all
local data and the threshold, we compute a “good” convex subsets, called safe zones,
for each node.
20
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.1: Example of incorrect monitoring by applying LDA locally. The initial stateof the data is presented in (A) and the state at a later point is shown in (B). In (B)every node (green and red dashed lines) calculates the same angle for the separator asit was in (A). But it can be seen that the global separator’s (blue solid line) angle haschanged significantly.
As we show below, convexity plays a key role in the correctness of this scheme. As
long as local data stay inside the safe zones, we guarantee that the function of the
global average — the euclidean distance between the true global model to the one that
was computed in the last synchronization (hereafter, model drift) — does not cross a
threshold. Nodes communicate only when local data leaves the safe zone, which we
call a safe zone violation (hereafter, violation). Once that happens, violations can be
resolved, for example by synchronization. In other words, we want to impose conditions
on the local data at each node so that as long as they hold, ||w − w0|| < R0, i.e., the
global model is valid.
2.3.1 Notation
We recall that P and Q are the classes in the binary classification problem. (p, q) and
(pi, qi) are the global and local means of classes P and Q.
S and Si are the global and local normalized scatter matrices of the feature space:
Si :=1
W
W∑j=1
xij(xij)
T
21
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
S :=1
Wk
k∑i=1
W∑j=1
xij(xij)
T =1
k
k∑i=1
Si.
Similarly, u and ui are the distance between the means of the classes, i.e., u := p− qand ui := pi − qi.B is the global covariance matrix, which is the sum of the covariance matrices of the
two classes, i.e., B := Bp +Bq. It can be shown that B = S − ppT − qqT .
Let w be our current true model. Then, following Eq. 2.2, we can express:
w := (S − ppT − qqT )−1(p− q) = B−1u. (2.6)
Let w0 be the existing model, previously computed from (S0, p0, q0) or from (B0, u0) at
the time of synchronization. Then,
w0 := (S0 − p0pT0 − q0qT0 )−1(p0 − q0) = B−10 u0. (2.7)
If Si0, p
i0 and qi0 are the local normalized scatter and averages of the samples in a
node at the time of last synchronization, we define the local drifts to be:
∆is := Si − Si
0
δip := pi − pi0δiq := qi − qi0.
We define ∆s, δp, and δq — the global drift vectors of S, p, and q — to be:
∆s := S − S0δp := p− p0δq := q − q0.
Remark. It is easy to see that every global drift vector is the average of the local drift
vectors:
∆s =1
k
∑∆i
s,
δp =1
k
∑δip,
δq =1
k
∑δiq.
2.3.2 Convex Safe Zones
Each node monitors its own drift vector: as long as current values at local nodes
(Si, pi, qi) are sufficiently similar to their values at synchronization time (Si0, p
i0, q
i0), w0
22
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
is guaranteed to be close to w. Formally, we define a convex set C such that:
(∆s, δp, δq) ∈ C ⇒‖ w − w0 ‖ < R0. (2.8)
Lemma 2.3.1. Let C be a convex set that satisfies Eq. 2.8. If (∆is, δ
ip, δ
iq) ∈ C for all i,
then
||w − w0|| < R0.
Proof. We express S, p and q as their values at synchronization with the addition of the
average of the local drift vectors:
(S, p, q) =1
k
∑i
(Si, pi, qj)
= (S0, p0, q0) +1
k
∑i
(∆is, δ
ip, δ
iq).
(2.9)
From C’s convexity and using Remark 1 we get:
∀i(∆is, δ
ip, δ
iq) ∈ C ⇒
1
k
∑i
(∆is, δ
ip, δ
iq) ∈ C
⇒ (∆s, δp, δq) ∈ C.(2.10)
Finally, from the definition of C we obtain:
(∆s, δp, δq) ∈ C ⇒‖ w − w0 ‖ < R0, (2.11)
2.3.3 Convex Bound for Local Condition
We denote the change in the global covariance matrix
∆ :=B −B0
= (S0 + ∆S − (p0 + δp)(p0 + δp)T
− (q0 + δq)(q0 + δq)T )
− (S0 − p0pT0 − q0qT0 )
= − δpδTp − δqδTq+ ∆S − p0δTp− δppT0 − q0δTq − δqqT0 .
23
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
We break ∆ into its quadratic part,
M := −δpδTp − δqδTq
M i := −δip(δip)T − δiq(δiq)T
and its linear part,
L := ∆S − p0δTp − δppT0 − q0δTq − δqqT0
Li := ∆iS − pi0(δip)T − δip(pi0)T − qi0(δiq)T − δiq(qi0)T ,
and hence
∆ = L+M,
∆i := Li +M i.
We denote the change of the distance between the means as
δ := u− u0 = δp − δq,
δi := δip − δiq.
Now we can define a convex bound for our problem:
Lemma 2.3.2. Let G be the set of triplets (∆is, δ
ip, δ
iq) that satisfies the bound:
||B−10 δi||+(||w0||+R0)(
∥∥∥B−10 Li∥∥∥+
∥∥∥B−10 M i∥∥∥) ≤ R0 (2.12)
where∥∥∥A∥∥∥ is the operator norm of the matrix A, and ||v|| is the euclidean norm of the
vector v.
If∥∥∥B−10 ∆i
∥∥∥ < 1, then G ⊆ C and G is convex.
2.3.4 Proof of the Convex Bound Lemma
We must find a convex subset C satisfying the condition of Eq. 2.8. Let us start by
recalling the definition of the operator norm of a matrix:
Definition 2.3.3. Let A be a matrix. Its operator norm or spectral norm (hereafter
just norm), is defined as: ∥∥∥A∥∥∥ = supx 6=0
||Ax||||x||
. (2.13)
The following result is very useful in the forthcoming analysis:
Lemma 2.3.4. If A is square and∥∥∥A∥∥∥ < 1, then
∥∥∥(I +A)−1∥∥∥ < 1
1−∥∥∥A∥∥∥ .
24
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
The proof for this lemma can be found in [GKS15].
We recall that C is the convex subset that satisfies inequality 2.8, and G is the set of
triplets (∆is, δ
ip, δ
iq) which satisfy the inequality 2.12.
Lemma 2.3.5. G ⊆ C
Proof. We can write the sphere inclusion condition 2.5 in terms of B0,∆, u0 and δ, by
using the triangle inequality:
||w − w0|| = ||(B0 + ∆)−1(u0 + δ)−B−10 u0||
< ||(B0 + ∆)−1δ||
+ ||((B0 + ∆)−1 −B−10 )u0||.
(2.14)
We split the right side of the last inequality into two parts:
E1 := ||(B0 + ∆)−1δ||
E2 := ||((B0 + ∆)−1 −B−10 )u0||.(2.15)
Under the assumption ||B−10 ∆|| ≤ 1, it follows from lemma 2.3.4:
E1 ≤||B−10 δ||
1−∥∥∥B−10 ∆
∥∥∥E2 ≤
||B−10 ∆w0||
1−∥∥∥B−10 ∆
∥∥∥ .(2.16)
From standard properties of the norm we get:
||B−10 ∆w0|| ≤∥∥∥B−10 ∆
∥∥∥||w0||. (2.17)
Substituting Eq. 2.15, 2.16 and 2.17 in Eq. 2.14, we get:
||w − w0 ‖ ≤ E1 + E2
≤||B−10 δ||+
∥∥∥B−10 ∆∥∥∥||w0||
1−∥∥∥B−10 ∆
∥∥∥≤ R0.
(2.18)
After rearranging the terms, we have
||B−10 δ||+∥∥∥B−10 ∆
∥∥∥||w0|| ≤ R0(1−∥∥∥B−10 ∆
∥∥∥). (2.19)
From the triangle inequality we can rewrite:∥∥∥B−10 ∆∥∥∥ ≤ ∥∥∥B−10 L
∥∥∥+∥∥∥B−10 M
∥∥∥. (2.20)
25
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
And finally, combining inequalities 2.19 and 2.20, we get the following bound:
||B−10 δ||+(||w0||+R0)(∥∥∥B−10 L
∥∥∥+∥∥∥B−10 M
∥∥∥) ≤ R0.
Lemma 2.3.6. ||B−10 δ||+ (||w0||+R0)(∥∥∥B−10 L
∥∥∥+∥∥∥B−10 M
∥∥∥ is convex in (∆s, δp, δq).
Proof. Multiplication by B−10 is a linear operation, and norm is a convex operation.
Therefore ||B−10 δ|| is convex in δ.
We recall that:
L := ∆S − p0δTp − δppT0 − q0δTq − δqqT0 .
L is linear in (∆s, δp) and therefore∥∥∥B−10 L
∥∥∥ is convex in these variables.
We recall that:
M := −δpδTp − δqδTq .
It is left to prove that∥∥∥B−10 M
∥∥∥ is convex in (δp, δq).
From the definition of the operator norm, we can rewrite:∥∥∥M∥∥∥ =||B−10 ( max||u||=1
{uT δpδTp u}+ max||u||=1
{uT δqδTq u})||
=||B−10 ( max||u||=1
{||uT δp||2}+ max||u||=1
{||uT δq||2})||.
We observe that the maximum over any number (infinite in this case) of convex functions
is also a convex function, and since multiplication by a matrix and the norm operation
preserve convexity, this concludes the proof.
Corollary 2.1. The proofs of Lemmas 2.3.5 and Lemma 2.3.6 complete the proof of
Lemma 2.12. From Lemma 2.12 and from Lemma 2.3.1 we conclude that
(||B−10 δ||+ (||w0||+R0)(∥∥∥B−10 L
∥∥∥+∥∥∥B−10 M
∥∥∥)
≤ R0)⇒ (||w − w0|| ≤ R0).(2.21)
which validates the convex bound.
2.4 Distributed LDA Monitoring Algorithm
In the following, we present two frameworks for LDA model monitoring that use the
bound in Eq. 2.12. In both frameworks, we define a coordinator, whose role is to monitor
the violation alerts from the nodes and aggregate the data from all the nodes when it
happens. The coordinator recomputes the model after data aggregation and sends the
26
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
new covariance matrix and the norm of the new model to the nodes. In both frameworks
every node runs the same update algorithm as detailed in Alg. 2.1. The frameworks
differ in their synchronization policy. The first, Distributed LDA Monitoring (DLDA),
will synchronize in a round in which at least one node has reported a violation (condition
2.12 in the node is not satisfied) as detailed in Alg. 2.2). The second, Probabilistic
Distributed LDA Monitoring (PDLDA), will synchronize in a round in which the number
of nodes with a violation is above a certain threshold. The derivation of this threshold
is presented Section 2.4.1.
Algorithm 2.1 Node Update: i is the index of the node, (x, y) is a new sample.
1: procedure Update2: if y is class P then3: pi = pi + x− xiold(p)4: Si = Si + xxT − xiold(p) ∗ (xiold(p))T
5: else6: qi = qi + x− xiold(q)7: Si = Si + xxT − xiold(q) ∗ (xiold(q))T
8: (∆is, δ
ip, δ
iq) = (Si − Si
0, pi − pi0, qi − qi0)
9: if ||B−10 δi||+ (||w0||+R0)(||B−10 Li||+ ||B−10 M i||)> R0 then
10: Report violation to coordinator11: Receive new global B−10 , ||w0||12: (Si
0, pi0, q
i0) = (Si, pi, qi)
Algorithm 2.2 Coordinator synchronization algorithm.
1: procedure Sync2: if One of the nodes has reported for violation then3: Ask from the nodes for their data4: Receive from every node i the triplet (Si, pi, qi)5: Compute updated ||w0|| and B−10 and distribute.
2.4.1 Probabilistic Distributed LDA Monitoring
DLDA triggers synchronization when a single node reports a violation. Our empirical
evaluation with a large number of nodes showed that such a strict policy causes
synchronization even when the global model is still valid. Loosely speaking, it is because
the condition in Equation 2.12 is stricter than the original condition from Equation 2.5.
Formally, it appears that in most of the datasets, G (the convex subset of C) is a proper
subset of C, and usually much smaller. To resolve this problem, we suggest to change
the synchronization policy of the system and synchronizing when a certain portion of
nodes report a violation. This portion is learned empirically on the training set of the
system and is notated as VT.
27
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
2.4.2 Analysis of the probabilistic version, PDLDA
2.5 Evaluation
We evaluated the performance of the proposed monitoring algorithms, DLDA and
PDLDA, on synthetic and real data. For each dataset we simulated a distributed data
stream by partitioning the data between the nodes and streaming it one sample in a
round.
2.5.1 Synthetic Data Experiments
We use synthetic data, in which all model assumptions hold, to exemplify the communi-
cation efficiency of our method (Section 2.5.1) and its ability to decide that the model
isn’t valid before the misclassifications (Section 2.5.1). We then (Section 2.5.1) analyze
the communication efficiency of our method as a function of the algorithm parameters.
Communication Efficiency
We compare DLDA to the T -periodic algorithm, denoted PER(T ), a sampling algorithm
that sends updates every T rounds. Our main performance metric is communication,
measured in normalized messages (the average number of messages sent per round by
each node). PER can achieve arbitrarily low communication at the cost of larger model
drift. However, periodic synchronization can miss the point of change in the data; hence
PER cannot guarantee to maintain the model drift under a fixed threshold, in contrast
to DLDA. Further, DLDA has additional intrinsic advantages over PER:
1. DLDA can be instantly calibrated to fit a given drift threshold, while for PER
the interval between synchronizations can only be determined empirically.
2. The rate that the data evolves might change. While DLDA adapts to the new
changing rate, PER suffers from its fixed period that has to be suboptimal to the
new one.
3. For a sudden change in the data, DLDA adapts immediately — the algorithm’s
latency is 0 — while for PER the latency might be up to the period length.
In this experiment we used a simple data generation process. There are 10 nodes,
each of which contains two data classes: P , a Gaussian centered at the origin and
with unit covariance matrix; and Q, a Gaussian also with unit covariance matrix,
but whose mean changes every 1,500 rounds, starting at (1, 0), and then changing to
(0,−1), (−1, 0), (0, 1) (see Fig. 2.2.
Figure 2.3 shows the behavior of the DLDA monitoring algorithm over the synthetic
dataset, with three points in time at which the data abruptly changes. DLDA achieves
a communication overhead of 0.01 messages per node per round, with the model error
guaranteed to always be below the given threshold. Conversely, the equivalent PER(100)
28
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.2: Illustration of the generation process of the synthetic data. The class P(denoted in blue) is fixed, while Q changes three times, every 1,500 rounds (the changesare depicted by the dark arrows). 2.3.
algorithm doesn’t maintain the model error below the threshold (red dashed line).
Figure 2.3 shows that the periodic algorithm does not always synchronize when the
model drift exceeds a given threshold. Moreover, it triggers redundant synchronizations
when there is no change in the data.
Early Drift Detection
To further expound on the advantage of the proposed DLDA algorithm, we consider
a toy example (Fig. 2.4), in which 2D data arrives from two classes (P ’s samples are
shown as plus signs and Q’s samples as minus signs). The means of the classes change
according to the depicted grey arrows, from time t1 to tL. The dark line at an angle of
−45◦ represents the optimal projection direction at time t1. As the classes change, this
initial projection direction remains ”correct”, in the sense that it still separates the two
classes; alas, at time tL, the two classes have switched their positions relative to the
projection’s direction, and the classifier fails. Hence, a monitoring algorithm which only
checks for misclassification at the nodes will fail to detect the drift in the classes until it
is too late – i.e., that the classifier fails – while DLDA will alert earlier, when the real
(global) classifier will have changed by more than the provided threshold (in this case
0.52 radians, or 30◦); this point is marked by an arrow in Fig. 2.4.
Parameter Analysis
Next, we analyze the parameters of the DLDA algorithm.
Model Drift Threshold: Model Drift Threshold is given by the user. Above it the
model drift is too big. It can be quantified in two ways: as the maximal angle between w
and w0, or as the euclidean distance between them. Figure 2.5 shows the communication
29
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.3: DLDA error (blue) vs. PER(100) error (red), for the synthetic datadescribed above. Horizontal axis represents rounds, vertical axis represents the norm ofthe difference between the real (global) model and the current model held at the nodes.Window size is 1,000. The maximum allowed error (which DLDA guarantees will neverbe surpassed) is T = 0.997 (which corresponds to a difference of 0.077 radians, or 4.4degrees, in the classifier’s direction). Both algorithms transmit the same overall numberof bytes, but at different rounds; while PER sends alerts periodically, DLDA alerts onlywhen the classifier may have changed. For this reason, PER yields a larger error whenthe two classes (and the classifier) change.
requirements of the DLDA algorithm as a function of the model drift threshold, and
the minimal communication required to match DLDA using PER. It can been seen that
for both fixed and dynamic data, DLDA outperforms PER for any given model drift
threshold.
Node Scalability: Node Scalability is how DLDA performs with different number of
nodes. Figure 2.6 shows the communication volume as a function of the number of
nodes k. We observe that communication increases slowly, reaching 0.25% on the fixed
data and 0.6% on the dynamic data distributed across 25 nodes.
Window Size: Figure 2.7 shows how communication decreases as a result of enlarging
the window size W . One can increase the window size to compensate for other factors in
the system that increase the communication. One of those is noise (which is quantified
in our context by the standard deviation of the data generating distribution).
Another parameter directly related to the window size is the dimension of the
data. The number of samples required for accurate estimation of the covariance matrix
grows with the dimension. In our settings, the number of training samples is linked to
the window size. When window size is fixed, communication grows linearly with the
dimension (see Figure 2.8).
30
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.4: A toy example demonstrating early detection of a change in the data.
Figure 2.5: Communication as the function of model drift for DLDA and PER. Theperiodic algorithm is tuned to achieve the same max model drift as DLDA for eachmodel drift threshold.
2.5.2 Real Data Experiments
In this section we test the algorithm on three real data sets. The first (USENET) is
too small to test the probabilistic approach; thus we use this set only for the DLDA
test. The second (Power Consumption Monitoring) is a medium size dataset (it is
distributed over 36 nodes) and we test both DLDA and PDLDA on it. The third (Gas
Sensor Time Series Monitoring) is a big set(it is distributed over 100 nodes). The DLDA
synchronization policy is too strict for a large number of nodes; hence we use this set
only for the PDLDA test.
Message Preference Monitoring — Usenet
The USENET dataset ( 2.9) is a text dataset that simulates a stream of messages from
three newsgroups (medicine, space, baseball); the messages are presented sequentially
to a user, who then labels them as interesting or junk, according to personal interest.
31
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.6: Communication as a function of the number of nodes for fixed (blue) andchanging (green dashed line) datasets
Figure 2.7: Communication as function of window size
Figure 2.8: Communication as a function of input dimension for fixed (blue) andchanging (green dashed line) datasets
32
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.9: Comparison between maximal (over nodes) DLDA model drift (blue) andthe true global model drift (green dashed line) for k = 2, W = 450. It can be seen thatDLDA responds to the change in the data that occurs after 600 rounds (red dottedvertical line) and causes a synchronization in round 698 (blue dashed vertical line).
Attribute values are binary, indicating the presence or absence of the 128 informative
words. The change in the data occurs from a change in the user’s preference (from space
to baseball). Figure 2.9 shows the results of the DLDA algorithm with W = 450 . The
first 450 rounds over the data correspond to the initialization phase and are omitted.
During the next 50 rounds the DLDA model drift (the value is calculated using the left
side of the inequality in Eq. 2.12) increases due to noise in the data; there is no change
in the user’s preferences. From round 500 to 600 the DLDA model drift is stable, and
again is due only to the noise. In round 600 there is a concept drift. From this point
both the DLDA model drift and the true model drift increase until the synchronization
in round 698.
Power Consumption Monitoring
The Power Consumption dataset contains the hourly power supply of an Italian electric
company as recorded from two sources: power supplied by the main grid and power
transformed from other grids. This stream contains three-year power supply records
from 1995 to 1998, and our learning task is to predict which hour (1 out of 24 hours) the
current power supply belongs to. Thchange in the dataft in this stream is mainly caused
by such factors as season, weather, time of day, and the differences between working
days and weekend. We demonstrate the algorithms on the following binary classification
problem: given a power supply measurement, decide whether it corresponds to night or
day. This dataset is an example of gradual change in the data (seasons do not change
abruptly). Figure 2.10 depicts the results of the DLDA and PDLDA algorithms. For a
small number of nodes, k = 4, and for large window size, W = 5000, DLDA requires
only 0.003 normalized messages. For a more distributed system, k = 36, and a smaller
window size, W = 600, DLDA requires 0.09 normalized messages. For PDLDA with
k = 36 and W = 600 and a violation threshold (VT) of 50%, PDLDA requires 0.02
33
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
normalized messages, much better than DLDA in the same setting.
Gas Sensor Time Series Monitoring
Data in this experiment consists of measurements collected by an array of 16 chemical
sensors in a lab, recording at a sampling rate of 100Hz for 24 hours, resulting in 8378504
data points for each sensor. During the first 12 hours the task is to detect the presence
of carbon monoxide (CO) in a mixture of chemicals, and from the 13th hour the task
is to detect the presence of methane, which corresponds to an abrupt change in the
data. Figure 2.11 demonstrates the results of PDLDA algorithm. First, we can observe
that the fraction of violated nodes (shown in blue) correlates with the true model drift
(shown in green). Second, we can see two patterns of behavior, which are separated by
an abrupt switch in the data (marked by the vertical red line). Before the switch, the
synchronization occurs every 150 rounds, and after the switch, it goes down to every
50 rounds. There is a transition period of about 1000 rounds that follows the point of
the data switch. In this interval, the sliding window mixes the old (before switch) data
and the new (after the switch) data, but once the window aggregates enough data, the
algorithms stabilizes and reduces the communication requirements. This experiment
shows that the PDLDA algorithm detects the abrupt change in the data and adapts to
the new conditions after a short period of time.
34
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
(a) DLDA behavior on Power Consumption data with: k=4, W=5000, VT=0
(b) DLDA behavior on Power Consumption data with: k=36 Nodes, W=600,VT=0
(c) PDLDA behavior on Power Consumption data with: k=36 Nodes, W=600,VT=18
Figure 2.10: The top and the center figures show the DLDA algorithm on the PowerSupply data set for a small (top) and large (center) number of nodes. The blue linerepresents the value of the local bound expression, corresponding to the node withthe maximum value. The green dashed line shows the model drift (normalized by thethreshold); the model is computed after the data was aggregated from all nodes. Thebottom plot shows the results of the PDLDA on the same dataset. The blue line in thebottom plot represents the fraction of violated nodes.
35
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Figure 2.11: Demonstration of PDLDA on the Gas Sensor dataset. A comparisonbetween the true model drift (green) to the fraction of the nodes that are violated in thecurrent round (blue). The experiment is configured for k=100 nodes, and the violationthreshold is VT=80.
36
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Chapter 3
Conclusion
In the first chapter we succeeded with a relatively high success rate to capture the essence,
and develop automatic recognition of 18 LMA motor elements, using an inexpensive
and widely available sensor. We hope that our work will provide the foundation and
inspiration for developing an in-home, inexpensive LMA based feedback system that
will be used for multiple purposes, such as therapy, arts, video games, communication
and human-robot interaction.
In the second chapter we introduced the first communication-efficient monitoring
algorithm for a linear classifier model that monitors the models itself, but does not
require knowledge of the global model at the local nodes. As long as all nodes meet
their local condition, the global model is guaranteed to be valid. Our algorithm has
important benefits:
• Our method works with distributed data in a communication efficient way.
• Monitoring the model as opposed to monitoring the misclassifications allows for
early detection of the changing even before misclassification occurs.
We evaluated the theoretical scheme – DLDA, and its probabilistic version – PDLDA,
on three real data sets. For a small number of nodes we used DLDA with its theoretical
guarantee, and for a greater number of nodes we used PDLDA. We showed that
the proposed scheme outperforms PER: it maintains a smaller Euclidean distance
between the last computed model and the current true model with a lower volume
of communication. This work is the first step in designing communication-efficient
algorithms with theoretical guarantees for monitoring classification models over dynamic
distributed data streams. One of the future directions is to extend the proposed
framework to ensembles of linear classifiers and neural networks, including deep learning
networks.
37
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
38
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Bibliography
[AC13] Andreas Aristidou and Yiorgos Chrysanthou. Motion indexing of
different emotional states using lma components. In SIGGRAPH
Asia 2013 Technical Briefs, page 21. ACM, 2013.
[ACC15] Andreas Aristidou, Panayiotis Charalambous, and Yiorgos
Chrysanthou. Emotion analysis and classification: Understanding
the performers’ emotions using the lma entities. In Computer
Graphics Forum, volume 34, pages 262–276. Wiley Online Library,
2015.
[AGZ+13] Hock Hee Ang, Vivekanand Gopalkrishnan, Indre Zliobaite,
Mykola Pechenizkiy, and Steven C. H. Hoi. Predictive handling of
asynchronous concept drifts in distributed environments. IEEE
Trans. Knowl. Data Eng., 25(10):2343–2355, 2013.
[BGdCAF+06] Manuel Baena-Garcia, Jose del Campo-Avila, Raul Fidalgo, Al-
bert Bifet, R Gavalda, and R Morales-Bueno. Early drift detection
method. In Fourth International Workshop on Knowledge Discov-
ery from Data Streams, volume 6, pages 77–86, 2006.
[BL80] Irmgard Bartenieff and Dori Lewis. Body movement: Coping with
the environment. Psychology Press, 1980.
[BN+93] Michele Basseville, Igor V Nikiforov, et al. Detection of abrupt
changes: theory and application, volume 104. Prentice Hall Engle-
wood Cliffs, 1993.
[Car97] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75,
1997.
[CCZB00] Diane Chi, Monica Costa, Liwei Zhao, and Norman Badler. The
emote model for Effort and Shape. In Proceedings of the 27th
annual conference on Computer graphics and interactive techniques,
pages 173–182. ACM Press/Addison-Wesley Publishing Co., 2000.
39
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
[CGHN+14] Jesus G Cruz-Garza, Zachery R Hernandez, Sargoon Nepaul,
Karen K Bradley, and Jose L Contreras-Vidal. Neural decoding of
expressive human movement from scalp electroencephalography
(eeg). Front. Hum. Neurosci, 8(188), 2014.
[CLV03] Antonio Camurri, Ingrid Lagerlof, and Gualtiero Volpe. Recog-
nizing emotion from dance movement: comparison of spectator
recognition and automated techniques. International journal of
human-computer studies, 59(1):213–225, 2003.
[DHS12] Richard O Duda, Peter E Hart, and David G Stork. Pattern
Classification. John Wiley & Sons, 2012.
[Dot95] Leonella Parteli Dott. Aesthetic listening: Contributions of
dance/movement therapy to the psychic understanding of motor
stereotypes and distortions in autism and psychosis in childhood
and adolescence. The Arts in Psychotherapy, 22(3):241–247, 1995.
[FCK97] Robert Fagan, Jan Conitz, and Elizabeth Kunibe. Observing
behavioral qualities. International Journal of Comparative Psy-
chology, 10(4), 1997.
[Fer14] Ciane Fernandes. The Moving Researcher: Laban/Bartenieff
Movement Analysis in Performing Arts Education and Creative
Arts Therapies. Jessica Kingsley Publishers, 2014.
[Fis36] Ronald A Fisher. The Use of Multiple Measurements in Taxonomic
Problems. Annals of Eugenics, 7(2):179–188, 1936.
[FP03] Afra Foroud and Sergio M Pellis. The development of roughness in
the play fighting of rats: A Laban movement analysis perspective.
Developmental Psychobiology, 42(1):35–43, 2003.
[FW06] Afra Foroud and Ian Q Whishaw. Changes in the kinematic
structure and non-kinematic features of movements during skilled
reaching after stroke: A laban movement analysis in two case
studies. Journal of Neuroscience Methods, 158(1):137–149, 2006.
[FW12] Afra Foroud and Ian Whishaw. The consummatory origins of
visually guided reaching in human infants: a dynamic integration
of whole-body and upper-limb movements. Behavioural brain
research, 231(2):343–355, 2012.
[GCF12] M Melissa Gross, Elizabeth A Crane, and Barbara L Fredrickson.
Effort-shape and kinematic assessment of bodily expression of
40
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
emotion during gait. Human movement science, 31(1):202–221,
2012.
[GDDM14] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra
Malik. Rich feature hierarchies for accurate object detection and
semantic segmentation. In IEEE Conference on Computer Vision
and Pattern Recognition, (CVPR), Columbus, OH, USA, June
23-28, 2014, pages 580–587, 2014.
[GKS15] Moshe Gabel, Daniel Keren, and Assaf Schuster. Monitoring Least
Squares Models of Distributed Streams. In Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pages 319–328. ACM, 2015.
[GMCR04] Joao Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues.
Learning with drift detection. In Advances in artificial intelligence–
SBIA, pages 286–295. Springer, 2004.
[HMR12] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discrim-
inative decorrelation for clustering and classification. In ECCV,
pages 459–472, 2012.
[HNG+07] Ling Huang, XuanLong Nguyen, Minos Garofalakis, Joseph M
Hellerstein, Michael Jordan, Anthony D Joseph, Nina Taft, et al.
Communication-efficient online detection of network-wide anoma-
lies. In 26th IEEE International Conference on Computer Com-
munications (INFOCOM), pages 134–142, 2007.
[KCR06] Ram Keralapura, Graham Cormode, and Jeyashankher Ra-
mamirtham. Communication-efficient distributed monitoring of
thresholded counts. In Proceedings of the SIGMOD International
Conference on Management of Data, pages 289–300. ACM, 2006.
[KCT+13] Mubbasir Kapadia, I-kao Chiang, Tiju Thomas, Norman I Badler,
Joseph T Kider Jr, et al. Efficient motion retrieval in large motion
databases. In Proceedings of the ACM SIGGRAPH Symposium
on Interactive 3D Graphics and Games, pages 19–28. ACM, 2013.
[KRRS08] Srinivas Kashyap, Jeyashankher Ramamirtham, Rajeev Rastogi,
and Pushpraj Shukla. Efficient constraint monitoring using adap-
tive thresholds. In IEEE 24th International Conference on Data
Engineering (ICDE), pages 526–535, 2008.
[KSA+14] Daniel Keren, Guy Sagy, Amir Abboud, David Ben-David, Assaf
Schuster, Izchak Sharfman, and Antonios Deligiannakis. Geometric
41
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
monitoring of heterogeneous streams. IEEE Transactions on
Knowledge and Data Engineering, 26(8):1890–1903, 2014.
[KSSL12] Daniel Keren, Izchak Sharfman, Assaf Schuster, and Avishay
Livne. Shape sensitive geometric monitoring. IEEE Transactions
on Knowledge and Data Engineering, 24(8):1520–1535, 2012.
[LD03] Jacqyln A Levy and Marshall P Duke. The use of Laban movement
analysis in the study of personality, emotional state and movement
style: An exploratory investigation of the veridicality of” body
language”. Individual Differences Research, 1(1), 2003.
[LVBB10] Tino Lourens, Roos Van Berkel, and Emilia Barakova. Com-
municating emotions and mental states to robots in a real time
parallel framework using Laban movement analysis. Robotics and
Autonomous Systems, 58(12):1256–1265, 2010.
[MGE11] Tomasz Malisiewicz, Abhinav Gupta, and Alexei A. Efros. En-
semble of exemplar-svms for object detection and beyond. In
IEEE International Conference on Computer Vision - (ICCV),
Barcelona, Spain, November 6-13, pages 89–96, 2011.
[MK10] Megumi Masuda and Shohei Kato. Motion rendering system
for emotion expression of human form robots based on laban
movement analysis. In RO-MAN, 2010 IEEE, pages 324–329.
IEEE, 2010.
[MKI09] Megumi Masuda, Shohei Kato, and Hidenori Itoh. Emotion de-
tection from body motion of human form robot based on laban
movement analysis. In Principles of Practice in Multi-Agent Sys-
tems, pages 322–334. Springer, 2009.
[MTW05] Sebastian Michel, Peter Triantafillou, and Gerhard Weikum. Klee:
A framework for distributed top-k query algorithms. In Proceedings
of the 31st International Conference on Very Large Databases,
pages 637–648. VLDB Endowment, 2005.
[NY07] Kyosuke Nishida and Koichiro Yamauchi. Detecting concept drift
using statistical testing. In International conference on discovery
science, pages 264–269. Springer, 2007.
[OHK15] Margarita Osadchy, Tamir Hazan, and Daniel Keren. K-hyperplane
Hinge-Minimax Classifier. In Proceedings of the 32nd International
Conference on Machine Learning (ICML-15), pages 1558–1566,
2015.
42
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
[RDA08] Joerg Rett, Jorge Dias, and Juan-Manuel Ahuactzin. Laban move-
ment analysis using a bayesian model and perspective projections.
Brain, Vision and AI, 4(6):978–953, 2008.
[SC13] Karen Studd and L Cox. Evrybody is a body. Dogear publishing,
2013.
[SR08] Shetal Shah and Krithi Ramamritham. Handling non-linear poly-
nomial queries over dynamic data. In IEEE 24th International
Conference on Data Engineering (ICDE), pages 1043–1052. IEEE,
2008.
[SSK07] Izchak Sharfman, Assaf Schuster, and Daniel Keren. A geomet-
ric approach to monitoring threshold functions over distributed
data streams. ACM Transactions on Database Systems (TODS),
32(4):23, 2007.
[STW15] Tal Shafir, Rachelle P Tsachor, and Kathleen B Welch. Emotion
regulation through movement: Unique sets of movement character-
istics are associated with and enhance basic emotions. Frontiers
in Psychology, 6, 2015.
[THM10] Pejman Tahmasebi, Ardeshir Hezarkhani, and Mojtaba Mortazavi.
Application of discriminant analysis for alteration separation; 1sun-
gun copper deposit, East Azerbaijan, Iran. Australian Journal of
Basic and Applied Sciences, 6(4):564–576, 2010.
[Tib96] Robert Tibshirani. Regression shrinkage and selection via the lasso.
Journal of the Royal Statistical Society. Series B (Methodological),
pages 267–288, 1996.
[TP91] Matthew Turk and Alex Pentland. Eigenfaces for recognition.
Journal of cognitive neuroscience, 3(1):71–86, 1991.
[ZB05] Liwei Zhao and Norman I Badler. Acquiring and validating motion
qualities from live limb gestures. Graphical Models, 67(1):1–16,
2005.
[ZGCA13] Haris Zacharatos, Christos Gatzoulis, Yiorgos Chrysanthou, and
Andreas Aristidou. Emotion recognition for exergames using Laban
movement analysis. In Proceedings of the Motion on Games, pages
39–44. ACM, 2013.
[ZH05] Hui Zou and Trevor Hastie. Regularization and variable selection
via the elastic net. Journal of the Royal Statistical Society: Series
B (Statistical Methodology), 67(2):301–320, 2005.
43
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
i
בצורה גלובלית על ידי ריכוז המידע מכל המקורות. בעוד שהשיטות הקיימות לניטור מידע מסתמכות על
יקות, לשיטתינו ישנה הבטחה תאורטית.יוריסט
האלגוריתם נוסה על שלושה מאגרי מידע אמיתיים מהעולם:
מאגר של הודעות של חדשות. המשימה הייתה לסווג האם המשתמש יתעניין או לא בידיעת החדשות שהגיעה, .1
את המשתמש על פי ההגדרה שהמשתמש מתעניין אך ורק בנושאים מסויימים. הנושאים אותם מעניינים
השתנו במהלך הזמן מה שגרם למודל להיות לא מדוייק. את השינו הזה הצלחנו לנטר.
מאגר של קריאות חשמל מתחנת כח על פני שלוש שנים. המשימה הייתה לסווג האם הקריאה נעשתה ביום או .2
בלילה. השינוי בהתפלגות המידע הוא עונות השנה. גם את השינויים הללו הצלחנו לנטר.
שעות. המטרה הייתה לסווג האם יש באוויר גז מסויים או אין. 24ר של קריאות של חיישני גז על פני מאג .3
שעות. גם את השינוי הזה הצליחנו לנטר. 12נובעת משינוי בהרכב הגזים לאחר השינוי בהתפלגות המידע
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
ii
תקציר
, או Laban Movement Analysis, Laban Movement Studiesיתוח תנועה לאבאן )באנגלית: נ
Labanotation היא שיטה ושפה לתיאור, פירוש והמחשה של התנועה האנושית. השיטה פותחה על בסיס התאוריות )
בעולם לתנועה, והיא משמשת של רודולף לאבאן העוסקות במהות התנועה. זו אחת משיטות הניתוח הנפוצות ביותר
על בסיס התאוריות של לאבאן פותח רקדנים, שחקנים, אתלטים, מורים למחול, מורים בחינוך מיוחד, מטפלים ועוד.
"ניתוח תנועה לאבאן". ניתוח זה נעשה בלי קשר לסגנון או שיטת תנועה מסוימת )כגון יוגה, בלט, מחול מודרני,
תן לבחון באמצעותו כל אחד מתחומי התנועה המגוונים. נקודת המוצא של לאבאן היא כי אמנויות לחימה ועוד(, ולכן ני
התנועה מייצרת ומבטאת את הצד המנטלי והרגשי של האדם. לטענתו, בין הגוף והנפש ישנם יחסים דו כיווניים:
היא שהפכה את ניתוח לתנועה ישנה השפעה על הגוף, הדעת והנפש, ובו זמנית היא גם מבטאת צדדים אלו. הנחה זו
תנועה לאבאן לגוף ידע רלוונטי בתחומי הנפש, תחומי הגוף ותחומי אמנות רבים.
ניתוח תנועות בעזרת שיטה זו עדיפה על פני תאור קינמטי של התנועה כיון שהיא מתארת גם אספקטים איכותיים של
ישומים רבים והיא השיטה המועדפת במחקר התנועה בנוסף למאפיינים כמותיים. בשל יתרון זה, שיטה זו פופולרית ב
מוטורי של תנועה, לימודי תאטרון ויוצרת עניין בעולם משחקי המחשב ורובוטיקה. במחקר זה פתחנו מערכת
.59%. בעזרת מערכת לומדת בדיוק של Kinectמאפייני לבן שונים ממצלמת העומק ה 18אוטומתית המזהה
מאגר של הקלטות של האיכויות שאנחנו יצרנו, במספר תצורות שונות: האלגוריתם לזיהוי לאבאן נוסה על מספר
למידה על שחקן מסויים והערכת ביצועים על דוגמאות מאותו השחקן )דוגמאות שלא היו במאגר עליו המודל
אומן כמובן(.
.למידה משחקן אחד והערכת הביצועים על שחקן אחר
אנשים ללא רקע בלאבאן.למידה משחקנים מקצועיים והערכת הביצועים על
הפרק השני של התיזה עוסק בגילוי של שינוי בהתפלגות של מידע, כאשר המידע מבוזר בין מקורות שונים וכאשר
וע בדיוק הסיווג של המודל שאומן על פי מידע ישן. בתצורה המבוזרת, אימון מודל דורש השינוי בהתפלגות עלול לפג
מקום אחד, דבר שהוא יקר במונחי התקשורת של הנתונים. על מנת למזער את ריכוז כל המידע מהמקורות השונים ל
מבוצע בכל אחד מהמקורות בצורה לוקלית )ללא תקשורת בין המקורות(, ה קשורת, אנו מציעים אלגוריתם ניטורהת
ל המודל אשר היה מחושב אם היו מרכזים את המידע מכ)הגלובלי שימור הדיוק של המודל הבטחה על תוך כדי
, אשר הוא אלגוריתם פופלרי Linear Discriminant Analysisאלגוריתם הסיווג אותו בחרנו לנטר הוא (.המקורות
עבור סיווג והורדת מימדיות בישומים רבים. בחירה זו נעשתה בשל ההבטחה התאורטית החזקה על נכונות הניטור
רותית שלו מקטינים את התקשורת בשני סדרי גודל אשר הוכחנו על מסווג זה. הדגמנו כיצד האלגוריתם וגרסה הסתב
)בהשוואה לסנכרון כל פעם שמגיעה דוגמא חדשה( עבור שלושה מאגרי מידע מתחומים שונים. בנוסף על כך,
האלגוריתם שלנו מנטר את המודל עצמו, ולא את שגיאותיו בניגוד לאגלוריתמים אחרים, עובדה המאפשרת לנו לזהות
ד לפני שסיווג לא נכון קורה.את השינוי במידע עו
בעבודה זו אני מחדשים על ידי התייחסות לקונספטים הבאים:
על ידי ניטור המודל ניתן לשלול את המודל הישן עוד לפני הופעתן של שגיאות. -ניטור המודל ולא השגיאות
בעדו שניטור מודלים בסביבה לא מבוזרת נחקר לעומק, אין הרבה עבודות על –ניטור מודל בסביבה מבזורת
ניטור מודל בסביבה מבוזרת. בסביבה זו, המידע מבוזר על פני מספר רב של מקורות, ואילו המודל נלמד
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
iii
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
iv
המחקר בוצע בהנחייתו של פרופסור אסף שוסטר מהפקולטה למדעי המחשב.
תודה לטכניון על התמיכה הכלכלית במחקרי.
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
v
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
vi
חבור על מחקר
לשם מילוי הדרישות לקבלת תואר מאסטר במדעי המחשב
ברנשטייןרן
מכון טכנולוגי לישראל ---הוגש לסנט הטכניון
2016כסלו תשע"ו חיפה דצמבר
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
vii
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
viii
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016
ix
וניטור מסווג לינארי ניתוח תנועה על פי לבן
במערכת מבוזרת
רן ברנשטיין
Technion - Computer Science Department - M.Sc. Thesis MSC-2016-17 - 2016