Kernel-Based Object Tracking for Cerebral Palsy...

7
Kernel-Based Object Tracking for Cerebral Palsy Detection Hodjat Rahmati 1 , Ole Morten Aamo 1 , Øyvind Stavdahl 1 , and Lars Adde 2 1 Department of Engineering Cybernetics, NTNU, Trondheim, Norway 2 Department of Laboratory Medicine, Children and Woman’s Health, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway AbstractCerebral palsy is a chronic conditions affecting body movements, posture and muscle coordination. It is caused by damage to one or more specific areas of the brain, usually occurring during fetal development or infancy. The General Movement Assessment procedure consists of observation and clinical classification of movement patterns, and the absence of normal movement qualities between 2-4 months post term age has been shown to be a strong predic- tor of later cerebral palsy. In this paper we present a method for estimating motion trajectories of infant limbs and head in order to assess general movements. We extract the motion information from video captured from infants. For extracting motion data, a combined Bayesian filtering and kernel-based tracker is used and the appropriate modification applied on the previous methods. Result of tracking experiments shows high performance of the proposed method. Keywords: Cerebral palsy detection, video-based object tracking, kernel-based tracking, mean shift algorithm, histogram-based target representation 1. Introduction Some diseases are characterized by qualitative changes of body movements, hence movement data can be analyzed in order to distinguish between subgroups of subjects, e.g. healthy versus pathologic. Clinical gait analysis is an ex- ample of such a technique that is used routinely in many centres. In human neonates and infants there are certain types of spontaneous motor activity in the limbs, head and trunk, referred to in the literature as General Movements (GM), the quality of which is known to be highly correlated with the probability of developing cerebral palsy (CP) [3]- [14]. The General Movement Assessment (GMA) procedure consists of observation and judgment of movement patterns, and the absence of normal GM has been shown to be a strong predictor of later CP development [3], [14] and [13]. Thus, motion data captured during the first weeks and months of life lend themselves to the application of computer-based as- sessment and classification. The potential benefits of a semi- automated computer-based system, as opposed to manual methods like GMA, include the reduction of, the admittedly low, bias caused by subjective aspects of the observers, and hence improved reliability. Even more importantly, a human movement assessor’s skills need to be maintained by frequently performing actual assessments. Due to the relatively low prevalence of CP and similar disorders, only at major hospitals will an assessor have access to a sufficient number of patients to maintain his or her predictive skills . A computer-based system thus bears the potential to make this kind of movement assessment more widely accessible by providing a tool that can be utilized by non-GMA experts. Operational use of such a tool may typically involve the recording of movements of an infant at risk at one or more occasions during a prescribed age window, subsequent processing of the data and the presentation of a classification result that indicates whether the recorded movements suggest normal or abnormal motor development. The clinician may then issue a predictive diagnosis based on this result and any other relevant information available about the infant. Numerous groups have used quantitative analysis for the discrimination of normal versus pathological motion, including the identification of normal and spastic upper extremity movements [2] and [1]. Quantitative evaluation of certain temporal and coordination-related properties of kicking movements at the ages of 1 and 3 months post term have been found not to be predictive of the neurological outcome of infants, although GMA based on the same data material exhibited significant predictive values [16]. Reference [15] explored the structure and the irregularity in the spontaneous behavior of young infants, while [7] presented a quantitative technique for measuring GM using micro-electromechanical accelerometers. More recently, [10] presented a methodology for classi- fication of infant motion as “normal” or “affected”(i.e. ab- normal) based on optical motion tracking. In their paper, 22 infants, 15 healthy and seven high risk, took part in the study. 3D motion analysis was performed using a Vicon 370 motion analysis system. It is a passive detection system, allowing the contact free capturing of an arbitrary number of reflecting markers with a temporal resolution of 50Hz and a high spatial precision. The movement of the single markers is synchronously recorded by seven infrared cameras mounted on tripods. Their results demonstrate an accuracy of 73%. In this paper we are going to prepare enough information to extract suitable features in order to use as classifier input for detecting CP. As [10] shows, if the motion data of infant’s head, hands and feet are available, it is possible to detect the disease with a high accuracy. Previous works on CP detection used different instruments such as sensors, markers or infrared cameras to extract information of infant’s motion.

Transcript of Kernel-Based Object Tracking for Cerebral Palsy...

Page 1: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

Kernel-Based Object Tracking for Cerebral Palsy Detection

Hodjat Rahmati1, Ole Morten Aamo1, Øyvind Stavdahl1, and Lars Adde2

1Department of Engineering Cybernetics, NTNU, Trondheim, Norway2Department of Laboratory Medicine, Children and Woman’s Health, Faculty of Medicine,

Norwegian University of Science and Technology, Trondheim, Norway

Abstract— Cerebral palsy is a chronic conditions affectingbody movements, posture and muscle coordination. It iscaused by damage to one or more specific areas of thebrain, usually occurring during fetal development or infancy.The General Movement Assessment procedure consists ofobservation and clinical classification of movement patterns,and the absence of normal movement qualities between 2-4months post term age has been shown to be a strong predic-tor of later cerebral palsy. In this paper we present a methodfor estimating motion trajectories of infant limbs and headin order to assess general movements. We extract the motioninformation from video captured from infants. For extractingmotion data, a combined Bayesian filtering and kernel-basedtracker is used and the appropriate modification applied onthe previous methods. Result of tracking experiments showshigh performance of the proposed method.

Keywords: Cerebral palsy detection, video-based object tracking,kernel-based tracking, mean shift algorithm, histogram-based targetrepresentation

1. IntroductionSome diseases are characterized by qualitative changes

of body movements, hence movement data can be analyzedin order to distinguish between subgroups of subjects, e.g.healthy versus pathologic. Clinical gait analysis is an ex-ample of such a technique that is used routinely in manycentres. In human neonates and infants there are certaintypes of spontaneous motor activity in the limbs, head andtrunk, referred to in the literature as General Movements(GM), the quality of which is known to be highly correlatedwith the probability of developing cerebral palsy (CP) [3]-[14]. The General Movement Assessment (GMA) procedureconsists of observation and judgment of movement patterns,and the absence of normal GM has been shown to be a strongpredictor of later CP development [3], [14] and [13]. Thus,motion data captured during the first weeks and months oflife lend themselves to the application of computer-based as-sessment and classification. The potential benefits of a semi-automated computer-based system, as opposed to manualmethods like GMA, include the reduction of, the admittedlylow, bias caused by subjective aspects of the observers,and hence improved reliability. Even more importantly, ahuman movement assessor’s skills need to be maintainedby frequently performing actual assessments. Due to the

relatively low prevalence of CP and similar disorders, onlyat major hospitals will an assessor have access to a sufficientnumber of patients to maintain his or her predictive skills .A computer-based system thus bears the potential to makethis kind of movement assessment more widely accessible byproviding a tool that can be utilized by non-GMA experts.Operational use of such a tool may typically involve therecording of movements of an infant at risk at one ormore occasions during a prescribed age window, subsequentprocessing of the data and the presentation of a classificationresult that indicates whether the recorded movements suggestnormal or abnormal motor development. The clinician maythen issue a predictive diagnosis based on this result and anyother relevant information available about the infant.

Numerous groups have used quantitative analysis forthe discrimination of normal versus pathological motion,including the identification of normal and spastic upperextremity movements [2] and [1]. Quantitative evaluationof certain temporal and coordination-related properties ofkicking movements at the ages of 1 and 3 months post termhave been found not to be predictive of the neurologicaloutcome of infants, although GMA based on the samedata material exhibited significant predictive values [16].Reference [15] explored the structure and the irregularityin the spontaneous behavior of young infants, while [7]presented a quantitative technique for measuring GM usingmicro-electromechanical accelerometers.

More recently, [10] presented a methodology for classi-fication of infant motion as “normal” or “affected”(i.e. ab-normal) based on optical motion tracking. In their paper, 22infants, 15 healthy and seven high risk, took part in the study.3D motion analysis was performed using a Vicon 370 motionanalysis system. It is a passive detection system, allowing thecontact free capturing of an arbitrary number of reflectingmarkers with a temporal resolution of 50Hz and a highspatial precision. The movement of the single markers issynchronously recorded by seven infrared cameras mountedon tripods. Their results demonstrate an accuracy of 73%.

In this paper we are going to prepare enough informationto extract suitable features in order to use as classifier inputfor detecting CP. As [10] shows, if the motion data of infant’shead, hands and feet are available, it is possible to detectthe disease with a high accuracy. Previous works on CPdetection used different instruments such as sensors, markersor infrared cameras to extract information of infant’s motion.

Page 2: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

These instruments affect motion pattern and may result in afalse detection. So, in this paper we try to obtain motion datajust using video recordings of the infant, which requires noinstrumentation of the infant’s body and hence has a minimalinfluence on the movements.

Video-based object tracking is a well known topic inimage processing. Among all video-based trackers, kernel-based algorithms, [19] and [6], are quite popular and performwell in tracking objects. One of the most widely appliedkernel-based tracking methods is the mean shift procedure,which was first proposed by [6]. Mean shift is a gradientascent approach which tries to find modes or maxima of adistribution. To apply the mean shift procedure, histogramsrepresenting the target and candidate must first be obtained.Then a similarity measure between target and candidate isdefined, so that the best possible candidate for mean shifttracker can be found by maximizing the similarity measure.

In the video recordings of the infant, objects of interest(hands, feet and head) are highly correlated to the back-ground in the sense of the color, for example in Fig. 1the color of the feet is the same as the color of otherparts of the legs. So, a simple color-based histograms cannot have a good representation of the targets. Also, despitethe popularity of the traditional mean shift tracker, such asthose in [6] and [19], there are two major drawbacks withthem. Firstly, they build the kernel-based model just using asingle frame, and secondly, adaptive updating of the modelis difficult. This paper generalizes the simple mean shifttrackers to deal with these two problems. We use motioninformation of each pixel to obtain a better representationof the target. In addition, a new way of updating the kernelprofile is proposed in which the kernel automatically adaptsto changes in the image, thereby improving the performanceof the tracker.

The rest of the paper is organized as follows. Section2 focuses on histogram-based representation for the targetand candidate. In section 3, a metric for the similaritymeasurement is defined and optimization of that is discussed.Section 4 modifies the prior works on mean shift algorithmand improves it’s performance. In section 5, performance ofthe proposed algorithm is tested and compared to anothermethod. Finally, Section 6 discusses the results and capabil-ities of the proposed method.

2. Target representationAs explained in the previous section, our goal is to track

the infant’s limbs and extract their trajectories, so by targetwe mean both hands, feet and head. The objective of targetrepresentation is to extract object features for describingshape and appearance of the specific target. In order to beable to track the target, a good representation for it mustbe prepared. There are a variety of representation methodsavailable, such as contour-based [9], [17] and histogram-based [11], [18]. We follow the method in [19] and [6],

where an ellipse is used as an approximation of the shape ofthe object, and the appearance of the object is described bya histogram-based method. This way of target representationis simple and applicable in general applications, and thereis a large number of tracking methods that can work basedon this representation [19] and [6].

In the rest of this paper, It denotes the t’th video frame,xi and It(xi) are the location and value of the i’th pixel inframe number t.

2.1 Object shapeAs it is explained in [19], if O = {xi} is the set of

pixels that belong to the object being tracked, its shape canbe approximated by an elliptical shape set using the firstand the second moments of the original set. Let Oe be theapproximated elliptical shape set for O, so we can define

x =1

N

∑xi∈O

xi, (1a)

V =1

N

∑xi∈O

(xi − x)(xi − x)T (1b)

where N is the number of the pixels inside O, x is the firstmoment of O and defines the center of Oe, and V is thesecond moment of O and includes the scales and orientationof Oe.

2.2 Object appearanceIn this part a histogram-based feature space is created to

represent the target appearance. For this purpose first, a setof m scalar features is extracted from the set Oe. Then, eachfeature is considered as a bin of a histogram. Without lossof generality the target can be located at the origin of theimage space. Also, for eliminating the effect of difference intarget dimensions, all targets that already are approximatedto an ellipsoidal region are normalized to a unit circle O∗e .

Let O∗e = {x∗i }i=1...N and I(O∗e) = {I(x∗i )}i=1...N bethe set of normalized pixel locations, and b : I(O∗e) →{1...m} associates to each pixel in O∗e a particular bin indexu = {1...m} with value of qu, that can be obtained asfollows,

qu = C

N∑i=1

K(‖x∗i ‖2)δ[b(I(x∗i ))− u] (2)

where δ is the Kronecker function, and K is an isotropickernel function with a convex and monotonic decreasingprofile that appoints smaller weights to pixels farther fromcenter. The perimetric pixels are often affected by occlusion,so they are less reliable. This kernel function reduces theeffect of pixels close to the border, so the density estimationis more robust. C is a normalization constant that ensuresthat

∑mu=1 qu = 1, i.e.

C =1∑N

i=1K(‖x∗i ‖2)(3)

Page 3: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

The target candidate, which is going to be comparedwith the target model, is represented in the same way. Let{x∗i }i=1...Nh

be the set of normalized pixel locations for thecandidate. Assuming that the candidate is centered at y, thevalue of u’th bin of the candidate histogram can be obtainedas

pu(y) = Ch

Nh∑i=1

K(‖y − xih

∗‖2)δ[b(I(x∗i ))− u] (4)

where K is the same kernel profile as defined above withdifferent bandwidth h that is the scale or size of thecandidate, Ch is a normalization constant that ensures that∑m

u=1 pu(y), i.e.

Ch =1∑Nh

i=1K(‖y−xi

h

∗‖2)(5)

Also, for notational simplicity, we denote the histogramQ = {qu}u=1...m and P (y) = {pu}u=1...m for the targetand candidate, respectively.

2.3 Observation modelAs explained above, an ellipsoidal region is used for

approximating the target shape. From eigendecompositionof covariance matrix

V =

[v11 v12v21 v22

] [λx 00 λy

] [v11 v12v21 v22

]−1where λx and λy are scales in semi-axes directions of theellipse, and ox = [v11, v21]T and oy = [v12, v22]T representorientation of the ellipsoidal region. Therefore, we definest = [xT , λx, λy, o

Tx , o

Ty ]T as the target state vector in frame

number t. The measurement function (likelihood) at framenumber t can be defined as

p(It|st) ∝ exp(−λD2(Q,P (y))) (6)

where λ is a constant and D2(Q,P (y)) is a distance mea-surement between the target and candidate histograms. Toobtain the best candidate in current frame, the log-likelihoodof measurement must be maximized, so the distance mea-surement must be minimized with respect to y.

3. Recursive distance minimizationFor optimizing the measurement function or the similarity

between the target and candidate, equation (6), variousdistance measures can be used. We define the distancemeasure using the Bhattacharyya coefficient, i.e.

D2(Q,P (y)) = 1− ρ(Q,P (y)), (7a)

ρ(Q,P (y)) =

m∑u=1

√qupu(y), (7b)

Since the Bhattacharyya coefficient, ρ(Q,P (y)), is asmooth function, using a differentiable kernel profile (suchas Gaussian kernel profile) a gradient-based approach can

be used to do the optimization procedure. As we can seefrom equation (7a), minimizing the distance is equivalent tomaximizing the Bhattacharyya coefficient.

For doing optimization the search start point can be setas the target location in the previous frame, P (y0)), alsowe approximate equation (7b) by it’s first order Taylorexpansion around the value P (y0), so

ρ(Q,P (y)) ≈ 1

2

m∑u=1

√qupu(y0) +

1

2

m∑u=1

pu(y)

√qu

pu(y0)

(8)

where the first term is constant and doesn’t affect theoptimization procedure. We denote the second term as f(y),and by substituting (4) into (8) we get

f(y) =Ch

2

Nh∑i=1

ωiK(‖y − xih

∗‖2) (9)

ωi =

m∑u=1

√qu

pu(y0)δ[b(I(x∗i ))− u] (10)

Equation (9) is similar to a kernel-based density estimationfunction, so the mean shift algorithm [5] can be used tocalculate the gradient step and obtain the maxima withrespect to the target location (x). For using gradient step forall parameters we use the extended mean shift [20] insteadof mean shift. First, we consider the kernel as a Gaussianprofile, so K(‖x∗i ‖2) = N(x∗i ; x, V ), then for

ri =ωiN(x∗i ; x{k}, V {k})∑Ni=1 ωiN(x∗i ; x{k}, V {k})

(11)

where {k} shows the iteration number. Finally,

x{k+1} =

N∑i=1

rixi =ωixiN(x∗i ; x{k}, V {k})∑Ni=1 ωiN(x∗i ; x{k}, V {k})

(12)

and

V {k+1} =

N∑i=1

ri(x∗i − x{k})(x∗i − x{k})T (13)

4. Improvement in kernel-based meanshift algorithm

The kernel-based mean shift algorithm has proven its sat-isfactory performance in video-based object tracking. How-ever, for applications in which the object is highly correlatedto the background, as is the case in video recordings of infantmovements, performance of this algorithm is not satisfactory,and it might lose tracking, so some modifications are neededfor improving the tracking performance. Fig. 1 shows atypical image with objects of interest shown in the markedregions have a high correlation with the background, so thelocalization accuracy of the object will be poor. Therefore, inthis section we are going to improve kernel-based mean shiftalgorithm by selecting appropriate features and modifyingthe kernel profile.

Page 4: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

Fig. 1: A typical image of interest in this paper

4.1 Feature selectionFeature selection is a crucial step in object tracking, and

an enormous number of features can be selected from thevast amount of information in a video. One of the commonlyused features is obtained based on a histogram of the colorin a region of the interest. In this paper we also use colorinformation and based on an RGB image. But as mentionedbefore, since for our purpose objects of interest and back-ground are highly correlated, using just color informationis insufficient for achieving robust tracking. To reduce thebackground effect on target representation, [6] proposed thefollowing procedure. First, an area equal to three times of thetarget area is considered for background around the target,then the background is represented as the discrete histogram{bu}u=1...m, in feature space, and it’s smallest nonzero entryis denoted by b∗. The weights{

vu = min

(b∗

bu, 1

)}u=1...m

(14)

are used as a transformation for the representation of targetand candidates model. The new representation for target andcandidate are as follows:

qu = Cvu

N∑i=1

K(‖x∗i ‖2)δ[b(I(x∗i ))− u] (15)

pu(y) = Chvu

Nh∑i=1

K(‖y − xih

∗‖2)δ[b(I(x∗i ))− u] (16)

also C and Ch are obtained by normalization to ensurethat

∑mu=1 qu = 1 and

∑mu=1 pu = 1. This transformation

aims to reduce effect of the salient background feature inrepresentation of both target and candidate. Since meanshift is invariant to the scale transformation of weights, thistransformation does not improve performance of the mean

shift algorithm and result is the same as when not using it[12].

Here we use the motion information of each pixel just fortarget representation and not for the candidate representation,and candidate is represented as equation (4). First, using [8]we obtain an image of background and foreground of thesame size as the original image as follows,

I∗t (x) =

{It(x) x is foreground0 x is background (17)

where x is each pixel in the frame. Then, the It is modifiedas follows,

It = βIt + (1− β)I∗t (18)

where 0 < β < 1 is a constant that can be tuned based onapplication and smaller β increases effect of motion changesin the modified image. Finally, equation (2) is modified asfollows,

qu = C

N∑i=1

K(‖x∗i ‖2)δ[b(I(x∗i ))− u] (19)

4.2 Kernel modificationAlthough traditional mean shift trackers, for example [6]

and [19], are quite popular, there are two drawbacks withthem. Firstly, they build the kernel-based model just using asingle frame; secondly, adaptive updating model is difficultjust using information of a singleo frame. So, in this sectionwe generalize the simple mean shift trackers for makingthem able to deal with these two problems.

Let ptu and qtu be the value of candidate and targetrepresentation for bin u in frame t. Also, q0u = p0u = qu.In following we try to update the representation for targetinstead of using a single representation for the whole videosequence. The new representation can be showed as follows,

qtu = (1− α)pt−1u + αqt−1u (20)

where 0 ≤ α < 1 is a learning rate and can be set based onapplication, for example if the target representation doesn’tchange fast during time it should be close to zero. Also,

pu(y) = Ch

Nh∑i=1

K(‖y − xih

∗‖2)δ[b(I(x∗i ))− u] (21)

In the appendix it is proven that in this representation thereis no need for a new normalization and the constants C andCh are calculated as before. As we can see from equation(20), this new representation uses information from therepresentation of the target in previous frames as well asthe current frame, and it can be updated automatically andadapt to the changes in the video. For the sake of clarity asummary of the proposed method is explained in Tab. 1.

Page 5: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

(a) frame1 (b) frame10 (c) frame30 (d) frame50 (e) frame70

Fig. 2: Results of tracking by [19]

(a) frame1 (b) frame10 (c) frame30 (d) frame50 (e) frame70

Fig. 3: Results of tracking by proposed method

Table 1: The Algorithm Summary.

step 0. get the input frame Itstep 1. calculate It using equation (18)step 2. obtain target model using equation (20)step 3. obtaining the best candidate

fork = 1: iteration limit or convergencecalculate ri using equation (11)calculate x{k} using equation (12)calculate V {k} using equation (13)

end forstep 4. if more frames available, get the next frame and go to step 1

4.3 Extension of the algorithmTo further improve in tracking performance, in this part

we combine the propose kernel tracker with Bayesian fil-tering approach. For doing this movement of the object isconsidered as a random walk, so st = st−1 +wt, and wt iswhite zero-mean Gaussian noise. The aim of tracking is toobtain the density p(st|I1:t) for all video frames. Bayesianfiltering has two stages, a prediction stage and an updatingstage, that can be shown as follows,

prediction stage :p(st|I1:t−1) =

∫p(st|st−1)p(st−1|I1:t−1)dst−1

(22a)

update stage :p(st|I1:t) =1

cp(It|st)p(st|I1:t−1) (22b)

where c =∫p(It|st)p(st|I1:t−1)dst. The complexity of

measurement model (6) may lose analytical tractability of(22b), so we approximate the measurement model as follows.Since p(It|st) is usually multimodal, so to have a goodapproximation all modes must be captured. Therefor, in

section 3 instead of starting the search in the previouslocation, y0, search is started with L different start points,and the starting points are selected as [19], where the startingpoint are selected from a linear combination of prior densityp(st|I1:t−1) and a wide distribution (for example, a uniformdistribution over the whole image). In the next step afterfinding all modes, a mixture Kalman filter [4] is used toobtain the current position of the object. Since there ismore than one mode, using simple Kalman filter might losetracking.

5. Experimental resultsAs mentioned before, for detecting CP wee need to

have trajectories of the infant’s hands, feet and head.For a faster implementation instead of tracking themin a single state vector, each target is tracked sepa-rately. For each of them we consider a state vectorof size 8, [locationx, locationy, λx, λy, v11, v21, v12, v22]where [v11, v21]T and [v12, v22]T represent orientation of theellipsoidal region. Also, the effectiveness of the proposedkernel-based object tracker is compared with [19]. For bothmethods an 8 × 8 × 8-binned color histogram is used.Although automatic object detection methods can be usedfor detecting our object of interest, here they are selectedmanually in the first frame, and their shape is approximatedby ellipsoids. Fig. 1 shows the selected regions for thispurpose.

Fig. 2 and Fig. 3 show the tracking results for the methodof [19] and the proposed method, respectively. It can be seenthat [19] does not have a good performance and it has losttracking for left foot and right hand in frame 30, and the

Page 6: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

head in frame 70. As we can see from Fig. 3, the proposedmethod does not lose track of the objects, except for thehead in the last frame shown, and it shows a better resultsrespect to the method of [19].

It is worth to mention that object of interest in this paperare challenging and difficult targets to track, hands and feetmove very fast and changes in consecutive frames are notnegligible. In order to have a better performance, videos withhigher frame rate should be used.

6. ConclusionsThe goal of this paper was to prepare information needed

for detecting cerebral palsy using video recordings of younginfants. Previous works on CP shows that it is possible todetect CP using motion data of the infant. Prior works onCP detection used different instrumentation such as elec-tromagnetic sensors, markers or infrared cameras to extractinformation on the infant’s motion. These instruments affectthe motion patterns and may result a false result. So, in thispaper a new approach was proposed, which is video-basedand does not require any explicit instrumentation of theinfant. We used a combined Bayesian filtering and kernel-based tracking method to increase performance of tracker. Inthe video recordings of the infant object of interest (hands,feet and head) are highly correlated to the background in thesense of the color, and this fact decrease the effectiveness ofprevious methods. For dealing with this problem two modi-fications were applied to the previous methods. First, usingpixel motion information a new feature space is obtained thatuses color information as well. Second, a way of updatingthe kernel profile is proposed that uses information of theprevious frame to have a better representation of the target,and also it is adaptive to changes in the image frame and canbe updated automatically. The proposed method is comparedto a well known method for video-based object tracking andexperiment results shows the improved performance of theproposed method.

AppendixIn this section we prove that the representation in equation

(20) is normalized, so∑m

u=1 qtu = 1;∀t.

m∑u=1

qtu =

m∑u=1

{(1− α)pt−1u + αqt−1u

}=

m∑u=1

{(1− α)pt−1u

}+

m∑u=1

{αqt−1u

}since pt−1u are normalized histogram bins, equation (4), so

m∑u=1

{(1− α)pt−1u

}= (1− α)

m∑u=1

{pt−1u

}︸ ︷︷ ︸

=1

= 1− α

on the other hand qt−1u is a recursive function of qt−2u andpt−2u , and the same for qt−2u , also q0u = qu, so

∑mu=1 q

0u = 1,

then intuitively we can conclude that

m∑u=1

{αqt−1u

}= α qt−1u︸︷︷︸

=1

= α

and finally,

m∑u=1

qtu = (1− α) + α = 1

References[1] L. Adde, J.L. Helbostad, A.R. Jensenius, G. Taraldsen, K.H.

Grunewaldt, and R. Støen. Early prediction of cerebral palsy bycomputer-based video analysis of general movements: a feasibilitystudy. Developmental Medicine & Child Neurology, 52(8):773–778,2010.

[2] L. Adde, J.L. Helbostad, A.R. Jensenius, G. Taraldsen, and R. Stoen.Using computer-based video analysis in the study of fidgety move-ments. Early human development, 85(9):541–547, 2009.

[3] A. Bos F. Ferrari C. Einspieler, H. F. R. Precthl and G. Cioni.Prechtl’s Method on the Qualitative assessment of General Movementsin Preterm, Term and Young Infants. London: Mac Keith Press, 2004.

[4] R. Chen and J.S. Liu. Mixture kalman filters. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 62(3):493–508,2000.

[5] D. Comaniciu and P. Meer. Mean shift: A robust approach towardfeature space analysis. Pattern Analysis and Machine Intelligence,IEEE Transactions on, 24(5):603–619, 2002.

[6] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based object tracking.Pattern Analysis and Machine Intelligence, IEEE Transactions on,25(5):564–577, 2003.

[7] M.S. Conover. Using accelerometers to quantify infant generalmovements as a tool for assessing motility to assist in making adiagnosis of cerebral palsy. PhD thesis, Virginia Polytechnic Instituteand State University, 2003.

[8] D.S. Lee. Effective gaussian mixture learning for video backgroundsubtraction. Pattern Analysis and Machine Intelligence, IEEE Trans-actions on, 27(5):827–832, 2005.

[9] C.E. Lu, N. Adluru, H. Ling, G. Zhu, and L.J. Latecki. Contourbased object detection using part bundles. Computer Vision and ImageUnderstanding, 114(7):827–834, 2010.

[10] L. Meinecke, N. Breitbach-Faller, C. Bartz, R. Damen, G. Rau, andC. Disselhorst-Klug. Movement analysis in the early detection ofnewborns at risk for developing spasticity due to infantile cerebralpalsy. Human movement science, 25(2):125–144, 2006.

[11] K. Ni, X. Bresson, T. Chan, and S. Esedoglu. Local histogram basedsegmentation using the wasserstein distance. International journal ofcomputer vision, 84(1):97–111, 2009.

[12] J. Ning, L. Zhang, D. Zhang, and C. Wu. Robust mean shift trackingwith corrected background-weighted histogram. lET Computer Vision,2010.

[13] H.F.R. Prechtl. State of the art of a new functional assessment ofthe young nervous system. an early predictor of cerebral palsy. Earlyhuman development, 50(1):1–11, 1997.

[14] H. F. R. Pretchl. General movement assessment as a method ofdevelopmental neurology: new paradigms and their consequences.Developmental Medicine Child Neurology, 43(12):836–842, 2001.

[15] S.S. Robertson, L.F. Bacher, and N.L. Huntington. Structure andirregularity in the spontaneous behavior of young infants. Behavioralneuroscience, 115(4):758, 2001.

[16] J.C. van der Heide, P.B. Paolicelli, A. Boldrini, and G. Cioni.Kinematic and qualitative analysis of lower-extremity movements inpreterm infants with brain lesions. Physical therapy, 79(6):546–557,1999.

Page 7: Kernel-Based Object Tracking for Cerebral Palsy Detectionworldcomp-proceedings.com/proc/p2012/IPC2953.pdf · 2014. 1. 19. · Kernel-Based Object Tracking for Cerebral Palsy Detection

[17] A. Yilmaz, X. Li, and M. Shah. Contour-based object trackingwith occlusion handling in video acquired using mobile cameras.Pattern Analysis and Machine Intelligence, IEEE Transactions on,26(11):1531–1536, 2004.

[18] B. Zhang, S. Shan, X. Chen, and W. Gao. Histogram of gaborphase patterns (hgpp): a novel object representation approach for facerecognition. Image Processing, IEEE Transactions on, 16(1):57–68,2007.

[19] Z. Zivkovic, A.T. Cemgil, and B. Kröse. Approximate bayesianmethods for kernel-based object tracking. Computer Vision and ImageUnderstanding, 113(6):743–749, 2009.

[20] Z. Zivkovic and B. Krose. An em-like algorithm for color-histogram-based object tracking. In Computer Vision and Pattern Recognition,2004. CVPR 2004. Proceedings of the 2004 IEEE Computer SocietyConference on, volume 1, pages I–798. IEEE, 2004.