Evolution Analysis of Binary Partition Tree for ... · Evolution Analysis of Binary Partition Tree...

Evolution Analysis of Binary Partition Tree forHierarchical Video Simplified Segmentation

Arief SetyantoSchool of Computer Science and

Electronics EngineeringUniversity of Essex

Colchester, United Kingdom CO43SQEmail: [email protected]

John C WoodSchool of Computer Science and


Colchester, United Kingdom CO43SQ

Mohammed GhanbarySchool of Computer Science and


Colchester, United Kingdom CO43SQ

Abstract—This paper proposes volumetric hierarchical videosegmentation and evolution analysis. The contribution of thispaper is twofold: Firstly, a single Binary Partition Tree (BPT)is proposed to represent the entire video segmentation. Everynode represents not only a single region but also a series ofcorrelated regions in subsequent frames. Secondly, we proposea method to identify a stop merging criteria by exploiting thediscontinuity of volume evolution. The pre-segmentation of thevideo is produced by 26 neighbourhood watershed. A volumeadjacency graph (VAG) is constructed and merging cost betweenall neighbouring volume as edges. Iterative merging amongneighbouring volumes is performed sequentially from the lowestvolume distance. Volume model combines colour and temporaldirection. Every iteration produces new larger volume; a newnode is issued the VAG is updated. The edge in the VAG withthe lowest merging cost a selected in every iteration. The historyof merging is recorded in the BPT. In order to identify salientnodes, volume evolution analysis is proposed; progressing fromthe initial partition at the lowest leaf of the BPT towards theroot. Discontinuity in the evolution of colour mean and temporaldirection indicates a reluctance to merge between a pair ofvolumes. Therefore, they may belong to different objects. Theevolution analysis identifies all salient nodes in the temporaldomain of the BPT. In order to simplify the BPT, nodes belowa salient node are pruned. The processing gives salient nodeswith temporal registration over a video sequence for higher-levelcognition.

Index Terms—Video Hierarchical Segmentation; Tree Simpli-fication; Salient nodes identification

I. INTRODUCTION

The problem of partitioning video into a set of coherent andcorrelated regions over time is a fundamental task for recog-nition, summarization, information retrieval, coding and cog-nition. The requirements for efficiently accessing semanticallymeaningful content are critical steps towards more complexvideo analysis [1]. However, obtaining semantic segmentationis difficult, while salient segmentation is more tractable. In-stead of providing a single final result, this research presents ahierarchical multi-scale segmentation within a binary partitiontree (BPT) and leaves the decision of final detail as an optionfor the end user or video analysis application.

Hierarchical segmentation begins with over segmentedvideo, which consists of many (> 1000) tiny and most likelymeaningless regions. The first-stage of the process uses a pre-

segmentation algorithm; in this paper the watershed algorithmis employed. The volume neighbourhood and pairwise distanceis represented by a volume adjacency graph (VAG) which isan extension of the 2D counterpart [2]. The weights betweenpairwise neighbouring volumes represents their dissimilarity.In order to obtain a larger volume partition, the small partitionsare merged until the root node is reached.

Some research aims to obtain a single final solution usinggraph techniques such as Normalized cut [3],[4] and recur-sive shortest spanning tree (RSST)[5]. An extension of anefficient graph based segmentation algorithm [6], in videohas been proposed by [7],[8]. Rather than produce a single,final segmentation result, this research provides a multilevelhierarchical segmentation of the content leaving the level ofdetail an option for the end user.

There are some works in the literature that proposes spatiotemporal video segmentation. One method is frame based pro-cessing, followed by region matching [9],[10],[11]. Anotherapproach is to segment the first frame, and the project itonto subsequent frames. The projection is helped by motioninformation and is proposed by [12], [13]. The third approachis volume segmentation. Video data is represented by a 3Dmatrix and is shown in [14],[15], [16],[17],[18].

A (BPT) is proposed as an efficient region based represen-tation of video and images [18]. Under a BPT representation,hierarchical multiple layers of segmentation can be presented.In hierarchical segmentation, merging is stopped when allpartitions are merged into a single region representing theentire image sequence. Previous work in BPT based videosegmentation [12], uses a projection approach. While this workutilize a volumetric approach, in order to avoid the additionaltask of projecting the region across frames.

In our previous work, [11] the BPT is constructed indi-vidually for every frame. Salient analysis is conducted forevery individual frame based on reluctant merging duringregion evolution. Rank-ordered salient regions for each frameare collected followed by a matching between regions acrossframes. Matching between regions across frames is compu-tationally intense. In this paper, we avoid matching task byrepresenting video as a 3D volume in horizontal (x), vertical(y) and temporal (t) axis.

2014 6th Computer Science and Electronic Engineering Conference (CEEC) University of Essex, UK

978-1-4799-6692-9/14/$31.00 ©2014 IEEE 52

(a) Volumetric Segmentation (b) Corresponding VAG

Fig. 1: Illustration of Volumetric Segmentation and corre-sponding Volume Adjacency graph

Pre-segmentation task is conducted to the 3D volume data,and resulting a small volume in spatio temporal axis thus asingle BPT is constructed for a frames sequence. A node nolonger represent a regions but a set of correlated regions insubsequent frames. Instead of rely on colour criteria, mergingorder is also consider volumetric direction. The history ofmerging is recorded in BPT as adapted from [18].

Due to a large number of volumes at the beginning of a pre-segmentation, the tree is most likely very complex. It consistsof thousands of partitions in the base nodes (lowest leaf).The tree gets simpler as it gets closer to the root. The nodesat the higher levels (close to the root) are under segmented,while the lower levels (near the leaves) are over-segmented.It is proposed that there is a meaningful solution in betweenroot and leaf and our evolutionary analysis can identify thosenodes.

Evolution analysis has been proposed for a single image by[19]. An extension to volumetric data is proposed in this paper.An extension to volumetric data is proposed in this paper. Bothcolour and motion are considered in this analysis. The rest ofthis paper consists of: the construction of the BPT in section 2;the proposed evolution analysis is discussed in section 3 andsection 4 presents experimental results, and finally discussionabout our findings are presented in section 5.

II. BINARY PARTITION TREE CONSTRUCTIONA. Pre-Segmentation

In order to provide a set of homogeneous partitions, a pre-segmentation task is conducted. Generally, pre-segmentationproduces an over-segmented partition. Video can be viewedas a matrix of voxels in the volumetric (x,y,t) space. The Wa-tershed algorithm [20] classify video data into homogeneousgroups voxels using a 6, 18 or 26 neighbourhood (allowedin Matlab). In order to reduce the computation, the video isconverted into 8 bit grey level. The result is a unique, labelledvolumetric partition with a watershed boundary between ad-jacent partitions, which have value 0.

Once pre-segmentation has been conducted, the originalvideo is grouped and labelled according to their homogeneity.

(a) Top 6 Levels of BPT (b) Plot of 2nd Level at frame 1

Fig. 2: A Piece 6 Top Level BPT of ’Carphone’ Plot of 2ndLevel of BPT in the frame 1

Each partition consists of a number of voxels, and separated bythe watershed boundary from its neighbours. Mean colour, andcentroid displacement across frames are calculated for eachpartition. The merging cost is calculated according to bothfeatures using the formula below.

δ(Vi, Vj) = N(Vi)(‖M(Vi)−M(Vi ∪ Vj)‖2)+N(Vj)(‖M(Vj)−M(Vi ∪ Vj)‖2) (1)

δ(Vi, Vj) Represents the merging cost if volume i and jare merged, ‖‖2 denotes L-2 norm, N(Vi) denotes numberof voxels in volume I , M(Vi) denotes volume I model. Thevolume can be drawn as a graph (G), each volume becomesa node/vertex (V ) and the pairwise dissimilarity δ(Vi, Vj) isrepresented by the weight of edge (E). The volume adjacencygraph (VAG) is constructed from all edges for every node. Anillustration of a VAG can be seen in figure 1 above.

B. Merging Order and BPT construction

Merging order is dependent on the edge weight in the graph.A lower merging cost will result in a higher priority in themerging order. When the partition Vi and Vj are selected tomerge, the corresponding edge is discarded from the VAG.

The merging operation produces a new composite volumecalled the parent node, both merged nodes become the rightand left child. The new features of the parent node arecalculated from the node features of the children. Mergingcost between all neighbours of both children is recalculatedto the new parent node and the VAG is updated accordingly.According to the new VAG, the smallest merging cost isselected for the next iteration. The iteration will terminatewhen the root of the tree is achieved.

The final node is the root node which represents an aggre-gate of the whole video. The lowest leaves are the same as theoriginal pre-segmentation. The other nodes are a result of theselective merging process. The BPT represents a multilevelview of video segmentation. The higher levels give a smallnumber of large regions (under segmentation) while the lowerlevels give a large number of small and often meaningless oversegmented volumes. Figure 2 shows pieces from the top-levelof the BPT for the Carphone video. The number of originalvolumes before merging is 4894. The total numbers of nodes


978-1-4799-6692-9/14/$31.00 ©2014 IEEE 53

for the first 20 frames of the carphone video are 9785 (3D).An illustration of the BPT and a plot of the correspondingvolumes can be seen in Figure 2.

Figure 2 (b) shows a plot of the volumes at the level 2 andshows some simple semantic content in the right node(yellow),which represents the car window while the left node (red) isnot semantically meaningful as it is an under segmentation.When these trees are manually browsed [21], the human candecide to go up or down a level according to the quality ofthe segmentation. However, when automatic object selection isrequired the absolute level in the BPT is not sufficient. Furtheranalysis is needed to allow the algorithm to identify importantnodes in the tree.

III. EVOLUTION ANALYSIS

Evolution analysis was proposed to identify the salientnodes in the BPT of a still image [21]. This proposal extendsthe idea to a BPT for volumetric video by considering volumetemporal direction factor. The initial segmentation producesvast numbers of tiny volumes. If the number of initial parti-tions is n, then there is n number of possible paths from the leafto the root. Paths can be defined as P = P1, P2, Pn, and eachpath has a collection of nodes from the lowest leaf towards theroot. Every individual path is defined as Pi = nd1, nd2, ..., ndlwhile l is the number of nodes along the path from the lowestleaf to the root, l can vary for each path. Evolution is definedas:

f(k) =M(Vk) k ∈ {1...l} (2)

While M(Vk) is a model of volume Vk which is defined asthe mean colour. (CV )k plus volumetric direction (DVk), αand β is a constant to control the proportion of mean colourand volumetric direction.

M(Vk) = α(CVk) + β(DVk) (3)

The figure 3 is a plot of the evolution function against thenode number (k) for the ’Carphone’ video. It begins at layer1, (the lowest leaf) toward to layer 23. It is observed that forthe first 6 nodes, the colour and centroid movement remainsteady at around 80. This means that for the first 6 layers thepartitions are homogeneous, indicating the same salient object.A discontinuity is observed at k = 6, it is a cue that merginghas occurred between heterogeneous volumes. A prominentdiscontinuity observed at k = 22, it is likely at this point thatthe merge has occurred between dissimilar objects as can beseen in figure 3 (b).

In order to identify the reluctant merges, a mathematical toolhas been proposed in [19], [21] and implemented on singleimage.An adaptation of this formula is employed to identifyreluctant volume merging.

f ′(k) =| e(k)− e(k − 1) | k ∈ {2...l} (4)

f ′′(k) = f(k − 1) + f(k + 1)− 2f(k) k ∈ {2..l − 1} (5)

(a) gray scale mean evolution

(b) first derivative and peaks during evolution

Fig. 3: Evolution Function for the ’Carphone’ video

Fig. 4: Plot of peak, k = 22, node in the BPT and projectionin the frame 1

According to the merging process, the value of the firstderivative is the difference between a child and its parent inthe BPT. A reluctant merge among left and right child (sibling)can be identified by the high value of the first derivative. Figure3 (a) shows the evolution of the volume along a path, straightstable line means a homogeneous merging has taken place. .According to equation 5, the peak of the evolution is observedat the second derivative crossing. Equation 4 calculates themagnitude of change along the path. An example of highestpeak shown by highest peak in figure 3 (b) which is happenedat k = 22 where the face of the people starts merge to thebackground (illustrated in figure 4).

As can be seen in the figure 3(a), there are many identifiedpeaks in one path. The algorithm will only selects the highestpeaks for each path. Due to a number of path observed inthe BPT, a set of redundant peak node candidate is produced.The redundancy occurs because the structure of the tree, there


978-1-4799-6692-9/14/$31.00 ©2014 IEEE 54

must be two or more path share the similar peak node. Onlyunique peak node candidate is selected.

Among all peak node candidates, there may exist a can-didate which have child-parent relationship, either direct orindirect. The algorithm must select only one of them. In thiswork, 3 categories of selection rule is proposed which are first,second and third. Each selection rule will select a candidatebased on relative position to the root. First selection rule selectthe closest candidate to the root (highest rank of k). Secondselection rule select the second highest candidate. If there areexist three or more candidates with child-parent relationship,third selection rule select a candidate in the third highest rankof k (further below the root). If there are only 2 availablecandidates in those relationship, third selection rule will beignored and second selection rule is applied. In figure 6(a),third salient set is selected and corresponding segmentation offrame 1 is displayed in figure 6(b). Second and first salient setis displayed figure 6 (c,d) and 6(e,f) respectively.

Rather than define a final single partition, the algorithm isdesigned to identify a set of important nodes in the tree. Eachset of nodes can be clustered in any level of original BPT.The accumulated node sizes for all nodes in the salient set areequal to the size of the video. None of the node members ofthe set are a parent or child of other nodes within the set.

The proposed algorithm is summarized as follows:1) Create a BPT of a video

a) Pre-segmentation using the watershed algorithmb) Calculate the mean colour and centroid motionc) Form a Volume Adjacency Graphd) Iteratively merge the partition until the root is

obtained, update VAG in every step.2) Evolution analysis to identify simplified BPT

a) Prepare a matrix (P) of peaks to consist of (pathno, node number, peak value)

b) Calculate the peak for each path (n times whilen = number of pre-segmentation result in thetable (column, Numb)), recording all l peaks, nodenumber and peak values.

c) For all identified peaks, select the highest from allpeaks and record in the peak set. Select only uniquenodes in this set.

d) Add every node in the peak set as if they have nosibling.

e) Check if there is a parent child relationship (eitherdirect or indirect), and select one of the node in apath according to peak selection rule. We proposeare 3 option highest level node (the closest to theroot), second and third.

3) Plot the salient video segmentation according to the peaknode

a) For simplicity, the BPT will be pruned at the peaknodes and the simplified BPT presented (basenodes = node in the peak set)

b) Plot the resultant volumes corresponding to thepeak set

(a) Oversegmented volume (b) Corresponding BPT

Fig. 5: first 20 frames of carphone video, oversegmented 3Dwaterhed result and corresponding Binary Partition Tree

IV. EXPERIMENT RESULT

In the experiment, some publicly available video is used.In this paper 4 video clips from www.xiph.org : carphone,foreman, soccer and stefan are used to test the algorithm.The standard watershed algorithm using 26 neighbourhood isselected for the pre-segmentation task. A fast implementationis provided in Matlab. The pre-segmentation algorithm resultsin a huge number of volumes, the result of pre-segmentationcan be seen in Table I below.

Video Size Pre-Segmentation Result Entire BPT EdgesNodes Size Duration Nodes Levelcarphone 144x176x20 4894 48.2 3.3 9785 113 18939foreman 144x176x20 5673 38.9 3.6 11344 92 21484soccer 144x176x20 7352 28.4 3.1 14701 146 27402stefan 120x176x20 9232 16.1 2.7 18449 113 29712

TABLE I: Pre-Segmentation Result and BPT Before Simpli-fication

The over-segmented results in the table I above show a vastnumber of small partitions across the frames with an averagesize of around 16 to 48 s voxels in the 3 frames. As canbe seen in the Edges column, the number of edges is muchgreater than the number of volumes/nodes, as one node canhave many neighbouring nodes. According to the table above,on average, a node has around six neighbours. Figure 5 (a)shows the original over segmented volume and figure 5(b) thecorresponding BPT.

As seen in the Figure 5 (a) almost all the regions in the firstframe are too small to be a semantically meaningful. The BPTbefore simplification is too complex. It consists of thousands ofnodes, and majority of nodes represent meaningless volumes.The initial segmentation, with every volume assigned a uniquelabel and colour is shown in figure 5 (a). In order to simplifythe BPT and reduce the over-segmented partition, evolutionaryanalysis as discussed in section 3 is applied.

A. Salient Volumetric Set

In this work parameters of equation 3 α = 0.9 and β = 0.1.Those parameter determine The proportion of colour and thevolumetric direction respectively. According to the proposedalgorithm, three sets of peaks are obtained. The first salient set


978-1-4799-6692-9/14/$31.00 ©2014 IEEE 55

(a) simplified BPT third peak (b) Segmentation Result

(c) simplified BPT second peak (d) Segmentation Result

(e) simplified BPT first peak (f) Segmentation Result

Fig. 6: Simplified BPT and its corresponding segmentationresult at frame 1 of ’carphone’ video

is obtained by collecting the peak that has the lowest k (closestto the root)of salient node candidates. The second and thirdare obtained by collecting the peaks having the second andthird highest k on salient candidates.

In order to simplify the BPT, all branches under the peaknode will be cut from the tree. Figure 6(a) shows a plot of thefirst frame of the video using third highest peak set. It can beseen that the BPT still looks complicated, and we still see anover-segmented results. Figure 6(b) shows the result from thesecond peak. The BPT looks a little bit simpler, according totable II third and second peak salient set has only 8 nodes and11 level different. Visually the differences between third and

second peak salient set cannot clearly observed. In contrast,figure 6(c) shows the first highest peak set. The BPT becomestoo simple and there is some loss of detail of video, andexcessive merging is evident. In the figure 6(c) the man’s facehas been excessively merged with the background.

Table II shows the simplification result for 4 video tests.Generally it shows, an average node size much larger than theresult of pre - segmentation in table I and also the durationof each volume rises from 3 frames to 8 - 18 frames foreach volume. The duration of a volume shows the temporalconsistency of a partition.

Video peak Simplified BPT Average Per Nodesalient nodes level Size(pixels) Duration

carphonefirst 39 9 12133 14second 280 84 1904 11third 288 93 1851 11

foremanfirst 42 8 13439 19second 624 64 932 10third 632 64 920 10

soccerfirst 39 10 24025 19second 570 143 1276 11third 580 143 1239 10

stefanfirst 39 11 13740 15second 906 86 722 8third 908 86 721 8

TABLE II: Simplification Result of some Test Video

The result of the BPT simplification is compared with theoriginal BPT I for some example test videos in table II. Thenumber of nodes and the levels of the simplified BPT declinerapidly at the first highest peak; the tree becomes simpler, thevolume getting bigger and the volume duration increasing.

V. CONCLUSION

As demonstrated by the experiment results, a set of salientnodes is obtained as a result of our controlled simplification.Three sets of salient partitions are obtained with differentlevels of detail. Higher levels of simplification produce lossof information as a result of excessive merging. Every salientpartition is a volume, and can be projected onto the frameswhere the partition existed. Not only can region information berepresented by the nodes but we also have temporal correlation(in time space) for the important nodes during the video.Instead of using simply colour mean, each node has a temporalvolumetric direction, which also gives the motion of theregions across temporally correlated frames. In future workthe type of motion for a region will be identified: rotational,translation, warping etc.

ACKNOWLEDGMENT

This work has been supported by Directorate General ofHigher Education (DHGE) Scholarship, Indonesian Ministryof Education.


978-1-4799-6692-9/14/$31.00 ©2014 IEEE 56

REFERENCES

[1] K. Ngan and H. Li, “Image/Video Segmentation: Current Status, Trends,and Challenges,” in Video segmentation and its applications, pp. 1–24,New York: Springer, 2011.

[2] a. Tremeau and P. Colantoni, “Regions adjacency graph applied to colorimage segmentation.,” IEEE Transactions on Image Processing, vol. 9,no. 4, pp. 735–744, 2000.

[3] J. Shi and J. Malik, “Motion segmentation and tracking using normalizedcuts,” in Computer Vision, 1998. Sixth International Conference on,1998.

[4] W. Tao, H. Jin, and Y. Zhang, “Color image segmentation based onmean shift and normalized cuts,” Systems, Man, and Cybernetics, PartB: Cybernetics, IEEE Transactions on, vol. 37, pp. 1382–9, Oct. 2007.

[5] E. Tuncel and L. Onural, “Utilization of the Recursive Shortest SpanningTree Algorithm for Video-Object Segmentation by 2-D,” Circuits andSystems for Video Technology, IEEE Transactions on, vol. 10, no. 5,pp. 776–781, 2000.

[6] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-BasedImage Segmentation,” International Journal of Computer Vision, vol. 59,pp. 167–181, Sept. 2004.

[7] M. Grundmann, V. Kwatra, M. Han, and I. Essa, “Efficient hierarchicalgraph-based video segmentation,” in 2010 IEEE Computer SocietyConference on Computer Vision and Pattern Recognition, pp. 2141–2148, Ieee, June 2010.

[8] C. Xu, C. Xiong, and J. Corso, “Streaming hierarchical video segmen-tation,” Computer VisionECCV 2012, 2012.

[9] a. Cavallaro, O. Steiger, and T. Ebrahimi, “Tracking video objects incluttered background,” IEEE Transactions on Circuits and Systems forVideo Technology, vol. 15, pp. 575–584, Apr. 2005.

[10] J. Wang and Y. Yagi, “Consecutive tracking and segmentation usingadaptive mean-shift and graph cut,” in Robotics and Biomimetics. IEEEInternational Conference on, 2007.

[11] A. Setyanto, J. C. Wood, and M. Ghanbari, “Platform for TemporalAnalysis of Binary Partition Tree,” in Signal Processing: Algorithms,Architectures, Arrangements, and Applications (SPA), 2013, (Poznan,Poland), pp. 45 – 50, 2013.

[12] C. Dorea, M. Pardas, and F. Marques, “Trajectory tree as an object-oriented hierarchical representation for video,” IEEE Transactions on

Circuits and Systems for Video Technology, vol. 19, no. 4, pp. 1–14,2009.

[13] V. Badrinarayanan, I. Budvytis, and R. Cipolla, “Semi-supervised videosegmentation using tree structured graphical models.,” IEEE transactionson pattern analysis and machine intelligence, vol. 35, pp. 2751–64, Nov.2013.

[14] M. El Saban and B. Manjunath, “Video region segmentation by spatio-temporal watersheds,” in Proceedings 2003 International Conference onImage Processing, vol. 1, pp. I–349–52, Ieee, 2003.

[15] D. DeMenthon and R. Megret, “Spatio-temporal segmentation of videoby hierarchical mean shift analysis,” in Statistical Methods in VideoProcessing Workshop, p. 20, 2002.

[16] M. Ristivojevic and J. Konrad, “Space-time image sequence analysis:object tunnels and occlusion volumes.,” IEEE transactions on imageprocessing : a publication of the IEEE Signal Processing Society,vol. 15, pp. 364–76, Feb. 2006.

[17] Y.-p. Hung, Y.-p. Tsai, and C.-c. Lai, “A Bayesian approach to videoobject segmentation via merging 3D watershed volumes,” Object recog-nition supported by user interaction for service robots, vol. 1, pp. 496–499, 2002.

[18] P. Salembier and F. Marques, “Region-based representations of imageand video: segmentation tools for multimedia services,” IEEE Trans-actions on Circuits and Systems for Video Technology, vol. 9, no. 8,pp. 1147–1169, 1999.

[19] H. Lu, J. C. Woods, and M. Ghanbari, “Binary Partition Tree for Se-mantic Object Extraction and Image Segmentation,” IEEE Transactionson Circuits and Systems for Video Technology, vol. 17, pp. 378–383,Mar. 2007.

[20] L. Vincent and P. Soille, “Watersheds in digital spaces: an efficientalgorithm based on immersion simulations,” IEEE transactions onpattern analysis and Machine Intelligence, vol. 13, no. 6, pp. 583–598,1991.

[21] H. Lu, J. C. Woods, and M. Ghanbari, “A Platform for Region SpaceAnalysis in Binary Partition Tree,” IADIS International Journal onComputer Science and Information Systems, vol. 2, no. 1, pp. 96–110,

2007.


978-1-4799-6692-9/14/$31.00 ©2014 IEEE 57

Evolution Analysis of Binary Partition Tree for ... · Evolution Analysis of Binary Partition Tree...

Documents

Transcript of Evolution Analysis of Binary Partition Tree for ... · Evolution Analysis of Binary Partition Tree...