Lucas Kanade Optical Flow Computation from Superpixel ...textroad.com/pdf/JAEBS/J. Appl. Environ....

J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017

© 2017, TextRoad Publication

ISSN: 2090-4274 Journal of Applied Environmental

and Biological Sciences

www.textroad.com

Corresponding Author: Halina Hassan, Intelligent Biometric Group, School of Electrical and Electronics Engineering,

Universiti Sains Malaysia, Engineering Campus, 14300 Nibong Tebal, Pulau Pinang, Malaysia,

E-mail:[email protected]

Lucas Kanade Optical Flow Computation from Superpixel based Intensity

Region for Facial Expression Feature Extraction

Halina Hassan1,2, Abduljalil Radman1, Shahrel Azmin Suandi1, Sazali Yaacob2

1Intelligent Biometric Group, School of Electrical and Electronics Engineering, Universiti Sains Malaysia,

Engineering Campus, 14300 Nibong Tebal, Pulau Pinang, Malaysia 2Electrical, Electronics and Automation Section, Universiti Kuala Lumpur Malaysian Spanish Institute, 09000

Kulim Hi-Tech Park, Kedah, Malaysia

Received: February 21, 2017

Accepted: May 14, 2017

ABSTRACT

Analyzing facial expression video sequences is a challenging area in computer vision field. This is due to the

fact that each of the expression has either one or more motions or positions of muscles beneath the skin of the

face. It is also affected by occlusions and lighting conditions. Existing method in facial expression analysis

either divide the face image into rectangular grid or as holistic. Both methods are incapable of representing

the facial features, whereby the inheriting structure of segmentation of the face image is ignored. This paper

introduces superpixel as a better pre-processing technique that will affect the performance of facial

expression feature extraction. It estimates the face structure by segmentation based on the number of pre-set

intensity based cluster. First, the face is delineated using Viola-Jones algorithm, followed by the pre-

processing techniques. Then, Lucas Kanade optical flow algorithm is applied to extract the spatio-temporal

data from the video sequences. Once the detected face is segmented, optical flow will be computed within

this region to track the motion between the video frames. Results reveal that more information can be

extracted using the superpixel as compared to regular grid. The performance of the proposed method result

has been validated qualitatively using extended Cohn-Kanade (CK+) facial expression database.

KEYWORDS: Superpixel, Feature Extraction, Optical Flow, Regular Grid, Facial Expression.

INTRODUCTION

Human computer interaction or commonly recognized as HCI is becoming more important nowadays [1].

Over the past two decades, automated and real time facial expression recognition (FER) has impacted many

areas such as driver safety, image retrieval, video conferencing, cognitive science, human emotion analysis and

virtual reality image retrieval [5].

The focus is on the development of a natural human interaction with the computers based on normal human to

human behaviour interaction. As a result, the facial expressions research has received much attention as it is well

known as the most expressive indicators that reflect the human emotions [1, 5]. Facial expression can be easily

recognized by human cognitive perception. However, in the field of computer vision and pattern recognition, it is

very complex to develop an automatic facial expression recognition system. This happens due to the complexity in

image acquisition, un-controlled environment, occlusions, intensity variation, pose angle etc.

Computer vision application has recently come to increasingly benefited from representing an image as a

collection of superpixels [13, 19] such as object localization [4], 3D reconstruction [14], segmentation [16-17]

and pose estimation [18]. Existing method simply divides the face image into regular patches ignoring the

structure of face components. In-fact, such regular grid based methods suffer from the same deficiency that the

inherent structure of the face image is ignored. Furthermore, it does not align with intensity edges of the face

image, leading to lack of facial feature representation as depicted in Figure 1. Therefore, instead of the rigid

rectangular patches, superpixels is introduced for improved alignment with intensity edges of the face

components. A superpixel often contains pixels with similar texture and colour which will effectively represent

the facial features. Hence, the face image can be segmented using superpixels equivalently to the inherent face

structure [11].

The superpixel has been proposed as the pre-processing method due to its following capabilities [19]:

1. Adhering well to image boundaries.

2. Reducing computational complexity during pre-processing.

There are many approaches of the superpixel algorithms which include the graph based and gradient ascent

based which has its own advantages and disadvantages [19]. In this paper, we use the SLIC algorithm

1

Hassan et al.,2017

abbreviated from simple linear iterative clustering, due to its greater performance as compared to other

boundary recall methods [1], complexness flexibility and publicly available [19].

Figure 1: Image segmentation on one of the CK+ color face image using SLIC super pixels of the different

size- top: 1000, middle: 500 and bottom: 100 (approximately)

Figure 1 shows that the more superpixel is assigned to the image, the more accurate it represents the facial

feature boundaries. Magnifying at the mouth region shows the clear difference in boundary using regular grid

and superpixel as shown in Figure 2. In regular grid, the upper lip corner does not lies inline within grid shown

in cyan. The horizontal red line shows the offset between the upper lip edges in between the assigned grid.

Meanwhile, the edge of lip is accurately represented by the assigned superpixel region.

Figure 2: Comparison of regular grid (left) and superpixel boundries (right) at mouth region

Figure 3 shows how the image had been segmented based on the RGB mean intensity with the pre-set

number of region in the program.

There are three major contributions presented in this paper:

1. Applying superpixel in facial features extraction.

2. Compare and analyze the existing pre-processing method (regular grid) with the superpixel technique.

3. Integrate optical flow algorithm using superpixel intensity based segmented regions.

Figure 3: Depiction of an image with superpixels. Left: Image with the pre-set number of boundaries.

Right: Mean RGB values for each superpixel region

RELATED WORK

Pre-processing method is an important step which will affect the performance in facial expression

recognition [32]. It involves the process of preparing the 2-dimensional spatial data before the feature extraction

and classification process.

2


Performance can be improved by using different pre-processing method. Recent research published by [5]

compares the pre-processing on facial images using the holistic representation (regular grid) and the local

represantation (region block). In [7] had used the salient facial patches as pre-processing method and extracted

the facial features using local binary pattern (LBP). Superpixels have been exploited to aid segmentation in

several complicated guises mostly to initialize segmentation [20, 22, 24, 26, 27]. In [12] proposed superpixel

mid-level cues method incorporated to image classification pipeline for superior scene description. Apart from

that, research had been done on hand crafted mid-level features targeted to encapsulatate information on groups

of pixel such as patches [29] or segments [30] and superpixels [31]. In [11] had introduced novel superpixel

based face sketch-photo synthesis method by estimating the face structures through image segmentation.

Motivated by such work, we execute experiments to prove the significant of superpixel in facial feature

extraction pipeline. It is then being integrated with Lucas Kanade feature extraction algorithm for temporal data

extraction. The Lucas Kanade method is employed due to the known fact that it can reduce the computational

complexity by using the minimum square error (MSE) criterion [28].

METHODOLOGY

In relation to affective computing, most of facial expression research apply the ground truth of the six facial

expression defined by [5]. The expressions are fear, happiness, disgust, sadness, anger and surprise. The easiest

to be recognized are the expression of sadness and happiness [3]. In this paper, we applied the six emotions in

the pre-processing stage and reveal the points that can be used for feature extraction. Figure 4 depicts the

general process flow of the experiment.

Analyzing facial expression involves deformation of facial muscles that will change the facial landmark

[15, 32]. Hence, superpixel is introduced to track the deformations and texture changes during the formation of

the facial expression. Contribution of superpixel method toward automatic facial expression recognition has to

be investigated.

The database that has been used is the extended Cohn-Kanade (CK+) database, which had been upgraded

with the spontaneous facial expression [9]. It consists of 593 image sequences performed by 120 subjects. The

age of participant ranges from 18 to 30 years. The video sequences range from 10 to 60 frames. The database is

available online [8].

The hardware used to execute this experiment are Intel core(TM) i5-5200 CPU 2.2GHz laptop, 4 GB RAM

and 64-bit operating system.

Figure 4: Overview of experiment process flow

Superpixel

The definition of superpixel is grouping pixels and divides them according to the pre-set number of regions.

This method reduced the processing complexity of the subsequent process [1]. The existing superpixel methods

include graph-based algorithm [21, 23] and gradient ascent algorithm. In graph-based algorithm, generating

superpixel is performed by treating each pixel as a node in a graph. Edge weights between two nodes are

proportional to the similarity between neighboring pixels [19]. Meanwhile, in gradient ascent algorithm, clusters

are iteratively refined until convergence to form superpixel [19]. Unfortunately, most state-of-the-art superpixel

method suffers from low quality segmentation, inconsistent size and shape, contain multiple hard-to-tune-

parameter and apparently high computational cost [1]. The comparison of several superpixel algorithms in term

of the boundaries can be accessed in [25].

SLIC is currently identified as one of the best methods to compute regular superpixel in term of

computation time and quality [34]. It generates equal-sized superpixels by generating cluster pixel based on

3

Hassan et al.,2017

similarity and proximity in the image plane [1] and adapts k-means search algorithm in localized pixels. The

clustering is done in 5-dimensional CIELAB colour space [labxy], where [lab] is the colour vector and [xy] is

the pixel coordinate.

The algorithm is as shown in Algorithm 1 below:

Algorithm 1: SLIC superpixel segmentation

Feature Extraction

The feature extraction phase represents a key component of any pattern recognition system [5]. It is the

stage whereby the information from the facial activity during forming the expression is extracted. Optical flow

method [6] is an important method of motion image analysis used to estimate the motion of brightness pattern

between the two frames. In this paper, Lucas-Kanade optical flow is employed because it is highly accurate for

motion tracking and robust to noise [6, 10]. It is also capable of tracking small motion [10]. Therefore, it can be

used to extract temporal information in the video facial expression analysis by estimating the optical flow

motion field.

Aperture problem is an inherent ambiguity in the process of estimating motion based on edges of objects

within the frame. Normal velocity is a local phenomenon and occurs when there is insufficient local intensity

structure to allow a full image velocity to be recovered. In this case, only the component of velocity normal to

the local intensity structure (for example, an edge) can be recovered [2].

For an image with N pixel, the approximate equally-sized superpixel is given by S � �N K⁄ where K is the

pre-set number of superpixel. Superpixel can be controlled by two parameters, K and m.C is the superpixel

cluster center where k � �1, K� at regular grid intervals S. Spatial pixel distance are defined by distance measure

given by the sum of lab distance, D� and xy plane distance normalized by S. The equation is as shown in

Equation (1).

D� � d�� d��(1)

where

d�� l l!"# � �a a!"# � �b b!"#

d�� x x!"# � �y y!"#

m is the variable to control the superpixel compactness. It controls the shape of superpixels. Greater m results

on more compact superpixel with greater spatial proximity. In our experiment, we used the compactness value

of 10.Kis set to 400 clusters to produce comparable number of the regular grid method. In this case, the optical

flow is being computed from the cluster center of the superpixel cluster.

4


Figure 5: Aperture problem and brightness constraint line. Adapt from [33] with some modification

One point on the brightness constraint line is the correct velocity. The smallest velocity magnitude on that

line isV%&&&&' , whereby:

V%&&&&' �()*|,)|

(2)

The weighted least-squares (LS) estimator minimizes the squared errors [19] and is expressed as:

∑ W#�x, y"/,I�x, y, t" ∙ V&&' � I3�x, y, t"4#

�,�∈ (3)

where k is the spatial neighborhood of the central pixel. W�x, y" is the weighted window function that

determines the support of the estimator [2]. It assigns larger weight to those pixels that are closer to the central

pixel as these pixels give more important value than others [8].

In the regular grid segmentation, 20×20 grids had been assigned. The computational efficiency depends on

the grid size. Total of facial points from the grid is 400. The optical flow to be tracked is displacement from its

vertices [14].

Facial Feature Detection

The images that have been used from the database are the frontal view of the first frame and the last frame

of each expression from the same subject. The first and the last frame are selected to enable significant

displacement between the two frames. Using two consecutive frames will result insignificant vector

displacement [5].

At this stage, we used the Viola Jones face detector to detect the face. We had done a simple experiment to

verify the capability of this detector. Figure 6 demonstrates that this algorithm is capable of detecting the face

with occlusion such as face with eyeglasses and slanted frontal view. This suggests that the Viola Jones detector

is robust to different size of the image, orientation and occlusion. This property is crucial in automatic face

recognition application.

Figure 6: Face detection on CK+ image (left) and random image with occlusion and slight orientation

(right) using Viola Jones face detector

RESULTS AND DISCUSSION

Figure 7 shows the comparison results after the optical flow had been computed using the two pre-

processing methods, regular grid on the left and superpixel on the right. More optical flow was observed in

result images with superpixel segmentation. This indicates that the superpixel is capable of improving the facial

5

Hassan et al.,2017

features extraction, whereby more information can be captured as compared to the conventional regular patches

segmentation.

REGULAR GRID SUPERPIXEL

NEUTRAL

DISGUST

FEAR

HAPPY

SAD

6


SURPRISE

Figure 7: Comparison of optical flow computed on each facial expression using regular grid (left) and

superpixel (right)

To further analyze, the resulted images are being magnified in order to increase visibility on the different

optical flow computed using the two segmentation method as shown in Figure 8.

Figure 8: Comparison of equally distributed grid on the eye region of the neutral expression (top) with

non-regular grid based on superpixel (bottom) of the neutral face expression

In Figure 8, it is observed that the superpixel adhere well with the eye shape while in regular grid, the eye

shape was not being accurately annotated. The difference is highlighted by the cyan line.

Figure 9: Comparison on the right eye region for happy expression. Top:

Regular grid, Bottom: Superpixel

7

Hassan et al.,2017

Figure 10: Comparison on the left eye region for happy expression. Top: Regular grid, Bottom:

Superpixel

Figure 9 and Figure 10 show that that the latter images have more contour adhering to the edges of the

eyelid and bottom line of the eye. This suggests that the facial feature extraction will be more accurate as

compared to the regular grid pre-processing because it can extract edges of the eye shape.

Figure 11: Comparison of the optical flow computed on consequtive frames for happy expression using

superpixel segmentation. The sequences start from (a) to (f)

Figure 11 shows the difference in optical flow computed on consecutive frames. The continuity of the

gradient motion tracking is clearly observed from one sequence to the other, especially at the eye and mouth

region.

(a)

(c)

(b)

(d)

(e) (f)

8


CONCLUSION

The investigation does not aim to focus on the accuracy that is competitive in literature. It explores the

potential segmentation method that is expected to improve the recognition rate due to its capability to segment

the image precisely based on intensity boundaries. We obtained comparative result at times for facial expression

feature extraction using the newly introduced pre-processing technique. The experiment has qualitatively proved

that superpixel will give a better result in extracting the features from the facial expression image sequences.

Apart from that, the capability of Viola Jones detector in detecting face with occlusion also will contribute in the

fast processing in the recognition process. The combination results of superpixel and Lucas Kanade optical flow

is suitable for a system that requires less computational complexity with high accuracy. Further work can be

done in the classification stage, whereby a quantitative data will be generated. The optical flow given by V� and

V�, which are the respective velocity in x and y direction can be quantitatively data translated via classification

process. We anticipate there will an improvement in the facial expression accuracy by applying the proposed

pre-processing technique.

ACKNOWLEDGMENT

The authors would like to acknowledge Universiti Sains Malaysia and Malaysian Spanish Institute,

Universiti Kuala Lumpur for the material, resources and expertise support in preparing this research. This

research is fully supported by research university individual grant from Universiti Sains Malaysia, Grant No.

1001/PELECT/814208.

REFERENCES

1. Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, 2012. SLIC Superpixels Compared to

State-of-the-Art Superpixel Methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (11):

2274-2282.

2. Barron, J.L. and N.A. Thacker, 2005. Tutorial: Computing 2D and 3D optical flow. Retrieved from

https://pdfs.semanticscholar.org/7123/85a47d0d01400c8d86ab2bd8b3380d126760.pdf.

3. Fan, X. and T. Tjahjadi, 2015. A Spatial-Temporal Framework Based on Histogram of Gradients and Optical

Flow for Facial Expression Recognition in Video Sequences. Pattern Recognition, 48 (11): 3407-3416.

4. Fulkerson, B., A. Vedaldi and S. Soatto, 2009. Class Segmentation and Object Localization with Superpixel

Neighborhoods. In the Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, pp:

670-677.

5. Ghimire, D., S. Jeong, J. Lee and S.H. Park, 2016. Facial Expression Recognition based on Local Region

Specific Features and Support Vector Machines. Multimedia Tools and Applications,76 (6): 7803-7821.

6. Guojiang, W. and F. Kechang, 2010. Facial Expression Recognition Based on Extended Optical Flow

Constraint. In the Proceedings of the 2010 IEEE International Conference on Intelligent Computation

Technology and Automation, pp: 297-300.

7. Happy, S.L. and A. Routray, 2014. Automatic Facial Expression Recognition Using Features of Salient Facial

Patches. IEEE Transactions on Affective Computing, 6 (1): 1-12.

8. Hassan, H., S. Yaacob, A. Radman and S.A. Suandi, 2016. Eye State Detection for Driver Inattention based on

Lucas Kanade Optical Flow Algorithm.In the Proceedings of the 2016 IEEE International Conference on

Intelligent and Advanced Systems, pp: 1-6.

9. Lucey, P., J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews and F. Ave, 2010. The Extended Cohn-

Kanade Dataset (CK+): A Complete Dataset for Action Unit and Emotion-Specified Expression. In the

Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Workshops, pp: 94-101.

10. Mahmoudi, S.A., M. Kierzynka, P. Manneback and K. Kurowski, 2014. Real-Time Motion Tracking Using

Optical Flow on Multiple GPUs. Bulletin of the Polish Academy Sciences, Technical Sciences, 62(1): 139-150

11. Peng, C., X. Gao, S. Member, N. Wang and J. Li, 2015. Superpixel-based Face Sketch-Photo Synthesis. IEEE

Transactions on Circuits and Systems for Video Technology, 27(2): 288-289.

12. Tasli, H.E., R. Sicre and T. Gevers, 2015. SuperPixel Based Mid-Level Image Description for Image

Recognition. Journal of Vision, Communication and Image Representation,33: 301-308.

13. Ren, X. and J. Malik, 2013. Learning a Classification Model for Segmentation. In the Proceedings of the 2013

International Conference on Computer Vision, pp: 10-17.

14. Saxena, A., M. Sun and A.Y. Ng, 2007. Learning 3-D Scene Structure from a Single Still Image.In the

Proceedings of the 2007 International Conference on Computer Vision, pp: 1-8.

15. Sormaz, M., A.W. Young and T.J. Andrews, 2016. Contributions of Feature Shapes and Surface Cues to the

Recognition of Facial Expressions. Vision Research, 127: 1-10.

9

Hassan et al.,2017

16. Wang, X. and X.P. Zhang, 2009. A New Localized Superpixel MARKOV Random Field for Image

Segmentation. In the Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, pp:

642-645.

17. Yang, Y., S. Hallman, D. Ramanan and C. Fowlkes, 2010. Layered object detection for multi-class

segmentation. In the Proceedings of the 2010 IEEE Computer Society, Conference on Computer Vision and

Pattern Recognition, pp: 3113-3120.

18. Mori, G. and C. Va, 2005. Guiding Model Search Using Segmentation.In the Proceedings of the 2005 10th

IEEE International Conference on Computer Vision,pp: 1417-1423.

19. Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, 2012. SLIC Superpixels Compared to

State-of-the-Art Superpixel Methods.IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (11):

2274-2281.

20. Arbeláez, P., M. Maire, C. Fowlkes and J. Malik, 2009. From Contours to Regions: An Empirical Evaluation.

In the Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, pp: 2294-2301.

21. Barbu, A., and S.C. Zhu, 2003. Graph Partition by Swendsen-Wang Cuts.In the Proceedings of the 2003 9th

IEEE International Conference onComputer Vision, pp: 1-8.

22. Ding, L. and A. Yilmaz, 2010. Interactive Image Segmentation Using Probabilistic Hypergraphs. Pattern

Recognition, 43(5): 1863-1873.

23. Felzenszwalb, P.F. and D.P. Huttenlocher, 2004. Efficient Graph-Based Image Segmentation. International

Journal of Computer Vision, 59(2): 167-181.

24. Li, Z., X. M. Wu and S.F. Chang, 2012. Segmentation Using Superpixels: A Bipartite Graph Partitioning

Approach.In the Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp:

789-796.

25. Stuz, D., 2014. Superpixel algorithms: Overview and comparison. Retrieved from

http://davidstutz.de/superpixel-algorithms-overview-comparison/.

26. Wang, J., Y. Jia, X.S. Hua, C. Zhang and L. Quan, 2008. Normalized Tree Partitioning for Image

Segmentation.In the Proceedings of the 2008 26th IEEE Conference on Computer Vision and Pattern

Recognition, pp: 1-8.

27. Mobahi, H., S.R. Rao, A.Y. Yang, S.S. Sastry and Y. Ma, 2011. Segmentation of Natural Images by Texture

and Boundary Compression. International Journal of Computer Vision, 95(1): 86-98.

28. Yang, A.Y., J. Wright, Y. Ma and S.S. Sastry, 2008. Unsupervised Segmentation of Natural Images Via Lossy

Data Compression.Computer Vision and Image Understanding, 110(2): 212-225.

29. Singh, S., A. Gupta and A.A. Efros, 2012. Unsupervised discovery of mid-level discriminative patches. In:

Computer Vision-ECCV 2012. Lecture Notes in Computer Science (eds A. Fitzgibbon, S. Lazebnik, P. Perona,

Y. Sato and C. Schmid)pp. 7573: 73-86.Springer, Berlin.

30. Tighe, J. and S. Lazebnik, 2010. SuperParsing: Scalable nonparametric image parsing with superpixels. In:

Computer Vision-ECCV 2010. ECCV 2010. Lecture Notes in Computer Science(eds K. Daniilidis, P. Maragos

and N. Paragios) pp. 352-365.Springer, Berlin.

31. Fernando, B., E. Fromont and T. Tuytelaars, 2012. Effective Use of Frequent Itemset Mining for Image

Classification.In: Vision-ECCV 2012. ECCV 2012. Lecture Notes in Computer Science (eds A. Fitzgibbon, S.

Lazebnik, P. Perona, Y. Sato and C. Schmid) pp. 214-227.Springer, Berlin.

32. Tian, Y.L., T. Kanade and J.F. Cohn, 2005. Facial expression analysis. In:Handbook of Face Recognition (eds )

pp. 247-275. Springer, New York.

33. O. Marques, 2011. Practical image and video processing using MATLAB.John Wiley and Sons.

34. Machairas, V., E. Decenciere and T. Walter, 2014. Waterpixels: Superpixels Based on the Watershed

Transformation.In the Proceedings of the 2014 IEEE International Conference in Image Processing, pp: 4343-

4347.

10

Lucas Kanade Optical Flow Computation from Superpixel ...textroad.com/pdf/JAEBS/J. Appl. Environ....

Documents

Transcript of Lucas Kanade Optical Flow Computation from Superpixel ...textroad.com/pdf/JAEBS/J. Appl. Environ....