Lucas Kanade Optical Flow Computation from Superpixel ...textroad.com/pdf/JAEBS/J. Appl. Environ....
Transcript of Lucas Kanade Optical Flow Computation from Superpixel ...textroad.com/pdf/JAEBS/J. Appl. Environ....
J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017
© 2017, TextRoad Publication
ISSN: 2090-4274 Journal of Applied Environmental
and Biological Sciences
www.textroad.com
Corresponding Author: Halina Hassan, Intelligent Biometric Group, School of Electrical and Electronics Engineering,
Universiti Sains Malaysia, Engineering Campus, 14300 Nibong Tebal, Pulau Pinang, Malaysia,
E-mail:[email protected]
Lucas Kanade Optical Flow Computation from Superpixel based Intensity
Region for Facial Expression Feature Extraction
Halina Hassan1,2, Abduljalil Radman1, Shahrel Azmin Suandi1, Sazali Yaacob2
1Intelligent Biometric Group, School of Electrical and Electronics Engineering, Universiti Sains Malaysia,
Engineering Campus, 14300 Nibong Tebal, Pulau Pinang, Malaysia 2Electrical, Electronics and Automation Section, Universiti Kuala Lumpur Malaysian Spanish Institute, 09000
Kulim Hi-Tech Park, Kedah, Malaysia
Received: February 21, 2017
Accepted: May 14, 2017
ABSTRACT
Analyzing facial expression video sequences is a challenging area in computer vision field. This is due to the
fact that each of the expression has either one or more motions or positions of muscles beneath the skin of the
face. It is also affected by occlusions and lighting conditions. Existing method in facial expression analysis
either divide the face image into rectangular grid or as holistic. Both methods are incapable of representing
the facial features, whereby the inheriting structure of segmentation of the face image is ignored. This paper
introduces superpixel as a better pre-processing technique that will affect the performance of facial
expression feature extraction. It estimates the face structure by segmentation based on the number of pre-set
intensity based cluster. First, the face is delineated using Viola-Jones algorithm, followed by the pre-
processing techniques. Then, Lucas Kanade optical flow algorithm is applied to extract the spatio-temporal
data from the video sequences. Once the detected face is segmented, optical flow will be computed within
this region to track the motion between the video frames. Results reveal that more information can be
extracted using the superpixel as compared to regular grid. The performance of the proposed method result
has been validated qualitatively using extended Cohn-Kanade (CK+) facial expression database.
KEYWORDS: Superpixel, Feature Extraction, Optical Flow, Regular Grid, Facial Expression.
INTRODUCTION
Human computer interaction or commonly recognized as HCI is becoming more important nowadays [1].
Over the past two decades, automated and real time facial expression recognition (FER) has impacted many
areas such as driver safety, image retrieval, video conferencing, cognitive science, human emotion analysis and
virtual reality image retrieval [5].
The focus is on the development of a natural human interaction with the computers based on normal human to
human behaviour interaction. As a result, the facial expressions research has received much attention as it is well
known as the most expressive indicators that reflect the human emotions [1, 5]. Facial expression can be easily
recognized by human cognitive perception. However, in the field of computer vision and pattern recognition, it is
very complex to develop an automatic facial expression recognition system. This happens due to the complexity in
image acquisition, un-controlled environment, occlusions, intensity variation, pose angle etc.
Computer vision application has recently come to increasingly benefited from representing an image as a
collection of superpixels [13, 19] such as object localization [4], 3D reconstruction [14], segmentation [16-17]
and pose estimation [18]. Existing method simply divides the face image into regular patches ignoring the
structure of face components. In-fact, such regular grid based methods suffer from the same deficiency that the
inherent structure of the face image is ignored. Furthermore, it does not align with intensity edges of the face
image, leading to lack of facial feature representation as depicted in Figure 1. Therefore, instead of the rigid
rectangular patches, superpixels is introduced for improved alignment with intensity edges of the face
components. A superpixel often contains pixels with similar texture and colour which will effectively represent
the facial features. Hence, the face image can be segmented using superpixels equivalently to the inherent face
structure [11].
The superpixel has been proposed as the pre-processing method due to its following capabilities [19]:
1. Adhering well to image boundaries.
2. Reducing computational complexity during pre-processing.
There are many approaches of the superpixel algorithms which include the graph based and gradient ascent
based which has its own advantages and disadvantages [19]. In this paper, we use the SLIC algorithm
1
Hassan et al.,2017
abbreviated from simple linear iterative clustering, due to its greater performance as compared to other
boundary recall methods [1], complexness flexibility and publicly available [19].
Figure 1: Image segmentation on one of the CK+ color face image using SLIC super pixels of the different
size- top: 1000, middle: 500 and bottom: 100 (approximately)
Figure 1 shows that the more superpixel is assigned to the image, the more accurate it represents the facial
feature boundaries. Magnifying at the mouth region shows the clear difference in boundary using regular grid
and superpixel as shown in Figure 2. In regular grid, the upper lip corner does not lies inline within grid shown
in cyan. The horizontal red line shows the offset between the upper lip edges in between the assigned grid.
Meanwhile, the edge of lip is accurately represented by the assigned superpixel region.
Figure 2: Comparison of regular grid (left) and superpixel boundries (right) at mouth region
Figure 3 shows how the image had been segmented based on the RGB mean intensity with the pre-set
number of region in the program.
There are three major contributions presented in this paper:
1. Applying superpixel in facial features extraction.
2. Compare and analyze the existing pre-processing method (regular grid) with the superpixel technique.
3. Integrate optical flow algorithm using superpixel intensity based segmented regions.
Figure 3: Depiction of an image with superpixels. Left: Image with the pre-set number of boundaries.
Right: Mean RGB values for each superpixel region
RELATED WORK
Pre-processing method is an important step which will affect the performance in facial expression
recognition [32]. It involves the process of preparing the 2-dimensional spatial data before the feature extraction
and classification process.
2
J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017
Performance can be improved by using different pre-processing method. Recent research published by [5]
compares the pre-processing on facial images using the holistic representation (regular grid) and the local
represantation (region block). In [7] had used the salient facial patches as pre-processing method and extracted
the facial features using local binary pattern (LBP). Superpixels have been exploited to aid segmentation in
several complicated guises mostly to initialize segmentation [20, 22, 24, 26, 27]. In [12] proposed superpixel
mid-level cues method incorporated to image classification pipeline for superior scene description. Apart from
that, research had been done on hand crafted mid-level features targeted to encapsulatate information on groups
of pixel such as patches [29] or segments [30] and superpixels [31]. In [11] had introduced novel superpixel
based face sketch-photo synthesis method by estimating the face structures through image segmentation.
Motivated by such work, we execute experiments to prove the significant of superpixel in facial feature
extraction pipeline. It is then being integrated with Lucas Kanade feature extraction algorithm for temporal data
extraction. The Lucas Kanade method is employed due to the known fact that it can reduce the computational
complexity by using the minimum square error (MSE) criterion [28].
METHODOLOGY
In relation to affective computing, most of facial expression research apply the ground truth of the six facial
expression defined by [5]. The expressions are fear, happiness, disgust, sadness, anger and surprise. The easiest
to be recognized are the expression of sadness and happiness [3]. In this paper, we applied the six emotions in
the pre-processing stage and reveal the points that can be used for feature extraction. Figure 4 depicts the
general process flow of the experiment.
Analyzing facial expression involves deformation of facial muscles that will change the facial landmark
[15, 32]. Hence, superpixel is introduced to track the deformations and texture changes during the formation of
the facial expression. Contribution of superpixel method toward automatic facial expression recognition has to
be investigated.
The database that has been used is the extended Cohn-Kanade (CK+) database, which had been upgraded
with the spontaneous facial expression [9]. It consists of 593 image sequences performed by 120 subjects. The
age of participant ranges from 18 to 30 years. The video sequences range from 10 to 60 frames. The database is
available online [8].
The hardware used to execute this experiment are Intel core(TM) i5-5200 CPU 2.2GHz laptop, 4 GB RAM
and 64-bit operating system.
Figure 4: Overview of experiment process flow
Superpixel
The definition of superpixel is grouping pixels and divides them according to the pre-set number of regions.
This method reduced the processing complexity of the subsequent process [1]. The existing superpixel methods
include graph-based algorithm [21, 23] and gradient ascent algorithm. In graph-based algorithm, generating
superpixel is performed by treating each pixel as a node in a graph. Edge weights between two nodes are
proportional to the similarity between neighboring pixels [19]. Meanwhile, in gradient ascent algorithm, clusters
are iteratively refined until convergence to form superpixel [19]. Unfortunately, most state-of-the-art superpixel
method suffers from low quality segmentation, inconsistent size and shape, contain multiple hard-to-tune-
parameter and apparently high computational cost [1]. The comparison of several superpixel algorithms in term
of the boundaries can be accessed in [25].
SLIC is currently identified as one of the best methods to compute regular superpixel in term of
computation time and quality [34]. It generates equal-sized superpixels by generating cluster pixel based on
3
Hassan et al.,2017
similarity and proximity in the image plane [1] and adapts k-means search algorithm in localized pixels. The
clustering is done in 5-dimensional CIELAB colour space [labxy], where [lab] is the colour vector and [xy] is
the pixel coordinate.
The algorithm is as shown in Algorithm 1 below:
Algorithm 1: SLIC superpixel segmentation
Feature Extraction
The feature extraction phase represents a key component of any pattern recognition system [5]. It is the
stage whereby the information from the facial activity during forming the expression is extracted. Optical flow
method [6] is an important method of motion image analysis used to estimate the motion of brightness pattern
between the two frames. In this paper, Lucas-Kanade optical flow is employed because it is highly accurate for
motion tracking and robust to noise [6, 10]. It is also capable of tracking small motion [10]. Therefore, it can be
used to extract temporal information in the video facial expression analysis by estimating the optical flow
motion field.
Aperture problem is an inherent ambiguity in the process of estimating motion based on edges of objects
within the frame. Normal velocity is a local phenomenon and occurs when there is insufficient local intensity
structure to allow a full image velocity to be recovered. In this case, only the component of velocity normal to
the local intensity structure (for example, an edge) can be recovered [2].
For an image with N pixel, the approximate equally-sized superpixel is given by S � �N K⁄ where K is the
pre-set number of superpixel. Superpixel can be controlled by two parameters, K and m.C is the superpixel
cluster center where k � �1, K� at regular grid intervals S. Spatial pixel distance are defined by distance measure
given by the sum of lab distance, D� and xy plane distance normalized by S. The equation is as shown in
Equation (1).
D� � d��� ���d��(1)
where
d��� � ��l l!"# � �a a!"# � �b b!"#
d�� � ��x x!"# � �y y!"#
m is the variable to control the superpixel compactness. It controls the shape of superpixels. Greater m results
on more compact superpixel with greater spatial proximity. In our experiment, we used the compactness value
of 10.Kis set to 400 clusters to produce comparable number of the regular grid method. In this case, the optical
flow is being computed from the cluster center of the superpixel cluster.
4
J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017
Figure 5: Aperture problem and brightness constraint line. Adapt from [33] with some modification
One point on the brightness constraint line is the correct velocity. The smallest velocity magnitude on that
line isV%&&&&' , whereby:
V%&&&&' �()*|,)|
(2)
The weighted least-squares (LS) estimator minimizes the squared errors [19] and is expressed as:
∑ W#�x, y"/,I�x, y, t" ∙ V&&' � I3�x, y, t"4#
�,�∈ (3)
where k is the spatial neighborhood of the central pixel. W�x, y" is the weighted window function that
determines the support of the estimator [2]. It assigns larger weight to those pixels that are closer to the central
pixel as these pixels give more important value than others [8].
In the regular grid segmentation, 20×20 grids had been assigned. The computational efficiency depends on
the grid size. Total of facial points from the grid is 400. The optical flow to be tracked is displacement from its
vertices [14].
Facial Feature Detection
The images that have been used from the database are the frontal view of the first frame and the last frame
of each expression from the same subject. The first and the last frame are selected to enable significant
displacement between the two frames. Using two consecutive frames will result insignificant vector
displacement [5].
At this stage, we used the Viola Jones face detector to detect the face. We had done a simple experiment to
verify the capability of this detector. Figure 6 demonstrates that this algorithm is capable of detecting the face
with occlusion such as face with eyeglasses and slanted frontal view. This suggests that the Viola Jones detector
is robust to different size of the image, orientation and occlusion. This property is crucial in automatic face
recognition application.
Figure 6: Face detection on CK+ image (left) and random image with occlusion and slight orientation
(right) using Viola Jones face detector
RESULTS AND DISCUSSION
Figure 7 shows the comparison results after the optical flow had been computed using the two pre-
processing methods, regular grid on the left and superpixel on the right. More optical flow was observed in
result images with superpixel segmentation. This indicates that the superpixel is capable of improving the facial
5
Hassan et al.,2017
features extraction, whereby more information can be captured as compared to the conventional regular patches
segmentation.
REGULAR GRID SUPERPIXEL
NEUTRAL
DISGUST
FEAR
HAPPY
SAD
6
J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017
SURPRISE
Figure 7: Comparison of optical flow computed on each facial expression using regular grid (left) and
superpixel (right)
To further analyze, the resulted images are being magnified in order to increase visibility on the different
optical flow computed using the two segmentation method as shown in Figure 8.
Figure 8: Comparison of equally distributed grid on the eye region of the neutral expression (top) with
non-regular grid based on superpixel (bottom) of the neutral face expression
In Figure 8, it is observed that the superpixel adhere well with the eye shape while in regular grid, the eye
shape was not being accurately annotated. The difference is highlighted by the cyan line.
Figure 9: Comparison on the right eye region for happy expression. Top:
Regular grid, Bottom: Superpixel
7
Hassan et al.,2017
Figure 10: Comparison on the left eye region for happy expression. Top: Regular grid, Bottom:
Superpixel
Figure 9 and Figure 10 show that that the latter images have more contour adhering to the edges of the
eyelid and bottom line of the eye. This suggests that the facial feature extraction will be more accurate as
compared to the regular grid pre-processing because it can extract edges of the eye shape.
Figure 11: Comparison of the optical flow computed on consequtive frames for happy expression using
superpixel segmentation. The sequences start from (a) to (f)
Figure 11 shows the difference in optical flow computed on consecutive frames. The continuity of the
gradient motion tracking is clearly observed from one sequence to the other, especially at the eye and mouth
region.
(a)
(c)
(b)
(d)
(e) (f)
8
J. Appl. Environ. Biol. Sci., 7(3S)1-10, 2017
CONCLUSION
The investigation does not aim to focus on the accuracy that is competitive in literature. It explores the
potential segmentation method that is expected to improve the recognition rate due to its capability to segment
the image precisely based on intensity boundaries. We obtained comparative result at times for facial expression
feature extraction using the newly introduced pre-processing technique. The experiment has qualitatively proved
that superpixel will give a better result in extracting the features from the facial expression image sequences.
Apart from that, the capability of Viola Jones detector in detecting face with occlusion also will contribute in the
fast processing in the recognition process. The combination results of superpixel and Lucas Kanade optical flow
is suitable for a system that requires less computational complexity with high accuracy. Further work can be
done in the classification stage, whereby a quantitative data will be generated. The optical flow given by V� and
V�, which are the respective velocity in x and y direction can be quantitatively data translated via classification
process. We anticipate there will an improvement in the facial expression accuracy by applying the proposed
pre-processing technique.
ACKNOWLEDGMENT
The authors would like to acknowledge Universiti Sains Malaysia and Malaysian Spanish Institute,
Universiti Kuala Lumpur for the material, resources and expertise support in preparing this research. This
research is fully supported by research university individual grant from Universiti Sains Malaysia, Grant No.
1001/PELECT/814208.
REFERENCES
1. Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, 2012. SLIC Superpixels Compared to
State-of-the-Art Superpixel Methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (11):
2274-2282.
2. Barron, J.L. and N.A. Thacker, 2005. Tutorial: Computing 2D and 3D optical flow. Retrieved from
https://pdfs.semanticscholar.org/7123/85a47d0d01400c8d86ab2bd8b3380d126760.pdf.
3. Fan, X. and T. Tjahjadi, 2015. A Spatial-Temporal Framework Based on Histogram of Gradients and Optical
Flow for Facial Expression Recognition in Video Sequences. Pattern Recognition, 48 (11): 3407-3416.
4. Fulkerson, B., A. Vedaldi and S. Soatto, 2009. Class Segmentation and Object Localization with Superpixel
Neighborhoods. In the Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, pp:
670-677.
5. Ghimire, D., S. Jeong, J. Lee and S.H. Park, 2016. Facial Expression Recognition based on Local Region
Specific Features and Support Vector Machines. Multimedia Tools and Applications,76 (6): 7803-7821.
6. Guojiang, W. and F. Kechang, 2010. Facial Expression Recognition Based on Extended Optical Flow
Constraint. In the Proceedings of the 2010 IEEE International Conference on Intelligent Computation
Technology and Automation, pp: 297-300.
7. Happy, S.L. and A. Routray, 2014. Automatic Facial Expression Recognition Using Features of Salient Facial
Patches. IEEE Transactions on Affective Computing, 6 (1): 1-12.
8. Hassan, H., S. Yaacob, A. Radman and S.A. Suandi, 2016. Eye State Detection for Driver Inattention based on
Lucas Kanade Optical Flow Algorithm.In the Proceedings of the 2016 IEEE International Conference on
Intelligent and Advanced Systems, pp: 1-6.
9. Lucey, P., J.F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, I. Matthews and F. Ave, 2010. The Extended Cohn-
Kanade Dataset (CK+): A Complete Dataset for Action Unit and Emotion-Specified Expression. In the
Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Workshops, pp: 94-101.
10. Mahmoudi, S.A., M. Kierzynka, P. Manneback and K. Kurowski, 2014. Real-Time Motion Tracking Using
Optical Flow on Multiple GPUs. Bulletin of the Polish Academy Sciences, Technical Sciences, 62(1): 139-150
11. Peng, C., X. Gao, S. Member, N. Wang and J. Li, 2015. Superpixel-based Face Sketch-Photo Synthesis. IEEE
Transactions on Circuits and Systems for Video Technology, 27(2): 288-289.
12. Tasli, H.E., R. Sicre and T. Gevers, 2015. SuperPixel Based Mid-Level Image Description for Image
Recognition. Journal of Vision, Communication and Image Representation,33: 301-308.
13. Ren, X. and J. Malik, 2013. Learning a Classification Model for Segmentation. In the Proceedings of the 2013
International Conference on Computer Vision, pp: 10-17.
14. Saxena, A., M. Sun and A.Y. Ng, 2007. Learning 3-D Scene Structure from a Single Still Image.In the
Proceedings of the 2007 International Conference on Computer Vision, pp: 1-8.
15. Sormaz, M., A.W. Young and T.J. Andrews, 2016. Contributions of Feature Shapes and Surface Cues to the
Recognition of Facial Expressions. Vision Research, 127: 1-10.
9
Hassan et al.,2017
16. Wang, X. and X.P. Zhang, 2009. A New Localized Superpixel MARKOV Random Field for Image
Segmentation. In the Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, pp:
642-645.
17. Yang, Y., S. Hallman, D. Ramanan and C. Fowlkes, 2010. Layered object detection for multi-class
segmentation. In the Proceedings of the 2010 IEEE Computer Society, Conference on Computer Vision and
Pattern Recognition, pp: 3113-3120.
18. Mori, G. and C. Va, 2005. Guiding Model Search Using Segmentation.In the Proceedings of the 2005 10th
IEEE International Conference on Computer Vision,pp: 1417-1423.
19. Achanta, R., A. Shaji, K. Smith, A. Lucchi, P. Fua and S. Süsstrunk, 2012. SLIC Superpixels Compared to
State-of-the-Art Superpixel Methods.IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (11):
2274-2281.
20. Arbeláez, P., M. Maire, C. Fowlkes and J. Malik, 2009. From Contours to Regions: An Empirical Evaluation.
In the Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pp: 2294-2301.
21. Barbu, A., and S.C. Zhu, 2003. Graph Partition by Swendsen-Wang Cuts.In the Proceedings of the 2003 9th
IEEE International Conference onComputer Vision, pp: 1-8.
22. Ding, L. and A. Yilmaz, 2010. Interactive Image Segmentation Using Probabilistic Hypergraphs. Pattern
Recognition, 43(5): 1863-1873.
23. Felzenszwalb, P.F. and D.P. Huttenlocher, 2004. Efficient Graph-Based Image Segmentation. International
Journal of Computer Vision, 59(2): 167-181.
24. Li, Z., X. M. Wu and S.F. Chang, 2012. Segmentation Using Superpixels: A Bipartite Graph Partitioning
Approach.In the Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp:
789-796.
25. Stuz, D., 2014. Superpixel algorithms: Overview and comparison. Retrieved from
http://davidstutz.de/superpixel-algorithms-overview-comparison/.
26. Wang, J., Y. Jia, X.S. Hua, C. Zhang and L. Quan, 2008. Normalized Tree Partitioning for Image
Segmentation.In the Proceedings of the 2008 26th IEEE Conference on Computer Vision and Pattern
Recognition, pp: 1-8.
27. Mobahi, H., S.R. Rao, A.Y. Yang, S.S. Sastry and Y. Ma, 2011. Segmentation of Natural Images by Texture
and Boundary Compression. International Journal of Computer Vision, 95(1): 86-98.
28. Yang, A.Y., J. Wright, Y. Ma and S.S. Sastry, 2008. Unsupervised Segmentation of Natural Images Via Lossy
Data Compression.Computer Vision and Image Understanding, 110(2): 212-225.
29. Singh, S., A. Gupta and A.A. Efros, 2012. Unsupervised discovery of mid-level discriminative patches. In:
Computer Vision-ECCV 2012. Lecture Notes in Computer Science (eds A. Fitzgibbon, S. Lazebnik, P. Perona,
Y. Sato and C. Schmid)pp. 7573: 73-86.Springer, Berlin.
30. Tighe, J. and S. Lazebnik, 2010. SuperParsing: Scalable nonparametric image parsing with superpixels. In:
Computer Vision-ECCV 2010. ECCV 2010. Lecture Notes in Computer Science(eds K. Daniilidis, P. Maragos
and N. Paragios) pp. 352-365.Springer, Berlin.
31. Fernando, B., E. Fromont and T. Tuytelaars, 2012. Effective Use of Frequent Itemset Mining for Image
Classification.In: Vision-ECCV 2012. ECCV 2012. Lecture Notes in Computer Science (eds A. Fitzgibbon, S.
Lazebnik, P. Perona, Y. Sato and C. Schmid) pp. 214-227.Springer, Berlin.
32. Tian, Y.L., T. Kanade and J.F. Cohn, 2005. Facial expression analysis. In:Handbook of Face Recognition (eds )
pp. 247-275. Springer, New York.
33. O. Marques, 2011. Practical image and video processing using MATLAB.John Wiley and Sons.
34. Machairas, V., E. Decenciere and T. Walter, 2014. Waterpixels: Superpixels Based on the Watershed
Transformation.In the Proceedings of the 2014 IEEE International Conference in Image Processing, pp: 4343-
4347.
10