Shape-based Quantification and Classification of3D Face Data for Craniofacial Research
Katarzyna Wilamowska
A dissertation submitted in partial fulfillmentof the requirements for the degree of
Doctor of Philosophy
University of Washington
2009
Program Authorized to Offer Degree: Computer Science and Engineering
University of WashingtonGraduate School
This is to certify that I have examined this copy of a doctoral dissertation by
Katarzyna Wilamowska
and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final
examining committee have been made.
Chair of the Supervisory Committee:
Linda Shapiro
Reading Committee:
Linda Shapiro
Maya R Gupta
James F Brinkley III
Date:
In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make its copiesfreely available for inspection. I further agree that extensive copying of this dissertation isallowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S.Copyright Law. Requests for copying or reproduction of this dissertation may be referredto Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copiesof the manuscript in microform and/or (b) printed copies of the manuscript made frommicroform.”
Signature
Date
University of Washington
Abstract
Shape-based Quantification and Classification of3D Face Data for Craniofacial Research
Katarzyna Wilamowska
Chair of the Supervisory Committee:Professor Linda Shapiro
Computer Science and Engineering
22q11.2DS been shown to be one of the most common multiple anomaly syndromes in hu-
mans. Early detection is important as many affected individuals are born with a conotruncal
cardiac anomaly, mild-to-moderate immune deficiency and learning disabilities, all of which
can benefit from early intervention.
Given a set of labeled 3D training meshes acquired from stereo imaging of heads, the
goal of this dissertation is to develop a successful methodology for discriminating between
22q11.2DS affected individuals and the general population and for quantifying the degree of
dysmorphology of facial features. Although many approaches for such discrimination exist
in the medical and computer vision literature, the goal is to develop methods that focus on
3D shape of both the face as a whole and specific local features.
The main contributions of this work are: an automated methodology for pose alignment, au-
tomatic generation of global and local data representations, robust automatic placement of
landmarks, generation of local descriptors for nasal and oral facial features, and a 22q11.2DS
classification rate which rivals medical experts. The methods developed for the 22q11.2DS
phenotype should be widely applicable to the shape-based quantification of any other cran-
iofacial dysmorphology.
TABLE OF CONTENTS
Page
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Paper Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Computer Vision Craniofacial Analysis . . . . . . . . . . . . . . . . . . . . . . 7
Chapter 3: Ground Truth and Measures of Success . . . . . . . . . . . . . . . . . . 93.1 Participant Specific Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Expert Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Hand-labeled Landmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Statistical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Chapter 4: Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Alignment Using Scanalyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4 Automatic 3D Pose Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 5: Global Data Representations . . . . . . . . . . . . . . . . . . . . . . . 235.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 2.5D Depth Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Curved Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
i
5.5 Labeled Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.6 Distance from Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 6: Global Representation Results . . . . . . . . . . . . . . . . . . . . . . . 316.1 Preliminary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 7: Local Data Representations . . . . . . . . . . . . . . . . . . . . . . . . 437.1 Automatic Nasal Landmark Detection . . . . . . . . . . . . . . . . . . . . . . 447.2 Automatic Oral Landmark Detection . . . . . . . . . . . . . . . . . . . . . . . 457.3 Landmark Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.4 Landmark-based Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.5 Shape-based Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 8: Local Representation Results . . . . . . . . . . . . . . . . . . . . . . . 598.1 Preliminary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Appendix A: Cephalometric Landmarks and Measures . . . . . . . . . . . . . . . . . 83
Appendix B: Classifier Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
ii
LIST OF FIGURES
Figure Number Page
1.1 Individuals with 22q11.2DS. Images reproduced from [10, 29]. . . . . . . . . . 1
2.1 2D landmark pattern used by Boehringer [10]. . . . . . . . . . . . . . . . . . . 5
2.2 Dense Surface Model construction [35]. . . . . . . . . . . . . . . . . . . . . . . 6
3.1 FISH test for 22q11.2DS; arrow points to the deleted genetic material [82]. . . 9
3.2 Survey administered to experts. . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Cephalometric landmarks (in blue) located on image of individual. . . . . . . 13
4.1 3dMD imaging system setup at Seattle Children’s Hospital. . . . . . . . . . . 16
4.2 Example image in need of cleanup. . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Example where ICP alignment performs worse than hand alignment. Observethat both lips and nose are misaligned in the automatic ICP version. . . . . . 18
4.4 Results of PCA used to align 3D meshes by their first principle componentvector. Note that each head is misaligned in a different direction. . . . . . . . 19
4.5 Tait-Bryan angles which describe the three degrees of freedom of a humanhead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Using symmetry to align face in forward direction. (a) 3D image, (b) inter-polated 2.5D image, (c) left side of face, (d) right side of face, (e) resultingdifference between left and right side . . . . . . . . . . . . . . . . . . . . . . . 21
4.7 Example results of yaw and roll alignment. . . . . . . . . . . . . . . . . . . . 21
4.8 Illustration of concept behind pitch alignment and example alignment result. 22
5.1 Snapshots of 3D meshes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 2.5D depth images (enhanced for the reader). . . . . . . . . . . . . . . . . . . 24
5.3 Curved line detail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Comparison of head data with and without facial texture. . . . . . . . . . . . 27
5.5 Topographic maps of the face with different contour line spacing. . . . . . . . 28
5.6 Curvature based image labeling. . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.1 Distance per individual to average of the control individuals. Black lineseparates affected from control, with affected individuals on the left. . . . . . 38
iii
6.2 Aggregate percent of correctly classified individuals as test set increases from2% to 50% of data set (on x-axis) shown from 0-100% accuracy (y-axis). . . . 40
6.3 Distance of control individual from control average, when that individual(circled in red) is used as the test sample. The y-axis represents the distanceto the average, while the x-axis lists all individuals in the W86 data set, withthe first 43 individuals affected, and the rest control. The blue line representsthe original distance from average used in experiment 6.2.6, while the blackdots represent the newly calculated distance from average when leaving outthe test individual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.4 Variance of full data, control set and affected set. All three data sets haveextremely large variances, on the order of 107. . . . . . . . . . . . . . . . . . . 41
7.1 Landmarks of interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2 Detecting the location of the nasal alae. . . . . . . . . . . . . . . . . . . . . . 447.3 Detecting landmarks of the mouth. . . . . . . . . . . . . . . . . . . . . . . . . 467.4 The nose area compared to the bounding box and different descriptor shapes. 527.5 Nose area in relation to bounding box area for two individuals of the same
age and gender with and without BNT . . . . . . . . . . . . . . . . . . . . . 537.6 Left and right contour lines of the nose . . . . . . . . . . . . . . . . . . . . . . 54
8.1 ROC performance curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628.2 Empirical approach to threshold detection for each descriptor. . . . . . . . . . 63
iv
LIST OF TABLES
Table Number Page
3.1 Distribution of participant data according to age, gender and 22q11.2DSaffected status for full dataset of 189 individuals. . . . . . . . . . . . . . . . . 10
3.2 Three expert survey results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Number of missing points for each of the hand-labeled landmarks. For land-
marks present on both the left and right side of the face, the order is givenas left right. A detailed description of each landmark is given in Appendix A. 14
5.1 Line positions. Position (125,150) is the location of the nose tip. . . . . . . . 255.2 Besl-Jain curvature value assignment. . . . . . . . . . . . . . . . . . . . . . . 29
6.1 Attribute selection of PCA vectors for data separation for gender, age andaffected. Each attribute name contains its eigenvalue rank in order of impor-tance, i.e. d5 is the 5th eigenvector. . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 F-measure scores for different classifiers with standard deviations provided.Data used are all PCA compressed versions of 3D snapshots and 2.5D images,on all 189 individuals and the initial four subsets tested: A106, AS106, W86,and WR86. Classifiers from left to right are: Naive Bayes, JRip (repeatedincremental pruning to produce error reduction propositional rule learner),J48 tree (C4.5 decision tree), NNk=1 (nearest neighbor classifier), NNk=3(3-nearest neighbor classifier), Neural Net:9,3 (neural network that uses back-propagation for training, with two hidden layers of size 9 and 3), SVM default(support vector machine with default WEKA [81] setup). . . . . . . . . . . . 33
6.3 Comparison of predictive capability of curvature value ranges. . . . . . . . . . 346.4 Choosing an appropriate data set. 3D snapshot with ear cutoff threshold
data format used. Classified using Naive Bayes. Standard deviations shown. . 356.5 Checking for data loss between data representations. All data shown here is
from the W86 dataset classified using Naive Bayes. Standard deviations shown. 356.6 Curved lines with Naive Bayes and W86. . . . . . . . . . . . . . . . . . . . . . 366.7 Symmetry measures with Naive Bayes and W86. EC refers to symmetry
analysis done on 2.5D images with an ear cutoff. FC refers to images withthe forehead removed due to noise from the hair removal process. . . . . . . . 37
6.8 Curvature labeled images compared to 2.5D results using Naive Bayes andW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
v
6.9 Topography labeled images compared to 2.5D results using Naive Bayes andW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.10 Classification using distance from average of control using Naive Bayes onW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.1 Landmark distances obtained using automatically detected landmarks. . . . . 487.2 List of nasal landmark-based descriptors. . . . . . . . . . . . . . . . . . . . . 497.3 List of oral landmark-based descriptors. . . . . . . . . . . . . . . . . . . . . . 507.4 List of bulbous nasal tip shape-based descriptors. . . . . . . . . . . . . . . . . 547.5 List of tubular shape-based descriptors. . . . . . . . . . . . . . . . . . . . . . 567.6 List of nasal root shape-based descriptors. . . . . . . . . . . . . . . . . . . . . 577.7 List of oral shape-based descriptors. . . . . . . . . . . . . . . . . . . . . . . . 58
8.1 Using experts’ median scores for facial features to predict 22q11.2DS. In eachtable, the upper set of results was obtained using Naive Bayes, the lower usingSVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Correct automatic placement compared to availability of hand-labeled land-marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.3 Prediction of 22q11.DS using landmark distance measures. . . . . . . . . . . . 628.4 Predicting expert marked nasal features using LN data set. The upper set of
results was obtained using Naive Bayes, the lower using SVM. . . . . . . . . 648.5 Predicting the four oral features using LO data set. The upper set of results
was obtained using Naive Bayes, the lower using SVM. . . . . . . . . . . . . 658.6 Predicting 22q11.2DS using landmark-based descriptors. The upper set of
results was obtained using Naive Bayes, the lower using SVM. . . . . . . . . 658.7 Using shape-based descriptors for predicting nasal and oral facial features.
For each descriptor type, the right arrow indicated the facial feature experts’median score to which it is compared. The upper set of results was obtainedusing Naive Bayes, the lower using SVM. . . . . . . . . . . . . . . . . . . . . 66
8.8 Performance of shape-based descriptors in predicting 22q11.2DS. The upperset of results was obtained using Naive Bayes, the lower using SVM. . . . . . 67
8.9 Predicting 22q11.2DS using all nasal, all oral and all descriptors. The upperset of results was obtained using Naive Bayes, the lower using SVM. . . . . . 67
9.1 Legend for classification errors for each individual in Table 9.2 and Table 9.3. 709.2 Errors in male individuals of W86 dataset. Representative global and local
descriptors are shown. Dark boxes signify errors. . . . . . . . . . . . . . . . . 719.3 Errors in female individuals of W86 dataset. Representative global and local
descriptors are shown. Dark boxes signify errors. . . . . . . . . . . . . . . . . 72
vi
ACKNOWLEDGMENTS
I owe my deepest gratitude to my advisor Dr. Linda Shapiro, who guided me to becoming
the researcher I am today. She provided me with the perfect mix of freedom to explore on
my own and direction when I faltered.
I am indebted to my committee members: Dr. Maya Gupta, Dr. James Brinkley, and
Dr. John Kramlich for their excellent feedback which ensured that this dissertation is ac-
cessible to computer science and medical audiences alike.
I am grateful to Dr. Carrie Heike, Dr. Michael Cunningham, Dr. Anne Hing, Dr. Mark
Hannibal and staff members at Seattle Children’s Hospital Craniofacial Center for providing
me with the 3D data used in this dissertation, as well as their medical and anthropometric
expertise.
I would like to thank Jia Wu for collaborating with me on local data representations and
members of my research group for their countless suggestions which improved my work.
Finally, I would like to express my gratitude to Stefan Schoenmackers for his support and
friendship.
This work was supported by the National Science Foundation Graduate Research Fellowship,
by the National Science Foundation under Grant Number DBI-0543631, by the National
Institute of Dental and Craniofacial Research under Grant Number 5K23DE17741-2, by
the General Clinical Research Center under Grant Number # M01-RR 00037 and by the
American Academy of Pediatrics Section on Genetics and Birth Defects.
vii
viii
DEDICATION
to my family
my grandparents for teaching me knowledge is the one thing that can never be taken away
my parents for always believing in my potential and supporting my goals
my sister for sharing the highs and lows of the PhD
ix
1
Chapter 1
INTRODUCTION
1.1 Motivation
Velocardiofacial syndrome (VCFS), or more precisely 22q11.2 deletion syndrome, was first
described in 1978 [69]. Since then, 22q11.2DS been shown to be one of the most com-
mon multiple anomaly syndromes in humans, with a disputed prevalence of anywhere from
1:2000 to 1:6000 live births in the United States [49, 13]. Early detection is important as
many affected individuals are born with a conotruncal cardiac anomaly, mild-to-moderate
immune deficiency and learning disabilities, all of which can benefit from early intervention.
Although, VCFS has more than 180 clinical features, including 16 craniofacial, 15 eye, 20
ear and 5 nasal anomalies [77, 30], no single feature occurs in 100% of the cases, and there
are no individuals who have most or all of the clinical features. In addition, the expression of
Figure 1.1: Individuals with 22q11.2DS. Images reproduced from [10, 29].
2
a specific feature may be quite varied; for example a palatal cleft feature can be an obvious
cleft palate or simply a disfunction of the palatal muscles [70].
While 22q11.2DS affected individuals often have a characteristic facial appearance, it can be
very subtle to detect (Figure 1.1). Even individuals with expert training1 have difficulty in
diagnosing 22q11.2DS from frontal facial photographs (predictions only slightly better than
chance) [7]. The final diagnosis is verified with fluorescence in situ hybridization (FISH)
testing [73], a genetic test which is both time consuming and expensive. For these two
reasons researchers have been highly motivated to develop faster and cheaper genetic tests
[28, 42, 20], as well as, to identify features that may improve physician accuracy in the
diagnosis of 22q11.2DS [36].
The shape-based quantification of 3D facial features proposed in this dissertation will lead
to better understanding of the connection between the 22q11.2 deletion syndrome genotype
and the phenotype of this syndrome. Being able to connect facial features to the genetic
code will allow for understanding the etiology of craniofacial malformation and pathogenesis
of 22q11.2DS, which, in turn, will be informative of the genetic control needed for normal
craniofacial development. From a clinical standpoint, offering a standard automated fil-
ter may aid physicians in concentrating on the more difficult cases and provide insights
into the shapes that are considered most telling for a specific dysmorphological syndrome.
Lastly, the identification of those patients who have higher likelihood of a positive test for
the 22q11.2 deletion would lead to more efficient use of medical resources (in this case,
expensive genetic tests).
1.2 Problem Statement
Given a set of labeled 3D training meshes acquired from stereo imaging of heads, the
goal of this research is to develop a successful methodology for discriminating between
22q11.2DS affected individuals and the general population and for quantifying the degree of
1members of the American Cleft Palate-Craniofacial Association [5]
3
dysmorphology of facial features. Although many approaches for such discrimination exist
in the medical and computer vision literature, the goal is to develop methods that focus on
3D shape of both the face as a whole and specific local features.
1.3 Paper Outline
In Chapter 2, the literature related to medical craniofacial assessment and craniofacial
analysis using computer vision will be reviewed. In Chapter 3 the sources of ground truth
as well as the statistical measures used for evaluating success will be stated. In Chapter
4, data preprocessing including an automatic method for pose alignment will be described.
Global data representations used in this dissertation and methods for their generation will
be explained in Chapter 5, followed by a description of experimental results on global
descriptors in Chapter 6. Local data representations will be described in Chapter 7, with
experimental results for local descriptors provided in Chapter 8. Finally, Chapter 9 will
summarize the contributions of this dissertation and suggest possible directions for further
research.
4
Chapter 2
RELATED LITERATURE
In this chapter the related literature on craniofacial feature assessment in medicine and
computer vision will be described. With respect to studies of 22q11.2DS, a brief descrip-
tion of manual medical assessment methods will be followed by current medical automated
methods. These will be followed by a general description of relevant work both in medicine
and computer vision. Although there will be brief mention of the work with different data
sources and formats, the focus of this literature review will be on 3D surface meshes of the
face.
2.1 Medical Craniofacial Assessment
Traditionally, the approach to identify and study an individual with facial dysmorphism
has been thorough clinical examination combined with craniofacial anthropometric mea-
surements [4, 62]. These measurements are based on landmarks picked visually and by
hand palpation of the underlying skull shape. It is important to note that there are very
few non-Caucasian normative physical data sets, so in general, the collected data is com-
pared to the Caucasian population [27, 33].
Newer methods of craniofacial assessment [4] involve using data from computerized to-
mography [76, 48, 71], magnetic resonance imagining [16, 31, 11, 6, 19], ultrasound studies
[47, 25, 21, 12], and stereoscopic imaging [3, 15, 58]. The information in these data repre-
sentations is often hand measured, or at least hand labeled, so the human effort in the use
of these newer systems is still quite significant.
With respect to 22q11.2 deletion syndrome, craniofacial anthropometric measurements pre-
vail as the standard manual assesment method. Automated methods of 22q11.2DS analysis
5
are limited to just two. Boehringer et al. [10, 57] used standard 2D photographs of in-
dividuals representing ten different facial dysmorphic syndromes, which were converted to
grayscale and cropped to 256 by 256 pixels in size. A predefined landmark pattern was
placed on each face (Figure 2.1) and a Gabor wavelet transformation was applied at each
node yielding a data set of 40 coefficients per node. The generated data sets were then
transformed using principal component analysis (PCA) and classified using linear discrimi-
nant analysis (LDA), support vector machines (SVM), and k-nearest neighbors (kNN). The
best prediction accuracy was found to be 76% using LDA, dropping to 52% when using a
completely automated system.
Figure 2.1: 2D landmark pattern used by Boehringer [10].
The second, more extensive work, is that of Hutton and Hammond using their Dense Surface
Models (DSM) [35, 40, 36, 37]. Here the input data is that of a 3D surface mesh created by
the 3dMD photometric system. The data collection attempted to capture individuals with
natural pose and neutral expression, although this was waived as some syndromes have a
characteristic facial expression. For each generated 3D mesh, eleven 3D landmarks were
manually located. A mean landmark set was calculated, and then each surface was warped
to bring the corresponding landmarks on each face into precise alignment with the mean
6
landmarks. A closest point correspondence to the vertices of a base mesh chosen from the
set was then constructed. The mesh connectivity in the base mesh was transfered back to
the densely correspondent meshes of each individual surface, and the original meshes and
landmarks were abandoned. The surfaces were then unwarped back to their original shapes.
These new surfaces were then used to calculate an average shape using Procrustes align-
ment and then subjected to PCA to compute the major modes of shape variation (Figure
2.2). The generated data sets (60 VCFS, 130 control) were classified according to their
Figure 2.2: Dense Surface Model construction [35].
PCA coefficients using many different classifiers (closest mean(CM), Decision trees, Neural
networks, Logistic regression, SVM) with best sensitivity and specificity results at 0.83 and
0.92 using SVM [36], respectively. Newer results (115 VCFS, 185 control) used CM, LDA
and SVM in studying discrimination abilities of local features (face, eyes, nose, mouth) at
a correct classification rate of 89% [37]. Neither Boehringer’s or the Dense Surface Models
methods are fully automatic; both benefit from manual landmark placement.
7
2.2 Computer Vision Craniofacial Analysis
Although the raw facial data format is provided in three dimensions, the data can be an-
alyzed in variations from one dimension up to three. It is of note that there are methods
that use texture information for facial analysis [78, 63], but there will be little focus on
them in this review as the data used in this research is textureless due to human subjects
requirements (IRB).
In reference to the face, 1D data can be defined as the line that describes the profile of
the face, or a signal waveform. The collection of profiles that describe different individuals
can then be analyzed for similarity of waveform using Pearson’s correlation coefficient, or
transformed to a new coordinate system using one of many compression schemes such as
PCA, Fourier transforms, or wavelet transforms [18]. PCA transforms the data so that the
greatest variance is in the first coordinate, the next in the second coordinate and so on. A
Fourier transform returns the frequency content of the entire signal as a sum of sines and
cosines of different frequencies. Wavelet transforms return the frequency content at different
parts of the signal [54].
2D facial data can be best thought of as a standard photograph, where the depth maybe
noted by the use of lighting. For analysis, many of the methods mentioned in the 1D sec-
tion have 2D equivalents. These methods can be supplemented by Fisherfaces [8], which
have been demonstrated to, in some cases, have lower error rates than PCA; and man-
ual/automatic selection of facial landmarks or features [56, 55].
3D facial data is defined as a double precision wire mesh of the head that includes the
face. Morphable model approaches [9, 45, 24] leverage databases of already enrolled 3D
meshes (often hand labeled with landmarks or features) for new image intake and recog-
nition. To reduce the computational requirements, new data representation schemes are
used. Canonical Face Depth Maps [23] create a smaller representation for 3D face data,
while work like Symbolic Surface Curvatures [65] concentrate on exactly describing a specific
8
facial feature. There is also a significant body of work on 3D landmarks and features ranging
from landmark detection to appropriate analysis of facial features [2, 17, 80, 51]. In each
of these cases, landmarks are either hand-labeled or induced from previously labeled faces.
Lastly, hybrid 2D-3D methods, where information from one dimensional space is used to
add detail to another dimensional space, are used in an effort to improve facial recognition
results [14, 64, 79, 66].
Most facial analysis methods in computer vision have been developed with focus on bio-
metric authentication and recognition [52, 59, 23, 45, 14, 51, 66], with very few [50, 57]
attempting to detect medically relevant facial dysmorphology. This fact, that computer
vision methods have not transfered well to medical applications, motivates the research in
this dissertation where computer vision methods are used to quantify 3D face data based
on shape.
9
Chapter 3
GROUND TRUTH AND MEASURES OF SUCCESS
This chapter will discuss the three types of ground truth used: participant specific data,
expert surveys and hand-labeled landmarks. A description of how each ground truth is
used in this work will be given and aggregate information will be presented. Lastly, the
statistical measures used for determining success in this work are introduced.
3.1 Participant Specific Data
Initial ground truth data was limited to gender, age and disease status. Gender and age
were collected as part of the participant intake survey. The disease status was defined as
either affected by 22q11.2DS or control, and was detected by a fluorescence in situ hy-
bridization (FISH) genetic test for 22q11.2DS. Intuitively, a FISH test consists of attaching
Figure 3.1: FISH test for 22q11.2DS; arrow points to the deleted genetic material [82].
10
customized fluorescent markers to a sample of an individual’s DNA. After allowing time
(about 12 hours) for the markers to attach themselves to the genetic section in question
and washing the sample to prevent false negatives, the DNA is viewed under a microscope
capable of inducing fluorescence in the markers. In the case of a 22q11.2 deletion test, one
or more sections of the chromosome will not fluoresce (see Figure 3.1).
The demographic distribution of the data is given in Table 3.1. Age and gender data
were used in Section 6.1. Individual disease status was used for classification experiments
in both Chapter 6 and Chapter 8.
Table 3.1: Distribution of participant data according to age, gender and 22q11.2DS affectedstatus for full dataset of 189 individuals.
Affected Control Total ofAge Female Male Female Male 189
less than 1 1 2 31 2 1 7 6 162 1 3 3 11 183 1 2 4 4 114 3 1 3 1 85 1 2 2 8 13
6 4 3 4 3 147 2 1 1 6 108 2 1 5 3 119 3 2 3 3 11
10 2 2 3 4 11
11 1 4 512 1 4 513 2 3 3 4 1214 1 1 3 515 1 1 2
16 2 2 417 1 1 218 1 1 220 1 1 1 3
21-25 1 10 2 1326-30 1 2 1 431-40 2 3 1 6
11
3.2 Expert Survey
In September 2008, expert ground truth was provided as qualitative data on a set of 164
individuals (with a ratio of 1:3 affected vs. control) as the results of a paper survey filled
out by Dr. Carrie Heike. Each facial feature was rated from 0 to 2 in quality, 0 = none, 1
= moderate, 2 = severe, all referring to 22q11.2DS characteristics. Additionally, a can’t tell
category (designated by the symbol “?”) was added during the process to account for traits
that could not be categorized based on an individual’s 3D snapshot image. The results from
this initial survey were used to find a good starting point for local feature description, which
will be further discussed in Chapter 7.
Based on Dr. Heike’s comments, several more anthropometric questions were added to
the survey and an Opposite option was added to the rating system (see Figure 3.2). This
revised survey was administered in October 2008 to two trained dysmorphologists who clas-
sified a Caucasian-only subset of 1:1 affected vs. control consisting of 86 individuals (this set
will henceforth be referred to as W86). Dr. Heike updated her previous survey by adding in-
formation for the missing data. The results of this second survey were collected in November
2008 including post-mortem interviews with each participant and summarized in Table 3.2.
Table 3.2: Three expert survey results.
Median of expert scores 22q11.2DS group Control group-1 0 1 2 ? -1 0 1 2 ?
Overall face 22q facial phenotype 0% 26% 56% 5% 0% 0% 93% 7% 0% 0%Overall face asymmetric 0% 67% 33% 0% 0% 0% 81% 16% 0% 0%Overall face square / rectangualr 0% 53% 44% 2% 2% 0% 77% 23% 0% 0%Overall face hypotonic appearnace 0% 65% 30% 5% 0% 0% 93% 7% 0% 0%Eyes hooded appearance 0% 56% 28% 7% 2% 0% 91% 7% 0% 2%Nose prominent nasal root 0% 53% 44% 5% 0% 0% 70% 26% 0% 0%Nose tubular appearance 0% 53% 47% 0% 0% 0% 77% 23% 0% 0%Nose bulbous nasal tip 0% 33% 47% 19% 0% 0% 84% 16% 0% 0%Nose small nasal alae 0% 26% 53% 5% 0% 0% 81% 12% 0% 0%Ears small 0% 40% 42% 2% 16% 0% 95% 5% 0% 0%Ears protuberant 0% 47% 40% 7% 9% 0% 67% 26% 5% 0%Midface relatively flat 0% 33% 67% 0% 0% 0% 77% 21% 2% 0%Forehead square 0% 37% 49% 0% 21% 0% 88% 12% 0% 0%Forehead prominent on profile 0% 72% 16% 0% 2% 2% 88% 9% 0% 0%Mouth small 0% 63% 37% 0% 0% 0% 86% 14% 0% 0%Mouth open 0% 81% 12% 5% 0% 0% 88% 12% 2% 0%Mouth downturned corners of mouth 0% 44% 53% 2% 9% 0% 79% 19% 2% 5%Mouth retrusive chin 2% 72% 19% 2% 0% 0% 100% 0% 0% 0%
12
1
Definitely Probably Probably DefinitelyYES YES NO NO
Does this individual have 22q11? © © © ©Do you know this individual? © © © ©
Opposite Not Moderate Severe Notof 22q11 22q11 22q11 22q11 enough data
Overall face22q Facial Phenotype © © © © ©Asymmetric © © © © ©Square/Rectangular © © © © ©Hypotonic appearance © © © © ©EyesHooded appearance © © © © ©NoseProminent nasal root © © © © ©Tubular appearance © © © © ©Bulbous nasal tip © © © © ©Small nasal alae © © © © ©EarsSmall © © © © ©Protuberant © © © © ©MidfaceRelatively flat © © © © ©ForeheadSquare © © © © ©Prominent on profile © © © © ©MouthSmall © © © © ©Open © © © © ©Downturned corners of mouth © © © © ©Retrusive chin © © © © ©Additional comments
Figure 3.2: Survey administered to experts.
13
As can be seen in the survey results, all features of the nose (prominent nasal root, tubular
appearance, bulbous nasal tip, and small nasal alae) were found to have a higher percentage
of moderate and severe expression in 22q11.2DS affected individuals. Midface flatness and
square forehead had the next best separations between affected and control groups, while
small mouth had a weak, but present, disease signal.
3.3 Hand-labeled Landmarks
The last form of expert ground truth was provided in November 2008 as quantitative data in
the form of hand-labeled anthropometric landmarks for a 144 subset of the 189 individuals
used in this work (see Figure 3.3). Of the 144 hand-labeled individuals, 77 occur in the
above mentioned W86 data set, and 60 of these are matched 1:1 affected vs. control. The
Figure 3.3: Cephalometric landmarks (in blue) located on image of individual.
14
availability of each landmark label is shown in Table 3.3. Robust landmarks of the nose are
highlighted in green and robust landmarks of the mouth are highlighted in blue. These hand-
labeled landmarks were used to check automatically generated landmarks and automatic
symmetry measures, as described in Chapter 8 and Section 6.2.4, respectively.
Table 3.3: Number of missing points for each of the hand-labeled landmarks. For landmarkspresent on both the left and right side of the face, the order is given as left right. A detaileddescription of each landmark is given in Appendix A.
Landmark Landmark Missing Data CountName Label L144 L77 L60
glabella g 1 1 1nasion n 0 0 0sellion se or s 0 0 0pronasale prn 1 1 1subnasale sn 1 1 1abiale superius ls 2 1 1stomion sto 15 9 8labiale inferius li 31 14 13sublabiale slab 26 14 13gnathion’ gn’ 50 23 21exocanthion ex 8 5 4 2 4 2endocanthion en 5 1 3 1 3 1alar curvature ac 4 2 1 2 1 2alare al 1 1 1 1 1 1subalare sbal 1 1 1 1 1 1subnsasale’ sn’ 32 25 14 13 11 10crista philtri cph 3 3 2 2 2 2cheilion ch 23 23 11 11 10 10tragion t 2 2 2 1 1 1preaurale pra 12 10 11 8 10 8postaurale pa 36 34 29 28 22 21superaurale sa 34 35 25 24 19 19subarale sba 83 70 54 44 44 35
3.4 Statistical Measures
Different measures of success are used in different communities; the measures most com-
monly used in classification and retrieval systems are used in this dissertation. In the
following equations, TP refers to the number of true positives (affected correctly labeled as
affected), FP refers to the number of false positives (control incorrectly labeled as affected),
TN refers to the number of true negatives (control correctly labeled as control) , and FN
15
refers to the number of false negatives (affected incorrectly labeled as control). For all the
measures listed here the results range from 0 to 1, with a score of 1 being the best.
Accuracy
Measures the portion of all decisions that were correct decisions.
Accuracy =TP + TN
TP + FP + TN + FN. (3.1)
Recall / Sensitivity
Measures the proportion of actual affected which are correctly labeled as affected.
R = Sn =TP
TP + FN. (3.2)
Precision
Measures the proportion of labeled affected which are actually affected.
P =TP
TP + FP. (3.3)
Specificity / 1−Fall-out
Measures the proportion of actual control which are correctly labeled as control.
Sp =TN
TN + FP. (3.4)
F-measure
Measures an even combination of precision and recall. F-measure, also called F1, is the
harmonic mean of precision and recall.
F1 =2 ∗ P ∗RP +R
(3.5)
=2 ∗ TP
2 ∗ TP + FP + FN. (3.6)
16
Chapter 4
DATA PREPROCESSING
This chapter will provide a quick overview of the source of the raw data and methods used
to prepare the raw format for research use. Data was cleaned using MeshLab [41], after
which it was pose aligned to face forward using two separate methods.
4.1 Data Source
The 3D data used in this research was collected as part of a study by Carrie Heike, M.D. [38]
at the Craniofacial Center of Seattle Children’s Hospital and Regional Medical Center. The
3dMD imaging system used can be seen in Figure 4.1. The subject sits at the location of the
Figure 4.1: 3dMD imaging system setup at Seattle Children’s Hospital.
17
blue booster seat facing towards the lower left camera stand. The data collection system
is made up of four camera stands, each containing three cameras. Of the three cameras in
each stand, one captures a direct photo, one captures an under angle and one captures an
over angle to yield a three-dimensional view of the face through stereo analysis. The twelve
resulting range maps are stitched together using proprietary methods of 3dMD to yield the
final 3D head mesh and a texture map of the face. Due to human subjects requirements
(IRB), the only data used in the research described in this work are the 3D meshes.
4.2 Data Cleaning
As the source of the data is from the real world, there needs to be quite a bit of data
cleaning before the 3D meshes can be used by computer vision methods. As can be seen
in Figure 4.2, the data contains extraneous clothing, hair, and sometimes parents. All this
Figure 4.2: Example image in need of cleanup.
18
information was removed by hand using MeshLab [41]. In addition, although not initially
obvious, neck data was removed in order to maintain conformity between meshes.
4.3 Alignment Using Scanalyze
Initially only fifteen 3D mesh heads were available, a group of seven one-year-old females and
a group of eight ten-year-old males. Each age group was aligned to an unaffected individual
in that group. The alignment was done using scanalyze and vrip, small programs that are
part of the Digital Michelangelo Project [53], aligning the one year old group to mesh F1-
x-1-3, and aligning the ten year old group to mesh M10-x-1-5. In order to take advantage
of scanalyze’s automatic Iterative Closest Point (ICP) registration, the two meshes first
needed to be moved by hand to be within the same three-dimensional space. Although ICP
worked well for many of the instances, in some cases the final result was more misaligned
(a) Hand aligned meshes (b) Meshes in (a) after ICP alignment
Figure 4.3: Example where ICP alignment performs worse than hand alignment. Observethat both lips and nose are misaligned in the automatic ICP version.
19
than the original hand alignment (see Figure 4.3). Unfortunately, as more data required
alignment, manual alignment supported by ICP became too time consuming.
4.4 Automatic 3D Pose Alignment
A standard pose alignment technique in computer vision is to use Principal Component
Analysis (PCA) [72], where the first principal component vector is used to align all meshes
to the x-axis. PCA is mathematically defined as an orthogonal linear transformation that
transforms the data to a new coordinate system such that the greatest variance by any
projection of the data comes to lie on the first coordinate (called the first principal compo-
nent), the second greatest variance on the second coordinate, and so on [44]. This method
failed to work on the data, as can be seen in Figure 4.4, due to the variable amount of
hair and head data available in each mesh; each of the first principal component vectors
points in a different direction relative to the general shape of the head. As a result, another
semi-automatic method was developed to align all of the 3D meshes.
(a) One year old female individuals
(b) Ten year old male individuals
Figure 4.4: Results of PCA used to align 3D meshes by their first principle componentvector. Note that each head is misaligned in a different direction.
20
4.4.1 Tait-Bryan Angles
The position of the head can be described by the Tait-Bryan angles often referred to as yaw,
pitch and roll, according to the illustrations in Figure 4.5. Yaw is the side-to-side movement
about the y-axis. Pitch is the up and down movement about the x-axis. Lastly, roll is the
twisting movement of the head about the z-axis. These angles were used to design alignment
methods which will be discussed in the following sections. The order of presentation will be
slightly modified as yaw and roll naturally belong together, while pitch requires a different
approach.
(a) Yaw (b) Pitch (c) Roll
Figure 4.5: Tait-Bryan angles which describe the three degrees of freedom of a human head.
4.4.2 Use of Facial Symmetry for Yaw and Roll Alignment
Symmetry between the left and right sides of face is used to determine the most central
position of the face. Although faces are not truly symmetrical, the pose alignment procedure
can be cast as finding the angular rotations of yaw and roll such that the error between the
left and right side of the face is minimal. To do this efficiently (see Figure 4.6), the original
3D mesh was interpolated to a 2.5D ordered grid (further discussion of 2.5D in Section 5.2).
The resulting image I was then split down the middle producing a left true image and a
right mirrored image. These two images were then overlaid and the difference error was
calculated by
Difference =height∑y=0
width/2∑x=0
∣∣I(x, y)− I(width− x− 1, y)∣∣. (4.1)
21
(a) 3D image (b) 2.5D image (c) left (d) right (e) diff
Figure 4.6: Using symmetry to align face in forward direction. (a) 3D image, (b) interpolated2.5D image, (c) left side of face, (d) right side of face, (e) resulting difference between leftand right side
(a) Original 3D position (b) After just Yaw rota-tion −45◦ to +45◦
(c) After just Roll rota-tion −45◦ to +45◦
(d) After both Yaw andRoll rotations −45◦ to+45◦
Figure 4.7: Example results of yaw and roll alignment.
Although it is possible to search through all 360◦ for the optimal rotation of pose, the cur-
rent set of 3D meshes contains only heads that are facing somewhat forward, and as such
the search space can be decreased significantly. To maintain robustness a search through
-45◦≤ θY ≤ 45◦ in yaw and -30◦≤ θR ≤ 30◦ in roll is recommended, but this can be further
decreased to about 10◦ in each direction if the method is semiautomatic, where the user can
choose to rotate only the negative or positive directions.
There are two points of interest when using this symmetry design. First, at 0◦, ±90◦,
and 180◦ there are local symmetry minima, and the global minimum is not necessarily lo-
22
cated at 0◦. This was resolved with a small amount of user interaction. Second, when yaw
and roll symmetry is maximized separately, the error rate increases, while when the yaw
and roll symmetry maximization is combined, the results were far more accurate (see Figure
4.7).
4.4.3 Aligning Head Pitch
The assumption of symmetry does not hold between the bottom and top parts of the face as
it did for the left and right sides; therefore the same methods cannot be used to automati-
cally align the pitch of the head. Instead, the pitch of the head is aligned by minimizing the
difference between the height of the chin and the height of the forehead (see Figure 4.8a).
Although the algorithm works quite well, if the rotation angle for pitch is set too wide, the
top of the head can be selected as the optimal solution.
As seen in Figure 4.8b, the results of running just one iteration of alignment for yaw,
roll and pitch is often not enough for final alignment. This is solved by a second iteration
of both yaw/roll and pitch alignment, but with a much smaller search space (often 5◦ is
sufficient).
(a) Minimize chin and forehead height difference (b) Example result
Figure 4.8: Illustration of concept behind pitch alignment and example alignment result.
23
Chapter 5
GLOBAL DATA REPRESENTATIONS
Although the raw data was in 3D double-precision mesh format, six representations were
chosen based on face information desired: (1) frontal and side snapshots of the 3D meshes,
(2) 2.5D depth images, (3) 1D curved line segments, (4) symmetry scores, (5) labeled
images, and (6) distances from average. 2D snapshots of the 3D mesh images were used as
a starting point, while interpolation to a 2.5D depth image was used as a means of retaining
the 3D aspect of the original mesh. The 1D curved line segments were used to determine if
there was any affected signal in the subsampled face profile. Symmetry scores were used to
determine the global structural symmetry of each individual. Labeled images were used as
a substitution for the original facial texture. Lastly, average faces for the whole set and each
subgroup were calculated, and a distance measure was used to determine an individual’s
membership in a specific subgroup. In each data representation case, the information was
normalized to the same height and width as the rest of the dataset.
5.1 Snapshots
(a) Frontal snapshot (b) Side snapshot
Figure 5.1: Snapshots of 3D meshes.
24
The motivation for this method came from the eigenfaces [75, 74] approach, where the
method uses 2D photographs of individuals. After neutral pose alignment (described in
Section 4.4.2 and Section 4.4.3), a set of frontal photographs of the 3D meshes was generated
(Figure 5.1a) using the visualization library VTK [67]. For the expert survey (described in
Section 3.2), an additional set of side snapshots rotated by 90◦ from the front was generated
(Figure 5.1b).
5.2 2.5D Depth Images
Since the original data was a double-precision unstructured triangular mesh, while 2.5D
images are represented as pixels, there was a need to interpolate the original data onto
an integer-precision structured grid. The data required correct normalization in all three
dimensions, with the final width and height of each face given by the x- and y-axes, and
the final depth of the face given by the z-axis. In order to properly scale in the z-direction,
all of the data was manually clipped at the ears. For the x-axis normalization, the face of
each individual was scaled to be exactly 200 units wide. The y-axis information was left in
the current scale, since scaling this dimension would lead to unnatural shapes.
(a) 9 months (b) 13 years (c) 39 years
Figure 5.2: 2.5D depth images (enhanced for the reader).
25
The z- and x-axis normalized unstructured triangular mesh was rasterized into a depth
buffer (an x by y matrix, with the highest z value – the tip of the nose – placed at high
illumination). As the final measurements for the 2.5D image were empirically determined
to be 250 pixels wide by 380 pixels tall, the tip of the nose for each individual was moved
to position (125,150) in x,y coordinates. Examples of 2.5D images are shown in Figure 5.2.
5.3 Curved Lines
Using the 2.5D images, specific lines can be extracted which may be descriptive of faces.
For example, a vertical line down the middle of the face becomes a waveform (depth as
a function of height) that can be analyzed (see Figure 5.3a). As seen in Table 5.1, four
versions of both vertical and horizontal lines were selected for signal testing. Odd numbers
of lines were used to maintain symmetry in the data. Finally, a combination of lines was
used to create grids of sizes 1x1, 3x3, 5x5, and 7x7 (see Figure 5.3b).
Table 5.1: Line positions. Position (125,150) is the location of the nose tip.
Line type Number of lines Line placement
Vertical 1 125 (middle of width)Vertical 3 75, 125, 175Vertical 5 75, 100, 125, 150, 175Vertical 7 50, 75, 100, 125, 150, 175, 200
Horizontal 1 150 (slightly below middle of height)Horizontal 3 100, 150, 200Horizontal 5 100, 125, 150, 175, 200Horizontal 7 75, 100, 125, 150, 175, 200, 225
5.4 Symmetry
There is a hypothesis in the 22q11.2 deletion syndrome literature [30] that affected individ-
uals are more likely to have an asymmetrical facial shape. Using 2.5D depth images, the
symmetry of any individual head I can be calculated using the Difference method developed
26
(a) Vertical curved lines for 2.5D images in Figure 5.2. From left to right, the curved lines are that of a 9month old, 13 year old, and 39 year old.
(b) Vertical, horizontal and grid lines. One line (green), three lines (green-orange), five lines (green-orange-brown), seven lines (all).
Figure 5.3: Curved line detail.
in Section 4.4.2. For readability, the reader should assume the following equivalencies
R ≡ I(x, y), (5.1)
L ≡ I(W − x− 1, y), (5.2)∑≡
H∑y=0
W/2∑x=0
. (5.3)
where H and W are the image height and width, respectively, x, y describe the location
of the particular pixel in question, and I(x, y) is the illumination at a particular pixel.
Therefore
Difference =H∑y=0
W/2∑x=0
I(x, y)− I(W − x− 1, y) ≡∑
R− L. (5.4)
27
Other symmetry measures were calculated as follows
Absolute Difference =∑|R− L|, (5.5)
Binary Difference = num{R− L > 0} − num{R− L < 0}, (5.6)
Difference Ratio =∑R− L > 0∑R− L < 0
, (5.7)
Binary Ratio =num{R− L > 0}num{R− L < 0}. (5.8)
As much of the 3D head data was asymmetrical due to the hair removal process, a version
with the forehead removed was generated for each head. This format is called FC (Forehead
Cut).
5.5 Labeled Images
Texture can often provide more information about an underlaying data set, a fact that
can be easily seen when comparing a 3D mesh with and without skin texture (Figure 5.4).
Although original face textures cannot be used due to IRB restrictions, alternate descriptive
labels can be generated. The image labeling approaches that were used in this work are
topographic face maps [66] and Gaussian and Besl-Jain curvature maps [2, 17, 51].
(a) Face texture (b) No texture
Figure 5.4: Comparison of head data with and without facial texture.
28
5.5.1 Topographic Face Maps
Given a 2.5D depth image I, topographic face map T is generated by zeroing all points of
depth z = I(x, y) which fail z mod τ = 0, where τ is the desired spacing of the contour
lines. The remaining values are then assigned to the maximum image value of 255. In other
words
T (x, y) =
0 if I(x, y) mod τ 6= 0
255 if I(x, y) mod τ = 0. (5.9)
Figure 5.5 gives examples of generated topographic face maps.
(a) τ = 5 (b) τ = 10 (c) τ = 15 (d) τ = 20
Figure 5.5: Topographic maps of the face with different contour line spacing.
5.5.2 Gaussian and Besl-Jain Curvature Face Maps
Curvature face maps were calculated using the standard equations given below. For each
point P in the 3D face mesh, κ1 and κ2 are the principal curvatures (the maximum and
minimum of the normal curvature, respectively). Mean curvature H is calculated by
H =12
(κ1 + κ2), (5.10)
29
while Gaussian curvature K is calculated by
K = κ1κ2. (5.11)
The Besl-Jain approach labels each point according to a combination of mean and Gaussian
curvatures, as shown in Table 5.2.
Once curvature values were calculated for the entire 3D mesh, the data was bounded on
each side by rangemin and rangemax, and point values Pv within the upper and lower range
were reassigned to fit the entire range of a grayscale image.
LabeledPoint =Pv − rangemin
|rangemax − rangemax| × 255. (5.12)
Figure 5.6 illustrates curvature-based labeled images used in this work.
Table 5.2: Besl-Jain curvature value assignment.
H → less than 0 equal to 0 greater than 0K ↓less than 0 saddle ridge minimal saddle valleyequal to 0 ridge flat valley
more than 0 peak (none) pit
(a) K (b) |K| (c) Besl-Jain (d) Besl-Jain Pit (e) Besl-Jain Peak
Figure 5.6: Curvature based image labeling.
30
5.6 Distance from Average
Three possible averages can be calculated for the data sets used in this work: average of the
entire set, average of the control subset, and average of the affected subset. Since the preva-
lence of 22q11.2DS affected individuals in a population is 1:4000, it is most appropriate to
use the average of the control set to evaluate an individual’s dissimilarity to the population.
The distance between the average vector A and a participant’s vector P can be measured
by any one of many distance measures. In this work three measures were used:
Euclidean =√
(P −A)(P −A)′ (5.13)
Cosine = 1− PA′√P ′P√A′A
(5.14)
Mahalanobis =√
(P −A)V −1(P −A)′ (5.15)
where V is the sample covariance matrix.
Each of the global representations described in this chapter has been tested for prediction
of 22q11.2DS. Experiments and results are described in detail in the next chapter.
31
Chapter 6
GLOBAL REPRESENTATION RESULTS
This chapter will discuss results for the global representations defined in Chapter 5. First,
preliminary studies to set up the experimental environment will be described. Following,
motivation and results for experiments on global data will be given.
6.1 Preliminary Studies
In these experiments, the data type variations followed those discussed in Chapter 5, with
an ear cutoff threshold and, in the 2.5D versions, the tip of the nose placed at the greatest
z-value in the image. In each case, the data was compressed using Principal Component
Analysis (PCA). This allowed for a maximum 189 attribute representation for the entire
data set, or an 86 attribute representation for the W86 subset. These attributes were then
assessed as to their ability to distinguish between affected and control individuals using
several common classifiers. The WEKA suite of classifiers [81], which includes multiple
classifiers of many different types, was used for all classification experiments. 10-fold cross
validation was used for all classifiers and each training/testing set was executed ten times,
for a result of 100 runs per data set per classifier. These results were then used to assess
the representational quality of each data type as well as its signal content.
6.1.1 Data Set Selection
The full data set included 189 individuals (53 affected, 136 control); such an uneven ratio is
not optimal in the use of any classifier. Therefore, an equal 1:1 ratio set needed to be used
and several options were proposed. Set A106 matched each of the 53 affected individuals
to a control individual of closest age without regard to gender or ethnicity. Set AS106
matched each of the 53 affected individuals to a control individual of closest age within
32
the same gender. Set W86 matched each of the 43 affected Caucasian2 individuals to a
Caucasian same-gender control individual of closest age. Set WR86 matched each of the 43
affected Caucasian individuals to a Caucasian same-gender control individual of the same
age, allowing repeats of controls where not enough same-aged subjects were available. It
should be noted that there was an attempt to create a ASE106 subset, that matched each
of the 53 affected individuals to a control individual of closest age, gender and ethnicity, but
this was unattainable as the most common ethnicity after Caucasian was listed as “other”,
which was considered too non-specific for ethnic matching.
6.1.2 Attribute Selection
Because the data is so varied in age and 22q11.2DS has such a subtle characteristic, the
simple solution of taking the top 10 eigenvectors (principle components) will not work. This
can be illustrated by using correlation-based feature selection [34] to find the attributes
which best predict age, gender and affected in data set W86. As can be seen in Table 6.1,
attributes used to best predict affected span the entire principle component list.
Table 6.1: Attribute selection of PCA vectors for data separation for gender, age andaffected. Each attribute name contains its eigenvalue rank in order of importance, i.e. d5is the 5th eigenvector.
Data # selected top 5 principal components next 5 principal componentsseparation attributes
gender 64 d1, d7, d8, d9, d10 d11,d12,d14,d15,d16age 47 d2, d3, d5, d6, d9 d13, d18, d20, d22, d23
affected 11 d1, d5, d8, d15, d25 d63, d66, d73, d75, d81 (d85)
6.1.3 Classifier Selection
There are many classifiers that are used in computer vision, with Support Vector Machines
(SVM) currently leading the field. Using the WEKA package [81], the performance of nine
2Participants in the study were asked to complete an intake form based on the Washington State BirthCertificate. Ethnicity for each individual was self-identified and included a family ethnic history for parentsand grandparents.
33
classifiers was compared. Appendix B provides a description of each classifier used. The
analysis of the results yielded Naive Bayes, one of the simplest classifiers, outperforming
all other classifiers for the current data set (Table 6.2). This was a surprise, but such
performance can be explained by the small size of the data set as well as the large number
of descriptors for each individual [26].
Table 6.2: F-measure scores for different classifiers with standard deviations provided. Dataused are all PCA compressed versions of 3D snapshots and 2.5D images, on all 189 indi-viduals and the initial four subsets tested: A106, AS106, W86, and WR86. Classifiers fromleft to right are: Naive Bayes, JRip (repeated incremental pruning to produce error reduc-tion propositional rule learner), J48 tree (C4.5 decision tree), NNk=1 (nearest neighborclassifier), NNk=3 (3-nearest neighbor classifier), Neural Net:9,3 (neural network that usesbackpropagation for training, with two hidden layers of size 9 and 3), SVM default (supportvector machine with default WEKA [81] setup).
Classifier → Naive JRip J48 NN NN Neural SVMData Set ↓ Bayes tree k = 1 k = 3 Net: 9,3 defaultALL-3Dsnp 0.53±0.16 0.39±0.21 0.48±0.19 0.29±0.22• 0.35±0.21• 0.31±0.22• 0.30±0.22•A106-3Dsnp 0.65±0.18 0.59±0.20 0.65±0.17 0.68±0.16 0.67±0.16 0.67±0.17 0.62±0.19AS106-3Dsnp 0.66±0.19 0.57±0.19 0.55±0.17 0.62±0.17 0.66±0.17 0.60±0.17 0.64±0.19W86-3Dsnps 0.68±0.20 0.58±0.21 0.69±0.15 0.46±0.25• 0.62±0.20 0.61±0.19 0.61±0.20WR86-3Dsnp 0.69±0.22 0.78±0.18 0.79±0.16 0.34±0.26• 0.10±0.18• 0.70±0.18 0.73±0.19ALL-25D 0.59±0.16 0.38±0.20• 0.45±0.19• 0.04±0.12• 0.06±0.12• 0.26±0.23• 0.26±0.23•A106-25D 0.68±0.16 0.62±0.18 0.57±0.16 0.50±0.18• 0.52±0.17• 0.52±0.18• 0.51±0.16•AS106-25D 0.69±0.18 0.59±0.18 0.62±0.16 0.49±0.20• 0.39±0.22• 0.51±0.17• 0.48±0.18•W86-25D 0.77±0.17 0.59±0.19• 0.56±0.20• 0.07±0.18• 0.23±0.23• 0.47±0.21• 0.46±0.22•WR86-25D 0.77±0.19 0.61±0.21 0.62±0.20 0.00±0.00• 0.00±0.00• 0.57±0.22• 0.55±0.23•
• statistically significant degradation as compared to Naive Bayes
6.1.4 Gaussian Range Selection
The range of Gaussian curvature values for the entire data set was−48, 751 to 1, 395, 243, 522
with a median of −0.0001, while the median of the absolute values was 0.001. To determine
the best possible range of curvature values for prediction of 22q11.2DS, several range options
were enumerated and the classification performance compared. As can be seen in Table 6.3,
the range ±0.5 was found to be best for Gaussian curvature. Similarly, the absolute values
of Gaussian curvature performed best at a range from 0 to 0.5.
34
Table 6.3: Comparison of predictive capability of curvature value ranges.
Dataset ±0.001 ±0.005 ±0.01 ±0.05 ±0.1 ±0.5 ±1F-measure 0.55 0.58 0.65 0.64 0.69 0.73 ◦ 0.72 ◦Precision 0.68 0.68 0.78 0.68 0.75 0.84 0.80Recall 0.50 0.54 0.61 0.64 0.68 ◦ 0.68 ◦ 0.70 ◦% Accuracy 62.54 63.43 70.14 65.96 71.25 76.61 ◦ 74.99
◦ statistically significant improvement
(a) Gaussian Curvature
Dataset 0.001 0.005 0.01 0.05 0.1 0.5 1F-measure 0.56 0.56 0.58 0.49 0.64 0.71 0.63Precision 0.71 0.73 0.71 0.49 0.60 0.86 0.79Recall 0.50 0.49 0.53 0.51 0.71 ◦ 0.64 0.56% Accuracy 63.46 64.63 65.01 49.97 60.35 75.51 ◦ 69.49
◦ statistically significant improvement
(b) Absolute value of Gaussian Curvature
6.2 Experiments
6.2.1 Full Data Set (3:1) versus 1:1 Data Set
The purpose of this experiment was to determine if the uneven distribution of affected and
control individuals in the dataset adversely affected classifier performance. Although com-
mon practice in data mining is that you test on a balanced set, the small number of affected
individuals yields a very small subset which is possibly too small for statistical significance.
The full data set, as well as four subsets (described in Section 6.1.1) were classified with
Naive Bayes, their F-measure, precision, recall and accuracy results shown in Table 6.4.
Primarily, one can see that the uneven dataset is the worst performer, supporting the
intuition that a 1:1 ratio allows for better classifier performance. In addition, the W86
subset proves to be the best performer for the entire group of 1:1 subsets. This is expected
for the following reasons. The ethnic background influences the morphology of the face
much more significantly than effects of 22q11.2DS, causing a source of noise for both the
ethnically mixed sets (A106 and AS106 ). Although it may be tempting to draw similar
35
Table 6.4: Choosing an appropriate data set. 3D snapshot with ear cutoff threshold dataformat used. Classified using Naive Bayes. Standard deviations shown.
Data Set ALL A106 AS106 W86 WR86F-measure 0.53±0.19 0.65±0.18 0.66±0.19 0.68±0.20 0.60±0.21Precision 0.56±0.22 0.74±0.18 0.78±0.21 ◦ 0.82±0.20 ◦ 0.71±0.22Recall 0.52±0.21 0.60±0.20 0.61±0.22 0.62±0.24 0.56±0.25Accuracy 74.66±9.50 69.20±14.16 71.30±14.63 73.99±12.84 66.08±14.83
◦ statistically significant improvement as compared to ALL data set
conclusions about gender based differences, when looking at the minor improvement from
A106 to AS106, this would be a mistake as the female/male distribution is not even. Lastly,
the poor performance of WR86 as compared to W86 is caused by the repetition of exactly
five control individuals (12% of the control dataset), suggesting the drawback of a very
small dataset and that the repeated individuals are influencing the control set too much.
Combined with recommendations from Dr. Heike, the W86 dataset was chosen as the most
appropriate to this work.
6.2.2 Original 3D Snapshot versus 2.5D
For the human viewer, the 3D Snapshot is considered to hold much more information than
the 2.5D representation. The purpose of this experiment was to determine how much data
loss would happen by moving from a 3D Snapshot representation to the 2.5D representation
of the data. Additionally, since ears are known as a signal carrier for 22q11.2DS and the
Table 6.5: Checking for data loss between data representations. All data shown here is fromthe W86 dataset classified using Naive Bayes. Standard deviations shown.
Data Set 3Dsnp 3Dsnp 2.5Dcut
F-measure 0.71±0.18 0.68±0.20 0.77±0.17Precision 0.88±0.18 0.82±0.20 0.87±0.17Recall 0.63±0.22 0.62±0.24 0.72±0.22% Accuracy 76.13±14.15 73.99±12.84 79.90±13.62
36
2.5D data format is without ears, it was also necessary to test how much data was being
lost by using the ear cutoff threshold. All images were 250 x 380 in size. As seen in Table
6.5, the 2.5D data format was found to be best at classifying 22q11.2DS disease status and
will be used as a baseline for the following experiments.
6.2.3 Curved Lines
In this experiment, the purpose was to discover if using curved lines, such as profile, would
contain any 22q11.2DS signal. All the line sets were generated using methods described in
Section 5.3.
Table 6.6: Curved lines with Naive Bayes and W86.
Dataset Vertical Lines Horizontal Lines Grid Lines2.5D 1 3 5 7 1 3 5 7 1x1 3x3 5x5 7x7
F-measure 0.77 0.71 0.75 0.76 0.71 0.52 • 0.60 • 0.65 • 0.67 0.69 0.71 0.74 0.73Precision 0.87 0.82 0.87 0.85 0.84 0.73 0.83 0.91 0.83 0.86 0.84 0.91 0.83Recall 0.72 0.66 0.69 0.72 0.65 0.44 • 0.52 • 0.54 • 0.60 • 0.62 0.65 0.65 0.68% Accuracy 79.90 74.89 78.74 78.21 74.85 63.61 • 69.24 • 73.57 72.31 75.51 75.04 79.10 76.14
• statistically significant degradation
As seen in Table 6.6, excluding the horizontal lines, whose results were statistically worse
than those of the other lines, there was no significant difference between the results for
different data representations. The vertical profile lines of 3 and 5 were found to be the
most informative of the different curved line types used in this experiment. Based on known
22q11.2DS signals such as a hooded appearance of the eyes, prominent forehead profile, rel-
atively flat midface or general hypotonic facial appearance, there is promise in using sparse
vertical lines to describe one or more of these anthropometric features.
6.2.4 Symmetry
The purpose of this experiment was to determine whether asymmetry can be used to discrim-
inate between affected and control individuals. Using expert median scores for symmetry,
the classification of 22q11.2DS has the highest accuracy of all the symmetry measures used,
37
Table 6.7: Symmetry measures with Naive Bayes and W86. EC refers to symmetry analysisdone on 2.5D images with an ear cutoff. FC refers to images with the forehead removeddue to noise from the hair removal process.
Data Set 2.5D Expert EC FC FC+ECF-measure 0.77 0.40 • 0.59 • 0.11 • 0.47 •Precision 0.87 0.66 0.48 • 0.22 • 0.58 •Recall 0.72 0.31 • 0.78 0.08 • 0.43 •% Accuracy 79.90 56.49 • 50.14 • 46.49 • 54.81 •
• statistically significant degradation
but as highlighted by the F-measure values, the recall is very weak (see Table 6.7). The
highest F-measure value given to the EC set, is actually a reflection of the difference be-
tween the affected and control data sets; children affected by 22q11.2DS often refused to
wear a head cap during the image intake process and, as such, the upper part of the head
was often uneven due to hair artifact removal. Generally, the symmetry measures, whether
automatically computed or given by experts were judged to be inferior in predicting disease
status to all other global data representations described in this work.
6.2.5 Labeled Images
In this experiment, the purpose was to discover if labeling images with various different
topography and curvature labels would improve upon the 22q11.2DS detection results of
the current best data representation (2.5D depth images). As can be seen in Table 6.8,
although curvature labels, particularly Gaussian (K), absolute value of Gaussian (|K|) and
Besl-Jain labels were superior to symmetry measures, the classification of disease status
Table 6.8: Curvature labeled images compared to 2.5D results using Naive Bayes and W86.
Dataset 2.5D K ± 0.5 |K| ± 0.5 BeslJain Pit PeakF-measure 0.77 0.73 0.71 0.70 0.61 0.59 •Precision 0.87 0.84 0.86 0.71 0.60 • 0.62 •Recall 0.72 0.68 0.64 0.72 0.66 0.59% Accuracy 79.90 76.61 75.51 70.81 60.56 • 61.53 •
• statistically significant degradation
38
Table 6.9: Topography labeled images compared to 2.5D results using Naive Bayes andW86.
Data Set 2.5D 5-step 10-step 15-step 20-stepF-measure 0.77 0.54 • 0.43 • 0.58 • 0.39 •Precision 0.87 0.78 0.65 0.69 0.56 •Recall 0.72 0.45 • 0.35 • 0.53 0.33 •% Accuracy 79.90 66.63 • 59.68 • 64.06 • 54.35 •
• statistically significant degradation
based on labeled images was unsuccessful in improving previous results (see Table 6.9 for
topography labeled images results).
6.2.6 Distance from Average of Control Individuals
Using distance from average is most similar to the experiments done by Hutton, et al.
described in Section 2.1. Starting with the 2.5D depth image data representation, for every
individual the distance to the control data set average was measured. These distances were
then used for classifying individuals as affected or not affected by 22q11.2DS. For visual
comparison, the distances from the average of control individuals for the Euclidean, Cosine
and Mahalanobis measures are shown in Figure 6.1. Note that in the case of the Mahalanobis
distance, the separation between the distance of affected and control individuals is most
apparent. Table 6.10 provides a numerical comparison for all three distance measures,
(a) Euclidean distance (b) Cosine distance (c) Mahalanobis distance
Figure 6.1: Distance per individual to average of the control individuals. Black line separatesaffected from control, with affected individuals on the left.
39
Table 6.10: Classification using distance from average of control using Naive Bayes on W86.
Dataset 2.5D Euclid Cosine MahalF-measure 0.77 0.63 0.59 0.94 ◦Precision 0.87 0.83 0.76 0.96 ◦Recall 0.72 0.54 0.51 0.93 ◦% Accuracy 79.90 71.31 67.88 94.00 ◦
◦ statistically significant improvement
illustrating that the Mahalanobis distance to the average control outperforms all other
global methods, yielding a F-measure of 0.94 (missing 5 individuals) in the case of the W86
data set.
6.2.7 Mahalanobis Distance as Classifier
The results using Mahalanobis distance described in Section 6.2.6 above were obtained us-
ing standard medical literature methods. However, this method of classification would be
discounted in the pattern recognition literature, because computation of the average control
requires labeling of the entire data set, not just the training set. In the above experiment
in Section 6.2.6, the control average was computed on all controls, and then 10-fold cross
validation was used in training and testing the classifier. The following new experiment was
designed so that a percentage of the control set was removed from the training data set
prior to computation of the control average and used exclusively in testing.
Figure 6.2 shows prediction accuracy as a function of the percentage of data used in testing
for both control average and affected average. As can be seen in these bar graphs, the
moment even one individual is removed from the average (test set size equal to 2%), the
classification accuracy drops to about 50%. In order to explain this drastic drop, Figure
6.3 illustrates how the distance calculation changes for a single test individual. The blue
line represents the original distances of each subject to the average of all control individuals
and one can speculate that a horizontal line may be drawn to separate the affected (first 43
individuals) and the controls. The black dots represent the new distances to the average,
40
(a) Using control average for classification
(b) Using affected average for classification
Figure 6.2: Aggregate percent of correctly classified individuals as test set increases from2% to 50% of data set (on x-axis) shown from 0-100% accuracy (y-axis).
calculated by leaving out a control individual (circled in red) out of the average. Although
most individual distances do not vary drastically, the test subject’s distance (circled in red)
41
Figure 6.3: Distance of control individual from control average, when that individual (circledin red) is used as the test sample. The y-axis represents the distance to the average, whilethe x-axis lists all individuals in the W86 data set, with the first 43 individuals affected,and the rest control. The blue line represents the original distance from average used inexperiment 6.2.6, while the black dots represent the newly calculated distance from averagewhen leaving out the test individual.
Figure 6.4: Variance of full data, control set and affected set. All three data sets haveextremely large variances, on the order of 107.
42
increases so much as to now be mistaken for an affected. Returning to Figure 6.2, this sort of
large shift in distance occurs frequently, yielding poor class prediction on the test set. What
causes this drastic change in distances, is that the separation of distances between affected
and control individuals is quite small, while the variance is very large, as shown in Figure 6.4.
Although distances from the average control or from the average affected may not be valid
as a classification feature, they are very useful in the medical community for the quantifi-
cation of dysmorphology. In this vein, the next chapter will focus on the detection and
quantification of local facial features.
43
Chapter 7
LOCAL DATA REPRESENTATIONS
This chapter will describe local facial features developed from 2.5D depth images. The nose,
with arguably the strongest signal, was chosen as the first of local features to examine, fol-
lowed by the mouth. As a first step, automatic detection of landmarks will be described.
A list of landmark distances used in anthropometry will be given. Next, landmark-based
descriptors will be explained. Lastly, developed shape descriptors will be discussed.
The nasal landmarks of interest are the sellion(s), pronasale(prn), subnasale(sn), and left
and right alae(al). Additionally, a helper landmark mf ′ was used that is similar to the
maxillofrontale(mf): a landmark that is located by palpation of the anterior lacrimal crest
of the maxilla at the frontomaxillary suture (Figure 7.1a). The oral landmarks of interest
are the labiale superius(ls), stomion(sto), labiale inferius(li) and left and right cheilion(ch)
(Figure 7.1b).
(a) Nasal landmarks (b) Oral landmarks
Figure 7.1: Landmarks of interest.
44
7.1 Automatic Nasal Landmark Detection
Given a 2.5D depth image I, generated as described in Section 5.2, the automatic detection
of landmarks proceeds as follows. For each depth image I, there is a set of points Imax at
the maximum z-value (maxz) which can be represented by
Imax ={
(x, y) : I(x, y) = maxx′,y′
I(x′, y′)}. (7.1)
The geometric center of these points (prnx,prny) is the pronasale. The sellion and subnasale
can be found as the local minima on either side of the pronasale on the line
M = Iprnx . (7.2)
To find the left and right alae, binary image NTsn is defined as the nasal tip thresholded
by snz, the depth of the subnasale (see Figure 7.2a).
NTsn =(I(x, y) ≥ snz
). (7.3)
As a starting point for the locations of the alae, the points located at left and right bound-
aries of NTsn must be found. First, the averages of the y-values of the points on the left
border, minx, and right border, maxx, of NTsn are calculated. In the case of symmetrical
faces aly (the y-value of both the left and right al) is the y-average, while for asymmetrical
(a) NTsn outlined in red (b) aly: minx, maxx (c) aly: alLx , alRx
Figure 7.2: Detecting the location of the nasal alae.
45
faces aly is the average of the left and right border averages.
aly = avg{y :(NTsn(minx, y) = 1
)∩(NTsn(maxx, y) = 1)}, (7.4)
where
minx = minx
(NTsn(x, y) = 1
), (7.5)
maxx = maxx
(NTsn(x, y) = 1
). (7.6)
As the depth of sn is not necessarily equal to the depth at which the nose connects with the
face, xmin and xmax maybe incorrectly placed on the aly horizontal line (see Figure 7.2b).
Therefore, to find alLx and alRx , the location of the attachment of the nose to the face must
be found. As shown in Figure 7.2c, alLx and alRx are detected by selecting the points with
the sharpest slope S on the horizontal line through aly.
Finally, the detection of the helper landmark mf ′ is done using the region growing in-
formation to find the horizontal line through the eyes O. Given O, the same method as for
the alae is used; the points below the sharpest slope are chosen as mf ′L and mf ′R. In a
few cases the location of the eyes is obscured, and the mf ′ locations are detected by finding
the local x-value minimums nearest to sx on the horizontal line through sy.
7.2 Automatic Oral Landmark Detection
The peak image, generated by the method described in Section 5.5.2, is used to find the
prominent parts of the upper and lower lips, marked blue in Figure 7.3a. The labiale
superius (ls) location is found where the lower edge of the upper lip area (UL) intersects
with midline M , while the labiale inferius (li) is found where the upper edge of the lower
46
lip area (LL) intersects with midline M (see Figure 7.3b).
lsx = lix = prnx, (7.7)
lsy = miny
(My ∈ UL), (7.8)
liy = maxy
(My ∈ LL). (7.9)
To detect the stomion (sto) the local z-value minimum between ls and li is used.
stox = prnx, (7.10)
stoy = {y : I(prnx, y) = minliy≤y≤lsy
I(prnx, y)}. (7.11)
In the case that this local minimum is not present, stoy is set to the y-value of the local
z-value minimum nearest midline M .
The left and right cheilion are detected using a combination of two methods. The first
method builds on the local minimum search by detecting a mouth line U as the trough
between the upper and lower lip, ending once the trough disappears as the lips meet (Fig-
ure 7.3c). Specifically, using sto as the starting point, the line is extended to the left by
(a) Lips and corners (b) ls and li (c) local minima line U (d) ch detected
Figure 7.3: Detecting landmarks of the mouth.
47
selecting the minimum of the closest three neighbor points. This process stops, when no
local minima can be found. A corresponding approach is used for extending U to the right.
The one drawback to this approach, is that it may fail to stop at the appropriate point.
The second approach used is based on the peak curvature values. As the corners of the
mouth are natural peaks, this method searches along the horizontal for the two nearest
peak areas (or dots) to sto, marked green in Figure 7.3a. Once each mouth corner dot
is found, a bounding box of is defined. The geometrical center of each bounding box is
calculated, yielding the location of ch (Figure 7.3d). The drawback of this method is that
due to face shape the peak image may not contain the mouth corner dots or the dots may
extended downward to the bottom of the chin. When the mouth line and dot approaches
are used together, the drawbacks of each method are minimized.
7.3 Landmark Distances
A set of craniofacial anthropometric landmarks and inter-landmark distances to characterize
the craniofacial features frequently affected in 22q11.2DS were initially selected [32]. Fol-
lowing a reliability study by Dr. Heike, 33 of these measurements were identified based on
demonstrated high inter- and intra-rater reliability, as well as high inter-method reliability
when comparing measurements taken directly with calipers and those taken indirectly on
the 3dMD imaging system [39].
Twelve of these landmarks were amenable to automatic detection, and were used to cal-
culate ten inter-landmark distances (Table 7.1) for subsequent inter-method comparisons
between hand-labeled and automatically detected landmarks.
7.4 Landmark-based Descriptors
Outside of the robust landmark distances discussed above, combinations of landmark mea-
surements can be used to better describe the shape of a particular facial feature. Eight such
descriptors were developed for the nose, while six were used for the mouth.
48
Table 7.1: Landmark distances obtained using automatically detected landmarks.
Description Name Mathematical Definition Approximation used
Nose width LA1 = ‖alR − alL‖
Nose tip protrusion LA2 = ‖sn− prn‖
Mouth width LA3 = ‖chR − chL‖
Upper lip height LA4 = ‖sn− sto‖
Vermillion height of upper lip LA5 = ‖ls− sto‖
Vermillion height of lower lip LA6 = ‖sto− li‖
Length of R alar base LA7 = ‖acR − sn‖ ≡ ‖alR − sn‖
Length of R alar stretch LA8 = ‖acR − prn‖ ≡ ‖alR − prn‖
Length of L alar base LA9 = ‖acL − sn‖ ≡ ‖alL − sn‖
Length of L alar stretch LA10 = ‖acL − prn‖ ≡ ‖alL − prn‖
Helper distance functions are used in the calculation of several descriptors and are de-
fined as follows, where † denotes a standard anthropometric distance measure not included
in Dr. Heike’s subset (see Section 7.3).
Depthface = maxz −minz 6=0
z, (7.12)
Widthface = maxx −minx , where z 6= 0, (7.13)
Depthnose = maxz − snz, (7.14)
Widthnose† = alRx − alLx , (7.15)
DepthNroot = sz − mf ′Rz +mf ′Lz2
, (7.16)
WidthNroot† = mf ′Rx −mf ′Lx . (7.17)
7.4.1 Nasal Descriptors
The landmark-based nasal descriptors are defined in Table 7.2. In summary, the normalized
49
Table 7.2: List of nasal landmark-based descriptors.
Description Name Mathematical Definition
Normalized nose depth LN1 = Depthnose/Depthface
Normalized nose width LN2 = Widthnose/Widthface
Normalized nasal root width LN3 = WidthNroot/Widthface
Normalized nasal root depth LN4 = DepthNroot/Depthface
Average nostril inclination† LN5 = avg[∠(L mf ′,L al,R al),∠(R mf ′,R al,L al)]
Nasal tip angle† LN6 = ∠(s, prn, sn)
Alar-slope angle† LN7 = ∠(L al, prn,R al)
Nasal root-slope angle† LN8 = ∠(L mf ′, s,R mf ′)
nose depth (LN1 ) is the ratio of nose depth to face depth. The normalized nose width (LN2 )
is the ratio of the width of the nose to the width of the face. The normalized nasal root
width (LN3 ) is the ratio of the nasal root width to face width. The normalized nasal root
depth (LN4 ) is the ratio of the nasal root depth to face depth. Average nostril inclination
(LN5 ) is the average of the left and right angles created by the lines outlining the side of the
nose and the base of the nose. Nasal tip angle (LN6 ) is the angle on the midline M between
the sellion and subnasale. Alar-slope angle (LN7 ) is the 3D angle between the left and right
alae passing through the pronasale. Finally, the nasal root-slope angle (LN8 ) is calculated
as the 3D angle through the sellion and stopping at the left and right mf ′.
7.4.2 Oral Descriptors
The landmark-based nasal descriptors are defined in Table 7.3. In summary, the normalized
mouth length (LO1 ) is the ratio of the mouth width to face width. LO2 is the ratio of the
height of the vermilion3 portion of the upper lip to full mouth height. (LO3 ) is calculated
similarly to LO2 , but for the vermilion portion of the lower lip. The inclination of the labial
3vermilion is the red pigmented portion of the lips
50
fissure (LO4 ) calculates the angle between the line defined by the location of the left and
right cheilion and the horizontal line through the right cheilion. The upper vermilion angle
(LO5 ) is the angle between the corners of the mouth and the top of the vermilion part of
the upper lip. The lower vermilion angle (LO6 ) is calculated similarly to LO5 , but for the
vermilion portion of the lower lip.
Table 7.3: List of oral landmark-based descriptors.
Description Name Mathematical Definition
Normalized mouth length LO1 = (R chx − L chx)/Widthface
Normalized vermilion height of upper lip LO2 = (lsy − stoy)/(lsy − liy)
Normalized vermilion height of lower lip LO3 = (stoy − liy)/(lsy − liy)
Inclination of labial fissure† LO4 = ∠(chR
x,y, chLx,y,horizontal)
Upper vermillion angle† LO5 = ∠(chR
x,y, lsx,y, chLx,y)
Lower vermillion angle† LO6 = ∠(chR
x,y, lix,y, chLx,y)
7.5 Shape-based Descriptors
Four sets of shape-based descriptors were developed. The first set describes the bulbous
nasal tip facial feature. The second and third set use an automatic nose edge approach to
describe nasal tubularity and the prominence of the nasal root. The fourth set describes
the mouth, focusing on such features as open, small, and downturned corners.
7.5.1 Bulbous Nasal Tip (BNT)
The nose region is grown using the pronasale(prn) as a seed pixel, while the threshold
is decreased gradually. NTd is a binary image representing the set of pixels in image I,
thresholded by depth maxz − d, where d is varied from 0 to Depthnose
NTd = (I(x, y) ≥ maxz − d). (7.18)
51
To normalize the bulbous features, the bounding box Bd for each NTd is constructed,
with the geometric center of Bd denoted by (Bx,By). The following four descriptors are
calculated.
Rectangularity
The ratio of nose area NTd to area of its bounding box Bd
Rd =num(NTd = 1)
area(Bd). (7.19)
The range of Rd is from 0 to 1; 1 is predictive of BNT .
Circularity
The difference between NTd and the matrix Ellipsed which represents an ellipse inscribed
in the bounding box Bd, with the same center as Bd, vertical diameter equal to the width
of the bounding box W (Bd) and horizontal diameter equal to the height of the bounding
box H(Bd).
Cd =∑x,y |NTd(x, y)− Ellipsed(x, y)|
area(Bd), (7.20)
where
Ellipsed(x, y) =
1 if (x−Bx)2
(W (Bd)/2)2 + (y−By)2
(H(Bd)/2)2 ≤ 0
0 otherwise. (7.21)
The range of Cd is from 0 to 1; 0 is predictive of BNT .
Triangularity
The difference between NTd and an isosceles triangle Triangled inscribed within the bound-
ing box Bd.
Td =∑x,y |NTd(x, y)− Triangled(x, y)|
area(Bd). (7.22)
52
The range of Td is from 0 to 1; 1 is predictive of BNT .
Upper Rectangularity
The area of the portion of the nose above yprn compared to its bounding box (BUd). This
is the same as the Rd calculation, except that only points above prny are considered.
Ud =num(NTd = 1)
area(BUd), y < prny. (7.23)
The range of Ud is from 0 to 1; 1 is predictive of BNT .
Severity Scores
For each descriptor δ listed above, a severity score Sevδ is defined as the portion of values
bigger than threshold Thδ as d varies from 1 to Depthnose.
Sevδ =num(δd > Thδ)Depthnose
. (7.24)
In each case, Thδ was empirically chosen to maximize the difference of average values for
severity score Sevδ between individuals with and without BNT .
For clarity, the calculation of SevR is described. Given two individuals, one with and
one without BNT , Rd was calculated at each increment of d, with the resulting values
(a) Rd and Cd (b) Td (c) Ud
Figure 7.4: The nose area compared to the bounding box and different descriptor shapes.
53
Figure 7.5: Nose area in relation to bounding box area for two individuals of the same ageand gender with and without BNT .
plotted in Figure 7.5. The count of points above ThR = 0.7 for the individual with severe
BNT is significantly greater than that of the individual with no BNT , yielding severity
scores SevR of 0.9 and 0.3 for the individual with and without BNT , respectively.
Bulbous Nose Coefficient
Using the two most basic descriptors, Rd and Cd, the bulbous coefficient can be defined as
the combination of their severities
β = SevR(1− SevC). (7.25)
Returning to the example from Figure 7.5, the severity scores β for the individuals with
and without BNT were 0.54 and 0.08, respectively.
54
Table 7.4: List of bulbous nasal tip shape-based descriptors.
Description Name Mathematical Definition
Rectangle severity DB1 SevR
Circle severity DB2 SevC
Triangle severity DB3 SevT
Upper rectangle severity DB4 SevU
Bulbous nose coefficient DB5 β
7.5.2 Automatic Nose Edges
The vertical shape of the nose can be thought of as the left and right contour of the nose
from the alae to the sellion. These contour lines can be used to calculate the width of the
nasal root, and quantify the tubularity of the nose.
The x and y components of the alae positions are chosen as starting points of the left
Figure 7.6: Left and right contour lines of the nose
55
and right contour lines.
LL0 = (alLx , alLy ), (7.26)
LR0 = (alRx , alRy ). (7.27)
As the y-value moves between aly and sy, j increases from 0 to J = sy − aly. In order to
keep the contour line continuous, the decision of which point to add as Lj+1 comes down
to a choice between maintaining a straight line or moving towards the middle of the nose.
This choice is based on the location of the neighboring sharpest slope S. Therefore, the
point Lj+1 is picked by the x-positions of Sj+1 and Lj ; if Sj+1 is closer to the middle of the
face than a direct vertical movement of 1 unit in the y-direction from Lj , the edge moves
inward by 1 unit in the x-direction, otherwise the edge maintains its current course. More
concretely, for the left side of the nose
[LLj+1
]y
= (j + 1) + aly, (7.28)
[LLj+1
]x
=
[LLj ]x + 1 if SLj+1 >
[LLj ]x[LLj ]x otherwise. (7.29)
The right side of the nose LRj+1 is calculated similarly.
Once the contour lines are found, improved maxillofrontale (mf∗) locations can be found
using the original mf ′ y-positions
mf∗Ly = mf∗Ry = mf ′y, (7.30)
mf∗Lx =[LLmf ′y]x, (7.31)
mf∗Rx =[LRmf ′y]x. (7.32)
(7.33)
The width of the nasal root can be found by looking at the x-values of L at y = sy (see
Equation 7.38).
56
In addition to the helper distance functions described in Equation 7.12 through Equation
7.17, the following functions were needed
slope(P ) = x, y slope of the tangent line at point P , (7.34)
Lslope(L) = x, y slope of the line L, (7.35)
Zslope(P ) = z-directional slope of the tangent line at point P , (7.36)
DepthNewNroot = sz − mf∗Rz +mf∗Lz2
, (7.37)
WidthNewNroot = mf∗Rx −mf∗Lx . (7.38)
The descriptors for tubularity of the nose are listed in Table 7.5. In each case, the goal of
the descriptor is to describe the trapezoid shape of the nose and determine its closeness to
a rectangle (or tube). The total nose spread (DT1 ) calculates the distance the width of the
nose extends past the width of the nasal root. The new average nostril inclination (DT2 )
performs the same calculation as for LN5 , but uses the newly detected mf∗ locations. The
average point slope in the right and left contour line L (DT3 ) determines the average slope
change from point to point on both the left and right side of the nose. The average of Lslopes (DT
4 ) is the average of the entire slope of the right and left nasal contour line. The
Table 7.5: List of tubular shape-based descriptors.
Description Name Mathematical Definition
Total nose spread DT1 = LL
sx− LL
alx + LRsx− LR
alx
New average nostril inclination DT2 = avg[∠(mf∗L, alL, alR),∠(mf∗R, alR, alL)]
Average point slope in left and right L DT3 =
PJj=0[slope(LL
j ) + slope(LRj )]/2J
Average of L slopes DT4 = avg[Lslope(LL), Lslope(LR)]
Ratio nasal root to nose width DT5 = WidthNroot/Widthnose
New ratio nasal root to nose width DT6 = WidthNewNroot/Widthnose
57
ratio of the nasal root of the nose width (DT5 ) provides a fractional assessment of the top
versus the bottom of the nasal trapezoid. Lastly, the new ratio of the nasal root to nasal
width (DT6 ) performs the same calculation as DT
5 , but uses the new mf∗ landmarks.
The descriptors for the prominence of the nasal root are shown in Table 7.6. The mini-
mum distance from left to right contour lines (DR1 ) detects the sharpness of the top of the
nasal trapezoid. Although this distance is often positive, it is possible for LL and LR to
cross one another yielding a negative distance. The number of points with a severe slope
(DR2 ) is calculated as a sum of the left and right edges which have z-direction slope greater
than or equal to three. The average slope at mf∗ (DR3 ) is the average of the z-slope at the
right and left mf∗ locations. DR4 is the ratio for the nasal root to the nose depth. The new
nasal root-slope angle (DR5 ), new normalized nasal root depth (DR
6 ), and new normalized
nasal root width (DR8 ) are calculated the same way as LN8 , LN4 , and LN3 , respectively, but
by using the newly calculated mf∗. Lastly, the new ratio nasal root to nose depth (DR7 ) is
a ratio between the depth of the new nasal root depth to the depth of the nose.
Table 7.6: List of nasal root shape-based descriptors.
Description Name Mathematical Definition
Min distance from left and right L DR1 = minj(LR
j − LLj )
Number of severe slope points ≥ 3 DR2 = num[Zslope(LL
j ) ≥ 3] + num[Zslope(LRj ) ≥ 3]
Average slope at mf∗ DR3 = avg[Zslope(mf∗L), Zslope(mf∗R)]
Ratio nasal root to nose depth DR4 = DepthNroot/Depthnose
New nasal root-slope angle DR5 = ∠(mf∗L, s,mf∗R)
New normalized nasal root depth DR6 = DepthNewNroot/Depthface
New ratio nasal root to nose depth DR7 = DepthNewNroot/Depthnose
New normalized nasal root width DR8 = WidthNewNroot/Widthface
58
7.5.3 Oral Shape-based Descriptors
The descriptors for the prominence of the nasal root are shown in Table 7.7. DO1 is used as
a descriptor for the Open Mouth facial feature and uses the peak image of an individual to
compare the lip areas to the bounding box (Peak) which contains them. The area of Peak
is restricted by the location of li, ls, chL and chR.
DO2 , DO
3 and DO4 are used as descriptors for the Small Mouth facial feature. The normalized
depth of the upper (DO2 ) and lower (DO
3 ) lips are calculated as ratios of the vermilion depth
to the depth of the face. DO4 is the ratio of the columella to the upper lip height.
DO5 , DO
6 and DO7 are used as descriptors for the Downturned Corners of the Mouth fa-
cial feature. The inclination angles of the left (DO5 ) and right (DO
6 ) sides of the lip are
found by calculating the angle between the arm of the selected ch and sto, and the horizon-
tal arm through stoy. Lastly, the corners of mouth angle (DO7 ) is calculated by finding the
southern angle between the points chL, sto and chR, yielding an obtuse angle if the corners
are downturned and a reflex angle if the corners are upturned.
Table 7.7: List of oral shape-based descriptors.
Description Name Mathematical Definition
Rectangularity of lips at chz DO1 = num(Peak = 1)/area(Peak)
Normalized depth of upper lip DO2 = (lsz − stoz)/Depthface
Normalized depth of lower lip DO3 = (stoz − liz)/Depthface
Ratio of columella to upper lip height DO4 = (sny − lsy)/(sny − stoy)
R-side lip inclination angle DO5 = ∠(R ch, sto, horizontal)
L-side lip inclination angle DO6 = ∠(L ch, sto, horizontal)
Corners of mouth angle DO7 = ∠(L ch, sto,R ch)
59
Chapter 8
LOCAL REPRESENTATION RESULTS
In this chapter the results from using local data representations are provided. In the pre-
liminary studies section ground truth data will be used to develop baselines, the accuracy
of automatic landmark prediction will be discussed, and the threshold selection for the bul-
bous nasal tip descriptors will be described. The experimental section will be divided into
landmark-based descriptor assessment and shape-based descriptor assessment. For both
types of descriptors, the similarity to expert median scores will be computed and clas-
sification performance of 22q11.2DS will be measured. As the best performing classifier
alternated sporadically between Naive Bayes and SVM, both sets of results are shown when
necessary. Note that even in those cases where SVM is the better classifier, the performance
improvement is unlikely to be statistically significant.
8.1 Preliminary Studies
8.1.1 Experts’ Median Scores as 22q11.2DS Predictors
The use of experts’ median scores for classification was assessed. Although there are four
facial features each for the nose and mouth, Small Nasal Alae and Retrusive Chin are fea-
tures for which successful landmark and shape-based descriptors have yet to be developed.
The sets missing the above two features are labeled as auto-3N (containing Bulbous Nasal
Tip, Tubular Appearance, Prominent Nasal Root) and auto-3O (containing Small Mouth,
Open Mouth, Downturned Corners of the Mouth), and the set containing all but the two
above features is labeled as auto-6 (auto-3N and auto-3O combined).
As seen in Table 8.2a, the use of the SVM classifier yields slightly better performance
when classifying all four nasal features, but when only the auto-3N features are used (Bul-
bous Nasal Tip, Prominent Nasal Root, and Tubular Appearance), Naive Bayes is the better
60
Table 8.1: Using experts’ median scores for facial features to predict 22q11.2DS. In eachtable, the upper set of results was obtained using Naive Bayes, the lower using SVM.
Dataset BNT PNR TA SNA auto-3N ALL-N
F-measure 0.68± 0.18 0.49± 0.23 • 0.46± 0.23 • 0.78± 0.16 0.73± 0.16 0.74± 0.18Precision 0.81± 0.20 0.60± 0.27 • 0.62± 0.32 0.86± 0.17 0.79± 0.17 0.83± 0.19Recall 0.63± 0.23 0.44± 0.24 0.39± 0.22 • 0.76± 0.21 0.71± 0.21 0.71± 0.23% Accuracy 72.49±14.42 58.83±14.61 • 57.00±16.23 • 80.69±12.29 74.93±13.75 76.92±13.51
F-measure 0.71± 0.17 0.52± 0.21 • 0.49± 0.22 • 0.80± 0.16 0.71± 0.17 0.80± 0.16Precision 0.85± 0.17 0.69± 0.26 0.67± 0.29 0.88± 0.16 0.85± 0.17 0.88± 0.16Recall 0.65± 0.22 0.46± 0.23 0.42± 0.22 • 0.77± 0.20 0.65± 0.22 0.77± 0.20% Accuracy 75.60±13.62 62.46±13.05 • 60.24±14.82 • 82.51±11.66 75.60±13.62 82.51±11.66
• statistically significant degradationBNT - Bulbous Nasal Tip, PNR - Prominent Nasal Root, TA - Tubular Appearance, SNA - Small Nasal Alae
(a) Nasal facial features.
Dataset OM SM DCM RC auto-3O ALL-O
F-measure 0.21± 0.21 0.48± 0.24 ◦ 0.27± 0.22 0.35± 0.28 0.48± 0.23 ◦ 0.52± 0.23 ◦Precision 0.34± 0.37 0.68± 0.33 0.26± 0.21 0.63± 0.46 0.62± 0.29 0.68± 0.29 ◦Recall 0.16± 0.17 0.40± 0.24 ◦ 0.32± 0.28 0.26± 0.24 0.43± 0.24 ◦ 0.45± 0.24 ◦% Accuracy 47.90±12.09 61.14±14.64 ◦ 34.53±10.25 • 60.07±13.99 ◦ 58.21±15.76 62.04±15.50 ◦F-measure 0.33± 0.23 0.49± 0.25 0.37± 0.25 0.35± 0.29 0.48± 0.24 0.51± 0.24Precision 0.32± 0.23 0.73± 0.33 ◦ 0.39± 0.26 0.65± 0.47 0.69± 0.32 ◦ 0.71± 0.30 ◦Recall 0.37± 0.31 0.40± 0.24 0.39± 0.30 0.26± 0.25 0.40± 0.23 0.43± 0.24% Accuracy 40.54±12.21 63.92±14.28 ◦ 44.93±17.34 61.89±12.10 ◦ 62.06±14.81 ◦ 63.35±15.04 ◦
◦, • statistically significant improvement or degradationOM - Open Mouth, SM - Small Mouth, DCM - Downturned Corners of Mouth, RC - Retrusive Chin
(b) Oral facial features.
Dataset auto-6 ALL
F-measure 0.61± 0.19 0.80± 0.16 ◦Precision 0.70± 0.22 0.85± 0.17Recall 0.58± 0.23 0.80± 0.19 ◦% Accuracy 65.31±14.07 81.40±13.39 ◦F-measure 0.71± 0.15 0.80± 0.14Precision 0.76± 0.18 0.86± 0.16Recall 0.71± 0.20 0.78± 0.19% Accuracy 71.99±13.97 80.67±12.79
◦ statistically significant improvement
(c) Comparison to 2.5D global results.
performer. Note that in both the nasal and oral cases, if the smaller sets of features auto-
3(N or O) are used, the performance is worse than that of global 2.5D, while when using
all four features ALL-(N or O), the performance matches that of global 2.5D.
61
When classification is done using the oral facial features (Open Mouth, Small Mouth, Down-
turned Corners of the Mouth, Retrusive Chin), the performance on any of the features or
their combinations is very poor, with only the use of all oral features for classification re-
ceiving an F-measure above 0.5 (Table 8.2b).
When the auto-6 features are used for classification, the performance decreases from that of
just using the auto-3N set. When all the nasal and oral median experts’ scores are used, the
performance shows improvement over that of the global 2.5D method, see Table 8.2c. The
statistically significant difference between the use of the auto-6 and all the scores can be
explained by the fact that Small Nasal Alae and Retrusive Chin are the top two attributes
used for classification. The Bulbous Nasal Tip and Open Mouth facial features fall into
second place, followed by Small Mouth in third.
8.1.2 Automatic Landmark Placement
The ability to properly locate anthropometric landmarks using the automated system was
checked. A visual inspection of the landmark location was used to determine accuracy of
placement. These results were compared to the availability of hand-labeled landmarks as
completed by an expert. As seen in Table 8.2, the nasal landmarks were detected at 98%+
accuracy. Oral landmarks had a slightly lower accuracy rate (93% on average), and the
helper landmarks mf were found at 92% accuracy. The availability of the hand-labeled
data is generally less than that of the automatic detection, with the note that the li field in
the hand-labeled data was purposefully omitted when a subject’s mouth was open. Based
on these results, perhaps some of the tedious manual land-marking may be substituted with
a first-pass auto landmark placement to be corrected by an expert only when necessary.
Table 8.2: Correct automatic placement compared to availability of hand-labeled landmarks.
s prn sn alL alR ls sto li chL chR mfL mfR
Hand-labeled 100% 98% 98% 98% 98% 98% 87% 78% 83% 83% n/a n/aAutomatic 100% 100% 100% 98% 100% 94% 95% 93% 92% 92% 92% 92%
62
8.1.3 Performance of Anthropometric Landmark Distance Measures
As anthropometric landmark distance measures are the standard for comparison in the
clinical setting, the quality of this method needed to be assessed. As described in Section
3.3, L60 is a 1:1 affected vs. control set contained within the W86 set. For valid comparison
to the baseline 2.5D depth image method, set L60-2.5D was generated as a subset of the
W86 set. L60-ALL is the data set where all of the original 33 distance measures were used
to classify individuals, while L60-10 is the set of the 10 inter-landmark distances, which
match the set of distances that can be calculated on the automatically detected landmarks
(L60-LA). As seen in Table 8.3, the performance of L60-2.5D is less than that of the global
2.5D baseline on the set W86 (F-measure 0.77). All of the landmark distance methods
were worse than L60-2.5D, as illustrated by the ROC curve in Figure 8.1. Note that when
using only the 10 distance measures, the automatically generated landmark set L60-LA,
outperforms the hand-labeled landmark set L60-10.
Table 8.3: Prediction of 22q11.DS using landmark distance measures.
Dataset L60-2.5D L60-ALL L60-10 L60-LA
F-measure 0.71 0.12 • 0.04 • 0.49 •Precision 0.83 0.13 • 0.04 • 0.48 •Recall 0.67 0.13 • 0.06 • 0.54% Accuracy 75.67 49.33 • 46.33 • 48.67 •
• statistically significant degradation
Figure 8.1: ROC performance curve.
63
8.1.4 Bulbous Nasal Tip Threshold Selection
For each descriptor δ, threshold Thδ was found empirically to maximize the difference
of average values between individuals with and without BNT . To find these thresholds,
severity scores of all individuals were calculated in threshold increments of 0.01 between 0
and 1. For each increment step, the average of the group without BNT and the average
of the group with BNT were calculated. The difference between these two groups was
then maximized for each descriptor yielding ThR = 0.71, ThC = 0.10, ThT = 0.37, and
ThU = 0.67 (Figure 8.2). To check that Thδ are stable in the population, the above study
was repeated for an expanded set of individuals totaling 164 (53 affected with 22q11.2DS).
For each of the four descriptors, the new thresholds were found to be unchanged.
(a) R threshold (b) C threshold
(c) T threshold (d) U threshold
Figure 8.2: Empirical approach to threshold detection for each descriptor.
64
8.2 Experiments
8.2.1 Landmark-Based Nasal Descriptor Similarity to Expert Scores
The purpose of this experiment was to assess the ability of the landmark-based nasal de-
scriptors (LN ) to match the experts’ median scores for these features. As seen in Table
8.4, the ability of LN to match the experts’ median response for any nasal facial feature
is relatively weak. In the case of Tubular Appearnace the performeance is slightly higher,
and this can be explained by the fact that LN5 is a measure of tubularity, as its definition is
based on the shape of the nasal trapezoid angles.
Table 8.4: Predicting expert marked nasal features using LN data set. The upper set ofresults was obtained using Naive Bayes, the lower using SVM.
Dataset BNT PNR TA SNA
F-measure 0.66± 0.16 0.69± 0.13 0.74± 0.10 0.66± 0.13Precision 0.63± 0.14 0.68± 0.10 0.69± 0.08 0.64± 0.15Recall 0.70± 0.21 0.73± 0.19 0.82± 0.15 0.73± 0.21%Accuracy 57.11±17.18 59.56±13.89 62.03±12.50 60.08±14.19
F-measure 0.68± 0.12 0.79± 0.04 0.81± 0.03 0.64± 0.14Precision 0.61± 0.09 0.66± 0.04 0.69± 0.04 0.59± 0.14Recall 0.81± 0.21 0.99± 0.06 1.00± 0.02 0.73± 0.20%Accuracy 57.28±11.49 65.44± 5.51 68.63± 4.29 55.82±12.90
BNT - Bulbous Nasal Tip, PNR - Prominent Nasal Root
TA - Tubular Appearance, SNA - Small Nasal Alae
8.2.2 Landmark-Based Oral Descriptor Similarity to Expert Scores
The purpose of this experiment was to assess the ability of the landmark-based oral descrip-
tors (LO) to match the experts’ median scores for oral facial features. As seen in Table 8.5,
Open Mouth is well predicted, most likely due to LO2 and LO3 , which are ratios of the upper
and lower lips to the entire mouth height, and LO5 and LO6 , whose angles would become
steeper as the mouth is opened. The high performance in predicting Retrusive Chin is a
red herring, as no landmark-based descriptors include any information below the lower lip.
65
Table 8.5: Predicting the four oral features using LO data set. The upper set of results wasobtained using Naive Bayes, the lower using SVM.
Dataset OM SM DCM RC
F-measure 0.93±0.06 0.76± 0.14 0.77± 0.09 0.90±0.07Precision 0.95±0.06 0.79± 0.13 0.71± 0.11 0.86±0.04Recall 0.93±0.10 0.75± 0.18 0.87± 0.14 0.94±0.11% Accuracy 89.11±9.21 66.50±16.03 68.53±11.92 82.39±9.75
F-measure 0.92±0.03 0.85± 0.03 0.76± 0.05 0.93±0.02Precision 0.86±0.05 0.74± 0.04 0.62± 0.05 0.87±0.03Recall 0.98±0.04 1.00± 0.00 0.97± 0.07 1.00±0.00% Accuracy 85.07±5.24 74.44± 4.10 61.43± 6.02 87.22±3.23
OM - Open Mouth, SM - Small Mouth
DCM - Downturned Corners of the Mouth, RC - Retrusive Chin
8.2.3 Landmark-Based Descriptor Classification of 22q11.2DS
The prediction of 22q11.2DS performance for the nasal, oral and combined landmark-based
descriptors are compared to the 2.5D global approach. As seen in Table 8.6, although using
the combination of both the nasal and oral landmark-based descriptors provides an im-
provement over using just one type of landmark-based descriptor, none of these outperform
the 2.5D global descriptor. Note also that in this case Naive Bayes is the better performing
classifier.
Table 8.6: Predicting 22q11.2DS using landmark-based descriptors. The upper set of resultswas obtained using Naive Bayes, the lower using SVM.
Dataset 2.5D LN LO LN+O
F-measure 0.77± 0.17 0.53± 0.22 • 0.47± 0.23 • 0.55± 0.22 •Precision 0.87± 0.17 0.57± 0.25 • 0.62± 0.30 • 0.64± 0.26 •Recall 0.72± 0.22 0.54± 0.27 0.41± 0.24 • 0.53± 0.25% Accuracy 79.90±13.62 56.51±16.35 • 57.29±15.57 • 61.15±16.12 •F-measure 0.45± 0.23 0.48± 0.20 0.47± 0.21 0.53± 0.19Precision 0.55± 0.29 0.51± 0.22 0.50± 0.24 0.56± 0.21Recall 0.42± 0.25 0.51± 0.26 0.48± 0.25 0.54± 0.23% Accuracy 53.78±16.32 50.31±16.51 50.51±16.89 54.35±17.23
• statistically significant improvement or degradation
66
8.2.4 Shape-Based Descriptor Similarity to Expert Scores
Compared to landmark-based descriptors, shape-based descriptors should perform better in
matching experts’ median scores. As seen in Table 8.7, the greatest improvement in match-
ing the experts’ median scores is in predicting Bulbous Nasal Tip and Tubular Appearance.
The prediction of Prominent Nasal Root and Small Mouth are slightly improved. Lastly,
Open Mouth and Downturned Corners of the Mouth match the landmark-based descriptors,
as both of these features can be easily described by Euclidean geometry measures.
Table 8.7: Using shape-based descriptors for predicting nasal and oral facial features. Foreach descriptor type, the right arrow indicated the facial feature experts’ median score towhich it is compared. The upper set of results was obtained using Naive Bayes, the lowerusing SVM.
Dataset DB → BNT DT → TA DR → PNR DO1 → OM DO
2:4 → SM DO5:7 → DCM
F-measure 0.88± 0.10 0.73± 0.15 0.65± 0.20 0.93± 0.07 0.65± 0.15 0.76± 0.11Precision 0.88± 0.12 0.85± 0.14 0.81± 0.21 0.93± 0.07 0.77± 0.16 0.70± 0.11Recall 0.90± 0.13 0.65± 0.18 0.57± 0.22 0.93± 0.09 0.58± 0.18 0.85± 0.15% Accuracy 85.78± 11.15 68.03±15.09 62.89±17.54 88.00±10.85 56.07±15.52 67.07±13.68
F-measure 0.88± 0.08 0.80± 0.05 0.74± 0.09 0.92± 0.03 0.85± 0.03 0.77± 0.03Precision 0.86± 0.12 0.69± 0.05 0.65± 0.06 0.86± 0.04 0.74± 0.04 0.63± 0.04Recall 0.91± 0.11 0.97± 0.08 0.89± 0.16 1.00± 0.00 1.00± 0.00 1.00± 0.00% Accuracy 84.42± 10.87 67.79± 6.13 60.83± 9.42 86.11± 4.23 74.44± 4.10 62.78± 4.08
8.2.5 Shape-Based Descriptor Classification of 22q11.2DS
The prediction of 22q11.2DS for the nasal and oral shape-based landmarks are compared
to the 2.5D global approach. As seen in Table 8.8, using all the nasal descriptors (DN ), the
performance of 2.5D is matched, while using just the oral DO descriptors disease prediction
is decreased from the 2.5D baseline. When the both the nasal and oral descriptors are used
together DALL, the performance exceeds that of the 2.5D global descriptor.
67
Table 8.8: Performance of shape-based descriptors in predicting 22q11.2DS. The upper setof results was obtained using Naive Bayes, the lower using SVM.
Dataset 2.5D β DB DT DR DN DO DALL
F-measure 0.77 0.64 0.73 0.69 0.71 0.77 0.66 0.79Precision 0.87 0.77 0.82 0.64 • 0.64 • 0.72 0.74 0.75Recall 0.72 0.58 0.69 0.77 0.83 0.84 0.64 0.85% Accuracy 79.90 70.19 76.42 65.99 67.13 74.71 69.92 77.29
F-measure 0.45 0.64 0.70 ◦ 0.71 ◦ 0.65 ◦ 0.78 ◦ 0.59 0.77 ◦Precision 0.55 0.82 ◦ 0.83 ◦ 0.65 0.62 0.79 ◦ 0.67 0.78Recall 0.42 0.55 0.63 0.80 ◦ 0.72 ◦ 0.79 ◦ 0.56 0.80 ◦% Accuracy 53.78 71.83 ◦ 74.69 ◦ 66.99 62.85 77.82 ◦ 62.78 76.71 ◦
◦, • statistically significant improvement or degradation
8.2.6 All Local Descriptor Classification of 22q11.2DS
SVM’s performed better in matching experts’ median scores, and Naive Bayes performed
better in classifying the disease status of 22q11.2DS. When all descriptors for a facial feature
are used, the SVM classification yields improved scores from 2.5D global results for the nasal
descriptors and all descriptors combined.
Table 8.9: Predicting 22q11.2DS using all nasal, all oral and all descriptors. The upper setof results was obtained using Naive Bayes, the lower using SVM.
DescriptorsDataset 2.5D LN +DN LO+DO ALL
F-measure 0.77 0.75 0.64 0.78Precision 0.87 0.71 • 0.73 0.77Recall 0.72 0.84 0.61 0.83% Accuracy 79.90 72.71 68.74 77.21
F-measure 0.45 0.81 ◦ 0.60 0.79 ◦Precision 0.55 0.83 ◦ 0.68 0.81 ◦Recall 0.42 0.81 ◦ 0.58 0.81 ◦% Accuracy 53.78 80.97 ◦ 64.46 79.11 ◦
◦, • significant improvement or degradationALL is the set of all descriptors LN+O + DN+O
68
Chapter 9
CONCLUSIONS
This dissertation has discussed the development of a successful methodology for classifying
22q11.2DS disease status and quantifying the degree of dysmorphology of global and local
facial features.
9.1 Contributions
The contributions of this work are
• Automated methodology for pose alignment. Each 3D head mesh is aligned to a natu-
ral pose using, first, facial symmetry and, second, chin-forehead elevation differences.
The facial symmetry approach required human intervention in only 1% of the cases.
Due to a stronger reliance on the initial seed position, the pitch rotation approach
based on chin-forhead elevation differences required manual intervention in 15% of
the cases.
• Automated generation of global data representations, including human-readable repre-
sentations such as snapshots of three-dimensional data and curved lines, data intensive
representations such as 2.5D depth images and labeled images, as well as data aggre-
gate representations such as facial symmetry or distance from control-set average.
• Robust automated detection of landmarks, where the accuracy of landmark place-
ment (above 90% in all cases) rivaled that of hand-labeled landmark availability. This
suggests that as an alternative to tedious landmarking performed by an expert, an
automated detection of landmarks could be performed with expert intervention nec-
essary in less than 10% of the cases.
69
• Automated generation of local data descriptors for the nose and mouth. For each
facial feature, landmark-based and shape-based descriptors were developed.
• Use of global and local descriptors for 22q11.2DS classification on real clinical data.
2.5D depth images were used as a baseline representation scheme (F-measure 0.77),
with snapshots of three-dimensional data and curved lines having a slightly decreased
classification performance (best F-measure 0.71 and 0.76, respectively). When used
with standard medical research methodology, the global Mahalanobis distance from
control-set average was found to be the best data representation for classification
(F-measure 0.94), while methods such as symmetry, topographically labeled images,
and local landmark-based descriptors all performed poorly (best F-measure 0.59, 0.58,
0.55, respectively). Classification on curvature labeled images (best F-measure 0.73)
and local shape-based descriptors (F-measure 0.78) matched that of the 2.5D depth
image baseline.
• Use of local descriptors for shape quantification of nasal and oral facial features. Each
landmark-based and shape-based descriptor method was compared to the median of
the experts’ scores and shape-based descriptors were found to outperform landmark-
based descriptors. Nasal features such as Bulbous Nasal Tip and Tubular Appearance,
produced F-measure scores of over 0.80, while Prominent Nasal Root was harder to
detect at an F-measure score of 0.65. Open Mouth was the only facial feature examined
that matched the expert scores at an F-measure of more than 0.90 using both the
landmark-based and shape-based descriptors, while Small Mouth and Downturned
Corners of the Mouth shape-based descriptors had F-measure scores of 0.85 and 0.77,
respectively. The mismatches to the expert scores in both the nasal and oral features
are not necessarily incorrect predictions, as selective screening of mismatches has
suggested mislabeling of the facial feature by experts. Examples of such mislabeling
include marking the presence of a bulbous nasal tip, when the small size of the nasal
alae is the actual feature or marking the presence of a prominent nasal root, when the
nose is tubular in appearance.
70
Representative global and local descriptor classification of 22q11.2DS per individual can
be seen in Table 9.2 for males and Table 9.3 for females, with the legend given in Ta-
ble 9.1. Prediction errors are marked as dark boxes, while correct prediction is white.
Note that classification using most global descriptors tends to complement that of the local
descriptors. The proportionally smaller male set does contain more classification errors,
supporting the need to recruit more study participants. Affected individuals are more likely
to be misclassified as controls, supporting the fact that the 22q11.2DS does have a very
subtle phenotype. Lastly, the errors in local descriptor classification support the fact that
phenotypic variation of any facial feature within the general population increases the dif-
ficulty of discriminating between 22q11.2DS affected individuals and the control population.
Although the focus of this work was 22q11.2 deletion syndrome affected individuals, the
methods developed for this phenotype should be widely applicable to the shape-based quan-
tification of any other craniofacial dysmorphology.
Table 9.1: Legend for classification errors for each individual in Table 9.2 and Table 9.3.
Name of descriptor Type Description
3D snp Global Based on 3D snapshot data representation3D snpc Global Based on 3D snapshot data representation cutoff at ears2.5D Global Based on 2.5D depth image data representation generated from 3D snpc
v3 Global Three vertical curved line representationv5 Global Five vertical curved line representationh5 Global Five horizontal curved line representationh7 Global Seven vertical curved line representationg5 Global 5x5 curved line grid representationg7 Global 7x7 curved line grid representationsym Global Symmetry of face data representationtopo 15 Global Topographic data label with 15 step size data representationK Global Gaussian curvature label thresholded between values -0.5 and +0.5|K| Global Absolute value of K label also thresholded at value 0.5Besl-Jain Global Besl-Jain curvature labelMah Global Mahalanobis distance from control averageL Local All local landmark-based descriptorsDB Local Set of bulbous nasal tip descriptorsDT Local Set of tubular appearance descriptorsDR Local Set of prominent nasal root descriptorsDN Local Set of all nasal descriptors {DB , DT , DR}DO Local Set of all oral descriptorsD Local Set of all shape-based descriptors {DB , DT , DR, DO}LD Local Set of all local descriptors {LN , LO, DB , DT , DR, DO}
71
Table 9.2: Errors in male individuals of W86 dataset. Representative global and localdescriptors are shown. Dark boxes signify errors.
Affected 3Dsnp
3Dsnpc
2.5D v3 v5 h5 h7 g5 g7 sym topo15
K |K| BeslJain
Mah L DB DT DR DN DO D LD
M5 0 0 1M6 0 0 1M10 0 0 1M9 0 0 1M13 0 0 1M2 0 0 3M6 0 0 2M13 0 0 2M7 0 0 1M9 11 0 2M1 2 0 1M2 0 0 1M2 0 0 2M20 0 0 1M8 0 0 1M5 0 0 2M6 0 0 3M13 0 0 3M10 0 0 2
Control 3Dsnp
3Dsnpc
2.5D v3 v5 h5 h7 g5 g7 sym topo15
K |K| BeslJain
Mah L DB DT DR DN DO D LD
M13 0 1 4M5 0 1 1M8 3 1 2M9 2 1 1M2 2 1 4M7 1 1 3M7 1 1 5M2 1 1 5M6 8 1 2M9 11 1 2M10 1 1 5M13 6 1 3M13 1 1 1M5 11 1 1M1 0 1 2M10 1 1 6M7 1 1 1M2 0 1 9M20 3 1 1
72
Table 9.3: Errors in female individuals of W86 dataset. Representative global and localdescriptors are shown. Dark boxes signify errors.
Affected 3Dsnp
3Dsnpc
2.5D v3 v5 h5 h7 g5 g7 sym topo15
K |K| BeslJain
Mah L DB DT DR DN DO D LD
F1 11 0 6F5 0 0 1F7 0 0 2F14 0 0 1F21 0 0 1F13 0 0 2F18 0 0 1F7 0 0 1F3 0 0 1F2 0 0 1F26 0 0 1F31 0 0 1F8 0 0 2F1 0 0 1F34 10 0 1F10 0 0 2F0 10 0 1F13 0 0 1F11 0 0 1F4 0 0 2F8 0 0 1F4 0 0 1F4 10 0 1F9 3 0 3
Control 3Dsnp
3Dsnpc
2.5D v3 v5 h5 h7 g5 g7 sym topo15
K |K| BeslJain
Mah L DB DT DR DN DO D LD
F5 5 1 1F25 1 1 1F4 1 1 2F7 3 1 1F3 0 1 4F14 9 1 1F5 5 1 2F9 10 1 2F1 11 1 3F10 9 1 3F18 10 1 1F2 3 1 1F21 1 1 2F29 6 1 1F8 8 1 3F8 8 1 4F1 2 1 8F1 1 1 1F12 4 1 1F15 2 1 1F13 3 1 1F4 6 1 1F34 2 1 1F8 8 1 2
73
9.2 Future Work
As the classification of 22q11.2DS disease status has a solution which rivals experts, further
work in this area should focus on local facial feature description and the development of a
full quantitive description of the face.
Local Facial Feature Description
Additional local features should be investigated and new landmark- and shape-based de-
scriptors should be developed. The most promising facial features for study are the ears,
eyes and midface hypolasia and some improvements can be made to the description of
pinched nasal alae and retrusive chin.
Ears The shapes of symptomatic ears are listed to have any of the following features:
small, protuberant, cup-shaped, attached lobules, overfolded helix (cauliflower-like appear-
ance) and mildly asymmetric placement on the head. Small and protuberant ears could be
detected using descriptors developed for the current 2.5D depth image. Cup-shaped, at-
tached lobules and overfolded helix features mostly likely require a more stringent analysis of
the original 3D mesh shape. Lastly, asymmetric placement on the head, can be approached
using a local version of the global symmetry measure developed in this dissertation.
Eyes The shapes of symptomatic eyes are listed to have any of the following features:
small, mild orbital hypertelorism (distance between eyes), mild vertical orbital dystopia
(vertical placement and inclination angle of left vs. right eye), and hooded upper eyelids.
For each one of these features, new shape descriptors can be developed on the current 2.5D
depth image.
Midface Hypoplasia Although a feature of 22q11.2DS, the cleft-lip and palate research
community is very interested in detecting the quality of midface morphology. Here, a subset
of vertical curved lines through the cheek area can be assessed for global curvature quality.
74
Pinched Nasal Alae Although mentioned in this dissertation, the methods developed
so far were not able to properly assess the quality of these two facial features. For pinched
nasal alae, an analysis of the original 3D mesh may yield improved assessment results, but
the coarseness of the mesh may prove to be inadequate.
Retrusive Chin For the retrusive chin feature, the general shape of the skull is a needed
prerequisite, and as a first step a skull shape estimation method would need to be developed
to fill in the areas removed due to noise caused by hair. Once this is done, a study must be
conducted whether the retrusive chin is a product of poor mandible development, or if it is
a result of a rotation of the skull forward, yielding to a prominent bulging of the forehead.
The latter option can be studied independently by developing a forehead shape descriptor
to discriminate between concave, flat and convex (bulging) foreheads.
Quantitative Facial Description
As phenotype-genotype studies of different craniofacial dysmorpholgy syndromes are of great
interest to researchers, the development of a full quantitative facial description is necessary.
Since expert qualitative ratings of shapes can be subject to low inter-rater reliability, the use
of automatic local facial shape descriptors can be used to avoid such problems. Lastly, since
each descriptor offers a quantitative value for the feature it is describing, combinations of
these values can be used to study gene expression variation and the etiology of craniofacial
malformation.
75
BIBLIOGRAPHY
[1] D Aha, D Kibler, and M Albert. Instance-based learning algorithms. Mach Learn,
1991.
[2] E Akagunduz and I Ulusoy. 3d object representation using transform and scale invariant
3d features. ICCV, pages 1–8, 2007.
[3] Kristina Aldridge, Simeon A Boyadjiev, George T Capone, Valerie B DeLeon, and
Joan T Richtsmeier. Precision and error of three-dimensional phenotypic measures
acquired from 3dmd photogrammetric images. Am J Med Genet, 138A:247–53, 2005.
[4] Judith E Allanson. Objective techniques for craniofacial assessment: what are the
choices? Am J Med Genet, 70:1–5, 1997.
[5] American Cleft Palate-Craniofacial Association.
[6] LL Baxter, TH Moran, Joan T Richtsmeier, J Troncoso, and RH Reeves. Discovery and
genetic localization of down syndrome cerebellar phenotypes using the ts65dn mouse.
Hum Mol Genet, 9:195–202, 2000.
[7] D Becker, T Pilgram, L Marty-Grames, D Govier, Jefferey L Marsh, and Alex A Kane.
Accuracy in identification of patients with 22q11. 2 deletion by likely care providers
using facial photographs. Plast Reconstr Surg, 2004.
[8] P Belhumeur, J Hespanha, and David J Kriegman. Eigenfaces vs. fisherfaces: recogni-
tion using class specific linear projection. IEEE T Pattern Anal, 1997.
[9] Volker Blanz. A learning-based high-level human computer interface for face modeling
and animation. LECTURE NOTES IN COMPUTER SCIENCE, 4451:296, 2007.
76
[10] Stefan Boehringer, Tobias Vollmar, Christiane Tasse, Rolf P Wurtz, Gabriele Gillessen-
Kaesbach, Bernhard Horsthemke, and Dagmar Wieczorek. Syndrome identification
based on 2d analysis software. Eur J Hum Genet, 14:1082–1089, 2006.
[11] FL Bookstein. Shape and the information in medical images: A decade of the morpho-
metric synthesis. Computer Vision and Image Understanding, 66:97–118, 1997.
[12] Bita Boozari, Matthias J Bahr, Stefan Kubicka, Juergen Klempnauer, Michael P
Manns, and Michael Gebel. Ultrasonography in patients with budd-chiari syndrome -
diagnostic signs and prognostic implications. J Hepatol, 49:572–80, 2008.
[13] L Botto, K May, P Fernhoff, A Correa, and K Coleman. A population-based study of
the 22q11. 2 deletion: Phenotype, incidence, and contribution to major birth defects
in the population. Pediatrics, 2003.
[14] KW Bowyer, Kyong I Chang, Patrick J Flynn, and X Chen. Face recognition using
2-d, 3-d, and infrared: Is multimodal better than multisample? Proceedings of the
IEEE, 94:2000–2012, 2006.
[15] C D Brack and I L Kessel. Evaluating the clinical utility of stereoscopic clinical pho-
tography. Studies in health technology and informatics, 132:42–4, 2008.
[16] Linda E Campbell, Eileen Daly, Fiona Toal, Angela F Stevens, Rayna Azuma, Marco
Catani, Virginia Ng, Therese van Amelsvoort, Xavier Chitnis, William Cutter, Declan
G M Murphy, and Kieran C Murphy. Brain and behaviour in children with 22q11.2
deletion syndrome: a volumetric and voxel-based morphometry mri study. Brain,
129:1218–1228, 2006.
[17] Kyong I Chang, Kevin W Bowyer, and Patrick J Flynn. Multiple nose region matching
for 3d face recognition under varying facial expression. IEEE T Pattern Anal, pages
1695–1700, 2006.
[18] G Chen and T Bui. Invariant fourier-wavelet descriptor for pattern recognition. Pattern
Recogn, 1999.
77
[19] T Chen and D Metaxas. Gibbs prior models, marching cubes, and deformable models:
A hybrid framework for 3d medical image segmentation. MICCAI, pages 703–710,
2003.
[20] Ying-Fan Chen, Po-Lin Kou, Shaw-Jenq Tsai, Ko-Fan Chen, Hsiang-Han Chan, Chung-
Ming Chen, and H Sunny Sun. Computational analysis and refinement of sequence
structure on chromosome 22q11.2 region: application to the development of quantita-
tive real-time pcr assay for clinical diagnosis. Genomics, 87:290–7, 2006.
[21] A Chousta, D Ville, I James, P Foray, C Bisch, P Depardon, R-C Rudigoz, and
L Guibaud. Pericallosal lipoma associated with pai syndrome: prenatal imaging find-
ings. Ultrasound Obst Gyn, 32:708–10, 2008.
[22] W Cohen. Fast effective rule induction. In Proceedings of the Twelfth International
Conference on Machine Learning, 1995.
[23] D Colbry and G Stockman. Canonical face depth map: A robust 3d representation for
face verification. CVPR, pages 1–7, 2007.
[24] Ashwin B Dalal and Shubha R Phadke. Morphometric analysis of face in dysmorphol-
ogy. Comput Methods Programs Biomed, 85:165–172, 2007.
[25] A L David, C Turnbull, R Scott, J Freeman, C M Bilardo, M van Maarle, and L S
Chitty. Diagnosis of apert syndrome in the second-trimester using 2d and 3d ultrasound.
Prenatal diag, 27:629–632, 2007.
[26] P Domingos and M Pazzani. On the optimality of the simple bayesian classifier under
zero-one loss. Mach Learn, 1997.
[27] M Feingold and W H Bossert. Normal values for selected physical parameters: an aid
to syndrome delineation. Birth Defects Orig Artic Ser, 10:1–16, 1974.
[28] L Fernandez, P Lapunzina, D Arjona, I Lopez Pajares, L Garcıa-Guereta, D Elorza,
M Burgueros, M L De Torres, M A Mori, M Palomares, A Garcıa-Alix, and A Delicado.
78
Comparative study of three diagnostic approaches (fish, strs and mlpa) in 30 patients
with 22q11.2 deletion syndrome. Clin Genet, 68:373–8, 2005.
[29] WL Fung, Eva WC Chow, GD Webb, MA Gatzoulis, and AS Bassett. Extracardiac
features predicting 22q11.2 deletion syndrome in adult congenital heart disease. Int J
Cardiol, 2008.
[30] K Golding-Kushner and Robert Shprintzen. Velo-cardio-facial syndrome volume 1.
Plural Pub Inc, 2007.
[31] D Gothelf, F Hoeft, C Hinard, JF Hallmayer, JV Stoecker, SE Antonarakis, MA Morris,
and AL Reiss. Abnormal cortical activation during response inhibition in 22q11. 2
deletion syndrome. Hum Brain Mapp, 28:533–42, 2007.
[32] L Guyot, M Dubuc, J Pujol, O Dutour, and N Philip. Craniofacial anthropometric
analysis in patients with 22 q 11 microdeletion. Am J Med Genet, 100:1–8, 2001.
[33] J Hall, U Froster-Iskenius, and J Allanson. Handbook of normal physical measurements.
Oxford University Press New York, 1989.
[34] M Hall. Correlation-based feature selection for machine learning. cs.waikato.ac.nz,
1999.
[35] Peter Hammond. The use of 3d face shape modelling in dysmorphology. Arch Dis
Child, 92:1120–6, 2007.
[36] Peter Hammond, T Hutton, J Allanson, and L Campbell. 3d analysis of facial mor-
phology. Am J Med Genet, 2004.
[37] Peter Hammond, Tim J Hutton, Judith E Allanson, Bernard F Buxton, Linda E
Campbell, Jill Clayton-Smith, Dian Donnai, Annette Karmiloff-Smith, Kay Metcalfe,
Kieran C Murphy, Michael A Patton, Barbara Pober, Katrina Prescott, Pete Scam-
bler, Adam Shaw, Ann C M Smith, Angela F Stevens, I Karen Temple, Raoul C M
Hennekam, and May Tassabehji. Discriminating power of localized three-dimensional
facial morphology. Am J Hum Genet, 77:999–1010, 2005.
79
[38] Carrie L Heike. Research plan - chromosome 22q11.2 deletion syndrome. 2005.
[39] Carrie L Heike, Michael L Cunningham, AV Hing, E Stuhaug, and JR Starr. Picture
perfect? reliability of craniofacial anthropometry using 3d digital stereophotogramme-
try in individuals with and without 22q11.2 deletion syndrome. J Plast Reconstr Surg,
2009.
[40] Tim J Hutton. Dense surface models of the human face. Biomedical Informatics Unit,
Eastman Dental Institute, University College London, 2004.
[41] ISTI - CNR. Meshlab. Visual Computing Lab.
[42] G Jalali, J Vorstman, A Errami, and R Vijzelaar. Detailed analysis of 22q11. 2 with a
high density mlpa probe set. Hum Mutat, 2007.
[43] G John and P Langley. Estimating continuous distributions in bayesian classifiers.
Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995.
[44] I T. Jolliffe. Principal Component Analysis. Springer verlag, 2002.
[45] Ioannis A Kakadiaris, Georgios Passalis, George Toderici, Mohammed N Murtuza,
Yunliang Lu, Nikos Karampatziakis, and Theoharis Theoharis. Three-dimensional face
recognition in the presence of facial expressions: an annotated deformable model ap-
proach. IEEE T Pattern Anal, 29:640–649, 2007.
[46] S Keerthi, S Shevade, and C Bhattacharyya. Improvements to platt’s smo algorithm
for svm classifier design. Neural Comput, 2001.
[47] MM Kennelly and P Moran. A clinical algorithm of prenatal diagnosis of radial ray
defects with two and three dimensional ultrasound. Prenatal diag, 27:730–737, 2007.
[48] H Kitaura, K Yonetsu, H Kitamori, K Kobayashi, and T Nakamura. Standardization
of 3-d ct measurements for length and angles by matrix transformation in the 3-d
coordinate system. Cleft Palate-Cran J, 37:349–356, 2000.
[49] L Kobrynski and K Sullivan. Velocardiofacial syndrome, digeorge syndrome: the chro-
mosome 22q11. 2 deletion syndromes. Lancet, 2007.
80
[50] E Learned-Miller, Q Lu, A Paisley, and P Trainer. Detecting acromegaly: Screening
for disease with a morphable model. MICCAI, 2006.
[51] Y Lee, I Kim, J Shim, and D Marshall. 3d facial image recognition using a nose vol-
ume and curvature based eigenface. LECTURE NOTES IN COMPUTER SCIENCE,
4077:616, 2006.
[52] M Leordeanu, M Hebert, and R Sukthankar. Beyond local appearance: Category
recognition from pairwise interactions of simple features. CVPR, 2007.
[53] M Levoy, K Pulli, B Curless, and S Rusinkiewicz. The digital michelangelo project: 3d
scanning of large statues. SIGGRAPH, 2000.
[54] Ze-Nian Li and Mark S. Drew. Fundamentals of Multimedia. Pearson Prentice Hall,
2003.
[55] HJ Lin, S Ruiz-Correa, Linda G Shapiro, ML Speltz, Michael L Cunningham, and
Raymond Sze. Predicting neuropsychological development from skull imaging. EMBC,
pages 3450–3455, 2006.
[56] Xiaoming Liu, Peter H Tu, and F Wheeler. Face model fitting on low resolution images.
BMVC, 2006.
[57] H Loos, Dagmar Wieczorek, Rolf P Wurtz, and C von-der Malsburg. Computer-based
recognition of dysmorphic faces. Eur J Hum Genet, 2003.
[58] Thomas R Nelson, Eun K Ji, Jong H Lee, Michael J Bailey, and Dolores H Pretorius.
Stereoscopic evaluation of fetal bony structures. J Ultras Med, 27:15–24, 2008.
[59] B Ommer and JM Buhmann. Learning the compositional nature of visual objects.
CVPR, pages 1–8, 2007.
[60] J Platt. Fast training of support vector machines using sequential minimal optimization.
portal.acm.org, 1999.
[61] J Quinlan. C4. 5: Programs for machine learning. books.google.com, 1993.
81
[62] Joan T Richtsmeier, Valerie B DeLeon, and SR Lele. The promise of geometric mor-
phometrics. Yearb Phys Anthropol, 45:63–91, 2002.
[63] S Romdhani and T Vetter. Estimating 3d shape and texture using pixel intensity,
edges, specular highlights, texture constraints and a prior. CVPR, 2005.
[64] S Romdhani and Thomas Vetter. 3d probabilistic feature point model for object de-
tection and recognition. CVPR, pages 1–8, 2007.
[65] S Ruiz-Correa, Linda G Shapiro, M Meila, G Berson, Michael L Cunningham, and
Raymond Sze. Symbolic signatures for deformable shapes. IEEE T Pattern Anal,
pages 75–90, 2006.
[66] Chafik Samir, Anuj Srivastava, and Mohamed Daoudi. Three-dimensional face recog-
nition using shapes of facial curves. IEEE T Pattern Anal, 28:1858–1863, 2006.
[67] William J Schroeder, Kenneth M Martin, and William E Lorensen. The design and
implementation of an object-oriented toolkit for 3d graphics and visualization. IEEE
Visualization, 96:93—100, 1996.
[68] J Shepanski, T Inc, and R Beach. Fast learning in artificial neural systems: multilayer
perceptrontraining using optimal estimation. Neural Networks, 1988.
[69] R J Shprintzen, R B Goldberg, M L Lewin, E J Sidoti, M D Berkman, R V Argamaso,
and D Young. A new syndrome involving cleft palate, cardiac anomalies, typical facies,
and learning disabilities: velo-cardio-facial syndrome. The Cleft palate journal, 15:56–
62, 1978.
[70] Robert J Shprintzen. Velo-cardio-facial syndrome: 30 years of study. Developmental
disabilities research reviews, 14:3–10, 2008.
[71] A Slavotinek, M Parisi, Carrie L Heike, Anne V Hing, and E Huang. New syndrome
craniofacial defects of blastogenesis: Duplication of pituitary with cleft palate and
orophgaryngeal tumors. Am J Med Genet, 135:13–20, 2005.
[72] L Smith. A tutorial on principal components analysis. Cornell University, 2002.
82
[73] H Stender, M Fiandaca, J Hyldig-Nielsen, and J Coull. Pna for rapid microbiology. J
Microbiol Meth, 2002.
[74] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J Cognitive Neurosci,
3, 1991.
[75] Matthew Turk and Alex Pentland. Face recognition using eigenfaces. CVPR, pages
586–591, 1991.
[76] M Vannier, J Marsh, and J Warren. Three dimensional computer graphics for cranio-
facial surgical planning and evaluation. ACM SIGGRAPH Computer Graphics, 1983.
[77] Inc. Velo-Cardio-Facial Syndrome Educational Foundation. Velo-cardo-facial syn-
drome: Specialist fact sheet. 2007.
[78] Peng Wang, C Kohler, F Barrett, R Gur, and R Verma. Quantifying facial expression
abnormality in schizophrenia by combining 2d and 3d features. CVPR, pages 1–8,
2007.
[79] Sen Wang, Yang Wang, Miao Jin, Xianfeng David Gu, and Dimitris Samaras. Confor-
mal geometry and its applications on 3d shape matching, recognition, and stitching.
IEEE T Pattern Anal, 29:1209–1220, 2007.
[80] T Whitmarsh, RC Veltkamp, M Spagnuolo, S Marini, and FB Haar. Landmark detec-
tion on 3d face scans by facial model registration. Proceedings of the 1st International
Workshop on Shape and Semantics, pages 71–76, 2006.
[81] Ian H. Witten and Eibe Frank. Data mining: Practical machine learning tools and
techniques. 2005.
[82] T Yakut, S Kilic, E Cil, E Yapici, and U Egeli. Fish investigation of 22q11. 2 deletion
in patients with immunodeficiency and/or cardiac . . . . Pediatric Surgery International,
2006.
83
Appendix A
CEPHALOMETRIC LANDMARKS AND MEASURES
Landmark LandmarkName Label Description
glabella g most prominent point in the median sagittal plane between thesupraorbital ridges
nasion n midpoint of the nasofrontal suturesellion se (or s) deepest point of nasofrontal anglepronasale prn most protruded point of nasal tipsubnasale sn junction of lower border of nasal septum and cutaneous portion
of upper liplabiale superius ls midpoint of the vermillion border of the upper lipstomion sto midpoint of labial fissure when lips are closed naturallylabiale inferius li midpoint of the vermillion border of the lower lipsublabiale slab angle of the dip between the lower lip and chingnathion’ gn’ lowest point in the midline on the lower border of the chin, since
this is a bony landmark, the soft tissue location is labeled as ’exocanthion ex outer corner of eye fissure where the eyelids meet (right and left)endocanthion en inner corner of eye fissure where the eyelids meet (right and left)alar curvature ac measured at the widest point of the alar curvature (right and left)alare al most lateral point of nasal ala (right and left)subalare sbal point on the lower margin of the base of the nasal ala where the
ala disappears into the upper lip skin (right and left)subnsasale’ sn’ located at the thinnest point of the nasal septum (right and left)crista philtri cph point on the crest of the philtrum just above the vermillion border
(right and left)cheilion ch outer corner of mouth where the outer edges of of the upper and
lower vermillions meet (right and left)tragion t located at notch above tragus of the ear where the upper edge of(labeled as tragus) cartilage disappears into skin of face (right and left)preaurale pra point on the ear insertion line opposite postaurale (right and left)postaurale pa most posterior point on the free margin of ear (helix)(right and left)superaurale sa highest point of the free margin of the ear (right and left)subarale sba lowest point of the earlobe (right and left)
84
Appendix B
CLASSIFIER DESCRIPTIONS
All classification was done using the WEKA classifier suite[81]. The classifiers used and non-
default options selected are briefly described below. WEKA classifier name in parenthesis
when different from that used in this paper.
JRip implements a propositional rule learner, Repeated Incremental Pruning to Produce
Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized
version of IREP [22].
J48 generates a pruned or unpruned C4.5 decision tree [61].
NN k = 1 (IB1) is a nearest-neighbor classifier that uses normalized Euclidean distance
[1].
NN k = 3 (IBk ’-K 3) is a K-nearest-neighbors classifier, with K set to 3 [1].
NN 9,3 (MultilayerPerceptron 9, 3) is a neural network with uses backpropagation to
train, with two hidden layers with sizes set to 9 and 3 [68].
SMO (SVM) implements John Platt’s sequential minimal optimization algorithm for
training a support vector classifier [60, 46]. The SVM classifier was used at the default
setting (complexity parameter = 1, polynomial kernel exponent = 1) , as well as variations
of both the complexity parameter and polynomial kernel exponent from 2-4. An RBF kernel
was also used.
Naive Bayes assumes independence of each attribute, modeling each attribute as a nor-
mal distribution over the range of the attribute values [43].
Top Related