Download - Shape-based Quanti cation and Classi cation of 3D Face ...

Shape-based Quantification and Classification of3D Face Data for Craniofacial Research

Katarzyna Wilamowska

A dissertation submitted in partial fulfillmentof the requirements for the degree of

Doctor of Philosophy

University of Washington

2009

Program Authorized to Offer Degree: Computer Science and Engineering

University of WashingtonGraduate School

This is to certify that I have examined this copy of a doctoral dissertation by


and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final

examining committee have been made.

Chair of the Supervisory Committee:

Linda Shapiro

Reading Committee:

Linda Shapiro

Maya R Gupta

James F Brinkley III

Date:

In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make its copiesfreely available for inspection. I further agree that extensive copying of this dissertation isallowable only for scholarly purposes, consistent with “fair use” as prescribed in the U.S.Copyright Law. Requests for copying or reproduction of this dissertation may be referredto Proquest Information and Learning, 300 North Zeeb Road, Ann Arbor, MI 48106-1346,1-800-521-0600, to whom the author has granted “the right to reproduce and sell (a) copiesof the manuscript in microform and/or (b) printed copies of the manuscript made frommicroform.”

Signature

Date

University of Washington

Abstract

Shape-based Quantification and Classification of3D Face Data for Craniofacial Research


Chair of the Supervisory Committee:Professor Linda Shapiro

Computer Science and Engineering

22q11.2DS been shown to be one of the most common multiple anomaly syndromes in hu-

mans. Early detection is important as many affected individuals are born with a conotruncal

cardiac anomaly, mild-to-moderate immune deficiency and learning disabilities, all of which

can benefit from early intervention.

Given a set of labeled 3D training meshes acquired from stereo imaging of heads, the

goal of this dissertation is to develop a successful methodology for discriminating between

22q11.2DS affected individuals and the general population and for quantifying the degree of

dysmorphology of facial features. Although many approaches for such discrimination exist

in the medical and computer vision literature, the goal is to develop methods that focus on

3D shape of both the face as a whole and specific local features.

The main contributions of this work are: an automated methodology for pose alignment, au-

tomatic generation of global and local data representations, robust automatic placement of

landmarks, generation of local descriptors for nasal and oral facial features, and a 22q11.2DS

classification rate which rivals medical experts. The methods developed for the 22q11.2DS

phenotype should be widely applicable to the shape-based quantification of any other cran-

iofacial dysmorphology.

TABLE OF CONTENTS

Page

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Paper Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2: Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Medical Craniofacial Assessment . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Computer Vision Craniofacial Analysis . . . . . . . . . . . . . . . . . . . . . . 7

Chapter 3: Ground Truth and Measures of Success . . . . . . . . . . . . . . . . . . 93.1 Participant Specific Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Expert Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Hand-labeled Landmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Statistical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Chapter 4: Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Alignment Using Scanalyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4 Automatic 3D Pose Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 5: Global Data Representations . . . . . . . . . . . . . . . . . . . . . . . 235.1 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 2.5D Depth Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Curved Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4 Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i

5.5 Labeled Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.6 Distance from Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Chapter 6: Global Representation Results . . . . . . . . . . . . . . . . . . . . . . . 316.1 Preliminary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Chapter 7: Local Data Representations . . . . . . . . . . . . . . . . . . . . . . . . 437.1 Automatic Nasal Landmark Detection . . . . . . . . . . . . . . . . . . . . . . 447.2 Automatic Oral Landmark Detection . . . . . . . . . . . . . . . . . . . . . . . 457.3 Landmark Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.4 Landmark-based Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . 477.5 Shape-based Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 8: Local Representation Results . . . . . . . . . . . . . . . . . . . . . . . 598.1 Preliminary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Appendix A: Cephalometric Landmarks and Measures . . . . . . . . . . . . . . . . . 83

Appendix B: Classifier Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

ii

LIST OF FIGURES

Figure Number Page

1.1 Individuals with 22q11.2DS. Images reproduced from [10, 29]. . . . . . . . . . 1

2.1 2D landmark pattern used by Boehringer [10]. . . . . . . . . . . . . . . . . . . 5

2.2 Dense Surface Model construction [35]. . . . . . . . . . . . . . . . . . . . . . . 6

3.1 FISH test for 22q11.2DS; arrow points to the deleted genetic material [82]. . . 9

3.2 Survey administered to experts. . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Cephalometric landmarks (in blue) located on image of individual. . . . . . . 13

4.1 3dMD imaging system setup at Seattle Children’s Hospital. . . . . . . . . . . 16

4.2 Example image in need of cleanup. . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Example where ICP alignment performs worse than hand alignment. Observethat both lips and nose are misaligned in the automatic ICP version. . . . . . 18

4.4 Results of PCA used to align 3D meshes by their first principle componentvector. Note that each head is misaligned in a different direction. . . . . . . . 19

4.5 Tait-Bryan angles which describe the three degrees of freedom of a humanhead. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.6 Using symmetry to align face in forward direction. (a) 3D image, (b) inter-polated 2.5D image, (c) left side of face, (d) right side of face, (e) resultingdifference between left and right side . . . . . . . . . . . . . . . . . . . . . . . 21

4.7 Example results of yaw and roll alignment. . . . . . . . . . . . . . . . . . . . 21

4.8 Illustration of concept behind pitch alignment and example alignment result. 22

5.1 Snapshots of 3D meshes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 2.5D depth images (enhanced for the reader). . . . . . . . . . . . . . . . . . . 24

5.3 Curved line detail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4 Comparison of head data with and without facial texture. . . . . . . . . . . . 27

5.5 Topographic maps of the face with different contour line spacing. . . . . . . . 28

5.6 Curvature based image labeling. . . . . . . . . . . . . . . . . . . . . . . . . . 29

6.1 Distance per individual to average of the control individuals. Black lineseparates affected from control, with affected individuals on the left. . . . . . 38

iii

6.2 Aggregate percent of correctly classified individuals as test set increases from2% to 50% of data set (on x-axis) shown from 0-100% accuracy (y-axis). . . . 40

6.3 Distance of control individual from control average, when that individual(circled in red) is used as the test sample. The y-axis represents the distanceto the average, while the x-axis lists all individuals in the W86 data set, withthe first 43 individuals affected, and the rest control. The blue line representsthe original distance from average used in experiment 6.2.6, while the blackdots represent the newly calculated distance from average when leaving outthe test individual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.4 Variance of full data, control set and affected set. All three data sets haveextremely large variances, on the order of 107. . . . . . . . . . . . . . . . . . . 41

7.1 Landmarks of interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2 Detecting the location of the nasal alae. . . . . . . . . . . . . . . . . . . . . . 447.3 Detecting landmarks of the mouth. . . . . . . . . . . . . . . . . . . . . . . . . 467.4 The nose area compared to the bounding box and different descriptor shapes. 527.5 Nose area in relation to bounding box area for two individuals of the same

age and gender with and without BNT . . . . . . . . . . . . . . . . . . . . . 537.6 Left and right contour lines of the nose . . . . . . . . . . . . . . . . . . . . . . 54

8.1 ROC performance curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628.2 Empirical approach to threshold detection for each descriptor. . . . . . . . . . 63

iv

LIST OF TABLES

Table Number Page

3.1 Distribution of participant data according to age, gender and 22q11.2DSaffected status for full dataset of 189 individuals. . . . . . . . . . . . . . . . . 10

3.2 Three expert survey results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Number of missing points for each of the hand-labeled landmarks. For land-

marks present on both the left and right side of the face, the order is givenas left right. A detailed description of each landmark is given in Appendix A. 14

5.1 Line positions. Position (125,150) is the location of the nose tip. . . . . . . . 255.2 Besl-Jain curvature value assignment. . . . . . . . . . . . . . . . . . . . . . . 29

6.1 Attribute selection of PCA vectors for data separation for gender, age andaffected. Each attribute name contains its eigenvalue rank in order of impor-tance, i.e. d5 is the 5th eigenvector. . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 F-measure scores for different classifiers with standard deviations provided.Data used are all PCA compressed versions of 3D snapshots and 2.5D images,on all 189 individuals and the initial four subsets tested: A106, AS106, W86,and WR86. Classifiers from left to right are: Naive Bayes, JRip (repeatedincremental pruning to produce error reduction propositional rule learner),J48 tree (C4.5 decision tree), NNk=1 (nearest neighbor classifier), NNk=3(3-nearest neighbor classifier), Neural Net:9,3 (neural network that uses back-propagation for training, with two hidden layers of size 9 and 3), SVM default(support vector machine with default WEKA [81] setup). . . . . . . . . . . . 33

6.3 Comparison of predictive capability of curvature value ranges. . . . . . . . . . 346.4 Choosing an appropriate data set. 3D snapshot with ear cutoff threshold

data format used. Classified using Naive Bayes. Standard deviations shown. . 356.5 Checking for data loss between data representations. All data shown here is

from the W86 dataset classified using Naive Bayes. Standard deviations shown. 356.6 Curved lines with Naive Bayes and W86. . . . . . . . . . . . . . . . . . . . . . 366.7 Symmetry measures with Naive Bayes and W86. EC refers to symmetry

analysis done on 2.5D images with an ear cutoff. FC refers to images withthe forehead removed due to noise from the hair removal process. . . . . . . . 37

6.8 Curvature labeled images compared to 2.5D results using Naive Bayes andW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

v

6.9 Topography labeled images compared to 2.5D results using Naive Bayes andW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.10 Classification using distance from average of control using Naive Bayes onW86. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.1 Landmark distances obtained using automatically detected landmarks. . . . . 487.2 List of nasal landmark-based descriptors. . . . . . . . . . . . . . . . . . . . . 497.3 List of oral landmark-based descriptors. . . . . . . . . . . . . . . . . . . . . . 507.4 List of bulbous nasal tip shape-based descriptors. . . . . . . . . . . . . . . . . 547.5 List of tubular shape-based descriptors. . . . . . . . . . . . . . . . . . . . . . 567.6 List of nasal root shape-based descriptors. . . . . . . . . . . . . . . . . . . . . 577.7 List of oral shape-based descriptors. . . . . . . . . . . . . . . . . . . . . . . . 58

8.1 Using experts’ median scores for facial features to predict 22q11.2DS. In eachtable, the upper set of results was obtained using Naive Bayes, the lower usingSVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

8.2 Correct automatic placement compared to availability of hand-labeled land-marks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8.3 Prediction of 22q11.DS using landmark distance measures. . . . . . . . . . . . 628.4 Predicting expert marked nasal features using LN data set. The upper set of

results was obtained using Naive Bayes, the lower using SVM. . . . . . . . . 648.5 Predicting the four oral features using LO data set. The upper set of results

was obtained using Naive Bayes, the lower using SVM. . . . . . . . . . . . . 658.6 Predicting 22q11.2DS using landmark-based descriptors. The upper set of

results was obtained using Naive Bayes, the lower using SVM. . . . . . . . . 658.7 Using shape-based descriptors for predicting nasal and oral facial features.

For each descriptor type, the right arrow indicated the facial feature experts’median score to which it is compared. The upper set of results was obtainedusing Naive Bayes, the lower using SVM. . . . . . . . . . . . . . . . . . . . . 66

8.8 Performance of shape-based descriptors in predicting 22q11.2DS. The upperset of results was obtained using Naive Bayes, the lower using SVM. . . . . . 67

8.9 Predicting 22q11.2DS using all nasal, all oral and all descriptors. The upperset of results was obtained using Naive Bayes, the lower using SVM. . . . . . 67

9.1 Legend for classification errors for each individual in Table 9.2 and Table 9.3. 709.2 Errors in male individuals of W86 dataset. Representative global and local

descriptors are shown. Dark boxes signify errors. . . . . . . . . . . . . . . . . 719.3 Errors in female individuals of W86 dataset. Representative global and local

descriptors are shown. Dark boxes signify errors. . . . . . . . . . . . . . . . . 72

vi

ACKNOWLEDGMENTS

I owe my deepest gratitude to my advisor Dr. Linda Shapiro, who guided me to becoming

the researcher I am today. She provided me with the perfect mix of freedom to explore on

my own and direction when I faltered.

I am indebted to my committee members: Dr. Maya Gupta, Dr. James Brinkley, and

Dr. John Kramlich for their excellent feedback which ensured that this dissertation is ac-

cessible to computer science and medical audiences alike.

I am grateful to Dr. Carrie Heike, Dr. Michael Cunningham, Dr. Anne Hing, Dr. Mark

Hannibal and staff members at Seattle Children’s Hospital Craniofacial Center for providing

me with the 3D data used in this dissertation, as well as their medical and anthropometric

expertise.

I would like to thank Jia Wu for collaborating with me on local data representations and

members of my research group for their countless suggestions which improved my work.

Finally, I would like to express my gratitude to Stefan Schoenmackers for his support and

friendship.

This work was supported by the National Science Foundation Graduate Research Fellowship,

by the National Science Foundation under Grant Number DBI-0543631, by the National

Institute of Dental and Craniofacial Research under Grant Number 5K23DE17741-2, by

the General Clinical Research Center under Grant Number # M01-RR 00037 and by the

American Academy of Pediatrics Section on Genetics and Birth Defects.

vii

DEDICATION

to my family

my grandparents for teaching me knowledge is the one thing that can never be taken away

my parents for always believing in my potential and supporting my goals

my sister for sharing the highs and lows of the PhD

ix

1

Chapter 1

INTRODUCTION

1.1 Motivation

Velocardiofacial syndrome (VCFS), or more precisely 22q11.2 deletion syndrome, was first

described in 1978 [69]. Since then, 22q11.2DS been shown to be one of the most com-

mon multiple anomaly syndromes in humans, with a disputed prevalence of anywhere from

1:2000 to 1:6000 live births in the United States [49, 13]. Early detection is important as

many affected individuals are born with a conotruncal cardiac anomaly, mild-to-moderate

immune deficiency and learning disabilities, all of which can benefit from early intervention.

Although, VCFS has more than 180 clinical features, including 16 craniofacial, 15 eye, 20

ear and 5 nasal anomalies [77, 30], no single feature occurs in 100% of the cases, and there

are no individuals who have most or all of the clinical features. In addition, the expression of

Figure 1.1: Individuals with 22q11.2DS. Images reproduced from [10, 29].

2

a specific feature may be quite varied; for example a palatal cleft feature can be an obvious

cleft palate or simply a disfunction of the palatal muscles [70].

While 22q11.2DS affected individuals often have a characteristic facial appearance, it can be

very subtle to detect (Figure 1.1). Even individuals with expert training1 have difficulty in

diagnosing 22q11.2DS from frontal facial photographs (predictions only slightly better than

chance) [7]. The final diagnosis is verified with fluorescence in situ hybridization (FISH)

testing [73], a genetic test which is both time consuming and expensive. For these two

reasons researchers have been highly motivated to develop faster and cheaper genetic tests

[28, 42, 20], as well as, to identify features that may improve physician accuracy in the

diagnosis of 22q11.2DS [36].

The shape-based quantification of 3D facial features proposed in this dissertation will lead

to better understanding of the connection between the 22q11.2 deletion syndrome genotype

and the phenotype of this syndrome. Being able to connect facial features to the genetic

code will allow for understanding the etiology of craniofacial malformation and pathogenesis

of 22q11.2DS, which, in turn, will be informative of the genetic control needed for normal

craniofacial development. From a clinical standpoint, offering a standard automated fil-

ter may aid physicians in concentrating on the more difficult cases and provide insights

into the shapes that are considered most telling for a specific dysmorphological syndrome.

Lastly, the identification of those patients who have higher likelihood of a positive test for

the 22q11.2 deletion would lead to more efficient use of medical resources (in this case,

expensive genetic tests).

1.2 Problem Statement

Given a set of labeled 3D training meshes acquired from stereo imaging of heads, the

goal of this research is to develop a successful methodology for discriminating between

22q11.2DS affected individuals and the general population and for quantifying the degree of

1members of the American Cleft Palate-Craniofacial Association [5]

3

dysmorphology of facial features. Although many approaches for such discrimination exist

in the medical and computer vision literature, the goal is to develop methods that focus on

3D shape of both the face as a whole and specific local features.

1.3 Paper Outline

In Chapter 2, the literature related to medical craniofacial assessment and craniofacial

analysis using computer vision will be reviewed. In Chapter 3 the sources of ground truth

as well as the statistical measures used for evaluating success will be stated. In Chapter

4, data preprocessing including an automatic method for pose alignment will be described.

Global data representations used in this dissertation and methods for their generation will

be explained in Chapter 5, followed by a description of experimental results on global

descriptors in Chapter 6. Local data representations will be described in Chapter 7, with

experimental results for local descriptors provided in Chapter 8. Finally, Chapter 9 will

summarize the contributions of this dissertation and suggest possible directions for further

research.

4

Chapter 2

RELATED LITERATURE

In this chapter the related literature on craniofacial feature assessment in medicine and

computer vision will be described. With respect to studies of 22q11.2DS, a brief descrip-

tion of manual medical assessment methods will be followed by current medical automated

methods. These will be followed by a general description of relevant work both in medicine

and computer vision. Although there will be brief mention of the work with different data

sources and formats, the focus of this literature review will be on 3D surface meshes of the

face.

2.1 Medical Craniofacial Assessment

Traditionally, the approach to identify and study an individual with facial dysmorphism

has been thorough clinical examination combined with craniofacial anthropometric mea-

surements [4, 62]. These measurements are based on landmarks picked visually and by

hand palpation of the underlying skull shape. It is important to note that there are very

few non-Caucasian normative physical data sets, so in general, the collected data is com-

pared to the Caucasian population [27, 33].

Newer methods of craniofacial assessment [4] involve using data from computerized to-

mography [76, 48, 71], magnetic resonance imagining [16, 31, 11, 6, 19], ultrasound studies

[47, 25, 21, 12], and stereoscopic imaging [3, 15, 58]. The information in these data repre-

sentations is often hand measured, or at least hand labeled, so the human effort in the use

of these newer systems is still quite significant.

With respect to 22q11.2 deletion syndrome, craniofacial anthropometric measurements pre-

vail as the standard manual assesment method. Automated methods of 22q11.2DS analysis

5

are limited to just two. Boehringer et al. [10, 57] used standard 2D photographs of in-

dividuals representing ten different facial dysmorphic syndromes, which were converted to

grayscale and cropped to 256 by 256 pixels in size. A predefined landmark pattern was

placed on each face (Figure 2.1) and a Gabor wavelet transformation was applied at each

node yielding a data set of 40 coefficients per node. The generated data sets were then

transformed using principal component analysis (PCA) and classified using linear discrimi-

nant analysis (LDA), support vector machines (SVM), and k-nearest neighbors (kNN). The

best prediction accuracy was found to be 76% using LDA, dropping to 52% when using a

completely automated system.

Figure 2.1: 2D landmark pattern used by Boehringer [10].

The second, more extensive work, is that of Hutton and Hammond using their Dense Surface

Models (DSM) [35, 40, 36, 37]. Here the input data is that of a 3D surface mesh created by

the 3dMD photometric system. The data collection attempted to capture individuals with

natural pose and neutral expression, although this was waived as some syndromes have a

characteristic facial expression. For each generated 3D mesh, eleven 3D landmarks were

manually located. A mean landmark set was calculated, and then each surface was warped

to bring the corresponding landmarks on each face into precise alignment with the mean

6

landmarks. A closest point correspondence to the vertices of a base mesh chosen from the

set was then constructed. The mesh connectivity in the base mesh was transfered back to

the densely correspondent meshes of each individual surface, and the original meshes and

landmarks were abandoned. The surfaces were then unwarped back to their original shapes.

These new surfaces were then used to calculate an average shape using Procrustes align-

ment and then subjected to PCA to compute the major modes of shape variation (Figure

2.2). The generated data sets (60 VCFS, 130 control) were classified according to their

Figure 2.2: Dense Surface Model construction [35].

PCA coefficients using many different classifiers (closest mean(CM), Decision trees, Neural

networks, Logistic regression, SVM) with best sensitivity and specificity results at 0.83 and

0.92 using SVM [36], respectively. Newer results (115 VCFS, 185 control) used CM, LDA

and SVM in studying discrimination abilities of local features (face, eyes, nose, mouth) at

a correct classification rate of 89% [37]. Neither Boehringer’s or the Dense Surface Models

methods are fully automatic; both benefit from manual landmark placement.

7

2.2 Computer Vision Craniofacial Analysis

Although the raw facial data format is provided in three dimensions, the data can be an-

alyzed in variations from one dimension up to three. It is of note that there are methods

that use texture information for facial analysis [78, 63], but there will be little focus on

them in this review as the data used in this research is textureless due to human subjects

requirements (IRB).

In reference to the face, 1D data can be defined as the line that describes the profile of

the face, or a signal waveform. The collection of profiles that describe different individuals

can then be analyzed for similarity of waveform using Pearson’s correlation coefficient, or

transformed to a new coordinate system using one of many compression schemes such as

PCA, Fourier transforms, or wavelet transforms [18]. PCA transforms the data so that the

greatest variance is in the first coordinate, the next in the second coordinate and so on. A

Fourier transform returns the frequency content of the entire signal as a sum of sines and

cosines of different frequencies. Wavelet transforms return the frequency content at different

parts of the signal [54].

2D facial data can be best thought of as a standard photograph, where the depth maybe

noted by the use of lighting. For analysis, many of the methods mentioned in the 1D sec-

tion have 2D equivalents. These methods can be supplemented by Fisherfaces [8], which

have been demonstrated to, in some cases, have lower error rates than PCA; and man-

ual/automatic selection of facial landmarks or features [56, 55].

3D facial data is defined as a double precision wire mesh of the head that includes the

face. Morphable model approaches [9, 45, 24] leverage databases of already enrolled 3D

meshes (often hand labeled with landmarks or features) for new image intake and recog-

nition. To reduce the computational requirements, new data representation schemes are

used. Canonical Face Depth Maps [23] create a smaller representation for 3D face data,

while work like Symbolic Surface Curvatures [65] concentrate on exactly describing a specific

8

facial feature. There is also a significant body of work on 3D landmarks and features ranging

from landmark detection to appropriate analysis of facial features [2, 17, 80, 51]. In each

of these cases, landmarks are either hand-labeled or induced from previously labeled faces.

Lastly, hybrid 2D-3D methods, where information from one dimensional space is used to

add detail to another dimensional space, are used in an effort to improve facial recognition

results [14, 64, 79, 66].

Most facial analysis methods in computer vision have been developed with focus on bio-

metric authentication and recognition [52, 59, 23, 45, 14, 51, 66], with very few [50, 57]

attempting to detect medically relevant facial dysmorphology. This fact, that computer

vision methods have not transfered well to medical applications, motivates the research in

this dissertation where computer vision methods are used to quantify 3D face data based

on shape.

9

Chapter 3

GROUND TRUTH AND MEASURES OF SUCCESS

This chapter will discuss the three types of ground truth used: participant specific data,

expert surveys and hand-labeled landmarks. A description of how each ground truth is

used in this work will be given and aggregate information will be presented. Lastly, the

statistical measures used for determining success in this work are introduced.

3.1 Participant Specific Data

Initial ground truth data was limited to gender, age and disease status. Gender and age

were collected as part of the participant intake survey. The disease status was defined as

either affected by 22q11.2DS or control, and was detected by a fluorescence in situ hy-

bridization (FISH) genetic test for 22q11.2DS. Intuitively, a FISH test consists of attaching

Figure 3.1: FISH test for 22q11.2DS; arrow points to the deleted genetic material [82].

10

customized fluorescent markers to a sample of an individual’s DNA. After allowing time

(about 12 hours) for the markers to attach themselves to the genetic section in question

and washing the sample to prevent false negatives, the DNA is viewed under a microscope

capable of inducing fluorescence in the markers. In the case of a 22q11.2 deletion test, one

or more sections of the chromosome will not fluoresce (see Figure 3.1).

The demographic distribution of the data is given in Table 3.1. Age and gender data

were used in Section 6.1. Individual disease status was used for classification experiments

in both Chapter 6 and Chapter 8.

Table 3.1: Distribution of participant data according to age, gender and 22q11.2DS affectedstatus for full dataset of 189 individuals.

Affected Control Total ofAge Female Male Female Male 189

less than 1 1 2 31 2 1 7 6 162 1 3 3 11 183 1 2 4 4 114 3 1 3 1 85 1 2 2 8 13

6 4 3 4 3 147 2 1 1 6 108 2 1 5 3 119 3 2 3 3 11

10 2 2 3 4 11

11 1 4 512 1 4 513 2 3 3 4 1214 1 1 3 515 1 1 2

16 2 2 417 1 1 218 1 1 220 1 1 1 3

21-25 1 10 2 1326-30 1 2 1 431-40 2 3 1 6

11

3.2 Expert Survey

In September 2008, expert ground truth was provided as qualitative data on a set of 164

individuals (with a ratio of 1:3 affected vs. control) as the results of a paper survey filled

out by Dr. Carrie Heike. Each facial feature was rated from 0 to 2 in quality, 0 = none, 1

= moderate, 2 = severe, all referring to 22q11.2DS characteristics. Additionally, a can’t tell

category (designated by the symbol “?”) was added during the process to account for traits

that could not be categorized based on an individual’s 3D snapshot image. The results from

this initial survey were used to find a good starting point for local feature description, which

will be further discussed in Chapter 7.

Based on Dr. Heike’s comments, several more anthropometric questions were added to

the survey and an Opposite option was added to the rating system (see Figure 3.2). This

revised survey was administered in October 2008 to two trained dysmorphologists who clas-

sified a Caucasian-only subset of 1:1 affected vs. control consisting of 86 individuals (this set

will henceforth be referred to as W86). Dr. Heike updated her previous survey by adding in-

formation for the missing data. The results of this second survey were collected in November

2008 including post-mortem interviews with each participant and summarized in Table 3.2.

Table 3.2: Three expert survey results.

Median of expert scores 22q11.2DS group Control group-1 0 1 2 ? -1 0 1 2 ?

Overall face 22q facial phenotype 0% 26% 56% 5% 0% 0% 93% 7% 0% 0%Overall face asymmetric 0% 67% 33% 0% 0% 0% 81% 16% 0% 0%Overall face square / rectangualr 0% 53% 44% 2% 2% 0% 77% 23% 0% 0%Overall face hypotonic appearnace 0% 65% 30% 5% 0% 0% 93% 7% 0% 0%Eyes hooded appearance 0% 56% 28% 7% 2% 0% 91% 7% 0% 2%Nose prominent nasal root 0% 53% 44% 5% 0% 0% 70% 26% 0% 0%Nose tubular appearance 0% 53% 47% 0% 0% 0% 77% 23% 0% 0%Nose bulbous nasal tip 0% 33% 47% 19% 0% 0% 84% 16% 0% 0%Nose small nasal alae 0% 26% 53% 5% 0% 0% 81% 12% 0% 0%Ears small 0% 40% 42% 2% 16% 0% 95% 5% 0% 0%Ears protuberant 0% 47% 40% 7% 9% 0% 67% 26% 5% 0%Midface relatively flat 0% 33% 67% 0% 0% 0% 77% 21% 2% 0%Forehead square 0% 37% 49% 0% 21% 0% 88% 12% 0% 0%Forehead prominent on profile 0% 72% 16% 0% 2% 2% 88% 9% 0% 0%Mouth small 0% 63% 37% 0% 0% 0% 86% 14% 0% 0%Mouth open 0% 81% 12% 5% 0% 0% 88% 12% 2% 0%Mouth downturned corners of mouth 0% 44% 53% 2% 9% 0% 79% 19% 2% 5%Mouth retrusive chin 2% 72% 19% 2% 0% 0% 100% 0% 0% 0%

12

1

Definitely Probably Probably DefinitelyYES YES NO NO

Does this individual have 22q11? © © © ©Do you know this individual? © © © ©

Opposite Not Moderate Severe Notof 22q11 22q11 22q11 22q11 enough data

Overall face22q Facial Phenotype © © © © ©Asymmetric © © © © ©Square/Rectangular © © © © ©Hypotonic appearance © © © © ©EyesHooded appearance © © © © ©NoseProminent nasal root © © © © ©Tubular appearance © © © © ©Bulbous nasal tip © © © © ©Small nasal alae © © © © ©EarsSmall © © © © ©Protuberant © © © © ©MidfaceRelatively flat © © © © ©ForeheadSquare © © © © ©Prominent on profile © © © © ©MouthSmall © © © © ©Open © © © © ©Downturned corners of mouth © © © © ©Retrusive chin © © © © ©Additional comments

Figure 3.2: Survey administered to experts.

13

As can be seen in the survey results, all features of the nose (prominent nasal root, tubular

appearance, bulbous nasal tip, and small nasal alae) were found to have a higher percentage

of moderate and severe expression in 22q11.2DS affected individuals. Midface flatness and

square forehead had the next best separations between affected and control groups, while

small mouth had a weak, but present, disease signal.

3.3 Hand-labeled Landmarks

The last form of expert ground truth was provided in November 2008 as quantitative data in

the form of hand-labeled anthropometric landmarks for a 144 subset of the 189 individuals

used in this work (see Figure 3.3). Of the 144 hand-labeled individuals, 77 occur in the

above mentioned W86 data set, and 60 of these are matched 1:1 affected vs. control. The

Figure 3.3: Cephalometric landmarks (in blue) located on image of individual.

14

availability of each landmark label is shown in Table 3.3. Robust landmarks of the nose are

highlighted in green and robust landmarks of the mouth are highlighted in blue. These hand-

labeled landmarks were used to check automatically generated landmarks and automatic

symmetry measures, as described in Chapter 8 and Section 6.2.4, respectively.

Table 3.3: Number of missing points for each of the hand-labeled landmarks. For landmarkspresent on both the left and right side of the face, the order is given as left right. A detaileddescription of each landmark is given in Appendix A.

Landmark Landmark Missing Data CountName Label L144 L77 L60

glabella g 1 1 1nasion n 0 0 0sellion se or s 0 0 0pronasale prn 1 1 1subnasale sn 1 1 1abiale superius ls 2 1 1stomion sto 15 9 8labiale inferius li 31 14 13sublabiale slab 26 14 13gnathion’ gn’ 50 23 21exocanthion ex 8 5 4 2 4 2endocanthion en 5 1 3 1 3 1alar curvature ac 4 2 1 2 1 2alare al 1 1 1 1 1 1subalare sbal 1 1 1 1 1 1subnsasale’ sn’ 32 25 14 13 11 10crista philtri cph 3 3 2 2 2 2cheilion ch 23 23 11 11 10 10tragion t 2 2 2 1 1 1preaurale pra 12 10 11 8 10 8postaurale pa 36 34 29 28 22 21superaurale sa 34 35 25 24 19 19subarale sba 83 70 54 44 44 35

3.4 Statistical Measures

Different measures of success are used in different communities; the measures most com-

monly used in classification and retrieval systems are used in this dissertation. In the

following equations, TP refers to the number of true positives (affected correctly labeled as

affected), FP refers to the number of false positives (control incorrectly labeled as affected),

TN refers to the number of true negatives (control correctly labeled as control) , and FN

15

refers to the number of false negatives (affected incorrectly labeled as control). For all the

measures listed here the results range from 0 to 1, with a score of 1 being the best.

Accuracy

Measures the portion of all decisions that were correct decisions.

Accuracy =TP + TN

TP + FP + TN + FN. (3.1)

Recall / Sensitivity

Measures the proportion of actual affected which are correctly labeled as affected.

R = Sn =TP

TP + FN. (3.2)

Precision

Measures the proportion of labeled affected which are actually affected.

P =TP

TP + FP. (3.3)

Specificity / 1−Fall-out

Measures the proportion of actual control which are correctly labeled as control.

Sp =TN

TN + FP. (3.4)

F-measure

Measures an even combination of precision and recall. F-measure, also called F1, is the

harmonic mean of precision and recall.

F1 =2 ∗ P ∗RP +R

(3.5)

=2 ∗ TP

2 ∗ TP + FP + FN. (3.6)

16

Chapter 4

DATA PREPROCESSING

This chapter will provide a quick overview of the source of the raw data and methods used

to prepare the raw format for research use. Data was cleaned using MeshLab [41], after

which it was pose aligned to face forward using two separate methods.

4.1 Data Source

The 3D data used in this research was collected as part of a study by Carrie Heike, M.D. [38]

at the Craniofacial Center of Seattle Children’s Hospital and Regional Medical Center. The

3dMD imaging system used can be seen in Figure 4.1. The subject sits at the location of the

Figure 4.1: 3dMD imaging system setup at Seattle Children’s Hospital.

17

blue booster seat facing towards the lower left camera stand. The data collection system

is made up of four camera stands, each containing three cameras. Of the three cameras in

each stand, one captures a direct photo, one captures an under angle and one captures an

over angle to yield a three-dimensional view of the face through stereo analysis. The twelve

resulting range maps are stitched together using proprietary methods of 3dMD to yield the

final 3D head mesh and a texture map of the face. Due to human subjects requirements

(IRB), the only data used in the research described in this work are the 3D meshes.

4.2 Data Cleaning

As the source of the data is from the real world, there needs to be quite a bit of data

cleaning before the 3D meshes can be used by computer vision methods. As can be seen

in Figure 4.2, the data contains extraneous clothing, hair, and sometimes parents. All this

Figure 4.2: Example image in need of cleanup.

18

information was removed by hand using MeshLab [41]. In addition, although not initially

obvious, neck data was removed in order to maintain conformity between meshes.

4.3 Alignment Using Scanalyze

Initially only fifteen 3D mesh heads were available, a group of seven one-year-old females and

a group of eight ten-year-old males. Each age group was aligned to an unaffected individual

in that group. The alignment was done using scanalyze and vrip, small programs that are

part of the Digital Michelangelo Project [53], aligning the one year old group to mesh F1-

x-1-3, and aligning the ten year old group to mesh M10-x-1-5. In order to take advantage

of scanalyze’s automatic Iterative Closest Point (ICP) registration, the two meshes first

needed to be moved by hand to be within the same three-dimensional space. Although ICP

worked well for many of the instances, in some cases the final result was more misaligned

(a) Hand aligned meshes (b) Meshes in (a) after ICP alignment

Figure 4.3: Example where ICP alignment performs worse than hand alignment. Observethat both lips and nose are misaligned in the automatic ICP version.

19

than the original hand alignment (see Figure 4.3). Unfortunately, as more data required

alignment, manual alignment supported by ICP became too time consuming.

4.4 Automatic 3D Pose Alignment

A standard pose alignment technique in computer vision is to use Principal Component

Analysis (PCA) [72], where the first principal component vector is used to align all meshes

to the x-axis. PCA is mathematically defined as an orthogonal linear transformation that

transforms the data to a new coordinate system such that the greatest variance by any

projection of the data comes to lie on the first coordinate (called the first principal compo-

nent), the second greatest variance on the second coordinate, and so on [44]. This method

failed to work on the data, as can be seen in Figure 4.4, due to the variable amount of

hair and head data available in each mesh; each of the first principal component vectors

points in a different direction relative to the general shape of the head. As a result, another

semi-automatic method was developed to align all of the 3D meshes.

(a) One year old female individuals

(b) Ten year old male individuals

Figure 4.4: Results of PCA used to align 3D meshes by their first principle componentvector. Note that each head is misaligned in a different direction.

20

4.4.1 Tait-Bryan Angles

The position of the head can be described by the Tait-Bryan angles often referred to as yaw,

pitch and roll, according to the illustrations in Figure 4.5. Yaw is the side-to-side movement

about the y-axis. Pitch is the up and down movement about the x-axis. Lastly, roll is the

twisting movement of the head about the z-axis. These angles were used to design alignment

methods which will be discussed in the following sections. The order of presentation will be

slightly modified as yaw and roll naturally belong together, while pitch requires a different

approach.

(a) Yaw (b) Pitch (c) Roll

Figure 4.5: Tait-Bryan angles which describe the three degrees of freedom of a human head.

4.4.2 Use of Facial Symmetry for Yaw and Roll Alignment

Symmetry between the left and right sides of face is used to determine the most central

position of the face. Although faces are not truly symmetrical, the pose alignment procedure

can be cast as finding the angular rotations of yaw and roll such that the error between the

left and right side of the face is minimal. To do this efficiently (see Figure 4.6), the original

3D mesh was interpolated to a 2.5D ordered grid (further discussion of 2.5D in Section 5.2).

The resulting image I was then split down the middle producing a left true image and a

right mirrored image. These two images were then overlaid and the difference error was

calculated by

Difference =height∑y=0

width/2∑x=0

∣∣I(x, y)− I(width− x− 1, y)∣∣. (4.1)

21

(a) 3D image (b) 2.5D image (c) left (d) right (e) diff

Figure 4.6: Using symmetry to align face in forward direction. (a) 3D image, (b) interpolated2.5D image, (c) left side of face, (d) right side of face, (e) resulting difference between leftand right side

(a) Original 3D position (b) After just Yaw rota-tion −45◦ to +45◦

(c) After just Roll rota-tion −45◦ to +45◦

(d) After both Yaw andRoll rotations −45◦ to+45◦

Figure 4.7: Example results of yaw and roll alignment.

Although it is possible to search through all 360◦ for the optimal rotation of pose, the cur-

rent set of 3D meshes contains only heads that are facing somewhat forward, and as such

the search space can be decreased significantly. To maintain robustness a search through

-45◦≤ θY ≤ 45◦ in yaw and -30◦≤ θR ≤ 30◦ in roll is recommended, but this can be further

decreased to about 10◦ in each direction if the method is semiautomatic, where the user can

choose to rotate only the negative or positive directions.

There are two points of interest when using this symmetry design. First, at 0◦, ±90◦,

and 180◦ there are local symmetry minima, and the global minimum is not necessarily lo-

22

cated at 0◦. This was resolved with a small amount of user interaction. Second, when yaw

and roll symmetry is maximized separately, the error rate increases, while when the yaw

and roll symmetry maximization is combined, the results were far more accurate (see Figure

4.7).

4.4.3 Aligning Head Pitch

The assumption of symmetry does not hold between the bottom and top parts of the face as

it did for the left and right sides; therefore the same methods cannot be used to automati-

cally align the pitch of the head. Instead, the pitch of the head is aligned by minimizing the

difference between the height of the chin and the height of the forehead (see Figure 4.8a).

Although the algorithm works quite well, if the rotation angle for pitch is set too wide, the

top of the head can be selected as the optimal solution.

As seen in Figure 4.8b, the results of running just one iteration of alignment for yaw,

roll and pitch is often not enough for final alignment. This is solved by a second iteration

of both yaw/roll and pitch alignment, but with a much smaller search space (often 5◦ is

sufficient).

(a) Minimize chin and forehead height difference (b) Example result

Figure 4.8: Illustration of concept behind pitch alignment and example alignment result.

23

Chapter 5

GLOBAL DATA REPRESENTATIONS

Although the raw data was in 3D double-precision mesh format, six representations were

chosen based on face information desired: (1) frontal and side snapshots of the 3D meshes,

(2) 2.5D depth images, (3) 1D curved line segments, (4) symmetry scores, (5) labeled

images, and (6) distances from average. 2D snapshots of the 3D mesh images were used as

a starting point, while interpolation to a 2.5D depth image was used as a means of retaining

the 3D aspect of the original mesh. The 1D curved line segments were used to determine if

there was any affected signal in the subsampled face profile. Symmetry scores were used to

determine the global structural symmetry of each individual. Labeled images were used as

a substitution for the original facial texture. Lastly, average faces for the whole set and each

subgroup were calculated, and a distance measure was used to determine an individual’s

membership in a specific subgroup. In each data representation case, the information was

normalized to the same height and width as the rest of the dataset.

5.1 Snapshots

(a) Frontal snapshot (b) Side snapshot

Figure 5.1: Snapshots of 3D meshes.

24

The motivation for this method came from the eigenfaces [75, 74] approach, where the

method uses 2D photographs of individuals. After neutral pose alignment (described in

Section 4.4.2 and Section 4.4.3), a set of frontal photographs of the 3D meshes was generated

(Figure 5.1a) using the visualization library VTK [67]. For the expert survey (described in

Section 3.2), an additional set of side snapshots rotated by 90◦ from the front was generated

(Figure 5.1b).

5.2 2.5D Depth Images

Since the original data was a double-precision unstructured triangular mesh, while 2.5D

images are represented as pixels, there was a need to interpolate the original data onto

an integer-precision structured grid. The data required correct normalization in all three

dimensions, with the final width and height of each face given by the x- and y-axes, and

the final depth of the face given by the z-axis. In order to properly scale in the z-direction,

all of the data was manually clipped at the ears. For the x-axis normalization, the face of

each individual was scaled to be exactly 200 units wide. The y-axis information was left in

the current scale, since scaling this dimension would lead to unnatural shapes.

(a) 9 months (b) 13 years (c) 39 years

Figure 5.2: 2.5D depth images (enhanced for the reader).

25

The z- and x-axis normalized unstructured triangular mesh was rasterized into a depth

buffer (an x by y matrix, with the highest z value – the tip of the nose – placed at high

illumination). As the final measurements for the 2.5D image were empirically determined

to be 250 pixels wide by 380 pixels tall, the tip of the nose for each individual was moved

to position (125,150) in x,y coordinates. Examples of 2.5D images are shown in Figure 5.2.

5.3 Curved Lines

Using the 2.5D images, specific lines can be extracted which may be descriptive of faces.

For example, a vertical line down the middle of the face becomes a waveform (depth as

a function of height) that can be analyzed (see Figure 5.3a). As seen in Table 5.1, four

versions of both vertical and horizontal lines were selected for signal testing. Odd numbers

of lines were used to maintain symmetry in the data. Finally, a combination of lines was

used to create grids of sizes 1x1, 3x3, 5x5, and 7x7 (see Figure 5.3b).

Table 5.1: Line positions. Position (125,150) is the location of the nose tip.

Line type Number of lines Line placement

Vertical 1 125 (middle of width)Vertical 3 75, 125, 175Vertical 5 75, 100, 125, 150, 175Vertical 7 50, 75, 100, 125, 150, 175, 200

Horizontal 1 150 (slightly below middle of height)Horizontal 3 100, 150, 200Horizontal 5 100, 125, 150, 175, 200Horizontal 7 75, 100, 125, 150, 175, 200, 225

5.4 Symmetry

There is a hypothesis in the 22q11.2 deletion syndrome literature [30] that affected individ-

uals are more likely to have an asymmetrical facial shape. Using 2.5D depth images, the

symmetry of any individual head I can be calculated using the Difference method developed

26

(a) Vertical curved lines for 2.5D images in Figure 5.2. From left to right, the curved lines are that of a 9month old, 13 year old, and 39 year old.

(b) Vertical, horizontal and grid lines. One line (green), three lines (green-orange), five lines (green-orange-brown), seven lines (all).

Figure 5.3: Curved line detail.

in Section 4.4.2. For readability, the reader should assume the following equivalencies

R ≡ I(x, y), (5.1)

L ≡ I(W − x− 1, y), (5.2)∑≡

H∑y=0

W/2∑x=0

. (5.3)

where H and W are the image height and width, respectively, x, y describe the location

of the particular pixel in question, and I(x, y) is the illumination at a particular pixel.

Therefore

Difference =H∑y=0

W/2∑x=0

I(x, y)− I(W − x− 1, y) ≡∑

R− L. (5.4)

27

Other symmetry measures were calculated as follows

Absolute Difference =∑|R− L|, (5.5)

Binary Difference = num{R− L > 0} − num{R− L < 0}, (5.6)

Difference Ratio =∑R− L > 0∑R− L < 0

, (5.7)

Binary Ratio =num{R− L > 0}num{R− L < 0}. (5.8)

As much of the 3D head data was asymmetrical due to the hair removal process, a version

with the forehead removed was generated for each head. This format is called FC (Forehead

Cut).

5.5 Labeled Images

Texture can often provide more information about an underlaying data set, a fact that

can be easily seen when comparing a 3D mesh with and without skin texture (Figure 5.4).

Although original face textures cannot be used due to IRB restrictions, alternate descriptive

labels can be generated. The image labeling approaches that were used in this work are

topographic face maps [66] and Gaussian and Besl-Jain curvature maps [2, 17, 51].

(a) Face texture (b) No texture

Figure 5.4: Comparison of head data with and without facial texture.

28

5.5.1 Topographic Face Maps

Given a 2.5D depth image I, topographic face map T is generated by zeroing all points of

depth z = I(x, y) which fail z mod τ = 0, where τ is the desired spacing of the contour

lines. The remaining values are then assigned to the maximum image value of 255. In other

words

T (x, y) =

0 if I(x, y) mod τ 6= 0

255 if I(x, y) mod τ = 0. (5.9)

Figure 5.5 gives examples of generated topographic face maps.

(a) τ = 5 (b) τ = 10 (c) τ = 15 (d) τ = 20

Figure 5.5: Topographic maps of the face with different contour line spacing.

5.5.2 Gaussian and Besl-Jain Curvature Face Maps

Curvature face maps were calculated using the standard equations given below. For each

point P in the 3D face mesh, κ1 and κ2 are the principal curvatures (the maximum and

minimum of the normal curvature, respectively). Mean curvature H is calculated by

H =12

(κ1 + κ2), (5.10)

29

while Gaussian curvature K is calculated by

K = κ1κ2. (5.11)

The Besl-Jain approach labels each point according to a combination of mean and Gaussian

curvatures, as shown in Table 5.2.

Once curvature values were calculated for the entire 3D mesh, the data was bounded on

each side by rangemin and rangemax, and point values Pv within the upper and lower range

were reassigned to fit the entire range of a grayscale image.

LabeledPoint =Pv − rangemin

|rangemax − rangemax| × 255. (5.12)

Figure 5.6 illustrates curvature-based labeled images used in this work.

Table 5.2: Besl-Jain curvature value assignment.

H → less than 0 equal to 0 greater than 0K ↓less than 0 saddle ridge minimal saddle valleyequal to 0 ridge flat valley

more than 0 peak (none) pit

(a) K (b) |K| (c) Besl-Jain (d) Besl-Jain Pit (e) Besl-Jain Peak

Figure 5.6: Curvature based image labeling.

30

5.6 Distance from Average

Three possible averages can be calculated for the data sets used in this work: average of the

entire set, average of the control subset, and average of the affected subset. Since the preva-

lence of 22q11.2DS affected individuals in a population is 1:4000, it is most appropriate to

use the average of the control set to evaluate an individual’s dissimilarity to the population.

The distance between the average vector A and a participant’s vector P can be measured

by any one of many distance measures. In this work three measures were used:

Euclidean =√

(P −A)(P −A)′ (5.13)

Cosine = 1− PA′√P ′P√A′A

(5.14)

Mahalanobis =√

(P −A)V −1(P −A)′ (5.15)

where V is the sample covariance matrix.

Each of the global representations described in this chapter has been tested for prediction

of 22q11.2DS. Experiments and results are described in detail in the next chapter.

31

Chapter 6

GLOBAL REPRESENTATION RESULTS

This chapter will discuss results for the global representations defined in Chapter 5. First,

preliminary studies to set up the experimental environment will be described. Following,

motivation and results for experiments on global data will be given.

6.1 Preliminary Studies

In these experiments, the data type variations followed those discussed in Chapter 5, with

an ear cutoff threshold and, in the 2.5D versions, the tip of the nose placed at the greatest

z-value in the image. In each case, the data was compressed using Principal Component

Analysis (PCA). This allowed for a maximum 189 attribute representation for the entire

data set, or an 86 attribute representation for the W86 subset. These attributes were then

assessed as to their ability to distinguish between affected and control individuals using

several common classifiers. The WEKA suite of classifiers [81], which includes multiple

classifiers of many different types, was used for all classification experiments. 10-fold cross

validation was used for all classifiers and each training/testing set was executed ten times,

for a result of 100 runs per data set per classifier. These results were then used to assess

the representational quality of each data type as well as its signal content.

6.1.1 Data Set Selection

The full data set included 189 individuals (53 affected, 136 control); such an uneven ratio is

not optimal in the use of any classifier. Therefore, an equal 1:1 ratio set needed to be used

and several options were proposed. Set A106 matched each of the 53 affected individuals

to a control individual of closest age without regard to gender or ethnicity. Set AS106

matched each of the 53 affected individuals to a control individual of closest age within

32

the same gender. Set W86 matched each of the 43 affected Caucasian2 individuals to a

Caucasian same-gender control individual of closest age. Set WR86 matched each of the 43

affected Caucasian individuals to a Caucasian same-gender control individual of the same

age, allowing repeats of controls where not enough same-aged subjects were available. It

should be noted that there was an attempt to create a ASE106 subset, that matched each

of the 53 affected individuals to a control individual of closest age, gender and ethnicity, but

this was unattainable as the most common ethnicity after Caucasian was listed as “other”,

which was considered too non-specific for ethnic matching.

6.1.2 Attribute Selection

Because the data is so varied in age and 22q11.2DS has such a subtle characteristic, the

simple solution of taking the top 10 eigenvectors (principle components) will not work. This

can be illustrated by using correlation-based feature selection [34] to find the attributes

which best predict age, gender and affected in data set W86. As can be seen in Table 6.1,

attributes used to best predict affected span the entire principle component list.

Table 6.1: Attribute selection of PCA vectors for data separation for gender, age andaffected. Each attribute name contains its eigenvalue rank in order of importance, i.e. d5is the 5th eigenvector.

Data # selected top 5 principal components next 5 principal componentsseparation attributes

gender 64 d1, d7, d8, d9, d10 d11,d12,d14,d15,d16age 47 d2, d3, d5, d6, d9 d13, d18, d20, d22, d23

affected 11 d1, d5, d8, d15, d25 d63, d66, d73, d75, d81 (d85)

6.1.3 Classifier Selection

There are many classifiers that are used in computer vision, with Support Vector Machines

(SVM) currently leading the field. Using the WEKA package [81], the performance of nine

2Participants in the study were asked to complete an intake form based on the Washington State BirthCertificate. Ethnicity for each individual was self-identified and included a family ethnic history for parentsand grandparents.

33

classifiers was compared. Appendix B provides a description of each classifier used. The

analysis of the results yielded Naive Bayes, one of the simplest classifiers, outperforming

all other classifiers for the current data set (Table 6.2). This was a surprise, but such

performance can be explained by the small size of the data set as well as the large number

of descriptors for each individual [26].

Table 6.2: F-measure scores for different classifiers with standard deviations provided. Dataused are all PCA compressed versions of 3D snapshots and 2.5D images, on all 189 indi-viduals and the initial four subsets tested: A106, AS106, W86, and WR86. Classifiers fromleft to right are: Naive Bayes, JRip (repeated incremental pruning to produce error reduc-tion propositional rule learner), J48 tree (C4.5 decision tree), NNk=1 (nearest neighborclassifier), NNk=3 (3-nearest neighbor classifier), Neural Net:9,3 (neural network that usesbackpropagation for training, with two hidden layers of size 9 and 3), SVM default (supportvector machine with default WEKA [81] setup).

Classifier → Naive JRip J48 NN NN Neural SVMData Set ↓ Bayes tree k = 1 k = 3 Net: 9,3 defaultALL-3Dsnp 0.53±0.16 0.39±0.21 0.48±0.19 0.29±0.22• 0.35±0.21• 0.31±0.22• 0.30±0.22•A106-3Dsnp 0.65±0.18 0.59±0.20 0.65±0.17 0.68±0.16 0.67±0.16 0.67±0.17 0.62±0.19AS106-3Dsnp 0.66±0.19 0.57±0.19 0.55±0.17 0.62±0.17 0.66±0.17 0.60±0.17 0.64±0.19W86-3Dsnps 0.68±0.20 0.58±0.21 0.69±0.15 0.46±0.25• 0.62±0.20 0.61±0.19 0.61±0.20WR86-3Dsnp 0.69±0.22 0.78±0.18 0.79±0.16 0.34±0.26• 0.10±0.18• 0.70±0.18 0.73±0.19ALL-25D 0.59±0.16 0.38±0.20• 0.45±0.19• 0.04±0.12• 0.06±0.12• 0.26±0.23• 0.26±0.23•A106-25D 0.68±0.16 0.62±0.18 0.57±0.16 0.50±0.18• 0.52±0.17• 0.52±0.18• 0.51±0.16•AS106-25D 0.69±0.18 0.59±0.18 0.62±0.16 0.49±0.20• 0.39±0.22• 0.51±0.17• 0.48±0.18•W86-25D 0.77±0.17 0.59±0.19• 0.56±0.20• 0.07±0.18• 0.23±0.23• 0.47±0.21• 0.46±0.22•WR86-25D 0.77±0.19 0.61±0.21 0.62±0.20 0.00±0.00• 0.00±0.00• 0.57±0.22• 0.55±0.23•

• statistically significant degradation as compared to Naive Bayes

6.1.4 Gaussian Range Selection

The range of Gaussian curvature values for the entire data set was−48, 751 to 1, 395, 243, 522

with a median of −0.0001, while the median of the absolute values was 0.001. To determine

the best possible range of curvature values for prediction of 22q11.2DS, several range options

were enumerated and the classification performance compared. As can be seen in Table 6.3,

the range ±0.5 was found to be best for Gaussian curvature. Similarly, the absolute values

of Gaussian curvature performed best at a range from 0 to 0.5.

34

Table 6.3: Comparison of predictive capability of curvature value ranges.

Dataset ±0.001 ±0.005 ±0.01 ±0.05 ±0.1 ±0.5 ±1F-measure 0.55 0.58 0.65 0.64 0.69 0.73 ◦ 0.72 ◦Precision 0.68 0.68 0.78 0.68 0.75 0.84 0.80Recall 0.50 0.54 0.61 0.64 0.68 ◦ 0.68 ◦ 0.70 ◦% Accuracy 62.54 63.43 70.14 65.96 71.25 76.61 ◦ 74.99

◦ statistically significant improvement

(a) Gaussian Curvature

Dataset 0.001 0.005 0.01 0.05 0.1 0.5 1F-measure 0.56 0.56 0.58 0.49 0.64 0.71 0.63Precision 0.71 0.73 0.71 0.49 0.60 0.86 0.79Recall 0.50 0.49 0.53 0.51 0.71 ◦ 0.64 0.56% Accuracy 63.46 64.63 65.01 49.97 60.35 75.51 ◦ 69.49


(b) Absolute value of Gaussian Curvature

6.2 Experiments

6.2.1 Full Data Set (3:1) versus 1:1 Data Set

The purpose of this experiment was to determine if the uneven distribution of affected and

control individuals in the dataset adversely affected classifier performance. Although com-

mon practice in data mining is that you test on a balanced set, the small number of affected

individuals yields a very small subset which is possibly too small for statistical significance.

The full data set, as well as four subsets (described in Section 6.1.1) were classified with

Naive Bayes, their F-measure, precision, recall and accuracy results shown in Table 6.4.

Primarily, one can see that the uneven dataset is the worst performer, supporting the

intuition that a 1:1 ratio allows for better classifier performance. In addition, the W86

subset proves to be the best performer for the entire group of 1:1 subsets. This is expected

for the following reasons. The ethnic background influences the morphology of the face

much more significantly than effects of 22q11.2DS, causing a source of noise for both the

ethnically mixed sets (A106 and AS106 ). Although it may be tempting to draw similar

35

Table 6.4: Choosing an appropriate data set. 3D snapshot with ear cutoff threshold dataformat used. Classified using Naive Bayes. Standard deviations shown.

Data Set ALL A106 AS106 W86 WR86F-measure 0.53±0.19 0.65±0.18 0.66±0.19 0.68±0.20 0.60±0.21Precision 0.56±0.22 0.74±0.18 0.78±0.21 ◦ 0.82±0.20 ◦ 0.71±0.22Recall 0.52±0.21 0.60±0.20 0.61±0.22 0.62±0.24 0.56±0.25Accuracy 74.66±9.50 69.20±14.16 71.30±14.63 73.99±12.84 66.08±14.83

◦ statistically significant improvement as compared to ALL data set

conclusions about gender based differences, when looking at the minor improvement from

A106 to AS106, this would be a mistake as the female/male distribution is not even. Lastly,

the poor performance of WR86 as compared to W86 is caused by the repetition of exactly

five control individuals (12% of the control dataset), suggesting the drawback of a very

small dataset and that the repeated individuals are influencing the control set too much.

Combined with recommendations from Dr. Heike, the W86 dataset was chosen as the most

appropriate to this work.

6.2.2 Original 3D Snapshot versus 2.5D

For the human viewer, the 3D Snapshot is considered to hold much more information than

the 2.5D representation. The purpose of this experiment was to determine how much data

loss would happen by moving from a 3D Snapshot representation to the 2.5D representation

of the data. Additionally, since ears are known as a signal carrier for 22q11.2DS and the

Table 6.5: Checking for data loss between data representations. All data shown here is fromthe W86 dataset classified using Naive Bayes. Standard deviations shown.

Data Set 3Dsnp 3Dsnp 2.5Dcut

F-measure 0.71±0.18 0.68±0.20 0.77±0.17Precision 0.88±0.18 0.82±0.20 0.87±0.17Recall 0.63±0.22 0.62±0.24 0.72±0.22% Accuracy 76.13±14.15 73.99±12.84 79.90±13.62

36

2.5D data format is without ears, it was also necessary to test how much data was being

lost by using the ear cutoff threshold. All images were 250 x 380 in size. As seen in Table

6.5, the 2.5D data format was found to be best at classifying 22q11.2DS disease status and

will be used as a baseline for the following experiments.

6.2.3 Curved Lines

In this experiment, the purpose was to discover if using curved lines, such as profile, would

contain any 22q11.2DS signal. All the line sets were generated using methods described in

Section 5.3.

Table 6.6: Curved lines with Naive Bayes and W86.

Dataset Vertical Lines Horizontal Lines Grid Lines2.5D 1 3 5 7 1 3 5 7 1x1 3x3 5x5 7x7

F-measure 0.77 0.71 0.75 0.76 0.71 0.52 • 0.60 • 0.65 • 0.67 0.69 0.71 0.74 0.73Precision 0.87 0.82 0.87 0.85 0.84 0.73 0.83 0.91 0.83 0.86 0.84 0.91 0.83Recall 0.72 0.66 0.69 0.72 0.65 0.44 • 0.52 • 0.54 • 0.60 • 0.62 0.65 0.65 0.68% Accuracy 79.90 74.89 78.74 78.21 74.85 63.61 • 69.24 • 73.57 72.31 75.51 75.04 79.10 76.14

• statistically significant degradation

As seen in Table 6.6, excluding the horizontal lines, whose results were statistically worse

than those of the other lines, there was no significant difference between the results for

different data representations. The vertical profile lines of 3 and 5 were found to be the

most informative of the different curved line types used in this experiment. Based on known

22q11.2DS signals such as a hooded appearance of the eyes, prominent forehead profile, rel-

atively flat midface or general hypotonic facial appearance, there is promise in using sparse

vertical lines to describe one or more of these anthropometric features.

6.2.4 Symmetry

The purpose of this experiment was to determine whether asymmetry can be used to discrim-

inate between affected and control individuals. Using expert median scores for symmetry,

the classification of 22q11.2DS has the highest accuracy of all the symmetry measures used,

37

Table 6.7: Symmetry measures with Naive Bayes and W86. EC refers to symmetry analysisdone on 2.5D images with an ear cutoff. FC refers to images with the forehead removeddue to noise from the hair removal process.

Data Set 2.5D Expert EC FC FC+ECF-measure 0.77 0.40 • 0.59 • 0.11 • 0.47 •Precision 0.87 0.66 0.48 • 0.22 • 0.58 •Recall 0.72 0.31 • 0.78 0.08 • 0.43 •% Accuracy 79.90 56.49 • 50.14 • 46.49 • 54.81 •


but as highlighted by the F-measure values, the recall is very weak (see Table 6.7). The

highest F-measure value given to the EC set, is actually a reflection of the difference be-

tween the affected and control data sets; children affected by 22q11.2DS often refused to

wear a head cap during the image intake process and, as such, the upper part of the head

was often uneven due to hair artifact removal. Generally, the symmetry measures, whether

automatically computed or given by experts were judged to be inferior in predicting disease

status to all other global data representations described in this work.

6.2.5 Labeled Images

In this experiment, the purpose was to discover if labeling images with various different

topography and curvature labels would improve upon the 22q11.2DS detection results of

the current best data representation (2.5D depth images). As can be seen in Table 6.8,

although curvature labels, particularly Gaussian (K), absolute value of Gaussian (|K|) and

Besl-Jain labels were superior to symmetry measures, the classification of disease status

Table 6.8: Curvature labeled images compared to 2.5D results using Naive Bayes and W86.

Dataset 2.5D K ± 0.5 |K| ± 0.5 BeslJain Pit PeakF-measure 0.77 0.73 0.71 0.70 0.61 0.59 •Precision 0.87 0.84 0.86 0.71 0.60 • 0.62 •Recall 0.72 0.68 0.64 0.72 0.66 0.59% Accuracy 79.90 76.61 75.51 70.81 60.56 • 61.53 •


38

Table 6.9: Topography labeled images compared to 2.5D results using Naive Bayes andW86.

Data Set 2.5D 5-step 10-step 15-step 20-stepF-measure 0.77 0.54 • 0.43 • 0.58 • 0.39 •Precision 0.87 0.78 0.65 0.69 0.56 •Recall 0.72 0.45 • 0.35 • 0.53 0.33 •% Accuracy 79.90 66.63 • 59.68 • 64.06 • 54.35 •


based on labeled images was unsuccessful in improving previous results (see Table 6.9 for

topography labeled images results).

6.2.6 Distance from Average of Control Individuals

Using distance from average is most similar to the experiments done by Hutton, et al.

described in Section 2.1. Starting with the 2.5D depth image data representation, for every

individual the distance to the control data set average was measured. These distances were

then used for classifying individuals as affected or not affected by 22q11.2DS. For visual

comparison, the distances from the average of control individuals for the Euclidean, Cosine

and Mahalanobis measures are shown in Figure 6.1. Note that in the case of the Mahalanobis

distance, the separation between the distance of affected and control individuals is most

apparent. Table 6.10 provides a numerical comparison for all three distance measures,

(a) Euclidean distance (b) Cosine distance (c) Mahalanobis distance

Figure 6.1: Distance per individual to average of the control individuals. Black line separatesaffected from control, with affected individuals on the left.

39

Table 6.10: Classification using distance from average of control using Naive Bayes on W86.

Dataset 2.5D Euclid Cosine MahalF-measure 0.77 0.63 0.59 0.94 ◦Precision 0.87 0.83 0.76 0.96 ◦Recall 0.72 0.54 0.51 0.93 ◦% Accuracy 79.90 71.31 67.88 94.00 ◦


illustrating that the Mahalanobis distance to the average control outperforms all other

global methods, yielding a F-measure of 0.94 (missing 5 individuals) in the case of the W86

data set.

6.2.7 Mahalanobis Distance as Classifier

The results using Mahalanobis distance described in Section 6.2.6 above were obtained us-

ing standard medical literature methods. However, this method of classification would be

discounted in the pattern recognition literature, because computation of the average control

requires labeling of the entire data set, not just the training set. In the above experiment

in Section 6.2.6, the control average was computed on all controls, and then 10-fold cross

validation was used in training and testing the classifier. The following new experiment was

designed so that a percentage of the control set was removed from the training data set

prior to computation of the control average and used exclusively in testing.

Figure 6.2 shows prediction accuracy as a function of the percentage of data used in testing

for both control average and affected average. As can be seen in these bar graphs, the

moment even one individual is removed from the average (test set size equal to 2%), the

classification accuracy drops to about 50%. In order to explain this drastic drop, Figure

6.3 illustrates how the distance calculation changes for a single test individual. The blue

line represents the original distances of each subject to the average of all control individuals

and one can speculate that a horizontal line may be drawn to separate the affected (first 43

individuals) and the controls. The black dots represent the new distances to the average,

40

(a) Using control average for classification

(b) Using affected average for classification

Figure 6.2: Aggregate percent of correctly classified individuals as test set increases from2% to 50% of data set (on x-axis) shown from 0-100% accuracy (y-axis).

calculated by leaving out a control individual (circled in red) out of the average. Although

most individual distances do not vary drastically, the test subject’s distance (circled in red)

41

Figure 6.3: Distance of control individual from control average, when that individual (circledin red) is used as the test sample. The y-axis represents the distance to the average, whilethe x-axis lists all individuals in the W86 data set, with the first 43 individuals affected,and the rest control. The blue line represents the original distance from average used inexperiment 6.2.6, while the black dots represent the newly calculated distance from averagewhen leaving out the test individual.

Figure 6.4: Variance of full data, control set and affected set. All three data sets haveextremely large variances, on the order of 107.

42

increases so much as to now be mistaken for an affected. Returning to Figure 6.2, this sort of

large shift in distance occurs frequently, yielding poor class prediction on the test set. What

causes this drastic change in distances, is that the separation of distances between affected

and control individuals is quite small, while the variance is very large, as shown in Figure 6.4.

Although distances from the average control or from the average affected may not be valid

as a classification feature, they are very useful in the medical community for the quantifi-

cation of dysmorphology. In this vein, the next chapter will focus on the detection and

quantification of local facial features.

43

Chapter 7

LOCAL DATA REPRESENTATIONS

This chapter will describe local facial features developed from 2.5D depth images. The nose,

with arguably the strongest signal, was chosen as the first of local features to examine, fol-

lowed by the mouth. As a first step, automatic detection of landmarks will be described.

A list of landmark distances used in anthropometry will be given. Next, landmark-based

descriptors will be explained. Lastly, developed shape descriptors will be discussed.

The nasal landmarks of interest are the sellion(s), pronasale(prn), subnasale(sn), and left

and right alae(al). Additionally, a helper landmark mf ′ was used that is similar to the

maxillofrontale(mf): a landmark that is located by palpation of the anterior lacrimal crest

of the maxilla at the frontomaxillary suture (Figure 7.1a). The oral landmarks of interest

are the labiale superius(ls), stomion(sto), labiale inferius(li) and left and right cheilion(ch)

(Figure 7.1b).

(a) Nasal landmarks (b) Oral landmarks

Figure 7.1: Landmarks of interest.

44

7.1 Automatic Nasal Landmark Detection

Given a 2.5D depth image I, generated as described in Section 5.2, the automatic detection

of landmarks proceeds as follows. For each depth image I, there is a set of points Imax at

the maximum z-value (maxz) which can be represented by

Imax ={

(x, y) : I(x, y) = maxx′,y′

I(x′, y′)}. (7.1)

The geometric center of these points (prnx,prny) is the pronasale. The sellion and subnasale

can be found as the local minima on either side of the pronasale on the line

M = Iprnx . (7.2)

To find the left and right alae, binary image NTsn is defined as the nasal tip thresholded

by snz, the depth of the subnasale (see Figure 7.2a).

NTsn =(I(x, y) ≥ snz

). (7.3)

As a starting point for the locations of the alae, the points located at left and right bound-

aries of NTsn must be found. First, the averages of the y-values of the points on the left

border, minx, and right border, maxx, of NTsn are calculated. In the case of symmetrical

faces aly (the y-value of both the left and right al) is the y-average, while for asymmetrical

(a) NTsn outlined in red (b) aly: minx, maxx (c) aly: alLx , alRx

Figure 7.2: Detecting the location of the nasal alae.

45

faces aly is the average of the left and right border averages.

aly = avg{y :(NTsn(minx, y) = 1

)∩(NTsn(maxx, y) = 1)}, (7.4)

where

minx = minx

(NTsn(x, y) = 1

), (7.5)

maxx = maxx

(NTsn(x, y) = 1

). (7.6)

As the depth of sn is not necessarily equal to the depth at which the nose connects with the

face, xmin and xmax maybe incorrectly placed on the aly horizontal line (see Figure 7.2b).

Therefore, to find alLx and alRx , the location of the attachment of the nose to the face must

be found. As shown in Figure 7.2c, alLx and alRx are detected by selecting the points with

the sharpest slope S on the horizontal line through aly.

Finally, the detection of the helper landmark mf ′ is done using the region growing in-

formation to find the horizontal line through the eyes O. Given O, the same method as for

the alae is used; the points below the sharpest slope are chosen as mf ′L and mf ′R. In a

few cases the location of the eyes is obscured, and the mf ′ locations are detected by finding

the local x-value minimums nearest to sx on the horizontal line through sy.

7.2 Automatic Oral Landmark Detection

The peak image, generated by the method described in Section 5.5.2, is used to find the

prominent parts of the upper and lower lips, marked blue in Figure 7.3a. The labiale

superius (ls) location is found where the lower edge of the upper lip area (UL) intersects

with midline M , while the labiale inferius (li) is found where the upper edge of the lower

46

lip area (LL) intersects with midline M (see Figure 7.3b).

lsx = lix = prnx, (7.7)

lsy = miny

(My ∈ UL), (7.8)

liy = maxy

(My ∈ LL). (7.9)

To detect the stomion (sto) the local z-value minimum between ls and li is used.

stox = prnx, (7.10)

stoy = {y : I(prnx, y) = minliy≤y≤lsy

I(prnx, y)}. (7.11)

In the case that this local minimum is not present, stoy is set to the y-value of the local

z-value minimum nearest midline M .

The left and right cheilion are detected using a combination of two methods. The first

method builds on the local minimum search by detecting a mouth line U as the trough

between the upper and lower lip, ending once the trough disappears as the lips meet (Fig-

ure 7.3c). Specifically, using sto as the starting point, the line is extended to the left by

(a) Lips and corners (b) ls and li (c) local minima line U (d) ch detected

Figure 7.3: Detecting landmarks of the mouth.

47

selecting the minimum of the closest three neighbor points. This process stops, when no

local minima can be found. A corresponding approach is used for extending U to the right.

The one drawback to this approach, is that it may fail to stop at the appropriate point.

The second approach used is based on the peak curvature values. As the corners of the

mouth are natural peaks, this method searches along the horizontal for the two nearest

peak areas (or dots) to sto, marked green in Figure 7.3a. Once each mouth corner dot

is found, a bounding box of is defined. The geometrical center of each bounding box is

calculated, yielding the location of ch (Figure 7.3d). The drawback of this method is that

due to face shape the peak image may not contain the mouth corner dots or the dots may

extended downward to the bottom of the chin. When the mouth line and dot approaches

are used together, the drawbacks of each method are minimized.

7.3 Landmark Distances

A set of craniofacial anthropometric landmarks and inter-landmark distances to characterize

the craniofacial features frequently affected in 22q11.2DS were initially selected [32]. Fol-

lowing a reliability study by Dr. Heike, 33 of these measurements were identified based on

demonstrated high inter- and intra-rater reliability, as well as high inter-method reliability

when comparing measurements taken directly with calipers and those taken indirectly on

the 3dMD imaging system [39].

Twelve of these landmarks were amenable to automatic detection, and were used to cal-

culate ten inter-landmark distances (Table 7.1) for subsequent inter-method comparisons

between hand-labeled and automatically detected landmarks.

7.4 Landmark-based Descriptors

Outside of the robust landmark distances discussed above, combinations of landmark mea-

surements can be used to better describe the shape of a particular facial feature. Eight such

descriptors were developed for the nose, while six were used for the mouth.

48

Table 7.1: Landmark distances obtained using automatically detected landmarks.

Description Name Mathematical Definition Approximation used

Nose width LA1 = ‖alR − alL‖

Nose tip protrusion LA2 = ‖sn− prn‖

Mouth width LA3 = ‖chR − chL‖

Upper lip height LA4 = ‖sn− sto‖

Vermillion height of upper lip LA5 = ‖ls− sto‖

Vermillion height of lower lip LA6 = ‖sto− li‖

Length of R alar base LA7 = ‖acR − sn‖ ≡ ‖alR − sn‖

Length of R alar stretch LA8 = ‖acR − prn‖ ≡ ‖alR − prn‖

Length of L alar base LA9 = ‖acL − sn‖ ≡ ‖alL − sn‖

Length of L alar stretch LA10 = ‖acL − prn‖ ≡ ‖alL − prn‖

Helper distance functions are used in the calculation of several descriptors and are de-

fined as follows, where † denotes a standard anthropometric distance measure not included

in Dr. Heike’s subset (see Section 7.3).

Depthface = maxz −minz 6=0

z, (7.12)

Widthface = maxx −minx , where z 6= 0, (7.13)

Depthnose = maxz − snz, (7.14)

Widthnose† = alRx − alLx , (7.15)

DepthNroot = sz − mf ′Rz +mf ′Lz2

, (7.16)

WidthNroot† = mf ′Rx −mf ′Lx . (7.17)

7.4.1 Nasal Descriptors

The landmark-based nasal descriptors are defined in Table 7.2. In summary, the normalized

49

Table 7.2: List of nasal landmark-based descriptors.

Description Name Mathematical Definition

Normalized nose depth LN1 = Depthnose/Depthface

Normalized nose width LN2 = Widthnose/Widthface

Normalized nasal root width LN3 = WidthNroot/Widthface

Normalized nasal root depth LN4 = DepthNroot/Depthface

Average nostril inclination† LN5 = avg[∠(L mf ′,L al,R al),∠(R mf ′,R al,L al)]

Nasal tip angle† LN6 = ∠(s, prn, sn)

Alar-slope angle† LN7 = ∠(L al, prn,R al)

Nasal root-slope angle† LN8 = ∠(L mf ′, s,R mf ′)

nose depth (LN1 ) is the ratio of nose depth to face depth. The normalized nose width (LN2 )

is the ratio of the width of the nose to the width of the face. The normalized nasal root

width (LN3 ) is the ratio of the nasal root width to face width. The normalized nasal root

depth (LN4 ) is the ratio of the nasal root depth to face depth. Average nostril inclination

(LN5 ) is the average of the left and right angles created by the lines outlining the side of the

nose and the base of the nose. Nasal tip angle (LN6 ) is the angle on the midline M between

the sellion and subnasale. Alar-slope angle (LN7 ) is the 3D angle between the left and right

alae passing through the pronasale. Finally, the nasal root-slope angle (LN8 ) is calculated

as the 3D angle through the sellion and stopping at the left and right mf ′.

7.4.2 Oral Descriptors

The landmark-based nasal descriptors are defined in Table 7.3. In summary, the normalized

mouth length (LO1 ) is the ratio of the mouth width to face width. LO2 is the ratio of the

height of the vermilion3 portion of the upper lip to full mouth height. (LO3 ) is calculated

similarly to LO2 , but for the vermilion portion of the lower lip. The inclination of the labial

3vermilion is the red pigmented portion of the lips

50

fissure (LO4 ) calculates the angle between the line defined by the location of the left and

right cheilion and the horizontal line through the right cheilion. The upper vermilion angle

(LO5 ) is the angle between the corners of the mouth and the top of the vermilion part of

the upper lip. The lower vermilion angle (LO6 ) is calculated similarly to LO5 , but for the

vermilion portion of the lower lip.

Table 7.3: List of oral landmark-based descriptors.


Normalized mouth length LO1 = (R chx − L chx)/Widthface

Normalized vermilion height of upper lip LO2 = (lsy − stoy)/(lsy − liy)

Normalized vermilion height of lower lip LO3 = (stoy − liy)/(lsy − liy)

Inclination of labial fissure† LO4 = ∠(chR

x,y, chLx,y,horizontal)

Upper vermillion angle† LO5 = ∠(chR

x,y, lsx,y, chLx,y)

Lower vermillion angle† LO6 = ∠(chR

x,y, lix,y, chLx,y)

7.5 Shape-based Descriptors

Four sets of shape-based descriptors were developed. The first set describes the bulbous

nasal tip facial feature. The second and third set use an automatic nose edge approach to

describe nasal tubularity and the prominence of the nasal root. The fourth set describes

the mouth, focusing on such features as open, small, and downturned corners.

7.5.1 Bulbous Nasal Tip (BNT)

The nose region is grown using the pronasale(prn) as a seed pixel, while the threshold

is decreased gradually. NTd is a binary image representing the set of pixels in image I,

thresholded by depth maxz − d, where d is varied from 0 to Depthnose

NTd = (I(x, y) ≥ maxz − d). (7.18)

51

To normalize the bulbous features, the bounding box Bd for each NTd is constructed,

with the geometric center of Bd denoted by (Bx,By). The following four descriptors are

calculated.

Rectangularity

The ratio of nose area NTd to area of its bounding box Bd

Rd =num(NTd = 1)

area(Bd). (7.19)

The range of Rd is from 0 to 1; 1 is predictive of BNT .

Circularity

The difference between NTd and the matrix Ellipsed which represents an ellipse inscribed

in the bounding box Bd, with the same center as Bd, vertical diameter equal to the width

of the bounding box W (Bd) and horizontal diameter equal to the height of the bounding

box H(Bd).

Cd =∑x,y |NTd(x, y)− Ellipsed(x, y)|

area(Bd), (7.20)

where

Ellipsed(x, y) =

1 if (x−Bx)2

(W (Bd)/2)2 + (y−By)2

(H(Bd)/2)2 ≤ 0

0 otherwise. (7.21)

The range of Cd is from 0 to 1; 0 is predictive of BNT .

Triangularity

The difference between NTd and an isosceles triangle Triangled inscribed within the bound-

ing box Bd.

Td =∑x,y |NTd(x, y)− Triangled(x, y)|

area(Bd). (7.22)

52

The range of Td is from 0 to 1; 1 is predictive of BNT .

Upper Rectangularity

The area of the portion of the nose above yprn compared to its bounding box (BUd). This

is the same as the Rd calculation, except that only points above prny are considered.

Ud =num(NTd = 1)

area(BUd), y < prny. (7.23)

The range of Ud is from 0 to 1; 1 is predictive of BNT .

Severity Scores

For each descriptor δ listed above, a severity score Sevδ is defined as the portion of values

bigger than threshold Thδ as d varies from 1 to Depthnose.

Sevδ =num(δd > Thδ)Depthnose

. (7.24)

In each case, Thδ was empirically chosen to maximize the difference of average values for

severity score Sevδ between individuals with and without BNT .

For clarity, the calculation of SevR is described. Given two individuals, one with and

one without BNT , Rd was calculated at each increment of d, with the resulting values

(a) Rd and Cd (b) Td (c) Ud

Figure 7.4: The nose area compared to the bounding box and different descriptor shapes.

53

Figure 7.5: Nose area in relation to bounding box area for two individuals of the same ageand gender with and without BNT .

plotted in Figure 7.5. The count of points above ThR = 0.7 for the individual with severe

BNT is significantly greater than that of the individual with no BNT , yielding severity

scores SevR of 0.9 and 0.3 for the individual with and without BNT , respectively.

Bulbous Nose Coefficient

Using the two most basic descriptors, Rd and Cd, the bulbous coefficient can be defined as

the combination of their severities

β = SevR(1− SevC). (7.25)

Returning to the example from Figure 7.5, the severity scores β for the individuals with

and without BNT were 0.54 and 0.08, respectively.

54

Table 7.4: List of bulbous nasal tip shape-based descriptors.


Rectangle severity DB1 SevR

Circle severity DB2 SevC

Triangle severity DB3 SevT

Upper rectangle severity DB4 SevU

Bulbous nose coefficient DB5 β

7.5.2 Automatic Nose Edges

The vertical shape of the nose can be thought of as the left and right contour of the nose

from the alae to the sellion. These contour lines can be used to calculate the width of the

nasal root, and quantify the tubularity of the nose.

The x and y components of the alae positions are chosen as starting points of the left

Figure 7.6: Left and right contour lines of the nose

55

and right contour lines.

LL0 = (alLx , alLy ), (7.26)

LR0 = (alRx , alRy ). (7.27)

As the y-value moves between aly and sy, j increases from 0 to J = sy − aly. In order to

keep the contour line continuous, the decision of which point to add as Lj+1 comes down

to a choice between maintaining a straight line or moving towards the middle of the nose.

This choice is based on the location of the neighboring sharpest slope S. Therefore, the

point Lj+1 is picked by the x-positions of Sj+1 and Lj ; if Sj+1 is closer to the middle of the

face than a direct vertical movement of 1 unit in the y-direction from Lj , the edge moves

inward by 1 unit in the x-direction, otherwise the edge maintains its current course. More

concretely, for the left side of the nose

[LLj+1

]y

= (j + 1) + aly, (7.28)

[LLj+1

]x

=

[LLj ]x + 1 if SLj+1 >

[LLj ]x[LLj ]x otherwise. (7.29)

The right side of the nose LRj+1 is calculated similarly.

Once the contour lines are found, improved maxillofrontale (mf∗) locations can be found

using the original mf ′ y-positions

mf∗Ly = mf∗Ry = mf ′y, (7.30)

mf∗Lx =[LLmf ′y]x, (7.31)

mf∗Rx =[LRmf ′y]x. (7.32)

(7.33)

The width of the nasal root can be found by looking at the x-values of L at y = sy (see

Equation 7.38).

56

In addition to the helper distance functions described in Equation 7.12 through Equation

7.17, the following functions were needed

slope(P ) = x, y slope of the tangent line at point P , (7.34)

Lslope(L) = x, y slope of the line L, (7.35)

Zslope(P ) = z-directional slope of the tangent line at point P , (7.36)

DepthNewNroot = sz − mf∗Rz +mf∗Lz2

, (7.37)

WidthNewNroot = mf∗Rx −mf∗Lx . (7.38)

The descriptors for tubularity of the nose are listed in Table 7.5. In each case, the goal of

the descriptor is to describe the trapezoid shape of the nose and determine its closeness to

a rectangle (or tube). The total nose spread (DT1 ) calculates the distance the width of the

nose extends past the width of the nasal root. The new average nostril inclination (DT2 )

performs the same calculation as for LN5 , but uses the newly detected mf∗ locations. The

average point slope in the right and left contour line L (DT3 ) determines the average slope

change from point to point on both the left and right side of the nose. The average of Lslopes (DT

4 ) is the average of the entire slope of the right and left nasal contour line. The

Table 7.5: List of tubular shape-based descriptors.


Total nose spread DT1 = LL

sx− LL

alx + LRsx− LR

alx

New average nostril inclination DT2 = avg[∠(mf∗L, alL, alR),∠(mf∗R, alR, alL)]

Average point slope in left and right L DT3 =

PJj=0[slope(LL

j ) + slope(LRj )]/2J

Average of L slopes DT4 = avg[Lslope(LL), Lslope(LR)]

Ratio nasal root to nose width DT5 = WidthNroot/Widthnose

New ratio nasal root to nose width DT6 = WidthNewNroot/Widthnose

57

ratio of the nasal root of the nose width (DT5 ) provides a fractional assessment of the top

versus the bottom of the nasal trapezoid. Lastly, the new ratio of the nasal root to nasal

width (DT6 ) performs the same calculation as DT

5 , but uses the new mf∗ landmarks.

The descriptors for the prominence of the nasal root are shown in Table 7.6. The mini-

mum distance from left to right contour lines (DR1 ) detects the sharpness of the top of the

nasal trapezoid. Although this distance is often positive, it is possible for LL and LR to

cross one another yielding a negative distance. The number of points with a severe slope

(DR2 ) is calculated as a sum of the left and right edges which have z-direction slope greater

than or equal to three. The average slope at mf∗ (DR3 ) is the average of the z-slope at the

right and left mf∗ locations. DR4 is the ratio for the nasal root to the nose depth. The new

nasal root-slope angle (DR5 ), new normalized nasal root depth (DR

6 ), and new normalized

nasal root width (DR8 ) are calculated the same way as LN8 , LN4 , and LN3 , respectively, but

by using the newly calculated mf∗. Lastly, the new ratio nasal root to nose depth (DR7 ) is

a ratio between the depth of the new nasal root depth to the depth of the nose.

Table 7.6: List of nasal root shape-based descriptors.


Min distance from left and right L DR1 = minj(LR

j − LLj )

Number of severe slope points ≥ 3 DR2 = num[Zslope(LL

j ) ≥ 3] + num[Zslope(LRj ) ≥ 3]

Average slope at mf∗ DR3 = avg[Zslope(mf∗L), Zslope(mf∗R)]

Ratio nasal root to nose depth DR4 = DepthNroot/Depthnose

New nasal root-slope angle DR5 = ∠(mf∗L, s,mf∗R)

New normalized nasal root depth DR6 = DepthNewNroot/Depthface

New ratio nasal root to nose depth DR7 = DepthNewNroot/Depthnose

New normalized nasal root width DR8 = WidthNewNroot/Widthface

58

7.5.3 Oral Shape-based Descriptors

The descriptors for the prominence of the nasal root are shown in Table 7.7. DO1 is used as

a descriptor for the Open Mouth facial feature and uses the peak image of an individual to

compare the lip areas to the bounding box (Peak) which contains them. The area of Peak

is restricted by the location of li, ls, chL and chR.

DO2 , DO

3 and DO4 are used as descriptors for the Small Mouth facial feature. The normalized

depth of the upper (DO2 ) and lower (DO

3 ) lips are calculated as ratios of the vermilion depth

to the depth of the face. DO4 is the ratio of the columella to the upper lip height.

DO5 , DO

6 and DO7 are used as descriptors for the Downturned Corners of the Mouth fa-

cial feature. The inclination angles of the left (DO5 ) and right (DO

6 ) sides of the lip are

found by calculating the angle between the arm of the selected ch and sto, and the horizon-

tal arm through stoy. Lastly, the corners of mouth angle (DO7 ) is calculated by finding the

southern angle between the points chL, sto and chR, yielding an obtuse angle if the corners

are downturned and a reflex angle if the corners are upturned.

Table 7.7: List of oral shape-based descriptors.


Rectangularity of lips at chz DO1 = num(Peak = 1)/area(Peak)

Normalized depth of upper lip DO2 = (lsz − stoz)/Depthface

Normalized depth of lower lip DO3 = (stoz − liz)/Depthface

Ratio of columella to upper lip height DO4 = (sny − lsy)/(sny − stoy)

R-side lip inclination angle DO5 = ∠(R ch, sto, horizontal)

L-side lip inclination angle DO6 = ∠(L ch, sto, horizontal)

Corners of mouth angle DO7 = ∠(L ch, sto,R ch)

59

Chapter 8

LOCAL REPRESENTATION RESULTS

In this chapter the results from using local data representations are provided. In the pre-

liminary studies section ground truth data will be used to develop baselines, the accuracy

of automatic landmark prediction will be discussed, and the threshold selection for the bul-

bous nasal tip descriptors will be described. The experimental section will be divided into

landmark-based descriptor assessment and shape-based descriptor assessment. For both

types of descriptors, the similarity to expert median scores will be computed and clas-

sification performance of 22q11.2DS will be measured. As the best performing classifier

alternated sporadically between Naive Bayes and SVM, both sets of results are shown when

necessary. Note that even in those cases where SVM is the better classifier, the performance

improvement is unlikely to be statistically significant.

8.1 Preliminary Studies

8.1.1 Experts’ Median Scores as 22q11.2DS Predictors

The use of experts’ median scores for classification was assessed. Although there are four

facial features each for the nose and mouth, Small Nasal Alae and Retrusive Chin are fea-

tures for which successful landmark and shape-based descriptors have yet to be developed.

The sets missing the above two features are labeled as auto-3N (containing Bulbous Nasal

Tip, Tubular Appearance, Prominent Nasal Root) and auto-3O (containing Small Mouth,

Open Mouth, Downturned Corners of the Mouth), and the set containing all but the two

above features is labeled as auto-6 (auto-3N and auto-3O combined).

As seen in Table 8.2a, the use of the SVM classifier yields slightly better performance

when classifying all four nasal features, but when only the auto-3N features are used (Bul-

bous Nasal Tip, Prominent Nasal Root, and Tubular Appearance), Naive Bayes is the better

60

Table 8.1: Using experts’ median scores for facial features to predict 22q11.2DS. In eachtable, the upper set of results was obtained using Naive Bayes, the lower using SVM.

Dataset BNT PNR TA SNA auto-3N ALL-N

F-measure 0.68± 0.18 0.49± 0.23 • 0.46± 0.23 • 0.78± 0.16 0.73± 0.16 0.74± 0.18Precision 0.81± 0.20 0.60± 0.27 • 0.62± 0.32 0.86± 0.17 0.79± 0.17 0.83± 0.19Recall 0.63± 0.23 0.44± 0.24 0.39± 0.22 • 0.76± 0.21 0.71± 0.21 0.71± 0.23% Accuracy 72.49±14.42 58.83±14.61 • 57.00±16.23 • 80.69±12.29 74.93±13.75 76.92±13.51

F-measure 0.71± 0.17 0.52± 0.21 • 0.49± 0.22 • 0.80± 0.16 0.71± 0.17 0.80± 0.16Precision 0.85± 0.17 0.69± 0.26 0.67± 0.29 0.88± 0.16 0.85± 0.17 0.88± 0.16Recall 0.65± 0.22 0.46± 0.23 0.42± 0.22 • 0.77± 0.20 0.65± 0.22 0.77± 0.20% Accuracy 75.60±13.62 62.46±13.05 • 60.24±14.82 • 82.51±11.66 75.60±13.62 82.51±11.66

• statistically significant degradationBNT - Bulbous Nasal Tip, PNR - Prominent Nasal Root, TA - Tubular Appearance, SNA - Small Nasal Alae

(a) Nasal facial features.

Dataset OM SM DCM RC auto-3O ALL-O

F-measure 0.21± 0.21 0.48± 0.24 ◦ 0.27± 0.22 0.35± 0.28 0.48± 0.23 ◦ 0.52± 0.23 ◦Precision 0.34± 0.37 0.68± 0.33 0.26± 0.21 0.63± 0.46 0.62± 0.29 0.68± 0.29 ◦Recall 0.16± 0.17 0.40± 0.24 ◦ 0.32± 0.28 0.26± 0.24 0.43± 0.24 ◦ 0.45± 0.24 ◦% Accuracy 47.90±12.09 61.14±14.64 ◦ 34.53±10.25 • 60.07±13.99 ◦ 58.21±15.76 62.04±15.50 ◦F-measure 0.33± 0.23 0.49± 0.25 0.37± 0.25 0.35± 0.29 0.48± 0.24 0.51± 0.24Precision 0.32± 0.23 0.73± 0.33 ◦ 0.39± 0.26 0.65± 0.47 0.69± 0.32 ◦ 0.71± 0.30 ◦Recall 0.37± 0.31 0.40± 0.24 0.39± 0.30 0.26± 0.25 0.40± 0.23 0.43± 0.24% Accuracy 40.54±12.21 63.92±14.28 ◦ 44.93±17.34 61.89±12.10 ◦ 62.06±14.81 ◦ 63.35±15.04 ◦

◦, • statistically significant improvement or degradationOM - Open Mouth, SM - Small Mouth, DCM - Downturned Corners of Mouth, RC - Retrusive Chin

(b) Oral facial features.

Dataset auto-6 ALL

F-measure 0.61± 0.19 0.80± 0.16 ◦Precision 0.70± 0.22 0.85± 0.17Recall 0.58± 0.23 0.80± 0.19 ◦% Accuracy 65.31±14.07 81.40±13.39 ◦F-measure 0.71± 0.15 0.80± 0.14Precision 0.76± 0.18 0.86± 0.16Recall 0.71± 0.20 0.78± 0.19% Accuracy 71.99±13.97 80.67±12.79


(c) Comparison to 2.5D global results.

performer. Note that in both the nasal and oral cases, if the smaller sets of features auto-

3(N or O) are used, the performance is worse than that of global 2.5D, while when using

all four features ALL-(N or O), the performance matches that of global 2.5D.

61

When classification is done using the oral facial features (Open Mouth, Small Mouth, Down-

turned Corners of the Mouth, Retrusive Chin), the performance on any of the features or

their combinations is very poor, with only the use of all oral features for classification re-

ceiving an F-measure above 0.5 (Table 8.2b).

When the auto-6 features are used for classification, the performance decreases from that of

just using the auto-3N set. When all the nasal and oral median experts’ scores are used, the

performance shows improvement over that of the global 2.5D method, see Table 8.2c. The

statistically significant difference between the use of the auto-6 and all the scores can be

explained by the fact that Small Nasal Alae and Retrusive Chin are the top two attributes

used for classification. The Bulbous Nasal Tip and Open Mouth facial features fall into

second place, followed by Small Mouth in third.

8.1.2 Automatic Landmark Placement

The ability to properly locate anthropometric landmarks using the automated system was

checked. A visual inspection of the landmark location was used to determine accuracy of

placement. These results were compared to the availability of hand-labeled landmarks as

completed by an expert. As seen in Table 8.2, the nasal landmarks were detected at 98%+

accuracy. Oral landmarks had a slightly lower accuracy rate (93% on average), and the

helper landmarks mf were found at 92% accuracy. The availability of the hand-labeled

data is generally less than that of the automatic detection, with the note that the li field in

the hand-labeled data was purposefully omitted when a subject’s mouth was open. Based

on these results, perhaps some of the tedious manual land-marking may be substituted with

a first-pass auto landmark placement to be corrected by an expert only when necessary.

Table 8.2: Correct automatic placement compared to availability of hand-labeled landmarks.

s prn sn alL alR ls sto li chL chR mfL mfR

Hand-labeled 100% 98% 98% 98% 98% 98% 87% 78% 83% 83% n/a n/aAutomatic 100% 100% 100% 98% 100% 94% 95% 93% 92% 92% 92% 92%

62

8.1.3 Performance of Anthropometric Landmark Distance Measures

As anthropometric landmark distance measures are the standard for comparison in the

clinical setting, the quality of this method needed to be assessed. As described in Section

3.3, L60 is a 1:1 affected vs. control set contained within the W86 set. For valid comparison

to the baseline 2.5D depth image method, set L60-2.5D was generated as a subset of the

W86 set. L60-ALL is the data set where all of the original 33 distance measures were used

to classify individuals, while L60-10 is the set of the 10 inter-landmark distances, which

match the set of distances that can be calculated on the automatically detected landmarks

(L60-LA). As seen in Table 8.3, the performance of L60-2.5D is less than that of the global

2.5D baseline on the set W86 (F-measure 0.77). All of the landmark distance methods

were worse than L60-2.5D, as illustrated by the ROC curve in Figure 8.1. Note that when

using only the 10 distance measures, the automatically generated landmark set L60-LA,

outperforms the hand-labeled landmark set L60-10.

Table 8.3: Prediction of 22q11.DS using landmark distance measures.

Dataset L60-2.5D L60-ALL L60-10 L60-LA

F-measure 0.71 0.12 • 0.04 • 0.49 •Precision 0.83 0.13 • 0.04 • 0.48 •Recall 0.67 0.13 • 0.06 • 0.54% Accuracy 75.67 49.33 • 46.33 • 48.67 •


Figure 8.1: ROC performance curve.

63

8.1.4 Bulbous Nasal Tip Threshold Selection

For each descriptor δ, threshold Thδ was found empirically to maximize the difference

of average values between individuals with and without BNT . To find these thresholds,

severity scores of all individuals were calculated in threshold increments of 0.01 between 0

and 1. For each increment step, the average of the group without BNT and the average

of the group with BNT were calculated. The difference between these two groups was

then maximized for each descriptor yielding ThR = 0.71, ThC = 0.10, ThT = 0.37, and

ThU = 0.67 (Figure 8.2). To check that Thδ are stable in the population, the above study

was repeated for an expanded set of individuals totaling 164 (53 affected with 22q11.2DS).

For each of the four descriptors, the new thresholds were found to be unchanged.

(a) R threshold (b) C threshold

(c) T threshold (d) U threshold

Figure 8.2: Empirical approach to threshold detection for each descriptor.

64

8.2 Experiments

8.2.1 Landmark-Based Nasal Descriptor Similarity to Expert Scores

The purpose of this experiment was to assess the ability of the landmark-based nasal de-

scriptors (LN ) to match the experts’ median scores for these features. As seen in Table

8.4, the ability of LN to match the experts’ median response for any nasal facial feature

is relatively weak. In the case of Tubular Appearnace the performeance is slightly higher,

and this can be explained by the fact that LN5 is a measure of tubularity, as its definition is

based on the shape of the nasal trapezoid angles.

Table 8.4: Predicting expert marked nasal features using LN data set. The upper set ofresults was obtained using Naive Bayes, the lower using SVM.

Dataset BNT PNR TA SNA

F-measure 0.66± 0.16 0.69± 0.13 0.74± 0.10 0.66± 0.13Precision 0.63± 0.14 0.68± 0.10 0.69± 0.08 0.64± 0.15Recall 0.70± 0.21 0.73± 0.19 0.82± 0.15 0.73± 0.21%Accuracy 57.11±17.18 59.56±13.89 62.03±12.50 60.08±14.19

F-measure 0.68± 0.12 0.79± 0.04 0.81± 0.03 0.64± 0.14Precision 0.61± 0.09 0.66± 0.04 0.69± 0.04 0.59± 0.14Recall 0.81± 0.21 0.99± 0.06 1.00± 0.02 0.73± 0.20%Accuracy 57.28±11.49 65.44± 5.51 68.63± 4.29 55.82±12.90

BNT - Bulbous Nasal Tip, PNR - Prominent Nasal Root

TA - Tubular Appearance, SNA - Small Nasal Alae

8.2.2 Landmark-Based Oral Descriptor Similarity to Expert Scores

The purpose of this experiment was to assess the ability of the landmark-based oral descrip-

tors (LO) to match the experts’ median scores for oral facial features. As seen in Table 8.5,

Open Mouth is well predicted, most likely due to LO2 and LO3 , which are ratios of the upper

and lower lips to the entire mouth height, and LO5 and LO6 , whose angles would become

steeper as the mouth is opened. The high performance in predicting Retrusive Chin is a

red herring, as no landmark-based descriptors include any information below the lower lip.

65

Table 8.5: Predicting the four oral features using LO data set. The upper set of results wasobtained using Naive Bayes, the lower using SVM.

Dataset OM SM DCM RC

F-measure 0.93±0.06 0.76± 0.14 0.77± 0.09 0.90±0.07Precision 0.95±0.06 0.79± 0.13 0.71± 0.11 0.86±0.04Recall 0.93±0.10 0.75± 0.18 0.87± 0.14 0.94±0.11% Accuracy 89.11±9.21 66.50±16.03 68.53±11.92 82.39±9.75

F-measure 0.92±0.03 0.85± 0.03 0.76± 0.05 0.93±0.02Precision 0.86±0.05 0.74± 0.04 0.62± 0.05 0.87±0.03Recall 0.98±0.04 1.00± 0.00 0.97± 0.07 1.00±0.00% Accuracy 85.07±5.24 74.44± 4.10 61.43± 6.02 87.22±3.23

OM - Open Mouth, SM - Small Mouth

DCM - Downturned Corners of the Mouth, RC - Retrusive Chin

8.2.3 Landmark-Based Descriptor Classification of 22q11.2DS

The prediction of 22q11.2DS performance for the nasal, oral and combined landmark-based

descriptors are compared to the 2.5D global approach. As seen in Table 8.6, although using

the combination of both the nasal and oral landmark-based descriptors provides an im-

provement over using just one type of landmark-based descriptor, none of these outperform

the 2.5D global descriptor. Note also that in this case Naive Bayes is the better performing

classifier.

Table 8.6: Predicting 22q11.2DS using landmark-based descriptors. The upper set of resultswas obtained using Naive Bayes, the lower using SVM.

Dataset 2.5D LN LO LN+O

F-measure 0.77± 0.17 0.53± 0.22 • 0.47± 0.23 • 0.55± 0.22 •Precision 0.87± 0.17 0.57± 0.25 • 0.62± 0.30 • 0.64± 0.26 •Recall 0.72± 0.22 0.54± 0.27 0.41± 0.24 • 0.53± 0.25% Accuracy 79.90±13.62 56.51±16.35 • 57.29±15.57 • 61.15±16.12 •F-measure 0.45± 0.23 0.48± 0.20 0.47± 0.21 0.53± 0.19Precision 0.55± 0.29 0.51± 0.22 0.50± 0.24 0.56± 0.21Recall 0.42± 0.25 0.51± 0.26 0.48± 0.25 0.54± 0.23% Accuracy 53.78±16.32 50.31±16.51 50.51±16.89 54.35±17.23

• statistically significant improvement or degradation

66

8.2.4 Shape-Based Descriptor Similarity to Expert Scores

Compared to landmark-based descriptors, shape-based descriptors should perform better in

matching experts’ median scores. As seen in Table 8.7, the greatest improvement in match-

ing the experts’ median scores is in predicting Bulbous Nasal Tip and Tubular Appearance.

The prediction of Prominent Nasal Root and Small Mouth are slightly improved. Lastly,

Open Mouth and Downturned Corners of the Mouth match the landmark-based descriptors,

as both of these features can be easily described by Euclidean geometry measures.

Table 8.7: Using shape-based descriptors for predicting nasal and oral facial features. Foreach descriptor type, the right arrow indicated the facial feature experts’ median score towhich it is compared. The upper set of results was obtained using Naive Bayes, the lowerusing SVM.

Dataset DB → BNT DT → TA DR → PNR DO1 → OM DO

2:4 → SM DO5:7 → DCM

F-measure 0.88± 0.10 0.73± 0.15 0.65± 0.20 0.93± 0.07 0.65± 0.15 0.76± 0.11Precision 0.88± 0.12 0.85± 0.14 0.81± 0.21 0.93± 0.07 0.77± 0.16 0.70± 0.11Recall 0.90± 0.13 0.65± 0.18 0.57± 0.22 0.93± 0.09 0.58± 0.18 0.85± 0.15% Accuracy 85.78± 11.15 68.03±15.09 62.89±17.54 88.00±10.85 56.07±15.52 67.07±13.68

F-measure 0.88± 0.08 0.80± 0.05 0.74± 0.09 0.92± 0.03 0.85± 0.03 0.77± 0.03Precision 0.86± 0.12 0.69± 0.05 0.65± 0.06 0.86± 0.04 0.74± 0.04 0.63± 0.04Recall 0.91± 0.11 0.97± 0.08 0.89± 0.16 1.00± 0.00 1.00± 0.00 1.00± 0.00% Accuracy 84.42± 10.87 67.79± 6.13 60.83± 9.42 86.11± 4.23 74.44± 4.10 62.78± 4.08

8.2.5 Shape-Based Descriptor Classification of 22q11.2DS

The prediction of 22q11.2DS for the nasal and oral shape-based landmarks are compared

to the 2.5D global approach. As seen in Table 8.8, using all the nasal descriptors (DN ), the

performance of 2.5D is matched, while using just the oral DO descriptors disease prediction

is decreased from the 2.5D baseline. When the both the nasal and oral descriptors are used

together DALL, the performance exceeds that of the 2.5D global descriptor.

67

Table 8.8: Performance of shape-based descriptors in predicting 22q11.2DS. The upper setof results was obtained using Naive Bayes, the lower using SVM.

Dataset 2.5D β DB DT DR DN DO DALL

F-measure 0.77 0.64 0.73 0.69 0.71 0.77 0.66 0.79Precision 0.87 0.77 0.82 0.64 • 0.64 • 0.72 0.74 0.75Recall 0.72 0.58 0.69 0.77 0.83 0.84 0.64 0.85% Accuracy 79.90 70.19 76.42 65.99 67.13 74.71 69.92 77.29

F-measure 0.45 0.64 0.70 ◦ 0.71 ◦ 0.65 ◦ 0.78 ◦ 0.59 0.77 ◦Precision 0.55 0.82 ◦ 0.83 ◦ 0.65 0.62 0.79 ◦ 0.67 0.78Recall 0.42 0.55 0.63 0.80 ◦ 0.72 ◦ 0.79 ◦ 0.56 0.80 ◦% Accuracy 53.78 71.83 ◦ 74.69 ◦ 66.99 62.85 77.82 ◦ 62.78 76.71 ◦

◦, • statistically significant improvement or degradation

8.2.6 All Local Descriptor Classification of 22q11.2DS

SVM’s performed better in matching experts’ median scores, and Naive Bayes performed

better in classifying the disease status of 22q11.2DS. When all descriptors for a facial feature

are used, the SVM classification yields improved scores from 2.5D global results for the nasal

descriptors and all descriptors combined.

Table 8.9: Predicting 22q11.2DS using all nasal, all oral and all descriptors. The upper setof results was obtained using Naive Bayes, the lower using SVM.

DescriptorsDataset 2.5D LN +DN LO+DO ALL

F-measure 0.77 0.75 0.64 0.78Precision 0.87 0.71 • 0.73 0.77Recall 0.72 0.84 0.61 0.83% Accuracy 79.90 72.71 68.74 77.21

F-measure 0.45 0.81 ◦ 0.60 0.79 ◦Precision 0.55 0.83 ◦ 0.68 0.81 ◦Recall 0.42 0.81 ◦ 0.58 0.81 ◦% Accuracy 53.78 80.97 ◦ 64.46 79.11 ◦

◦, • significant improvement or degradationALL is the set of all descriptors LN+O + DN+O

68

Chapter 9

CONCLUSIONS

This dissertation has discussed the development of a successful methodology for classifying

22q11.2DS disease status and quantifying the degree of dysmorphology of global and local

facial features.

9.1 Contributions

The contributions of this work are

• Automated methodology for pose alignment. Each 3D head mesh is aligned to a natu-

ral pose using, first, facial symmetry and, second, chin-forehead elevation differences.

The facial symmetry approach required human intervention in only 1% of the cases.

Due to a stronger reliance on the initial seed position, the pitch rotation approach

based on chin-forhead elevation differences required manual intervention in 15% of

the cases.

• Automated generation of global data representations, including human-readable repre-

sentations such as snapshots of three-dimensional data and curved lines, data intensive

representations such as 2.5D depth images and labeled images, as well as data aggre-

gate representations such as facial symmetry or distance from control-set average.

• Robust automated detection of landmarks, where the accuracy of landmark place-

ment (above 90% in all cases) rivaled that of hand-labeled landmark availability. This

suggests that as an alternative to tedious landmarking performed by an expert, an

automated detection of landmarks could be performed with expert intervention nec-

essary in less than 10% of the cases.

69

• Automated generation of local data descriptors for the nose and mouth. For each

facial feature, landmark-based and shape-based descriptors were developed.

• Use of global and local descriptors for 22q11.2DS classification on real clinical data.

2.5D depth images were used as a baseline representation scheme (F-measure 0.77),

with snapshots of three-dimensional data and curved lines having a slightly decreased

classification performance (best F-measure 0.71 and 0.76, respectively). When used

with standard medical research methodology, the global Mahalanobis distance from

control-set average was found to be the best data representation for classification

(F-measure 0.94), while methods such as symmetry, topographically labeled images,

and local landmark-based descriptors all performed poorly (best F-measure 0.59, 0.58,

0.55, respectively). Classification on curvature labeled images (best F-measure 0.73)

and local shape-based descriptors (F-measure 0.78) matched that of the 2.5D depth

image baseline.

• Use of local descriptors for shape quantification of nasal and oral facial features. Each

landmark-based and shape-based descriptor method was compared to the median of

the experts’ scores and shape-based descriptors were found to outperform landmark-

based descriptors. Nasal features such as Bulbous Nasal Tip and Tubular Appearance,

produced F-measure scores of over 0.80, while Prominent Nasal Root was harder to

detect at an F-measure score of 0.65. Open Mouth was the only facial feature examined

that matched the expert scores at an F-measure of more than 0.90 using both the

landmark-based and shape-based descriptors, while Small Mouth and Downturned

Corners of the Mouth shape-based descriptors had F-measure scores of 0.85 and 0.77,

respectively. The mismatches to the expert scores in both the nasal and oral features

are not necessarily incorrect predictions, as selective screening of mismatches has

suggested mislabeling of the facial feature by experts. Examples of such mislabeling

include marking the presence of a bulbous nasal tip, when the small size of the nasal

alae is the actual feature or marking the presence of a prominent nasal root, when the

nose is tubular in appearance.

70

Representative global and local descriptor classification of 22q11.2DS per individual can

be seen in Table 9.2 for males and Table 9.3 for females, with the legend given in Ta-

ble 9.1. Prediction errors are marked as dark boxes, while correct prediction is white.

Note that classification using most global descriptors tends to complement that of the local

descriptors. The proportionally smaller male set does contain more classification errors,

supporting the need to recruit more study participants. Affected individuals are more likely

to be misclassified as controls, supporting the fact that the 22q11.2DS does have a very

subtle phenotype. Lastly, the errors in local descriptor classification support the fact that

phenotypic variation of any facial feature within the general population increases the dif-

ficulty of discriminating between 22q11.2DS affected individuals and the control population.

Although the focus of this work was 22q11.2 deletion syndrome affected individuals, the

methods developed for this phenotype should be widely applicable to the shape-based quan-

tification of any other craniofacial dysmorphology.

Table 9.1: Legend for classification errors for each individual in Table 9.2 and Table 9.3.

Name of descriptor Type Description

3D snp Global Based on 3D snapshot data representation3D snpc Global Based on 3D snapshot data representation cutoff at ears2.5D Global Based on 2.5D depth image data representation generated from 3D snpc

v3 Global Three vertical curved line representationv5 Global Five vertical curved line representationh5 Global Five horizontal curved line representationh7 Global Seven vertical curved line representationg5 Global 5x5 curved line grid representationg7 Global 7x7 curved line grid representationsym Global Symmetry of face data representationtopo 15 Global Topographic data label with 15 step size data representationK Global Gaussian curvature label thresholded between values -0.5 and +0.5|K| Global Absolute value of K label also thresholded at value 0.5Besl-Jain Global Besl-Jain curvature labelMah Global Mahalanobis distance from control averageL Local All local landmark-based descriptorsDB Local Set of bulbous nasal tip descriptorsDT Local Set of tubular appearance descriptorsDR Local Set of prominent nasal root descriptorsDN Local Set of all nasal descriptors {DB , DT , DR}DO Local Set of all oral descriptorsD Local Set of all shape-based descriptors {DB , DT , DR, DO}LD Local Set of all local descriptors {LN , LO, DB , DT , DR, DO}

71

Table 9.2: Errors in male individuals of W86 dataset. Representative global and localdescriptors are shown. Dark boxes signify errors.

Affected 3Dsnp

3Dsnpc

2.5D v3 v5 h5 h7 g5 g7 sym topo15

K |K| BeslJain

Mah L DB DT DR DN DO D LD

M5 0 0 1M6 0 0 1M10 0 0 1M9 0 0 1M13 0 0 1M2 0 0 3M6 0 0 2M13 0 0 2M7 0 0 1M9 11 0 2M1 2 0 1M2 0 0 1M2 0 0 2M20 0 0 1M8 0 0 1M5 0 0 2M6 0 0 3M13 0 0 3M10 0 0 2

Control 3Dsnp

3Dsnpc


K |K| BeslJain


M13 0 1 4M5 0 1 1M8 3 1 2M9 2 1 1M2 2 1 4M7 1 1 3M7 1 1 5M2 1 1 5M6 8 1 2M9 11 1 2M10 1 1 5M13 6 1 3M13 1 1 1M5 11 1 1M1 0 1 2M10 1 1 6M7 1 1 1M2 0 1 9M20 3 1 1

72

Table 9.3: Errors in female individuals of W86 dataset. Representative global and localdescriptors are shown. Dark boxes signify errors.

Affected 3Dsnp

3Dsnpc


K |K| BeslJain


F1 11 0 6F5 0 0 1F7 0 0 2F14 0 0 1F21 0 0 1F13 0 0 2F18 0 0 1F7 0 0 1F3 0 0 1F2 0 0 1F26 0 0 1F31 0 0 1F8 0 0 2F1 0 0 1F34 10 0 1F10 0 0 2F0 10 0 1F13 0 0 1F11 0 0 1F4 0 0 2F8 0 0 1F4 0 0 1F4 10 0 1F9 3 0 3

Control 3Dsnp

3Dsnpc


K |K| BeslJain


F5 5 1 1F25 1 1 1F4 1 1 2F7 3 1 1F3 0 1 4F14 9 1 1F5 5 1 2F9 10 1 2F1 11 1 3F10 9 1 3F18 10 1 1F2 3 1 1F21 1 1 2F29 6 1 1F8 8 1 3F8 8 1 4F1 2 1 8F1 1 1 1F12 4 1 1F15 2 1 1F13 3 1 1F4 6 1 1F34 2 1 1F8 8 1 2

73

9.2 Future Work

As the classification of 22q11.2DS disease status has a solution which rivals experts, further

work in this area should focus on local facial feature description and the development of a

full quantitive description of the face.

Local Facial Feature Description

Additional local features should be investigated and new landmark- and shape-based de-

scriptors should be developed. The most promising facial features for study are the ears,

eyes and midface hypolasia and some improvements can be made to the description of

pinched nasal alae and retrusive chin.

Ears The shapes of symptomatic ears are listed to have any of the following features:

small, protuberant, cup-shaped, attached lobules, overfolded helix (cauliflower-like appear-

ance) and mildly asymmetric placement on the head. Small and protuberant ears could be

detected using descriptors developed for the current 2.5D depth image. Cup-shaped, at-

tached lobules and overfolded helix features mostly likely require a more stringent analysis of

the original 3D mesh shape. Lastly, asymmetric placement on the head, can be approached

using a local version of the global symmetry measure developed in this dissertation.

Eyes The shapes of symptomatic eyes are listed to have any of the following features:

small, mild orbital hypertelorism (distance between eyes), mild vertical orbital dystopia

(vertical placement and inclination angle of left vs. right eye), and hooded upper eyelids.

For each one of these features, new shape descriptors can be developed on the current 2.5D

depth image.

Midface Hypoplasia Although a feature of 22q11.2DS, the cleft-lip and palate research

community is very interested in detecting the quality of midface morphology. Here, a subset

of vertical curved lines through the cheek area can be assessed for global curvature quality.

74

Pinched Nasal Alae Although mentioned in this dissertation, the methods developed

so far were not able to properly assess the quality of these two facial features. For pinched

nasal alae, an analysis of the original 3D mesh may yield improved assessment results, but

the coarseness of the mesh may prove to be inadequate.

Retrusive Chin For the retrusive chin feature, the general shape of the skull is a needed

prerequisite, and as a first step a skull shape estimation method would need to be developed

to fill in the areas removed due to noise caused by hair. Once this is done, a study must be

conducted whether the retrusive chin is a product of poor mandible development, or if it is

a result of a rotation of the skull forward, yielding to a prominent bulging of the forehead.

The latter option can be studied independently by developing a forehead shape descriptor

to discriminate between concave, flat and convex (bulging) foreheads.

Quantitative Facial Description

As phenotype-genotype studies of different craniofacial dysmorpholgy syndromes are of great

interest to researchers, the development of a full quantitative facial description is necessary.

Since expert qualitative ratings of shapes can be subject to low inter-rater reliability, the use

of automatic local facial shape descriptors can be used to avoid such problems. Lastly, since

each descriptor offers a quantitative value for the feature it is describing, combinations of

these values can be used to study gene expression variation and the etiology of craniofacial

malformation.

75

BIBLIOGRAPHY

[1] D Aha, D Kibler, and M Albert. Instance-based learning algorithms. Mach Learn,

1991.

[2] E Akagunduz and I Ulusoy. 3d object representation using transform and scale invariant

3d features. ICCV, pages 1–8, 2007.

[3] Kristina Aldridge, Simeon A Boyadjiev, George T Capone, Valerie B DeLeon, and

Joan T Richtsmeier. Precision and error of three-dimensional phenotypic measures

acquired from 3dmd photogrammetric images. Am J Med Genet, 138A:247–53, 2005.

[4] Judith E Allanson. Objective techniques for craniofacial assessment: what are the

choices? Am J Med Genet, 70:1–5, 1997.

[5] American Cleft Palate-Craniofacial Association.

[6] LL Baxter, TH Moran, Joan T Richtsmeier, J Troncoso, and RH Reeves. Discovery and

genetic localization of down syndrome cerebellar phenotypes using the ts65dn mouse.

Hum Mol Genet, 9:195–202, 2000.

[7] D Becker, T Pilgram, L Marty-Grames, D Govier, Jefferey L Marsh, and Alex A Kane.

Accuracy in identification of patients with 22q11. 2 deletion by likely care providers

using facial photographs. Plast Reconstr Surg, 2004.

[8] P Belhumeur, J Hespanha, and David J Kriegman. Eigenfaces vs. fisherfaces: recogni-

tion using class specific linear projection. IEEE T Pattern Anal, 1997.

[9] Volker Blanz. A learning-based high-level human computer interface for face modeling

and animation. LECTURE NOTES IN COMPUTER SCIENCE, 4451:296, 2007.

76

[10] Stefan Boehringer, Tobias Vollmar, Christiane Tasse, Rolf P Wurtz, Gabriele Gillessen-

Kaesbach, Bernhard Horsthemke, and Dagmar Wieczorek. Syndrome identification

based on 2d analysis software. Eur J Hum Genet, 14:1082–1089, 2006.

[11] FL Bookstein. Shape and the information in medical images: A decade of the morpho-

metric synthesis. Computer Vision and Image Understanding, 66:97–118, 1997.

[12] Bita Boozari, Matthias J Bahr, Stefan Kubicka, Juergen Klempnauer, Michael P

Manns, and Michael Gebel. Ultrasonography in patients with budd-chiari syndrome -

diagnostic signs and prognostic implications. J Hepatol, 49:572–80, 2008.

[13] L Botto, K May, P Fernhoff, A Correa, and K Coleman. A population-based study of

the 22q11. 2 deletion: Phenotype, incidence, and contribution to major birth defects

in the population. Pediatrics, 2003.

[14] KW Bowyer, Kyong I Chang, Patrick J Flynn, and X Chen. Face recognition using

2-d, 3-d, and infrared: Is multimodal better than multisample? Proceedings of the

IEEE, 94:2000–2012, 2006.

[15] C D Brack and I L Kessel. Evaluating the clinical utility of stereoscopic clinical pho-

tography. Studies in health technology and informatics, 132:42–4, 2008.

[16] Linda E Campbell, Eileen Daly, Fiona Toal, Angela F Stevens, Rayna Azuma, Marco

Catani, Virginia Ng, Therese van Amelsvoort, Xavier Chitnis, William Cutter, Declan

G M Murphy, and Kieran C Murphy. Brain and behaviour in children with 22q11.2

deletion syndrome: a volumetric and voxel-based morphometry mri study. Brain,

129:1218–1228, 2006.

[17] Kyong I Chang, Kevin W Bowyer, and Patrick J Flynn. Multiple nose region matching

for 3d face recognition under varying facial expression. IEEE T Pattern Anal, pages

1695–1700, 2006.

[18] G Chen and T Bui. Invariant fourier-wavelet descriptor for pattern recognition. Pattern

Recogn, 1999.

77

[19] T Chen and D Metaxas. Gibbs prior models, marching cubes, and deformable models:

A hybrid framework for 3d medical image segmentation. MICCAI, pages 703–710,

2003.

[20] Ying-Fan Chen, Po-Lin Kou, Shaw-Jenq Tsai, Ko-Fan Chen, Hsiang-Han Chan, Chung-

Ming Chen, and H Sunny Sun. Computational analysis and refinement of sequence

structure on chromosome 22q11.2 region: application to the development of quantita-

tive real-time pcr assay for clinical diagnosis. Genomics, 87:290–7, 2006.

[21] A Chousta, D Ville, I James, P Foray, C Bisch, P Depardon, R-C Rudigoz, and

L Guibaud. Pericallosal lipoma associated with pai syndrome: prenatal imaging find-

ings. Ultrasound Obst Gyn, 32:708–10, 2008.

[22] W Cohen. Fast effective rule induction. In Proceedings of the Twelfth International

Conference on Machine Learning, 1995.

[23] D Colbry and G Stockman. Canonical face depth map: A robust 3d representation for

face verification. CVPR, pages 1–7, 2007.

[24] Ashwin B Dalal and Shubha R Phadke. Morphometric analysis of face in dysmorphol-

ogy. Comput Methods Programs Biomed, 85:165–172, 2007.

[25] A L David, C Turnbull, R Scott, J Freeman, C M Bilardo, M van Maarle, and L S

Chitty. Diagnosis of apert syndrome in the second-trimester using 2d and 3d ultrasound.

Prenatal diag, 27:629–632, 2007.

[26] P Domingos and M Pazzani. On the optimality of the simple bayesian classifier under

zero-one loss. Mach Learn, 1997.

[27] M Feingold and W H Bossert. Normal values for selected physical parameters: an aid

to syndrome delineation. Birth Defects Orig Artic Ser, 10:1–16, 1974.

[28] L Fernandez, P Lapunzina, D Arjona, I Lopez Pajares, L Garcıa-Guereta, D Elorza,

M Burgueros, M L De Torres, M A Mori, M Palomares, A Garcıa-Alix, and A Delicado.

78

Comparative study of three diagnostic approaches (fish, strs and mlpa) in 30 patients

with 22q11.2 deletion syndrome. Clin Genet, 68:373–8, 2005.

[29] WL Fung, Eva WC Chow, GD Webb, MA Gatzoulis, and AS Bassett. Extracardiac

features predicting 22q11.2 deletion syndrome in adult congenital heart disease. Int J

Cardiol, 2008.

[30] K Golding-Kushner and Robert Shprintzen. Velo-cardio-facial syndrome volume 1.

Plural Pub Inc, 2007.

[31] D Gothelf, F Hoeft, C Hinard, JF Hallmayer, JV Stoecker, SE Antonarakis, MA Morris,

and AL Reiss. Abnormal cortical activation during response inhibition in 22q11. 2

deletion syndrome. Hum Brain Mapp, 28:533–42, 2007.

[32] L Guyot, M Dubuc, J Pujol, O Dutour, and N Philip. Craniofacial anthropometric

analysis in patients with 22 q 11 microdeletion. Am J Med Genet, 100:1–8, 2001.

[33] J Hall, U Froster-Iskenius, and J Allanson. Handbook of normal physical measurements.

Oxford University Press New York, 1989.

[34] M Hall. Correlation-based feature selection for machine learning. cs.waikato.ac.nz,

1999.

[35] Peter Hammond. The use of 3d face shape modelling in dysmorphology. Arch Dis

Child, 92:1120–6, 2007.

[36] Peter Hammond, T Hutton, J Allanson, and L Campbell. 3d analysis of facial mor-

phology. Am J Med Genet, 2004.

[37] Peter Hammond, Tim J Hutton, Judith E Allanson, Bernard F Buxton, Linda E

Campbell, Jill Clayton-Smith, Dian Donnai, Annette Karmiloff-Smith, Kay Metcalfe,

Kieran C Murphy, Michael A Patton, Barbara Pober, Katrina Prescott, Pete Scam-

bler, Adam Shaw, Ann C M Smith, Angela F Stevens, I Karen Temple, Raoul C M

Hennekam, and May Tassabehji. Discriminating power of localized three-dimensional

facial morphology. Am J Hum Genet, 77:999–1010, 2005.

79

[38] Carrie L Heike. Research plan - chromosome 22q11.2 deletion syndrome. 2005.

[39] Carrie L Heike, Michael L Cunningham, AV Hing, E Stuhaug, and JR Starr. Picture

perfect? reliability of craniofacial anthropometry using 3d digital stereophotogramme-

try in individuals with and without 22q11.2 deletion syndrome. J Plast Reconstr Surg,

2009.

[40] Tim J Hutton. Dense surface models of the human face. Biomedical Informatics Unit,

Eastman Dental Institute, University College London, 2004.

[41] ISTI - CNR. Meshlab. Visual Computing Lab.

[42] G Jalali, J Vorstman, A Errami, and R Vijzelaar. Detailed analysis of 22q11. 2 with a

high density mlpa probe set. Hum Mutat, 2007.

[43] G John and P Langley. Estimating continuous distributions in bayesian classifiers.

Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995.

[44] I T. Jolliffe. Principal Component Analysis. Springer verlag, 2002.

[45] Ioannis A Kakadiaris, Georgios Passalis, George Toderici, Mohammed N Murtuza,

Yunliang Lu, Nikos Karampatziakis, and Theoharis Theoharis. Three-dimensional face

recognition in the presence of facial expressions: an annotated deformable model ap-

proach. IEEE T Pattern Anal, 29:640–649, 2007.

[46] S Keerthi, S Shevade, and C Bhattacharyya. Improvements to platt’s smo algorithm

for svm classifier design. Neural Comput, 2001.

[47] MM Kennelly and P Moran. A clinical algorithm of prenatal diagnosis of radial ray

defects with two and three dimensional ultrasound. Prenatal diag, 27:730–737, 2007.

[48] H Kitaura, K Yonetsu, H Kitamori, K Kobayashi, and T Nakamura. Standardization

of 3-d ct measurements for length and angles by matrix transformation in the 3-d

coordinate system. Cleft Palate-Cran J, 37:349–356, 2000.

[49] L Kobrynski and K Sullivan. Velocardiofacial syndrome, digeorge syndrome: the chro-

mosome 22q11. 2 deletion syndromes. Lancet, 2007.

80

[50] E Learned-Miller, Q Lu, A Paisley, and P Trainer. Detecting acromegaly: Screening

for disease with a morphable model. MICCAI, 2006.

[51] Y Lee, I Kim, J Shim, and D Marshall. 3d facial image recognition using a nose vol-

ume and curvature based eigenface. LECTURE NOTES IN COMPUTER SCIENCE,

4077:616, 2006.

[52] M Leordeanu, M Hebert, and R Sukthankar. Beyond local appearance: Category

recognition from pairwise interactions of simple features. CVPR, 2007.

[53] M Levoy, K Pulli, B Curless, and S Rusinkiewicz. The digital michelangelo project: 3d

scanning of large statues. SIGGRAPH, 2000.

[54] Ze-Nian Li and Mark S. Drew. Fundamentals of Multimedia. Pearson Prentice Hall,

2003.

[55] HJ Lin, S Ruiz-Correa, Linda G Shapiro, ML Speltz, Michael L Cunningham, and

Raymond Sze. Predicting neuropsychological development from skull imaging. EMBC,

pages 3450–3455, 2006.

[56] Xiaoming Liu, Peter H Tu, and F Wheeler. Face model fitting on low resolution images.

BMVC, 2006.

[57] H Loos, Dagmar Wieczorek, Rolf P Wurtz, and C von-der Malsburg. Computer-based

recognition of dysmorphic faces. Eur J Hum Genet, 2003.

[58] Thomas R Nelson, Eun K Ji, Jong H Lee, Michael J Bailey, and Dolores H Pretorius.

Stereoscopic evaluation of fetal bony structures. J Ultras Med, 27:15–24, 2008.

[59] B Ommer and JM Buhmann. Learning the compositional nature of visual objects.

CVPR, pages 1–8, 2007.

[60] J Platt. Fast training of support vector machines using sequential minimal optimization.

portal.acm.org, 1999.

[61] J Quinlan. C4. 5: Programs for machine learning. books.google.com, 1993.

81

[62] Joan T Richtsmeier, Valerie B DeLeon, and SR Lele. The promise of geometric mor-

phometrics. Yearb Phys Anthropol, 45:63–91, 2002.

[63] S Romdhani and T Vetter. Estimating 3d shape and texture using pixel intensity,

edges, specular highlights, texture constraints and a prior. CVPR, 2005.

[64] S Romdhani and Thomas Vetter. 3d probabilistic feature point model for object de-

tection and recognition. CVPR, pages 1–8, 2007.

[65] S Ruiz-Correa, Linda G Shapiro, M Meila, G Berson, Michael L Cunningham, and

Raymond Sze. Symbolic signatures for deformable shapes. IEEE T Pattern Anal,

pages 75–90, 2006.

[66] Chafik Samir, Anuj Srivastava, and Mohamed Daoudi. Three-dimensional face recog-

nition using shapes of facial curves. IEEE T Pattern Anal, 28:1858–1863, 2006.

[67] William J Schroeder, Kenneth M Martin, and William E Lorensen. The design and

implementation of an object-oriented toolkit for 3d graphics and visualization. IEEE

Visualization, 96:93—100, 1996.

[68] J Shepanski, T Inc, and R Beach. Fast learning in artificial neural systems: multilayer

perceptrontraining using optimal estimation. Neural Networks, 1988.

[69] R J Shprintzen, R B Goldberg, M L Lewin, E J Sidoti, M D Berkman, R V Argamaso,

and D Young. A new syndrome involving cleft palate, cardiac anomalies, typical facies,

and learning disabilities: velo-cardio-facial syndrome. The Cleft palate journal, 15:56–

62, 1978.

[70] Robert J Shprintzen. Velo-cardio-facial syndrome: 30 years of study. Developmental

disabilities research reviews, 14:3–10, 2008.

[71] A Slavotinek, M Parisi, Carrie L Heike, Anne V Hing, and E Huang. New syndrome

craniofacial defects of blastogenesis: Duplication of pituitary with cleft palate and

orophgaryngeal tumors. Am J Med Genet, 135:13–20, 2005.

[72] L Smith. A tutorial on principal components analysis. Cornell University, 2002.

82

[73] H Stender, M Fiandaca, J Hyldig-Nielsen, and J Coull. Pna for rapid microbiology. J

Microbiol Meth, 2002.

[74] Matthew Turk and Alex Pentland. Eigenfaces for recognition. J Cognitive Neurosci,

3, 1991.

[75] Matthew Turk and Alex Pentland. Face recognition using eigenfaces. CVPR, pages

586–591, 1991.

[76] M Vannier, J Marsh, and J Warren. Three dimensional computer graphics for cranio-

facial surgical planning and evaluation. ACM SIGGRAPH Computer Graphics, 1983.

[77] Inc. Velo-Cardio-Facial Syndrome Educational Foundation. Velo-cardo-facial syn-

drome: Specialist fact sheet. 2007.

[78] Peng Wang, C Kohler, F Barrett, R Gur, and R Verma. Quantifying facial expression

abnormality in schizophrenia by combining 2d and 3d features. CVPR, pages 1–8,

2007.

[79] Sen Wang, Yang Wang, Miao Jin, Xianfeng David Gu, and Dimitris Samaras. Confor-

mal geometry and its applications on 3d shape matching, recognition, and stitching.

IEEE T Pattern Anal, 29:1209–1220, 2007.

[80] T Whitmarsh, RC Veltkamp, M Spagnuolo, S Marini, and FB Haar. Landmark detec-

tion on 3d face scans by facial model registration. Proceedings of the 1st International

Workshop on Shape and Semantics, pages 71–76, 2006.

[81] Ian H. Witten and Eibe Frank. Data mining: Practical machine learning tools and

techniques. 2005.

[82] T Yakut, S Kilic, E Cil, E Yapici, and U Egeli. Fish investigation of 22q11. 2 deletion

in patients with immunodeficiency and/or cardiac . . . . Pediatric Surgery International,

2006.

83

Appendix A

CEPHALOMETRIC LANDMARKS AND MEASURES

Landmark LandmarkName Label Description

glabella g most prominent point in the median sagittal plane between thesupraorbital ridges

nasion n midpoint of the nasofrontal suturesellion se (or s) deepest point of nasofrontal anglepronasale prn most protruded point of nasal tipsubnasale sn junction of lower border of nasal septum and cutaneous portion

of upper liplabiale superius ls midpoint of the vermillion border of the upper lipstomion sto midpoint of labial fissure when lips are closed naturallylabiale inferius li midpoint of the vermillion border of the lower lipsublabiale slab angle of the dip between the lower lip and chingnathion’ gn’ lowest point in the midline on the lower border of the chin, since

this is a bony landmark, the soft tissue location is labeled as ’exocanthion ex outer corner of eye fissure where the eyelids meet (right and left)endocanthion en inner corner of eye fissure where the eyelids meet (right and left)alar curvature ac measured at the widest point of the alar curvature (right and left)alare al most lateral point of nasal ala (right and left)subalare sbal point on the lower margin of the base of the nasal ala where the

ala disappears into the upper lip skin (right and left)subnsasale’ sn’ located at the thinnest point of the nasal septum (right and left)crista philtri cph point on the crest of the philtrum just above the vermillion border

(right and left)cheilion ch outer corner of mouth where the outer edges of of the upper and

lower vermillions meet (right and left)tragion t located at notch above tragus of the ear where the upper edge of(labeled as tragus) cartilage disappears into skin of face (right and left)preaurale pra point on the ear insertion line opposite postaurale (right and left)postaurale pa most posterior point on the free margin of ear (helix)(right and left)superaurale sa highest point of the free margin of the ear (right and left)subarale sba lowest point of the earlobe (right and left)

84

Appendix B

CLASSIFIER DESCRIPTIONS

All classification was done using the WEKA classifier suite[81]. The classifiers used and non-

default options selected are briefly described below. WEKA classifier name in parenthesis

when different from that used in this paper.

JRip implements a propositional rule learner, Repeated Incremental Pruning to Produce

Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized

version of IREP [22].

J48 generates a pruned or unpruned C4.5 decision tree [61].

NN k = 1 (IB1) is a nearest-neighbor classifier that uses normalized Euclidean distance

[1].

NN k = 3 (IBk ’-K 3) is a K-nearest-neighbors classifier, with K set to 3 [1].

NN 9,3 (MultilayerPerceptron 9, 3) is a neural network with uses backpropagation to

train, with two hidden layers with sizes set to 9 and 3 [68].

SMO (SVM) implements John Platt’s sequential minimal optimization algorithm for

training a support vector classifier [60, 46]. The SVM classifier was used at the default

setting (complexity parameter = 1, polynomial kernel exponent = 1) , as well as variations

of both the complexity parameter and polynomial kernel exponent from 2-4. An RBF kernel

was also used.

Naive Bayes assumes independence of each attribute, modeling each attribute as a nor-

mal distribution over the range of the attribute values [43].