Post on 15-Jun-2020
Reconstructing the Evolutionary History of MCPH genes
and its Implications in Human Brain Size and
Intelligence
By
Nashaiman Pervaiz
National Center for Bioinformatics
Faculty of Biological Sciences
Quaid-i-Azam University
Islamabad, Pakistan
2019
Reconstructing the Evolutionary History of MCPH genes
and its Implications in Human Brain Size and
Intelligence
By
Nashaiman Pervaiz
A thesis submitted in the partial fulfillment of
the requirements for the degree of
DOCTOR OF PHILOSOPHY
IN
BIOINFORMATICS
National Center for Bioinformatics
Faculty of Biological Sciences
Quaid-i-Azam University
Islamabad, Pakistan
2019
Acknowledgement
Acknowledgements
Millionth gratitude to Allah Almighty, the most beneficent and the most merciful, who
bestowed me with the potential to seek the knowledge and to explore some of the
many aspects of his creation. Countless blessings and clemencies of Allah may be
upon our Holy Prophet Hazrat Muhammad (P.B.U.H), the fortune of knowledge, who
took the humanity out of the abyss of ignorance and elevated it to the zenith of
consciousness.
With deep regards and profound respect, I owe this opportunity to express my deep
sense of gratitude and indebtedness to my supervisor Dr. Amir Ali Abbasi for his
inspiring guidance, encouragement, and valuable suggestions throughout the research
work. Without his continuous support and assistance it would not have been possible
to finish this dissertation. I am extremely grateful to him for giving me his precious
time. I am also thankful to all faculty members of National Center for Bioinformatics
for their sincere and kind attitude, guidance and cooperation during the period of my
study at this university.
I would like my sincere thanks to my all colleagues at Comparative and Evolutionary
Genomics (CEG) lab, in particular Rabail Zehra, Shahid Ali, Dr. Rashid Minhas, Irfan
Hussain and Fatima Batool for their conducive discussion, utmost cooperation,
valuable memory and providing a peaceful environment, during my stay in CEG lab. I
shall be failing in my duty if I do not put across my thanks and gratitude to my junior
lab fellow Anabia Sohail, Irum Javaid Siddiqui and Noor us Sehar for their
cooperation and memorable company. It was a pleasure and honour to work with
Shahid Ali, Anabia Sohail, Irum Javaid Siddiqui and Dr. Rashid Minhas.
I should like to thanks entire staff of the National Center for Bioinformatics, in
particular Mr. Talib Hussain, Mr. Yasir Abbasi, Mr. Ali, Mr. M.Naseer, Mr. Masood
and Mr. Naseer Ahmed Raja for their kind cooperation.
I am also thankful to my roommates Anam Murad and Waheeda Rana for being the
most supportive and caring and also for their refreshing discussion on various topics.
This journey would not have been possible without the support of my family. I am
especially indebted to two most precious and substantially unique hominins of this
Acknowledgement
universe, my father Rana Pervaiz Akhter Khan and my mother Anjum pervaiz who
taught and supported me throughout my education and giving me liberty to choose
what I desired. I have no words to express my immense affection for my parents but I
salute you all for the selfless love, care, pain and sacrifice you did to shape my life. I
would like to thank my mentor, my adorable brother Adnan Pervaiz who inspired me
in my childhood and till date for his hard work, dedication and optimistic approach to
accomplish any project. He always reinforced me whenever I am down and pushes me
to fly on the sky. Many thanks to my best friend, my delectable brother Rana Qaisar
Pervaiz who always stands with me in every decision I took and unconditionally
supported me. He believed in me more than I believe in myself. I am greatly indebted
to my elder sister Shabnam Pervaiz who has kept me (mostly) sane through her
critiques. Her encouragement, support and unwavering faith in my ability to muddle
through have been a great help. My most fervent thanks to two homininae my beloved
younger sisters, Hina Pervaiz and Nida Pervaiz, who through long phone conversation
at a crucial times, helped me find the strength to continue my work and helped me
come to the decision, once and for all, that obtaining this degree would not be the first
challenge in my life that I would not rise to meet. No words can thank them enough for
their contribution in my life.
In the end I want to present my unbending thanks to all those hands who prayed for my
betterment and serenity.
Nashaiman Pervaiz
Contents
Contents
List of Figures ............................................................................................................. i
List of Tables.............................................................................................................. ii
List of Abbreviations ................................................................................................. iv
Summary .................................................................................................................. vii
Introduction ................................................................................................................ 1
1.1 Human Brain Evolution .................................................................................... 2
1.1.1 Human brain regions evolved during Pliocene-Pleistocene epochs ............. 4
1.2 Autosomal recessive primary microcephaly (MCPH) ........................................ 5
1.3 Primary microcephaly genes and their functions................................................ 7
1.3.1 Microcephalin ............................................................................................ 8
1.3.2 WD repeat domain 62 (WDR62) ................................................................ 8
1.3.3 Cyclin-dependent kinase 5 regulatory associated protein 2 (CDK5RAP2) .. 9
1.3.4 Kinetochore scaffold 1 (KNL1) ................................................................ 11
1.3.5 Abnormal spindle-like microcephaly associated gene (ASPM) ................. 12
1.3.6 Centrosomal associated protein J (CENPJ) ............................................... 13
1.3.7 SCL/TAL1 interrupting locus (STIL) ....................................................... 14
1.3.8 Centrosomal protein 135 (CEP135) .......................................................... 15
1.3.9 Centrosomal protein 152 (CEP152) .......................................................... 16
1.3.10 Zinc finger protein 335 (ZNF335) .......................................................... 16
1.3.11 Polyhomeotic homolog 1(PHC1) ............................................................ 17
1.3.12 Cyclin-dependent kinase 6 (CDK6) ........................................................ 18
1.3.13 SAS-6 centriolar assembly protein (SASS6) ........................................... 18
1.3.14 Major facilitator superfamily domain containing 2A (MFSD2A) ............ 19
1.3.15 Citron rho-interacting serine/threonine kinase (CIT) ............................... 20
1.3.16 Kinesin family member 14 (KIF14) ........................................................ 21
1.4 The cost of human brain size enlargement ....................................................... 22
1.5 Parkinson‘s disease ......................................................................................... 22
1.5.1 Alpha synuclein ....................................................................................... 23
1.6 Archaic human genomes ................................................................................. 24
1.7 Aims & approach of study............................................................................... 24
Materials and Methods ............................................................................................. 26
2.1 Dataset for genes linked with autosomal recessive primary microcephaly ....... 26
2.2 Sequence Alignment ....................................................................................... 26
2.3 Phylogenetic tree reconstruction methods........................................................ 27
2.3.1 Phylogenetic analysis by neighbor Joining (NJ) method ........................... 27
Contents
2.3.2 Phylogenetic analysis by Maximum likelihood (ML) method ................... 28
2.4 Ancestral state reconstruction .......................................................................... 28
2.5 Analysis of molecular macroevolution ............................................................ 29
2.5.1 Estimation of selective pressure on MCPH protein coding genes .............. 29
2.5.2 Codon substitutions site models................................................................ 30
2.5.3 Codon substitutions Branch-site model ..................................................... 31
2.5.4 Clade model C (CmC) analyses ................................................................ 32
2.6 Statiscal Analysis ............................................................................................ 33
2.7 Detecting selection at microevolutionary level ................................................ 33
2.7.1 Sequence acquisiotn of human population data ......................................... 33
2.7.2 Frequency spectrum based method for natural selection ........................... 34
2.8 Molecular evolution of synuclein genes .......................................................... 34
2.8.1 Sequence and structure analysis of synuclein genes .................................. 34
2.8.2 Estimation of functional divergence among synuclein genes..................... 35
2.8.3 Identification of coevolutionary relationship among residues within gene. 35
Results ...................................................................................................................... 37
3.1 Identification of candidate genes ..................................................................... 37
3.2 WD repeat domain 62 (WDR62) ..................................................................... 37
3.2.1 Evolutionary history of MCPH2 gene WDR62 ......................................... 37
3.2.2 Molecular evolution of WDR62 in mammals ............................................ 39
3.2.3 Human polymorphisms and signatures of selection................................... 40
3.2.4 SWAKK analysis of WDR62 ................................................................... 40
3.2.5 Comparative analysis of WDR62 with archaic humans and modern human
populations ....................................................................................................... 42
3.3 SCL/TAL1 interrupting locus (STIL) .............................................................. 44
3.3.1 Evolutionary history of STIL .................................................................... 44
3.3.2 Molecular Evolution of STIL in Mammals by Site Models ....................... 46
3.3.3 Episodic selection at various stages of primate evolution in STIL locus .... 47
3.3.4 Divergent selection pressure between clades of mammals for STIL locus . 48
3.3.5 Human polymorphisms and signatures of selection................................... 49
3.3.6 SWAKK analysis of STIL ........................................................................ 50
3.3.7 Comparative analysis of STIL with archaic humans and modern human
populations ....................................................................................................... 52
3.4 Centrosomal Protein 135 (CEP135)................................................................. 53
3.4.1 Evolutionary history of CEP135 ............................................................... 53
3.4.2 Estimation of pervasive signals of positive selection in CEP135 during
placental mammals ........................................................................................... 55
3.4.3 Signature of positive selection by branch-site model................................. 55
Contents
3.4.4 Divergent selective pressure across CEP135 mammalian phylogeny ........ 57
3.5 Zinc finger protein 335 (ZNF335) ................................................................... 58
3.5.1 Evolutionary history of ZNF335 ............................................................... 58
3.5.2 Molecular evolution of ZNF335 in mammals by site models .................... 58
3.5.3 Signatures of episodic positive selection at various evolutionary stages from
ancestral primate to human terminal branch ...................................................... 61
3.5.4 Divergent selection pressure between different partitions of mammalian
phylogeny ......................................................................................................... 61
3.6 Polyhomeotic homolog 1 (PHC1).................................................................... 62
3.6.1 Phylogenetic analysis of PHC1 ................................................................. 62
3.6.2 Molecular evolution of PHC1 by site models ............................................ 64
3.6.3 Episodic Selection at PHC1 mammalian phylogeny ................................. 64
3.6.4 Divergent selective constraints across PHC1 mammalian phylogeny ........ 66
3.7 Cyclin Dependent Kinase 6 (CDK6) ............................................................... 67
3.7.1 Phylogenetic analysis of CDK6 ................................................................ 67
3.7.2 Molecular evolution of CDK6 by site model ............................................ 67
3.7.3 Episodic positive selection on CDK6 phylogeny ...................................... 68
3.7.4 Divergent selective constraint across CDK6 mammalian phylogeny ......... 71
3.8 SAS-6 centriolar assembly protein (SASS6) .................................................... 71
3.8.1 Evolutionary history of SASS6 ................................................................. 71
3.8.2 Molecular Evolution of SASS6 in mammals by site models ..................... 73
3.8.3 Signature of episodic positive selection at SASS6 mammalian phylogeny 74
3.8.4 Divergent selective constraints between partitions of SASS6 mammalian
phylogeny ......................................................................................................... 75
3.9 Major Facilitator Superfamily Domain Containing 2A (MFSD2A) ................. 76
3.9.1 Phylogenetic analysis of MFSD2A ........................................................... 76
3.9.2 Pervasive adaptive evolution of MFSD2A in placental mammals ............. 78
3.9.3 Episodic adaptive evolution across the MFSD2A mammalian Phylogeny . 78
3.9.4 Divergent selective constraint across the MFSD2A mammalian phylogeny
......................................................................................................................... 80
3.10 Citron rho-interacting serine/threonine kinase (CIT)...................................... 81
3.10.1 Evolutionary history of CIT gene ........................................................... 81
3.10.2 Molecular evolution of CIT across eutherian .......................................... 81
3.10.3 Molecular evolution of CIT protein coding gene by branch-site model ... 83
3.10.4 Divergent selective pressure across CIT mammalian phylogeny ............. 84
3.11 Kinesin Family Member 14 (KIF14) ............................................................. 85
3.11.1 Evolutionary history of KIF14 ................................................................ 85
3.11.2 Pervasive adaptive evolution in KIF14 across eutherian mammals.......... 85
Contents
3.11.3 Episodic positive selection across KIF14 mammalian phylogeny............ 87
3.11.4 Site-specific functional divergence among the partitions of KIF14
mammalian phylogeny ...................................................................................... 88
3.12 Synuclein gene family ................................................................................... 89
3.12.1 Evolutionary history of synuclein family ................................................ 89
3.12.2 Sequence evolution and Coevolutionary relationship .............................. 91
3.12.3 Structural evolution of α synuclein ......................................................... 95
3.12.4 Divergent selective constraint among synuclein genes ............................ 98
Discussion ...............................................................................................................101
Conclusion and future prospects ..............................................................................113
References ...............................................................................................................114
List of Figures
i
List of Figures
Figure 1. 1: Comparative brain size of extant primates. ............................................... 3
Figure 1. 2: Endocrinal differences in genus homo during Pliocene-Pleistocene. ......... 5
Figure 1. 3: Comparative view of normal and microcephalic brain. ............................. 6
Figure 1. 4: MCPH genes from different pathway to cause microcephaly.................... 7
Figure 1. 5: The role of WDR62 during neocorticogenesis. ....................................... 10
Figure 1. 6: Circular illustration of Parkinson‘s disease associated genes on human
chromosomes. ........................................................................................................... 23
Figure 2. 1: Phylogenetic tree of 48 placental mammal genomes. ............................. 30
Figure 3. 1: Evolutionary history of MCPH2 gene WDR62....................................... 38 Figure 3. 2: Estimation of WDR62 sequence evolution in therian. .......................... 39
Figure 3. 3: SWAKK plot of human and chimpanzee WDR62. ............................... 41 Figure 3. 4: Comparative analysis of WDR62 among human populations. ............... 43
Figure 3. 5: Phylogenetic analysis of STIL gene. ...................................................... 45 Figure 3. 6: Sliding window analysis of STIL. .......................................................... 51
Figure 3. 7: Evolutionary history of MCPH8 gene CEP135 ...................................... 54 Figure 3. 8: Phylogenetic tree of MCPH10 gene ZNF335 using NJ approach ............ 59
Figure 3. 9: Phylogenetic tree of human PHC1 and its putative paralogs. .................. 63 Figure 3. 10: Evolutionary history of MCPH12 gene CDK6. .................................... 69
Figure 3. 11: Evolutionary history of human SASS6 gene. ........................................ 72 Figure 3. 12: Phylogenetic tree of MCPH15 gene MFSD2A gene ............................. 77
Figure 3. 13: Evolutionary history of human CIT gene.............................................. 82 Figure 3. 14: Evolutionary history of human KIF14 gene. ......................................... 86
Figure 3. 15: Evolutionary history of synuclein family. ............................................. 90 Figure 3. 16: Sequence alignment of human synuclein paralogs. ............................... 93
Figure 3. 17: Coevolutionary relationship within synuclein genes. ............................ 94 Figure 3. 18: Structural deviation among synuclein paralogs. .................................... 96
Figure 3. 19: Structural evolution of α synuclein protein. since the split from last
common sarcopterygian ancestor. ............................................................................. 97
Figure 3. 20: Structural analysis of mutant models of human α synuclein. ................. 97
Figure 4. 1: Human neocortical cell types. ...............................................................103
Figure 4. 2: Schematic overview of neurodegenerative (a, b,e) and neuroprotective
(c,d) role of alpha synuclein. ...................................................................................109
List of Tables
ii
List of Tables
Table 1. 1: Comparative brain size of extinct and extant primates. .............................. 4
Table 3. 1: Amino acids substitutions in human and chimpanzee lineage since the
divergence from hominini ancestor. .......................................................................... 42 Table 3. 2: Parameter estimation and LRT for Mammals STIL. ................................ 47
Table 3. 3: Branch-site analysis of STIL. .................................................................. 48 Table 3. 4: Divergent selection constraint parameters estimation and likelihood scores
for STIL. .................................................................................................................. 49 Table 3. 5: Tests for departure from neutrality through population‘s variation data
(1000 genome).......................................................................................................... 50 Table 3. 6: Human and chimpanzee specific substitutions in STIL after the divergence
from hominini ancestor. ............................................................................................ 52 Table 3. 7: Selective pressure estimation and LRT for Mammals CEP135. ............... 56
Table 3. 8: Branch-site analysis of CEP135. ............................................................. 57 Table 3. 9: Divergent selection constraint parameters estimation and likelihood scores
for CEP135. .............................................................................................................. 57 Table 3. 10: Parameter estimation and LRT for Mammals ZNF335. ......................... 60
Table 3. 11: Branch-site analysis of ZNF335. .......................................................... 61 Table 3. 12: Divergent selection constraint parameters estimation and likelihood scores
for ZNF335. ............................................................................................................. 62 Table 3. 13: Parameter estimation and LRT for Mammals PHC1. ............................ 65
Table 3. 14: Branch-site analysis of PHC1. ............................................................... 66 Table 3. 15: Divergent selection constraint parameters estimation and likelihood
scores for PHC1........................................................................................................ 67 Table 3. 16: Parameter estimation and LRT for Mammals CDK6. ............................ 70
Table 3. 17: Branch-site analysis of CDK6. .............................................................. 70 Table 3. 18: Divergent selection constraint parameters estimation and likelihood scores
for CDK6. ................................................................................................................ 71 Table 3. 19: Parameter estimation and LRT for Mammals SASS6. ........................... 74
Table 3. 20: Branch-site analysis of SASS6. ............................................................. 75 Table 3. 21: Divergent selection constraint parameters estimation and likelihood scores
for SASS6. ............................................................................................................... 76 Table 3. 22: Parameter estimation and LRT for Mammals MFSD2A. ....................... 79
Table 3. 23: Branch-site analysis of MFSD2A. ........................................................ 80 Table 3. 24: Divergent selection constraint parameters estimation and likelihood scores
for MFSD2A. ........................................................................................................... 80 Table 3. 25: Parameter estimation and LRT for Mammals CIT. ............................... 83
Table 3. 26: Branch-site analysis of CIT. .................................................................. 84 Table 3. 27: Divergent selection constraint parameters estimation and likelihood
scores for CIT. .......................................................................................................... 85 Table 3. 28: Parameter estimation and LRT for Mammals KIF14. ............................ 87
Table 3. 29: Branch-site analysis of KIF14. .............................................................. 88 Table 3. 30: Divergent selection constraint parameters estimation and likelihood scores
for KIF14. ................................................................................................................ 89 Table 3. 31: Sites under negative selection constraint in α synuclein among vertebrates
alignment with SLAC analysis. ................................................................................. 95 Table 3. 32: Parameter estimation and likehood score for synuclein family to detect
functional divergence. .............................................................................................. 99
List of Tables
iii
Table 3. 33: Statistical significance of functional divergence among synuclein family.
................................................................................................................................100
Table 3. 34: Type 1 functional divergence of synuclein family.................................100
Table 4. 1: Chimpanzee, hominin and human specific amino acids replacements in
MCPH genes since the divergence from hominini ancestor. .....................................106
List of Abbreviations
iv
List of Abbreviations
MCPH Autosomal recessive Primary microcephaly
HC Head circumference
OFC Occipitofrontal circumference
SD Standard deviation
MRI Magnetic resonance imaging
MCPH1 Microcephalin
WDR62 WD repeat domain 62
CDK5RAP2 Cyclin-dependent kinase 5 regulatory associated protein 2
KNL1 Kinetochore scaffold 1
CASC5 Cancer susceptibility candidate 5
ASPM Abnormal spindle-like microcephaly associated gene
CENPJ Centrosomal associated protein J
STIL SCL/TAL1 interrupting locus
CEP135 Centrosomal protein 135
CEP152 Centrosomal protein 152
ZNF335 Zinc finger protein 335
PHC1 Polyhomeotic homolog 1
CDK6 Cyclin-dependent kinase 6
SASS6 SAS-6 centriolar assembly protein
MFSD2A Major facilitator superfamily domain containing 2A
CIT Citron rho-interacting serine/threonine kinase
KIF14 Kinesin family member 14
CINP CDK2 interacting protein
asl Asterless
IQ Isoleucine glutamine
DHA Docosahexanoic acid
BBB Blood brain barrier
BRCT Breast cancer1 carboxyl-terminal
RT-PCR Reverse transcriptase polymerase chain reaction
AD Alzheimer‘s disease
List of Abbreviations
v
PD Parkinson‘s disease
ALS Amyotrophic lateral sclerosis
MS Multiple sclerosis
LBD Lewy bodies‘ disease
MSA Multiple System Atrophy
NCBI National Center for Biotechnology Information
NJ Neighbor Joining
ML Maximum likelihood
WAG Whelan And Goldman
JTT Jones, Taylor, and Thornton
ASR Ancestral sequence reconstructions
LRT Likelihood ratio test
BEB Bayes Empirical Bayes
NEB Naïve empirical Bayes
CmC Clade model C
GY94 Goldman and Yang 94
SLAC Single likelihood ancestor counting
DIVERGE DetectIng Variability in Evolutionary Rates among Genes
MISTIC Mutual Information Server To Infer Coevolution
MI Mutual information
cMI Cumulative mutual information
pMI Proximity mutual information
CHB Han Chinese in Beijing, China
CHS Han Chinese south China
JPT Japanese in Tokyo, Japan
MXL People with Mexican ancestry in Los Angeles
PUR Puetro Ricans in Puetro Rico
CLM Colombians in Medellin, Colombia
IBS Iberian population in Spain
TSI Toscani in Italia
CEU Uttah residents with ancestry from northern and western Europe
GBR British from England and Scotland UK
List of Abbreviations
vi
FIN Finnish in Finland
ASW People with africans ancestry in southwest united states
LWK Luhya in webuyo, Kenya
YRI Yoruba in Ibadan Nigeria
SWAKK Sliding window analysis of Ka/Ks
CDS Coding sequences
SNCA α synuclein
SNCB β synuclein
SNCG γ synuclein
BP Basal progenitor
bRG Basal radial galia
VZ Ventricular zone
iSVZ Inner subventricular zone
oSVZ Outer ventricular zone
IZ Intermediate zone
CP Cortical plate
SRGAP2 SLIT-ROBO Rho GTPase activating protein 2
HARE5 Human accelerated region 5
ADCYAP1 Adenylate-cyclase-activating polypeptide 1
ROS Reactive oxygen specie
MPP+ 1-methyl-4-phenylpyridinium
PKCδ Protein Kinase C delta
HAT Histone acytyltransferase
Summary
vii
Summary
Background: The enlarged and globular brain is the most distinctive anatomical
feature in human evolution that set us apart from our extinct and extant modern human
relatives. In a petite evolutionary time the magnitude of human brain is three fold
expanded as compared to our closest living kin chimpanzee. Major episodes of human
brain size expansion occurred during the upper Pliocene to early Pleistocene era and
yet again in middle Pleistocene epochs. The exact genetic basis of these evolutionary
changes that bifurcate the highly cognitive human brain from supposedly lesser
cognitive nonhuman hominids brain still remain enigmatic. However, it is presumed
that complex and larger human brain emerged by essential changes in genes and non-
coding regulatory elements. One approach to comprehending the evolution of human
brain is to scrutinize the evolution of genes indispensable for normal brain
development. Although brain development is genetically complex process, genes
associated with early brain development are the best candidate genes in order to
understand the mechanism involved in the evolutionary expansion of human brain
size. Primary microcephaly genes were selected as their key role in early brain
development and mutations in these genes cause severe reduction in cerebral cortex
size that is most notably expanded during recent human history. The brain size of
microcephalic patients is similar with the size of Pan troglodyte brain and the very
early hominid the gracile australopithecine Australopithecus afarensis (average brain
size of Australopithecines is 450 cm3), suggesting that primary microcephaly genes
likely to have been evolutionary targets in the enlargement of human brain evolution.
In this study, the implications of primary microcephaly genes in the evolutionary
enlargement of human brain size has been explored by executing a comprehensive
evolutionary analysis on ten newly identified microcephaly genes (WDR62, STIL,
CEP135, ZNF335, PHC1, CDK6, SASS6, MFSD2A, CIT, and KIF14) across 48
euthrian species. Subsequently also try to explored what are the mechanisms that
associate the evolutionary expansion of human brain size with Parkinson‘s disease by
studying the molecular evolution of Parkinson‘s disorder linked alpha synuclein gene.
Results: By employing codon substitutions site models based on maximum
likelihood method, signatures of pervasive positive selection were identified in five
MCPH genes (KIF14, ZNF335, SASS6, CIT and KIF14). For primates, positive
Summary
viii
selection was found solely in KIF14. Whereas, in nonprimate placental mammals four
genes STIL, ZNF335, SASS6, and CIT have exhibit the signature of adaptive
evolution. However, pervasive positive selection has acted in STIL, ZNF335 and
KIF14 for placental mammals. This study also identified acceleration in the coding
sequences of WDR62 and STIL for human terminal branch both by codon
substitutions and frequency based methods. However, acceleration in STIL gene is not
significant by codon substitutions based method. Furthermore, the signatures of
divergent selection constraints between clades are significant for only two genes STIL
and SASS6.
In the present study, in an endeavor to elucidate whether and why Parkinson‘s disorder
affects solely Homo sapiens. Evolutionary study of Parkinson‘s disease associated α
synuclein gene revealed that α synuclein gene has been originated specifically at the
root of jawed vertebrates and no evolutionary substitutions was accumulated in the α
synuclein amino acid sequence during the last 35 million years of evolution.
Furthermore, structural dynamics enlighten that during the course of vertebrate
evolutionary history, region of amino terminal domain (32 to 58 amino acids) of α
synuclein was continuously evolved at structural level, in spite of high sequence
conservation at sequence level.
Conclusion: This study concluded that evolutionary enlargement of human brain
size during Pliocene-Pleistocene period might have not associated to the human
MCPH coding sequences exclusively. The joint human specific changes in coding and
noncoding regions of human microcephaly loci might have been conducive to the
modification in the function of MCPH genes in humans that likely to be responsible
for the human brain evolution during the last two million years.
Current study on evolution of α synuclein gene provide that region encompassing 32-
58 amino acid residues of amino terminal domain is critical for normal cellular
function and Parkinson‘s disease pathogenesis.
Chapter 1 Introduction
1
Introduction
Homo sapiens is substantially different from other non-human primates by its unique
morphological, anatomical, physiological and behavioral features, including relative
brain size, bipedalism, craniofacial attributes, small canine teeth, dimensions of pelvis,
vocal organs, hairless skin, opposable elongated thumb, shortened fingers, language
and advanced tool making capabilities (Carroll, 2003; Gagneux & Varki, 2001). These
unique human specific phenotypic traits are emerged during the last 6 million years of
evolution after its divergence from Pan lineage. The evolution of modern human
unique eccentrics was not a linear, additive process, and knowledge about pattern,
magnitude and rate of change can only be studied through comparative analyses not
only from extant and extinct nonhuman primates but also from extinct hominid that
exist between the period of last 5 million years. Extant and extinct primate species
together can provide the answer of these questions. First, what distinguishes hominid
from hominidae? Second, what distinguishes hominin from hominids? Third, what
distinguishes anatomically modern humans from hominin? And last, ideas about the
precise timing of apparition of unique human eccentrics. This view consolidated by the
discovery of oldest and primitive extinct hominid Sahelanthropus tchadensis that
exhibits chimpanzee-sized brain but later hominid like dental, basicranium and facial
features, indicating that bipedalism arose soon since the divergence of Homo sapiens
from Pan (Pan troglodytes and Pan paniscus) lineage (Brunet et al., 2002; Zollikofer
et al., 2005). The decoding of extinct hominin and extant nonhuman primates genomes
were provided an opportunity for evolutionary biologist to precisely understand how
and what genetic underpinnings ensued in the evolution of human distinctive oddities.
The capability to pinpoint those genomic changes that have carved the Homo sapiens
unique eccentrics be contingent on the association of numbers of genes to human
specific phenotypes and as well on the detection the modern human specific changes
within coding and noncoding sequences. In genetic perspective, human eccentricities
arise by concomitant changes in the protein coding and conserved non-coding
sequences; however, the precise genetic underpinnings of these eccentricities still
remains enigmatic (Olson & Varki, 2003; Vallender, Mekel-Bobrov, & Lahn, 2008).
Chapter 1 Introduction
2
1.1 Human Brain Evolution
Of paramount, defining attribute of human evolution is the structurally complex brain
that differs from nonhuman primates in size, shape, organization and functions. Homo
sapiens brain is three fold bigger than our closest extant relatives the chimpanzee and
approximately 6-8 folds bigger than that of extant old world monkeys and platyrrhini
(Figure 1.1 & Table 1.1) (Semendeferi & Damasio, 2000; Stephan, Frahm, & Baron,
1981). This expansion is heterogeneous across brain regions, the most notable
expansion occurred in neocortex that has been directly related to the emergence of
higher cognitive capabilities, such as language, intelligence and social learning
(Geschwind & Rakic, 2013). Expansion is not restricted to grey matter; upsurge in
white matter volume is also contributed toward the uniqueness of modern human brain
(Schoenemann, Sheehan, & Glotzer, 2005). Brain size expansion over last 6 million
years is not arising at constant rate; it is static or slow at some time and rapid in some
other evolutionary period. Until the middle-Pliocene epoch approximately between 3-
2.5 million years, all early hominids have nonhuman hominidae like brain size such as
Australopithecus afarensis exhibits 384 cm3 brain volume and populated earth
between 4 and 2.8 million years ago (McHenry, 1994). However, Homo erectus
appears on earth approximately 1.9 million years ago and had an average brain volume
of 950 cm3
(Rightmire, 2004). Between 1.9-1 million years of age brain did not change
in size significantly. After that accelerated brain expansion occurred in middle-
Pleistocene species such as Homo heidelbergensis considered to be another ancestor of
archaic hominin and had average brain size three times larger than Pan (Table 1.1)
(Rightmire, 2013). So, the significant rapid expansion in brain size occurred in early
Pleistocene and yet again in middle-Pleistocene epoch. Homo neanderthalensis had
greater brain size approximately 1512 cm3 as compared to Homo sapiens whose
average brain size is 1355 cm3 (Table 1.1).
Globular brain shape of Homo sapiens (modern human) distinct us from our closest
extinct archaic hominins the Homo neanderthalensis indicating that globularity
emerged after the divergence of anatomically modern humans from Neandertals and
Denisovans approximately 500,000 years ago (Gunz et al., 2012; Neubauer, Hublin, &
Gunz, 2018; Prüfer et al., 2014). Fossils evidence revealed that Homo sapiens from
130,000 years ago have more globular shape than those exist in 200,000 years ago.
However, it is evident that Homo sapiens Brain shape evolved gradually and
Chapter 1 Introduction
3
directionally within Homo sapiens in the upper Pleistocene period between 100,000-
35,000 years ago (Neubauer, et al., 2018).
Figure 1. 1: Comparative brain size of extant primates.
Blue lines highlight the superior temporal sulcus. Species within red circle are belongs to apes, while in
purple and cyan blue circles are old world monkeys and platyrrhini respectively. Adapted from [(Bryant & Preuss, 2018)].
Chapter 1 Introduction
4
Table 1. 1: Comparative brain size of extinct and extant primates.
Primate Species Brain size (cm3) Estimated age
(Million years)
Geological epoch
Homo sapiens 1355 0-0.2 Middle Pleistocene
Homo neanderthalensis 1512 0.03-0.550/750 Middle Pleistocene
Homo heidelbergensis 1198 0.3-1 Middle Pleistocene
Homo erectus 1016 0.2-1.9 Upper Pliocene
Homo ergaster 854 1.5-1.9 Upper Pliocene
Homo rudolfensis 752 1.8-2.4 Pliocene
Homo habilis 552 1.6-2.3 Pliocene
Panthropus boisei 510 1.2-2.2 Pliocene
Australopithecus africanus 457 2.6-3 Pliocene
Australopithecus afarensis 384 3-3.6 Pliocene
Sahelanthropus tchadensis 370 ~ 6-7 Upper Miocene
Pan troglodytes 336 ~ 0-7 Upper Miocene
Pan paniscus 311 ~ 6-7 Upper Miocene
Gorilla gorilla 425 ~ 0-7/9 Miocene
Pongo abelli 445 ~ 0-14 Miocene
Old word monkeys 33-205 ~ 0-25 Upper Oligocene
Platyrrhini 4-123 0-35/40 Upper Eocene
Taken from (Carroll, 2003; Semendeferi & Damasio, 2000; Vallender, et al., 2008; Zollikofer, et al.,
2005)
1.1.1 Human brain regions evolved during Pliocene-Pleistocene epochs
Cerebral cortex surface area in human is increased three fold during the last 5 million
years since the divergence from chimpanzee but majority of this enlargement is
initiated in upper Pleistocene. Prefrontal, temporal and parietal lobe thought to be
involved in higher cognitive capabilities is lager in humans as compared to nonhuman
primates and enlargement in these areas are generally associated with cultural and
behavioral complexity. Prefrontal volume expanded in human and occupied
disproportionately large amount of not only grey matter but also white matter as
compared to nonhuman primates (Donahue, Glasser, Preuss, Rilling, & Van Essen,
2018). Furthermore, relative to nonhuman hominids, orbitofrontal cortex (region of
prefrontal cortex) is explicitly wider in anatomically modern human. Although certain
widening of parietal regions volume observed in Neandertals, but the generalized
expansion of entire parietal surface is a unique characteristic of anatomically modern
human (Bruner, 2010). Furthermore, upper parietal surface bulging is also specific to
modern humans. Parietal lobe has involved in speech decoding, numerical processing,
and sensory information processing, so modern human specific parietal bulging and
expansion might have some implications in specie specific higher cognitive
specialization (Figure 1.2). Anatomically modern human have relatively larger overall
Chapter 1 Introduction
5
volume, white matter volume and apomorphic location of temporal lobe (Bastir et al.,
2011). Homo sapiens had significantly larger cerebellar hemispheres as compared to
Homo neanderthalensis, prominently on the right side. Larger cerebellar hemispheres
have been known to implicate in executive functions including language processing,
working memory capacity and social complexity (Kochiyama et al., 2018). During the
evolution of Homo sapiens brain, enlargement in the magnitude of brain is not the
merely evolutionary change; alterations also occurred at microstructural and
organization, connection level including higher order organization of cortex, cellular
and laminar organization, long distance cortical connection (Todd M Preuss, 2011).
Modern human brain enlargement accompanied extensive modification and increased
number of connections between the subregions of brain.
Figure 1. 2: Endocrinal differences in genus homo during Pliocene-Pleistocene.
Both modern human and Neandertals show widening in frontal and lateral parieto-temporal lobes as
compared to other hominids. While entire parietal enlargement and parietal surface bulging found only
in modern humans. [Adapted from (Bruner, 2010)]
1.2 Autosomal recessive primary microcephaly (MCPH)
The microcephaly is derived from two Greek words micro from ―mikros‖ (small), and
cephaly from ―Kephale‖ (head). The prominent phenotype of the humans suffering
from microcephaly is small head size (Figure 1.3). Autosomal recessive primary
microcephaly (MCPH) is a rare congenital brain developmental disorder characterized
by reduced head circumference (HC) or occipitofrontal circumference (OFC) that is
Chapter 1 Introduction
6
lesser than three standard deviation (SD) at birth with mild to moderate mental
retardation in the absence of any other neuroanatomical etiology (Woods, Bond, &
Enard, 2005). The small occipitofrontal circumference is a consequence of the
reduction in the size of cerebral cortex which leads to a simplified gyral patterning
without affecting the cerebral cortex thickness. Magnetic resonance imaging (MRI) of
primary microcephaly patients have shown reduction in brain size particularly affected
the frontal lobes of cerebral cortex due to neuronal proliferation defect but with the
normal architecture of brain (Desir, Cassart, David, Van Bogaert, & Abramowicz,
2008; Saadi et al., 2009). Primary microcephaly patients usually have intellectual
disability and language delay, with varying degree of motor delay. The rate of
incidence of primary microcephaly is higher in Middle Eastern and Asian populations
where the consanguineous marriages are more usual than in Caucasian populations.
The prevalence of primary microcephaly is reported 1 in 10,000 in Asian and Middle
Eastern populations. Primary microcephaly is captivating disorder and considered to
be consequence of atavistic process because it disturbed the brain to body size ratio,
whereby the brain size of microcephaly patient is equivalent to that of our closest
living relatives great apes brains and extinct early hominids (sahelanthropus and
australopithecus) brains (Mochida & Walsh, 2001).
Figure 1. 3: Comparative view of normal and microcephalic brain.
Chapter 1 Introduction
7
Left side microcephalic patient and right side aged match control. Microcephalic patient show severe
reduction in brain volume. Poorly developed frontal lobe and angiogenesis of rostrum corpus callosum
(white arrow) in microcephalic patient as compared to an aged match control. [Adapted from (Kaindl et
al., 2010)].
1.3 Primary microcephaly genes and their functions
Autosomal recessive primary microcephaly is genetically heterogeneous disorder.
Atleast eighteen loci (MCPH1-18) have been identified to responsible for primary
microcephaly at different human chromosomes (Table 1.1) (H. Li et al., 2016). These
underlie genes are MCPH1, WDR62, CDK5RAP2, CASC5, ASPM, CENPJ, STIL,
CEP135, CEP152, ZNF335, PHC1, CDK6, SASS6, MFSD2A, CIT and KIF14 (Awad
et al., 2013; Basit et al., 2016; Bond et al., 2002; Bond et al., 2005; Genin et al., 2012;
Guernsey et al., 2010; Gul et al., 2006; Muhammad Sajid Hussain et al., 2012;
Muhammad S Hussain et al., 2013; Jackson et al., 2002; Khan et al., 2014; Kumar,
Girimaji, Duvvari, & Blanton, 2009; H. Li, et al., 2016; Moawia et al., 2017; Adeline
K Nicholas et al., 2010; Y. J. Yang et al., 2012). Almost all MCPH genes expressed
mainly in fetal brain and have a dominant contribution in the regulation of
neurogenesis and cytokinesis, which in turn control the brain size (Basit, et al., 2016).
Figure 1. 4: MCPH genes from different pathway to cause microcephaly.
Chapter 1 Introduction
8
Sixteen MCPH genes have been identified, of which nine and two encode centrosome and cytokinesis
genes respectively. The organelles implicated in primary microcephaly are shown in red. [This picture
is adapted from (Jayaraman, Bae, & Walsh, 2018)].
1.3.1 Microcephalin
Microcephalin gene (MCPH1) contains 14 exons across the genomic region of 241905
bp at human chromosome 8p23.1 (Jackson, et al., 2002). MCPH1 is the first gene
detected as a causative agent of autosomal recessive primary microcephaly in two
Pakistani families. MCPH1 gene encodes 835 amino acids and contains three BRCT
(breast cancer1 carboxyl-terminal) domains, one at amino terminal and two at carboxyl
terminal of MCPH1 protein. BRCT domain present in DNA repair and cell cycle
proteins, and they involve in protein-DNA and protein-protein interactions particularly
interact with those proteins that phosphorylated on serine/threonine residues (Huyton,
Bates, Zhang, Sternberg, & Freemont, 2000; Yu, Chini, He, Mer, & Chen, 2003).
Mutations in MCPH1 gene are not only responsible for autosomal recessive primary
microcephaly but also cause premature chromosome condensation syndrome
(Trimborn et al., 2004). MCPH patients due to mutations in microcephalin have a
capacity to contain head circumference lesser than 4 standard deviation at birth (Evans
et al., 2005). Expression study of human fetal tissue by RT-PCR shows that MCPH1 is
expressed in fetal brain, kidney and liver with analogous level (Jackson, et al., 2002).
The expression of MCPH1 is also noted in other tissue such as heart, lungs, spleen,
thymus, skeletal muscles and some adult tissues at low levels (Jackson, et al., 2002). In
situ hybridization studies revealed that MCPH1 gene is expressed high level during
neurogenesis in the developing forebrain specifically lateral ventricles walls (Jackson,
et al., 2002). Microcephalin regulates BRCA1 and BRCA2 and contributes to the DNA
repair process, disruption in this DNA repair mechanism due to loss of function of
DNA repair protein can leads to excessive programmed cell death during neurogenesis
that explain the reduction in brain size by mutated microcephalin. Recently
microcephalin is used as a novel biomarker for the diagnostic of breast cancer
associated with BRCA1 inactivation (Richardson et al., 2011).
1.3.2 WD repeat domain 62 (WDR62)
MCPH2 gene WDR62 is the second most common cause of primary microcephaly
after ASPM gene. Homozygous mutations in WDR62 genes cause primary
microcephaly and some other cortical development malformations such as
Chapter 1 Introduction
9
lissencephaly, pachygyria, agenesis of corpus callosum and schizencephaly (Bilguvar
et al., 2010; Kousar et al., 2011). Heterozygous mutations in WDR62 cause
polymicrogyria in humans (Murdock et al., 2011). WDR62 gene reside on human
chromosome 19q13.12 and straddling genomic region of 50230bp (Memon et al.,
2013). WDR62 gene encompasses 32 exons and encodes 1523 amino acids log spindle
pole protein.
WDR62 has been characterized by comprehending fifteen amino terminal WD40
domains, MKK7β1 binding domain, JNK binding and loop helix domain at carboxyl
terminal (Pervaiz & Abbasi, 2016). Carboxyl terminal region of WADR62 does not
share definite sequence homology to any known protein.WDR62 is localized in
nucleus, cytoplasm and spindle pole, its localization contingent on type of cell and
stage of cell cycle (Bogoyevitch et al., 2012). WDR62 is expressed at the accelerated
rate in the ventricular and subventricular zones of neuroepithelium during
neocorticogenesis (Bilguvar, et al., 2010; Adeline K Nicholas, et al., 2010). During
neurogenesis, cortical neurons originate from the progenitor cells in the ventricular
zone of the developing brain. The progenitor cells undergo a cycle of proliferative
symmetric divisions before moving to neurogenic asymmetric divisions. The transition
from proliferative division to neurogenic division is controlled by spindle pole
orientation and defect in spindle pole orientation resulted defect in this switching and
ultimately lead to primary microcephaly (Figure 1.2) WDR62 has been implicated in
the spindle pole formation and orientation regulation and might have been conducive
to prolonged human specific neural proliferative division that is consistent with
expansion of human brain size (Cohen-Katsenelson, Wasserman, Khateb, Whitmarsh,
& Aronheim, 2011; A. K. Nicholas et al., 2010).
1.3.3 Cyclin-dependent kinase 5 regulatory associated protein 2
(CDK5RAP2)
Mutated MCPH3 gene CDK5RAP2 is considered to be a very rare cause of primary
microcephaly because only eleven families have been reported worldwide that are
affected by MCPH3 gene mutations (Abdullah et al., 2017; Bond, et al., 2005;
Moynihan et al., 2000). CDK5RAP2 is the third earliest cause of primary
microcephaly gene encompasses 34 exons and located at human chromosome 9q33.2
(Bond, et al., 2005; Moynihan, et al., 2000). Like WDR62, homozygous missense
Chapter 1 Introduction
10
mutations in CDK5RAP2 are identified to cause hypoplasia of corpus callosum (ACC)
that is present in 3-5% of individual affected by neurodevelopment disorder (Jouan et
al., 2016). CDK5RAP2 expressed in developing brain, kidney, lungs, placenta and
testis (Park et al., 2015).
Figure 1. 5: The role of WDR62 during neocorticogenesis.
During neocorticogenesis, WDR62 is supposed to play a key role both in symmetric divisions of apical
precursors and migration of neurons. Homozygous mutations disrupt the normal function of WDR62
which consequently alter the timing of proliferative division and also affect the neural migration to their
final destination. M: marginal zone, CP: cortical plate, SVZ: subventricular zone, and VZ: ventricular
zone. [Adapted from (Wollnik, 2010)].
CDK5RAP2 is 215kDa pericentriolar protein, also known as centrosomal associated
protein 215 (CEP 215) encodes 1893 amino acids and contains EB1 binding domain,
CDK5R1 interacting domain, p53 binding domain and several SMC (structural
maintenance of chromosome) domains (Sukumaran et al., 2017). The amino terminal
region of CDK5RAP2 encompasses γTuRC binding site that is considered
indispensible for the recruitment of γTuRC toward centrosome which leads to the
production of the microtubules and form spindle pole (Sukumaran, et al., 2017).
CDK5RAP2 has also involved in DNA damage signaling, centriole replication,
asymmetric centriole inheritance and spindle checkpoint function (Barr, Kilmartin, &
Gergely, 2010; Barrera et al., 2010; Lizarraga et al., 2010; X. Zhang et al., 2009). It
has been implicated in cell fate determination during neurogenesis and loss of function
Chapter 1 Introduction
11
of CDK5RAP2 could cause premature depletion of neural progenitor cells and thereby
primary microcephaly (Barr, et al., 2010). Previous study of mutated CDK5RAP2 in
the Hertwig's anemia mouse showed malformations in the development of cerebral
cortex and ultimately caused severe reduction in the brain size of mutant mice at birth
(Lizarraga, et al., 2010). During neocorticogenesis, CDK5RAP2 deficient cortical
progenitors display defect in spindle pole orientation subsequently increased the early
cell cycle exit and excessive apoptosis in neuronal cells, ultimately reduces the cortical
precursor pool (Lizarraga, et al., 2010).
1.3.4 Kinetochore scaffold 1 (KNL1)
KNL1 also known as CASC5 (Cancer susceptibility candidate 5) gene holds 18 exons,
straddling the genomic region of 70,322 bp within MCPH4 locus at human
chromosome 15q15.1 (Genin, et al., 2012; Jamieson, Govaerts, & Abramowicz, 1999).
KNL1 is a 265 KDa kinetochore protein encodes 2342 amino acids. KNL1 was
expressed at higher rate in ventricular zone as compared to subventricular zone in the
neocortex of developing brain at 13-16 gestational week (Fietz et al., 2012).
Homozygous mutations in KNL1 gene have been reported in different geographic
region and indeed cause primary microcephaly in Moroccans and Pakistani families
(Genin, et al., 2012; Szczepanski et al., 2016). Cognitive functions were impaired from
moderate to severe level in affected individual with MCPH4. These mutations induced
skipping of exon 18 and 25 which in turn creates a frameshift and introduces
premature stop codon, ultimately produces C-terminally truncated proteins. Carboxyl
terminal of KNL1 encompasses the regions that are essential for interaction with
ZWINT-1 and NSL1-MIS12 complex, which is indispensable for normal
chromosomal alignment and segregation (Genin, et al., 2012; Szczepanski, et al.,
2016). It is also implicated in the regulation of DNA damage signaling as impaired
KNL1 function in mutant fibroblast cells cause overactive pathways and eventually
chromosomal instability (Szczepanski, et al., 2016). Carboxyl-terminally truncated
KNL1 altered the shape of nuclei from ovoid (normally present control cells) to
lobulated and fragmented (Szczepanski, et al., 2016). KNL1-deficient cells showed
misalignment of chromosome and premature mitotic arrest in primary fibroblast
resulting inappropriate symmetric and asymmetric division ultimately produced
inefficient neural proliferation and abnormal brain size (Kiyomitsu, Obuse, &
Yanagida, 2007). It is very interesting to note that novel phosphorylation site was
Chapter 1 Introduction
12
identified in human KNL1 protein at serine residue 1076 that is originated since the
human-Pan split approximately six to seven million years ago (D. S. Kim & Hahn,
2011). The gain of phosphorylation site solely in human might play a role in human
cell division and the evolution of brain size during the Pleistocene-Pliocene era.
1.3.5 Abnormal spindle-like microcephaly associated gene (ASPM)
MCPH5 gene ASPM contains 28 exons covering the genomic region of 62566 bp at
human chromosome 1q31.3 (Bond, et al., 2002; Pattison et al., 2000). ASPM is
considered most common cause of primary microcephaly as recessive mutations in
ASPM were found in 60% of affected individual till date (Létard et al., 2018). Human
ASPM protein contains 3477 amino acids that are annotated with putative amino
terminal region microtubule binding domain, two calponin homology domains and 81
IQ (isoleucine glutamine) motifs that are highly variable in numbers in orthologs.
ASPM is localizes both in centrosome and spindle pole during interphase and from
prophase through telophase respectively (Fish, Kosodo, Enard, Pääbo, & Huttner,
2006; Zhong, Liu, Zhao, Pfeifer, & Xu, 2005). The human ASPM ortholog in
Drosophila melanogaster asp is responsible for organizing and binding together
microtubules at spindle pole, while mutations in asp cause premature mitotic arrest
resulting decreased central nervous system development (Gonzalez et al., 1990).
However, recessive mutations in mouse ASPM cause not only mild microcephaly but
also major defect in male and female fertility and reduced the testis size in adult mice
(Pulvers et al., 2010). Within human ASPM coding region 147 mutations have been
reported in HGMD professional 2017 that are associated with primary microcephaly.
Primary microcephaly patients affected with ASPM mutations have normal gyral
patterning and cortical structure as compared to affected with WDR62 mutations
(Bilguvar, et al., 2010; Bond, et al., 2002). RT-PCR analysis showed that ASPM
expression in various embryonic and adult tissues. During fetal development human
ASPM expressed in brain, heart, lungs, liver, stomach, spleen, colon, skeletal muscles
and skin tissues (Bond, et al., 2002; Kouprina et al., 2005; Rhoads & Kenguele, 2005).
Northern blot and in situ hybridization analyses showed acceleration in the ASPM
expression during cortical neurogenesis, particularly at embryonic day E14.5 and
E16.5 in ventricular zone (Bond, et al., 2002). The role of ASPM during neurogenesis
is directly assessed. ASPM is expressed in ventricular zone at accelerated level during
proliferative division and progressively downregulated with their switching from
Chapter 1 Introduction
13
symmetric proliferative to asymmetric neuroepithelial division demonstrating its role
in neuron production (Fish, et al., 2006). Like WDR62, ASPM controls the transition
of proliferative to neurogenic division. ASPM and WDR62 interact and perform
indispensible role in centriole duplication. Deletion of one of the two genes (WDR62
and ASPM) greatly enhanced the mutated phenotype of other gene such as leads to
severity of primary microcephaly, while deletion of both genes is embryonically lethal
(Jayaraman et al., 2016). ASPM and WDR62 play role in centriole biogenesis
regulation, cell fate determination and apical complex that explain the functions of
these gene in brain expansion (Jayaraman, et al., 2016).
1.3.6 Centrosomal associated protein J (CENPJ)
Like MCPH3, mutations in MCPH6 gene are considered to be a rare cause of primary
microcephaly. Human CENPJ gene harboured 17 exons encode 1338 amino acids,
covering the genomic DNA of 39847 bp within MCPH6 locus at human chromosome
13q12.12-12.13. Splicing mutation in CENP has been reported to cause Seckle
syndrome (Al-Dosari, Shaheen, Colak, & Alkuraya, 2010). CENPJ was highly
expressed in brain and spinal cord but it also widely expressed in developing embryo
at low level (Bond, et al., 2005). In early neurogenesis, primary expression of CENPJ
is detected in neuroepithelium of frontal cortex (Bond, et al., 2005). It contains
microtubule binding domain, microtubule destabilizing domain that harboured 112
amino acid long PN2-3 motif, 5 coiled-coiled domain and two 14-3-3 binding sites at
carboxyl terminus of the protein (Chen, Olayioye, Lindeman, & Tang, 2006; Hung,
Chen, Chang, Li, & Tang, 2004). CENPJ localizes in centrosome throughout the cell
cycle but microtubule independent way. CENPJ is also involved in gamma tubulin
complex. In vivo analysis revealed that microtubule assembly is initiated at centrosome
by gamma tubulin complex (Schiebel, 2000). CENPJ binds to tubule heterodimers by
PN2-3 motif and impede not only microtubule nucleation from the centrosome and
also depolymerization of microtubules, indicating its role microtubules assembly at
centrosome and kinetochore that is predicted to be important during mitosis for proper
chromosomal segregation (Hung, et al., 2004). Ortholog of human CENPJ in
Caenorhabditis elegans Sas-4 play a key role in controlling centrosome organization
and centriole duplication (Kirkham, Müller-Reichert, Oegema, Grill, & Hyman, 2003;
Leidel & Gönczy, 2005). Mutations in Drosophila melanogaster DSas-4, ortholog of
human CENPJ revealed loss of centrioles during embryonic development and also
Chapter 1 Introduction
14
indicated that 30% abnormal asymmetric division of neuroblasts cells (Basto et al.,
2006). DSas-4 deficient cell showed abnormal spindle formation that is responsible for
the dramatic chromosomal segregation defects probably due to the loss of centrioles
(Rodrigues-Martins, Riparbelli, Callaini, Glover, & Bettencourt-Dias, 2008). CENPJ
play a role in centriole biogenesis regulation, maintenance of centrosome integrity
might explain the controlling the brain size during human development and loss of
function of CENPJ cause primary microcephaly probably due to the loss of mature
centrosomes and impaired spindle positioning (Cho, Chang, Chen, & Tang, 2006).
1.3.7 SCL/TAL1 interrupting locus (STIL)
STIL gene was located at MCPH7 locus harbour 20 exons encoding 1288 amino acid
log pericentriolar protein, spanning the genomic region of 63,018 bp at human
chromosome 1p33 (Kumar, et al., 2009). STIL gene also well-known as SIL was
initially linked with T cell acute lymphoblastic leukemia (Aplan, Lombardi, & Kirsch,
1991). Homozygous truncated mutations in human STIL gene has been responsible for
not only primary microcephaly but also for lobar holoprosencephaly (Kakar et al.,
2015; Kumar, et al., 2009). STIL is expressed essentially in all human fetal tissues at
gestational week 16, and its expression in developing brain indicated its role neuronal
proliferation (Kumar, et al., 2009). Expression of human STIL gene is reported in
proliferating cells in early embryonic development and maximum in human fetal
thymus, bone marrow, colon and fetal liver (Izraeli & Colaizzo-Anas, 1997). In situ
hybridization study of mice showed the expression of STIL in the developing cerebral
cortex specifically in the subventricular neuroepithelial cells at embryonic day E14.5
(Kumar, et al., 2009; C. M. Smith et al., 2006). STIL-deficient mice expire in uterus
after embryonic day E10.5. Several developmental abnormalities appeared in STIL-
deficient mice between embryonic days E7.5-8.5 such as restricted development,
reduced proliferation and enhanced apoptosis, neural tube closure defects,
holoprosencephaly, left-right asymmetry defect and overall reduced size as compared
to wild type mice (Izraeli et al., 1999). Correspondingly, csp-mutant zebrafish
embryos, ortholog of STIL in zebrafish reveals elevated level of mitotic index with
disorganized mitotic spindles and also reported that homozygous csp-mutant embryos
were lethal in early developmental stage (Pfaff et al., 2007). STIL is essential for
proper spindle pole organization in vertebrates and regulate centrosome integrity and
mitosis (Castiel et al., 2011). STIL interact directly to another MCPH protein CENPJ
Chapter 1 Introduction
15
by 231-619 residues, in turn this complex binds to another spindle assembly abnormal
protein 6 homolog (SASS6) and form a complex that is important in cell division and
centriole biogenesis regulation. Mutation in human CENPJ significantly decreases the
binding capacity to STIL (Tang et al., 2011). Furthermore STIL is phosphorylated at
serine/threonine sites during mitosis to promote its binding with Pin1 and affects the
spindle checkpoint duration (Campaner, Kaldis, Izraeli, & Kirsch, 2005).
1.3.8 Centrosomal protein 135 (CEP135)
Biallelic truncated mutation in CEP135 gene was identified to responsible for MCPH8
(Muhammad Sajid Hussain, et al., 2012). MCPH8 gene CEP135 contains 26 exons,
spanning the genomic region of 84.382 kbp at human chromosome 4q12 (Muhammad
Sajid Hussain, et al., 2012). CEP153 was expressed in neuroepithelium of mouse
cerebral cortex during embryonic day E11.5-15.5. CEP135 gene encodes highly
conserved centrosomal protein of 1140 amino acids and contains two coiled-coiled
regions in amino terminal. Like CENPJ, CEP135 is a centrosomal component and
localized to centrosome throughout the cell cycle but microtubule independent manner
(Ohta et al., 2002). Human CEP135 interact with two other MCPH genes CENPJ and
hSas6, and this interaction is essential for centriole assembly (Lin et al., 2013).
Similar to MCPH3 associated CDK5RAP2 protein, mutant CEP135 was also altered
the shape of nuclei from oval to lobulated and fragmented in approximately 20%
primary fibroblasts of microcephaly patients (Muhammad Sajid Hussain, et al., 2012).
Furthermore, impaired function of CEP135 in primary fibroblasts also showed other
anomalies such as centrosome number abnormalities (complete loss of centrosome was
observed in 22% of cells while elevated level of chromosome number in 18% of cells),
microtubule organization defects (Muhammad Sajid Hussain, et al., 2012). CEP135-
deficient CHO cells and mutant primary fibroblasts showed significant reduction in
growth rate (Muhammad Sajid Hussain, et al., 2012; Ohta, et al., 2002). CEP135
deficiency by RNA interference triggered the premature centrosome splitting
mechanism and disorganization microtubules arrangement (K. Kim, Lee, Chang, &
Rhee, 2008; Ohta, et al., 2002). Function of CEP135 is significant in centriole
biogenesis, spindle organization, and cytokinesis regulation.
Chapter 1 Introduction
16
1.3.9 Centrosomal protein 152 (CEP152)
Gene responsible for MCPH9 locus on human chromosome 15q21.1 was identified as
CEP152 and contains 26 exons encodes 1710 amino acids long centrosomal protein
(Guernsey, et al., 2010). Previously, CEP152 gene is assigned to MCPH4 locus until
the discovery of KNL1 gene mutations associated with primary microcephaly (Genin,
et al., 2012; Guernsey, et al., 2010). CEP152 is expressed in mouse brain tissues at
embryonic stage E12.5 and E14.5 (Guernsey, et al., 2010). Homozygous missense and
truncated mutations of CEP152 gene have been reported to cause primary
microcephaly (Guernsey, et al., 2010). Biallelic mutations in CEP152 also cause
Seckle syndrome (Kalay et al., 2011). Patients affected with MCPH9 gene have head
circumference within the range of 5-7 standard deviations below mean with simplified
gyral pattern but normal cortex thickness (Guernsey, et al., 2010). CEP152-truncated
mutant was not found in centrosome in transfected cells (Guernsey, et al., 2010).
Though, overexpression and antibody staining analyses revealed centrosomal
localization of human CEP152 (Kalay, et al., 2011). CEP152-deficient human
fibroblast cells revealed the presence of multiple nuclei with variable size and
fragmented centrosome and elevated level of aberrant cell division, while these cells
seemed to be arrested in early anaphase (Kalay, et al., 2011). CEP152 is involved in
the regulation of genomic integrity and DNA damage response through interaction
with genome maintenance protein CINP (CDK2 interacting protein) (Kalay, et al.,
2011). Mutations in asl (asterless) gene, ortholog of CEP152 in Drosophila
melanogaster lead to embryogenesis arrest and male infertility in flies (Blachon et al.,
2008). Subcellular localization analysis of asl gene revealed its association with
centrosome at centrioles periphery and regulate the initiation of centriole duplication
(Blachon, et al., 2008).
1.3.10 Zinc finger protein 335 (ZNF335)
ZNF335 also known as NIF-1 gene contains 28 exons encodes 1342 amino acids,
spanning the genomic DNA of 23,519 bp at human chromosome 20q13.12 (Y. J.
Yang, et al., 2012). Mutations in MCPH10 gene ZNF335 cause one of the most severe
form of primary microcephaly with head circumference 9 standard deviations below
means (Sato et al., 2016; Stouffs et al., 2018; Y. J. Yang, et al., 2012). MRI study of
patients affected with MCPH10 gene revealed cortex size reduced greatly as compared
Chapter 1 Introduction
17
to skull with more severe simplified gyral pattern and extra-axial space, invisible basal
ganglia and also show neuronal disorganization (Stouffs, et al., 2018; Y. J. Yang, et
al., 2012). ZNF335 is ubiquitously expressed in variety of human fetal (brain, lungs,
liver and kidney) and adult organs. ZNF335 expression is elevated during mouse
cortical neurogenesis at embryonic stage E13-E15 (Y. J. Yang, et al., 2012). During
neurogenesis, ZNF335 is expressed in ventricular and subventricular zones as well as
in the developing cortical plate but at lowest level. ZNF335 is essential for embryonic
development as ZNF335-deficient mice cause increase cell death and it is
embryonically lethal in early development stage before E7.5 (Y. J. Yang, et al., 2012).
ZNF335 is involved in histone trimethylation regulation and control the expression
level of variety of somatic and brain developmental genes. Microarray analysis
revealed that ZNF335-defcient neurons displayed reduced expression of brain
developmental genes particularly DLX homeobox genes, REST/NRSF, Co-REST 2
gene involved in early brain development and neurogenesis respectively (Y. J. Yang,
et al., 2012). ZNF335 is essential for neurogenesis and neuronal differentiation and
migration in mammals.
1.3.11 Polyhomeotic homolog 1(PHC1)
Mutation in PHC1 gene has been reported to affect two siblings in consanguineous
Saudi family with primary microcephaly through recessive mode of inheritance
(Awad, et al., 2013). It is the twelfth gene (MCPH12) that is associated with primary
microcephaly. PHC1 gene holds 15 exons, spanning the genomic region of 25,194 bp
at human chromosome 12p13.31 (Awad, et al., 2013). PHC1 gene also known as
EDR1 encodes 1004 amino acid long protein and has characterized by the presence of
carboxyl terminal SAM (sterile alpha motif) domain. It is considered as an essential
component of polycomb repressive complex 1 (PRC1) that maintain the
transcriptionally repressive state of HOX genes (Isono et al., 2005). Similar to another
MCPH gene ZNF335, PHC1 is localized in cell nucleus (Alkema et al., 1997; Cmarko,
Verschure, Otte, van Driel, & Fakan, 2003; N. Hashimoto et al., 1998). Mutation in
PHC1 gene increases the expression of GMNN (geminin) and decreases H2A
ubiquitination and recruitment of PHC1 to chromatin due to reduction in PHC1 protein
level, ultimately impaired the DNA repair system and cell cycle activity in patient‘s
cells (Awad, et al., 2013). Microarray analysis of PHC1-mutated patient‘s cells
revealed significant dysregulation of those genes involved in cell cycle, cellular
Chapter 1 Introduction
18
proliferation, apoptosis, DNA replication and DNA repair system (Awad, et al., 2013).
The implications of chromatin remodeling in the pathogenesis of primary
microcephaly was first time discovered by PHC1 mutation.
1.3.12 Cyclin-dependent kinase 6 (CDK6)
CDK6 gene contains eight exons encode 326 amino acids, covering the large genomic
region (221,454 bp) at human chromosome 7q21.2. CDK6 contains protein tyrosine
kinase domain and has involved in cell cycle G1 progression and regulate G1 to S
phase transition (Russo, Tong, Lee, Jeffrey, & Pavletich, 1998). Missense mutation in
CDK6 gene has been reported to be a underlying cause of MCPH12 locus defect in
Pakistani family (Muhammad S Hussain, et al., 2013). CDK6 is localized in cytoplasm
and nucleus in the interphase cells, its presence also observed at centrosome
throughout the mitotic cycle (Ericson, Krull, Slomiany, & Grossel, 2003; Mahony,
Parry, & Lees, 1998). Immunofluorescence studies revealed that during neurogenesis
CDK6 is expressed in neuroepithelium during cerebral cortex development at
embryonic day E11.5 and also in the basal progenitor cells at embryonic days E11.5
and E15.5 (Muhammad S Hussain, et al., 2013). CDK6 is a key regulator to maintains
balance between proliferative symmetric and neurogenic asymmetric divisions that is
very important for the quantitative production of neurons (Beukelaers et al., 2011).
CDK6 mutation perturbs the proliferation of apical neuronal precursor cells and might
be loss the balance between proliferative and neurogenic division, ultimately reduced
the number of neurons that explain the cause of microcephaly phenotype by MCPH12
gene (Muhammad S Hussain, et al., 2013). CDK6-mutant fibroblasts revealed its
absence in centrosome and also showed mitotic spindle disorganization (Muhammad S
Hussain, et al., 2013). Similar to two other MCPH genes CEP135 and CEP152,
CDK6-mutant fibroblasts also showed other abnormalities including misshapen nuclei,
centrosome number anomalies, microtubule organization defect and reduction in
growth rate (Muhammad S Hussain, et al., 2013).
1.3.13 SAS-6 centriolar assembly protein (SASS6)
Homozygous missense mutation in SASS6 gene reported to cause primary
microcephaly in Pakistani family. Mutation occurred in highly conserved region of
PISA motif situated within the amino terminal domain (Khan, et al., 2014; van Breugel
Chapter 1 Introduction
19
et al., 2011). Affected individuals had occipitofrontal circumference between the range
of -6.63 to -19.6 standard deviations below the mean and also had severe mental
retardation (Khan, et al., 2014). SASS6 present on MCPH14 locus and encompasses
17 exons, holding the genomic region of 49,392 bp at human chromosome 1p21.2.
SASS6 gene encodes 657 amino acids long Cartwheel protein and it is localized at
centrioles and cytoplasm (Nakazawa, Hiraki, Kamiya, & Hirono, 2007; Strnad et al.,
2007). SASS6 is characterized by amino terminal domain that contains two highly
conserved motif, coiled coil domain, and carboxyl terminal domain (van Breugel, et
al., 2011). SASS6 is essential for procentriole formation and centriole duplication and
functions in human, as depletion of SASS6 block the centriole duplication and
overexpression leads to centriole amplification (Arquint & Nigg, 2016; Strnad, et al.,
2007). In addition to SASS6, another MCPH gene STIL is also a core component of
centriole duplication (Arquint & Nigg, 2016). Knockdown study of DSAS-6, an
ortholog of SASS6 in Drosophila melanogaster revealed significant reduction in
number of centrosome in flies brain (Rodrigues-Martins et al., 2007). Furthermore,
SAS-6 is necessary for centrosome duplication cycle in Caenorhabditis elegans,
suggested that function of SASS6 is evolutionary conserved from human to nematodes
(Leidel, Delattre, Cerutti, Baumer, & Gönczy, 2005). SASS6 interact with two other
MCPH genes CEP135 and CENPJ and form a complex that is necessary for centriole
assembly (Lin, et al., 2013). Mutation in human SASS6 partially impaired its function
and has drastic effect on centriole formation and cell division, ultimately affected the
neurogenesis process (Khan, et al., 2014).
1.3.14 Major facilitator superfamily domain containing 2A (MFSD2A)
MFSD2A gene comprehends 14 exons across 14,816 bp long genomic region at
human chromosome 1p34.2. MFSD2A gene encodes 543 amino acids long plasma
membrane protein and transport docosahexanoic acid (DHA) as
lysophosphatidylcholine across the blood brain barrier (BBB) and also involved in the
formation of blood brain barrier (Ben-Zvi et al., 2014; Nguyen et al., 2014). MFSD2A
is expressed in various human fetal and adult (cortex, corpus callosum, cerebellum,
pons, spinal cord and liver) tissues. However it is expressed at elevated level in human
fetal brain particularly in the endothelium cells of BBB (Guemez-Gamboa et al.,
2015). Biallelic missense mutations in MFSD2A were reported to cause microcephaly
with and without lethality in three families from different ethnic groups including
Chapter 1 Introduction
20
Libyans, Egyptian, and Pakistani (Alakbarzade et al., 2015; Guemez-Gamboa, et al.,
2015). Brain imaging revealed that affected individuals also exhibit some other
anomalies in addition to reduced cortex size such as brainstem and cerebellar
hypoplasia, cortical surface effacement, and significant deficiency in posterior white
matter (Alakbarzade, et al., 2015; Guemez-Gamboa, et al., 2015). Mutations do not
affect expression level and localization of MFSD2A while they impaired its transport
activity and reduced the brain uptake of lysophosphatidylcholine, suggesting that it is
essential for normal brain development (Guemez-Gamboa, et al., 2015). Congruently,
MFSD2A-knockout mice showed DHA deficiency in brain and exhibit severe
microcephaly with cognitive impairment. Furthermore, 40%of MFSD2A-knockout
mice expire in the early age of life (Berger, Charron, & Silver, 2012; Nguyen, et al.,
2014). Danio rerio has two inparalogs mfsd2aa and mfsd2ab for human MFSD2A. In
situ hybridization analysis revealed that these both inparalogs mfsd2aa and mfsd2ab
are expressed throughout the nervous system in the zebrafish embryos (Guemez-
Gamboa, et al., 2015). Knockdown analyses showed that these inparalogs have non
redundant functions in zebrafish as lethality observed for each paralog (Guemez-
Gamboa, et al., 2015). So, all these results specify that MFSD2A is a unique primary
microcephaly gene in a way that it provides a new insight into human brain evolution
and development.
1.3.15 Citron rho-interacting serine/threonine kinase (CIT)
Biallelic mutations in CIT gene present on MCPH17 locus cause primary
microcephaly in Egyptian and Saudi families (Basit, et al., 2016; H. Li, et al., 2016;
Shaheen et al., 2016). Affected individuals brains showed simplified gyral pattern,
agenesis of corpus callosum, and profound lack of white matter (Basit, et al., 2016;
Shaheen, et al., 2016). Lack of CIT has been reported to cause spindle orientation
defect in mammals and insects (Gai et al., 2016). CIT is localized in central spindle
and co-localized with another MCPH gene ASPM in midbody and may function
together neural progenitor division (Paramasivam, Chang, & LoTurco, 2007). CIT
gene contains 48 exons, spanning the genomic region of 191,501 bp at human
chromosome 12q24.23. CIT gene encodes 2069 amino acids long CRIK (citron rho-
interacting kinase) protein that has been characterized by two kinase domains, cysteine
rich, pleckstrin homology and carboxyl terminal citron homology domain. CIT is
expressed in the ventricular zone of neuroepithelium of the developing neocortex (Di
Chapter 1 Introduction
21
Cunto et al., 2000). Expression of CIT also observed in adult tissues including brain,
lungs, kidney and spleen (Di Cunto et al., 1998). CIT has been implicated in the
regulation of cytokinesis pattern and progression during the central nervous system
development. Flathead rat model has been characterized by reduced brain size with
abnormal cerebral cortex development (Sarkisian, Rattan, D'Mello, & LoTurco, 1999).
Single nucleotide deletion in rat CIT gene has been reported causative mutation in
flathead rat and disrupted the cytokinesis in neural progenitor cells and increases
apoptosis (Sarkisian, Li, Di Cunto, D'Mello, & LoTurco, 2002). Congruent phenotype
was found in CIT-knockout mice that displayed significant reduction in brain size
particularly in hippocampus, olfactory bulb and cerebellum region due to depletion of
neurons and expire prior reaching to adulthood (Di Cunto, et al., 2000). CIT-mutated
phenotype in human might be as a result of disruption in cytokinesis and neurogenesis
along with elevated level of apoptosis in neuronal cells.
1.3.16 Kinesin family member 14 (KIF14)
Recently, it has been reported that homozygous and heterozygous mutations in KIF14
gene affect four families from different geographic regions (Pakistan, Germany, and
Saudi Arabia) with primary microcephaly (Moawia, et al., 2017). Another study
reported that KIF14 biallelic mutations cause lethal fetal anomalies in brain and kidney
(Filges et al., 2014). KIF14 gene encompasses 30 exons covering 69,234 bp long
genomic region at human chromosome 1q32.1. KIF14 gene encodes 1648 amino acids
and has been characterized by four domains amino terminal PRC1 binding, kinesin
motor, FHA and CRIK binding domains (Moawia, et al., 2017). Like MCPH17 protein
CIT, KIF14 is localized at central spindle and midbody and their localization are
codependent (Gruneberg et al., 2006). KIF14 is expressed in brain and kidney but at
elevated level in fetal brain development particularly at embryonic day E12.5-16.5
(Fujikura et al., 2013). KIF14-depleted human cells impaired the CIT localization at
central spindle and midbody ultimately induced cytokinesis failure followed by
apoptosis (Gruneberg, et al., 2006). KIF14 was not identified at midbody in the
primary fibroblast of affected individuals and explain the reason of resultant
phenotype. Similar phenotype is also observed in KIF14-knockout and laggard (novel
spontaneous mouse mutant) mice that showed reduction in brain size specifically most
dramatic in cortices and olfactory bulb, hypomelynation, and apoptosis (Fujikura, et
al., 2013).
Chapter 1 Introduction
22
1.4 The cost of human brain size enlargement
Expansion of human brain during evolution underlies higher cognitive capabilities and
high social interaction complexity that set us apart not only from nonhuman primates
but also from our extinct hominids relatives. However, modern humans pay its cost in
the form of neurodegenerative disorders. Neurodegenerative disorders are a group of
chronic disorders characterized by slow progressive loss of specific type of neurons in
discrete regions of brain (Gao & Hong, 2008). Neurodegeneration will become the
world's second leading cause of death by the year 2040, overtaking cancer
(Kontopoulos, Parvin, & Feany, 2006; Siddiqui, Pervaiz, & Abbasi, 2016). The
neurodegenerative disorders include Alzheimer‘s disease (AD), Parkinson‘s disease,
frontotemporal dementia, amyotrophic lateral sclerosis (ALS), multiple sclerosis (MS),
Huntington‘s disease, Lewy bodies‘ disease (LBD), and Multiple System Atrophy
(MSA) (Gao & Hong, 2008). However, Alzheimer‘s and Parkinson‘s disorders are the
two most prevalent neurodegenerative disorders and are considered exclusively affect
and restricted to modern humans. It is generally accepted that dramatic brain evolution
during the last 1 million years makes human susceptible neurodegeneration. However,
Swiss scientist study consolidated this view where he established that brain regions
involved in neurodegenerative disorders are recently evolved in modern human during
Pleistocene-Pliocene age (Ghika, 2008).
1.5 Parkinson’s disease
Parkinson‘s disorder is the second most prevalent neurodegenerative disorder after
Alzheimer‘s which affects 1–2% of the population above age 65 and 4–5% above age
85 (Bisaglia, Mammi, & Bubacco, 2009). Neuropathologically, it is defined by the
presence of Lewy bodies, Lewy neurites and the loss of 78% dopaminergic neurons of
substantia nigra parse compacta which is parallel to loss of dopamine in neostriatum
that is involved in controlling the motor behavior (Siddiqui, et al., 2016).
Parkinson‘s disease is clinically defined by cardinal signs including bradykinesia,
resting tremor, rigidity, mask facial expression and postural instability (Lücking &
Brice, 2000). PD is genetically heterogeneous disease as atleast 11 genes not from a
common gene family are reported to cause PD, and many genes are identified to be
susceptible for PD (Figure). However, alpha synuclein is identified as the first
causative gene intricate in the early onset of familial PD.
Chapter 1 Introduction
23
Figure 1. 6: Circular illustration of Parkinson’s disease associated genes on human chromosomes.
Genes that contain causative mutations are shown in blue. Two genes shown in red are those that
contain moderate effect protein coding risk alleles, while genes shown in black are identified by genome
wide analysis studies. [Adapted from (Singleton & Hardy, 2016)].
1.5.1 Alpha synuclein
Among three synucleins, α synuclein has received great attention because it was
emerged as a central protein in the pathophysiology of both early onset of autosomal
dominant familial and sporadic Parkinson‘s disorder. Six missense mutations (A30P,
E46K, H50Q, G51D, A53T, and A53E) in the amino-terminal lipid binding domain
and gene multiplications (duplications and triplication of α synuclein) caused early
onset of hereditary Parkinson‘s disease (Siddiqui, et al., 2016; Vekrellis, Xilouri,
Emmanouilidou, Rideout, & Stefanis, 2011). The α synuclein is vertebrate specific
presynaptic protein and is required for the synaptic vesicles endocytosis/exocytosis,
and also regulates presynaptic architecture and synaptic vesicle distribution (Vargas et
al., 2014; Vargas et al., 2017). Alpha synuclein encompasses five exons and encodes
140 amino acids long 14.5kDa protein. It is belong to a family of intrinsically
disordered group of proteins contain three members α, β, and γ that are reside on
Chapter 1 Introduction
24
human FGFR paralogon and map to chromosome 4, 5 and 10 respectively (Campion et
al., 1995; Lavedan et al., 1998; Spillantini, Divane, & Goedert, 1995). α synuclein is
expressed predominantly in neocortex, hippocampus, striatum, thalamus, and
cerebellum and is particularly enriched at presynaptic terminals (George, 2002;
Solano, Miller, Augood, Young, & Penney, 2000). Expression of α synuclein is also
observed in various other cells including heart, hematopoietic, lungs, ocular, cochlea,
and skeletal muscles, indicating more general function of α synuclein coupled with its
role in nervous system (M. Hashimoto & Masliah, 1999; Lücking & Brice, 2000;
Surguchov, McMahan, Masliah, & Surgucheva, 2001).
1.6 Archaic human genomes
Extinct archaic humans (Neandertals and Denisovans) genome sequences provide a
new insight into the concrete events that transpired during human evolution. Archaic
humans (Neandertals and Denisovans) split recently from modern human
approximately 550,000-750,000 years ago and were extinct from earth almost 30,000
years ago (Prüfer, et al., 2014). These two archaic humans are considered to be the
closest extinct relatives of modern humans. Svante Pääbo group of Max Planck
Institute for Evolutionary Anthropology was sequenced the complete genomes of two
archaic humans, the Neanderthals and Denisovans with 50 × and 30 × sequence
coverage, respectively (Prüfer, et al., 2014; Reich et al., 2010). Both archaic humans
contributed to the ancestry of modern humans populations. As mentioned above both
Neandertals and modern humans have comparable larger brain. Therefore,
identification of hominin (Neandertals, Denisovans and modern humans) and modern
human specific changes by comparative genomics approach made it possible to decode
the genetic basis of evolutionary enlargement of brain size and increased susceptibility
to neurodegenerative disorders in unprecedented details.
1.7 Aims & approach of study
Inquisitiveness in ascertaining the genetic basis of vital differences those are
responsible for evolutionary expansion of human brain size as comparison to
nonhuman hominidea. Evolutionary forces shaping species genomes that ultimately
influenced the variations in phenotypic traits between the species. Identification of
those genes that have exhibited some signatures of natural selection might have been
Chapter 1 Introduction
25
unveiled our understanding about individuality of Homo sapiens. The human primary
microcephaly genes are enthralling candidates in order to understand the evolutionary
enlargement in the magnitude of human brain because mutations in these MCPH genes
cause drastic diminution in human cerebral cortex size that is approximately similar in
the size of nonhuman hominidae. Along the way also try to elucidate why solely
modern humans have Parkinson‘s disease. The approach was taken following steps.
I. Paralogs and orthologs of MCPH candidate genes were identified across long
evolutionary distance from subphyla vertebrata to phyla porifera and
reconstruct phylogenetic tree in order to identify the how deep genes of interest
are rooted.
II. Strength and direction of natural selection acting on candidate MCPH genes
was tested in different datasets of eutherian established phylogeny on codon
based substitutions models. Natural selection was also tested within modern
human through frequency based methods for those candidate genes that have
accelerated rate of evolution in human branch as compared to our closest extant
relatives Pan.
III. Modern human specific residues will be identified by combining archaic
human‘s (Neandertals and Denisovans) data with placental mammals through
comparative analysis.
IV. Interspecies sequences differences occurred during evolution that ultimately
yield a clade specific function along with altered selection constraint between
the clades. So, functional divergence will be estimated for the orthologous
sequences between the different partitions of eutherian phylogeny particularly
with respect to evolutionary expansion of brain size in primates.
V. Evolutionary advantage of enlarged brain size in modern humans came up with
the price of susceptibility to Parkinson‘s disorder. In order to understand why
hereditary Parkinsonism only specific to humans, molecular evolutionary
analysis will be performed on Parkinson‘s disease associated alpha synuclein
gene.
The overall objective of this study is to attempt to find evolutionary genetic and
molecular basis underlying two complex human phenotypic traits; evolutionary
expansion of human brain size and human specific neurodegenerative disorder
Parkinson‘s disease using evolutionary comparative genomics approach.
Chapter 2 Materials and Methods
26
Materials and Methods
2.1 Dataset for genes linked with autosomal recessive primary
microcephaly
Genes from 10 microcephaly locus (MCPH) were included in this analysis. The
chromosomal location and amino acid and coding sequences of these 10 loci in human
were obtained from Ensembl genome browser (http:/www.ensembl.org) (Hubbard et
al., 2002). The closest putative paralogs for these 10 MCPH genes in human were
obtained both from Ensembl paralogy prediction approach and by similarity search
approach. Similarity search approach was carried out by performing reciprocal
BLASTP search of gene of interest against the human protein databases available at
National Center for Biotechnology Information (NCBI) (http:/www.ncbi.nlm.nih.gov)
(Altschul, Gish, Miller, Myers, & Lipman, 1990; Pruitt, Tatusova, & Maglott, 2007).
In order to hunt for true paralogs for these 10 human MCPH protein coding genes
phylogenetic relationships were carried out. The orthologous amino acid and coding
sequences in other metazoan species for these MCPH genes were retrieved from
Ensemble through reciprocal BLAST/BLAT search. The genomic information of
many metazoan species was not availablea at Ensemble genome browser, the
orthologues sequences of those species were collected from sequence databases
available at NCBI (http:/www.ncbi.nlm.nih.gov) using bidirectional BLAST hit
strategy (Altschul, et al., 1990; Pruitt, et al., 2007).
2.2 Sequence Alignment
Multiple sequence alignment is important in comparative and evolutionary genomics
studies. They enable phylogenetic tree estimations, duplication timing estimations,
ancestral sequence reconstructions, structure prediction, natural selection analyses, and
critical residue identification. CLUSTAL W and MUSCLE are two widely used
algorithm for aligning the homologous nucleotide and amino acid sequences (Edgar,
2004; Thompson, Higgins, & Gibson, 1994). CLUSTAL W multiple sequence
alignment are based on progressive method in which most similar sequences are
aligned first and then progressively move into the alignment of distantly related
sequences. For phylogenetic analysis of each MCPH gene family (WDR62, STIL,
ZNF335, SASS6, PHC1, MFSD2A, CEP135, CDK6, CIT, and KIF14)), amino acid
Chapter 2 Materials and Methods
27
sequences were aligned using CLUSTAL W and MUSCLE with default parameters
(Edgar, 2004; Thompson, et al., 1994).
Natural selection analyses require multiple sequence alignments that were generated
by phylogeny aware alignment algorithm. PRANK is a probabilistic multiple sequence
(DNA, codon, and amino acid) alignment program that provide evolutionarily correct
alignment as compared to other alignment methods (Löytynoja & Goldman, 2008).
PRANK conceded insertion and deletion as a distinct evolutionary event and
introduces indel instead of aligning too divergent sequences and reduces the number of
false positive for evolutionary analysis (Fletcher & Yang, 2010; Löytynoja, 2014). The
orthologous coding sequences retrieved from placental mammals for each MCPH
protein coding gene (STIL, ZNF335, SASS6, PHC1, MFSD2A, CEP135, CDK6, CIT,
and KIF14) were aligned by PRANK with default parameters for empirical codon
model and used eutherian phylogenetic information as guide (Löytynoja & Goldman,
2008). The mammalian orthologous coding sequences of each MCPH gene were also
aligned by MUSCLE based on codon model with default parameters for the
reconstruction of mammalian phylogenetic tree for each gene (Edgar, 2004).
2.3 Phylogenetic tree reconstruction methods
2.3.1 Phylogenetic analysis by neighbor Joining (NJ) method
Phylogenies for each MCPH gene family and synuclein family were constructed to
understand the depth, and evolutionary histories of MCPH genes. The phylogenetic
trees for each MCPH gene family and synuclein family were reconstructed by
including the amino acid sequences from representative members of phyla vertebrata,
urochordata, cephalochordata, hemichordata, echinodermata, arthropoda, Mollusca,
annelida, cnidaria, placozoa and porifera through NJ method (Saitou & Nei, 1987). NJ
method reconstruct tree from distance matrix that contains pairwise evolutionary
distance between the features of a group of sequences. The uncorrected proportion (p)
distance, poisson correction and Jones, Taylor, and Thornton (JTT) methods were used
as amino acid substitution models to calculate evolutionary distances between
sequences (Jones, Taylor, & Thornton, 1992; Zuckerkandl & Pauling, 1965). All
positions comprehending gaps and missing data were eradicated with the help of
complete deletion parameter. The topological reliability of each NJ tree was evaluated
Chapter 2 Materials and Methods
28
by bootstrap method which produces the bootstrap score for each evolutionary
relationship between the branches of the tree on the basis of 500-1000
pseudoreplicates (Felsenstein, 1985).
2.3.2 Phylogenetic analysis by Maximum likelihood (ML) method
Phylogenetic trees for each MCPH gene family and synuclein family were also
reconstructed by cladistic/character based Maximum Likelihood (ML) method.
Whelan And Goldman (WAG) model was used as a amino acid substitution model
(Whelan & Goldman, 2001). The phylogenetic trees with the upmost log likelihood
scores are selected as final trees. For ML, Initial trees were generated automatically
using Neighbor Joining and BioNJ methods based on matrix of pairwise distances
calculated by a Jones, Taylor, and Thornton (JTT) model (Jones, et al., 1992).
Alignment columns encompassing missing data and gaps were removed with the help
of complete deletion parameter. The topological reliability of each ML tree was
evaluated by bootstrap method based on producing 500-1000 pseudoreplicates
(Felsenstein, 1985).
2.4 Ancestral state reconstruction
Ancestral sequence reconstructions (ASR) of proteins provide understanding about
how the natural selection shaped sequences and their function that ultimately change
the phenotypic traits during evolution (Groussin et al., 2014). ASR methods are
reconstructing the amino acid sequences of extinct ancestors that exist at the internal
nodes of phylogeny from sequences and the phylogeny of the extant species.
Maximum Likelihood (ML) method implemented in MEGA was used to reconstruct
ancestral sequences at internal nodes of phylogenetic tree of proteins of interest based
on amino substitution WAG model (Tamura et al., 2011; Whelan & Goldman, 2001;
Z. Yang, Kumar, & Nei, 1995). However, errors associated with ASR analyses were
eliminated by inferred ancestral sequences through PRANK program that accepted
insertions and deletions as distinct evolutionary events (Löytynoja & Goldman, 2008).
Chapter 2 Materials and Methods
29
2.5 Analysis of molecular macroevolution
2.5.1 Estimation of selective pressure on MCPH protein coding genes
Estimation the numbers of nonsynonymous (dN/ Ka) and synonymous (dS/Ks)
substitutions provide direct insight into mechanism of molecular sequence evolution
(Z. Yang, Nielsen, Goldman, & Pedersen, 2000; Z. Yang & Swanson, 2002). Selective
pressure (ω = dN/dS) acting on the coding sequence of genes is a rate ratio of
nonsynonymous to synonymous substitutions (Anisimova & Kosiol, 2008; R. Nielsen
& Yang, 1998). The ω ratio specifies the direction and strength of natural selection
operating on protein coding gene; 0 > ω < 1 indicates negative selection (greater
number of synonymous substitutions are accumulating fastly in the protein coding
sequence as compared to nonsynonymous substitutions), ω = 1 is congruous with
neutral evolution (equal number of non-silent and silent substitutions are amassing in
protein coding sequence), and ω > 1 represents positive Darwinian selection (number
of nonsynonymous substitutions are accruing faster than synonymous substitutions in
protein coding gene). Negative selection is operating on sequence when an existing
function or phenotype is evolutionary favorable or essential for particular trait. It is
generally acceptable that positive selection favors the adaptive function or phenotype.
The selective pressure ω was measured for nine MCPH genes (WDR62 excluded for
codon based maximum likelihood method because these methods were already
performed on the coding sequence of WDR62) by using maximum likelihood based
codon substitutions models implemented in CodeML program from PAML4.7
software (Wong, Yang, Goldman, & Nielsen, 2004; Z. Yang, 2007; Z. Yang, et al.,
2000). Several analyses were performed to check whether the positive selection was
acting on the coding sequences of autosomal recessive primary microcephaly genes
with respect to brain size evolution. Coding sequences from 48 placental mammalian
species (20 primates and 28 nonprimate placental mammals) provide sufficient
genomic coverage to perform these analyses (Figure 2.1). The ratio non-synonymous
(Ka) to synonymous (Ks) substitution rates for WDR62 in mammals were calculated
by using Pamilo–Bianchi–Li‘s method in MEGA5.05 (W. H. Li, 1993).
Chapter 2 Materials and Methods
30
Figure 2. 1: Phylogenetic tree of 48 placental mammal genomes.
The tree show the 48 species used in this study. The coding sequence of these species for MCPH loci
were retrieved from NCBI and Ensembl database.
2.5.2 Codon substitutions site models
Five codon substitutions site models (M1, M2, M7, M8, and M8a) based maximum
likelihood method were employed in CodeML program of PAML4.7 software to
detect the positive Darwinian selection in nine MCPH genes across three datasets i.e.,
primates, nonprimate placental mammals and placental mammals (Wong, et al., 2004;
Chapter 2 Materials and Methods
31
Z. Yang, 2007; Z. Yang, et al., 2000). Codon substitutions site models permit selective
pressure ω to vary across the codon sites of protein coding gene but prohibited ω to
vary across lineages. Patterns of selection in nine genes (STIL, ZNF335, SASS6,
PHC1, MFSD2A, CEP135, CDK6, CIT, and KIF14) were investigated on the above
mentioned three datasets by submitting a well-accepted phylogeny and alignment
(alignment by PRANK program) of respective datasets to the CodeML. Likelihood
ratio tests (LRTs) were calculated for three site pairs of models from log likelihood
scores of five codon substitutions site models M1, M2, M7, M8, and M8a to test sings
of positive selection. The first pair compare the null model M1 (nearly neutral model
that assume the existence of two classes of sites with ω = 1 and ω < 1) and alternative
model M2 (positive selection model that assume an additional third class of site with ω
> 1) (Wong, et al., 2004; Z. Yang, Wong, & Nielsen, 2005). The other two pairs are
null model M7 (beta) and alternative model M8 (beta, and ω2 > 1), and the last pair
comparison between null model M8a (beta and ω2 = 1) and alternative model M8
(beta, and ω2 > 1) (Swanson, Nielsen, & Yang, 2003; Z. Yang & Swanson, 2002). The
LRT values for three site pairs of models were calculated as follows:
LRT = 2(log likelihood score of alternative model – log likelihood score of null model)
The significance of these test are determined by calculating p values from LRT values
using Chi-square program of PAMLX 1.2 package (B. Xu & Yang, 2013). For this
study, positive selection inferred only if two out of three site pair models significantly
reject the null model in the favor of alternative models. Naïve Empirical Bayes (NEB)
and Bayes Empirical Bayes (BEB) methods implemented in M8 codon substitution site
model were used to identify positive selected sites by estimating the posterior
probability for site classes (Z. Yang, et al., 2005).
2.5.3 Codon substitutions Branch-site model
Positive selection acting on protein coding gene often transient or for a short period of
time and affects only a fraction of sites. The above site model was unable to detect this
type of transient and episodic positive selection. Branch-site approach of Zhang,
Nielsen and Yang implemented in CodeML was used to test signature of episodic
positive selection that was restricted to specific lineage/lineages (Jianzhi Zhang,
Nielsen, & Yang, 2005). This model allows ω to vary both across branches and sites.
The branch-site model allows that phylogeny can be divided into prespecified
Chapter 2 Materials and Methods
32
foreground branch (ω2 >= 1, proportion of sites may be under positive selection) and
background branches (where proportion of sites experienced either purifying selection
or neutral evolution 0 < ω2 <= 1). The inference of positive selection was conducted by
calculating LRT between this branch-site model and null model (it is same as branch
site model but with ω2 = 1 for foreground branch) with above mention LRT formula
(Jianzhi Zhang, et al., 2005). Eutherian multiple sequence alignment and well accepted
phylogeny used as input for the detection of episodic selection at different
evolutionary time point from primate ancestral branch to human terminal branch.
Branch-site test was performed, in specific in relation to prefrontal forebrain size
evolution.
2.5.4 Clade model C (CmC) analyses
It is not necessary that adaptive function evolved only if positive selection operating
on protein coding gene. Difference in selective pressure between the clades of
phylogeny can also responsible for the adaptive evolution of particular phenotype and
trait. When talked about particularly in context with brain evolution, cerebral cortex
size tends to start expanding in the ancestor of primates. To determine the pattern of
divergence in selective constraint in nine MCPH genes across the mammalian
phylogeny, clade model C (CmC) approach implemented in CodeML was used
(Bielawski & Yang, 2004). Clade model C assumes that proportions of sites have
evolved under divergent selective pressure but not necessarily under positive selection
in two or more partition of phylogeny defined a priory. The LRT was conducted by
comparing CmC model with null model M2a-rel with the same formula mentioned in
site model analyses (Weadick & Chang, 2011). Both alternative CmC and null M2a-rel
model have possessed three classes of sites with ω = 1 (neutral evolution), ω < 1
(negative selection). The third class of site in M2a-rel has single ω ratio (ω2 > 0 that
allows sites to be evolved adaptively) that is shared between all clades of phylogeny
while CmC third class of site has ω ratios equal to the partitions of the phylogeny (for
example if two phylogeny divided into two partitions then third class contain two ω
ratios ω2 > 0 and ω3 > 0 one for each partition) and varies among the partition of
phylogeny (Bielawski & Yang, 2004; Weadick & Chang, 2011). Mammalian multiple
sequence alignment and well established phylogeny were used as input for the
detection of divergent selective pressure between primates and nonprimate eutherian,
Chapter 2 Materials and Methods
33
simians and nonsimians eutherian, catarrhini and noncatarrhini eutherian, hominidae
and nonhominidae eutherian, and hominini and nonhominini eutherian.
2.6 Statiscal Analysis
P values were calculated for site, branch-site and CmC analyses from LRT values in
Chi-square program of PAMLX 1.2 package (B. Xu & Yang, 2013). The P values of
all codon based maximum likelihood methods were corrected for false discovery rate
by using q value package in R3.5.0 (Storey & Tibshirani, 2003; Team, 2018).
Bootstrap method was used for π0 estimation and specified fdr.level = 0.05 in q value
package in R3.5.0 (Storey, Taylor, & Siegmund, 2004).
Coding sequences of two MCPH genes (WDR62 and STIL) contain higher number of
nonsynonymous substitutions than synonymous substitutions in modern human as
compared to chimpanzee and these sequences are accelerated in modern human
terminal branch but not at significant level. To detect the accelerated segments, the
Sliding window analysis of Ka/Ks ratio was performed on human and chimpanzee
orthologous coding sequence of WDR62 and STIL. Ka-Ks was computed at the sliding
augmentation of 10 codons (30 nucleotides) and the upshots are acquired in the form
of graph drawn by the GNUPLOT software employed in SWAKK (Liang, Zhou, &
Landweber, 2006). The non-synonymous substitutions within positively selected
segments (Ka/Ks>1) are categorized according to their physicochemical properties by
using BLOSSUM 62 (J. Zhang, 2000). Further tests were conducted to identify
whether signature of positive selection is present in modern huamans.
2.7 Detecting selection at microevolutionary level
2.7.1 Sequence acquisiotn of human population data
Variation data of 1092 individuals from fourteen different modern human populations
(CHB:97; CHS:100; JPT:89; FIN:93; GBR:89; TSI:98; IBS:14; CEU:85; CLM:60;
MXL:66; PUR:55; ASW:6; LWK:97 and YRI:88) for WDR62 and STIL protein
coding genes was obtained from 1000 Genomes Project phase 1 data in variant call
format (VCF) (www.1000genomes.org) (Abecasis et al., 2012). Polymorphisms
among population, allele frequencies were manually calculated utilizing VCF files.
The topology of modern human populations tree was depicted in accordance with
Chapter 2 Materials and Methods
34
previously described data (McEvoy, Powell, Goddard, & Visscher, 2011). With the
sense of completion, HapMap and CEPH databases were scanned to gain insights
about the derived allele frequencies by exploiting the SPSmart webserver (Amigo,
Salas, Phillips, & Carracedo, 2008) . VCF-consensus perl script was used to obtained
nucleotide sequence of two MCPH genes (WDR62 and STIL) for 1092 individuals
from their respective VCF files through VCF tools in linux. Protein coding sequences
of WDR62 and STIL in 1092 individuals were predicted by using similarity based
programs FGENESH+ and FGENESH_C implemented in desktop centered MolQuest
Bioinformatics toolbox.
2.7.2 Frequency spectrum based method for natural selection
Positive selection causes advantageous allele to fix rapidly within a population. To
scrutinize whether the observed patterns of variability in WDR62 and STIL coding
sequences within human populations are congruent with the neutral model, classical
neutrality test Tajima‘s D, Fu and Li‘s D, D*, F and F* were performed on the panel
of validated coding SNPs present in 1092 individuals 1000 genome project phase 1 (Fu
& Li, 1993; Tajima, 1989). These classical neutrality tests were executed using the
DnaSP 5.10 (Librado & Rozas, 2009).
2.8 Molecular evolution of synuclein genes
2.8.1 Sequence and structure analysis of synuclein genes
Single Likelihood Ancestor Counting (SLAC) method implemented in Hyphy was
used to detect non-neutral evolution acting on each codon in vertebrate alignment of α
synuclein, by ASR and computing substitutions (Goldman & Yang, 1994). Impact of
the evolutionary alterations that have transpired during vertebrate with Ka/Ks < 1 are
categorized into neutral or radical change according to their physicochemical
properties (Betts & Russell, 2003; Grantham, 1974).
Domains and motifs have been allocated to human α synuclein as described previously
(Du et al., 2003; Uverskya & Finka, 2002). ClustalW2 based multiple sequence
alignments were used to map the putative positioning of these domain and motifs in
putative paralog of human α synuclein protein (Thomopson, Higgins, & Gibson,
1994).
Chapter 2 Materials and Methods
35
NMR structure of human α synuclein (1XQ8) was obtained from Protein Data Bank
(PDB) (Ulmer, Bax, Cole, & Nussbaum, 2005). Structures of β and γ synuclein
coupled with ancestral proteins structures of α synuclein from sarcopterygians to
placental and nonprimate placental mammals were modelled by Modeller (Webb &
Sali, 2014). Qualities of the modeled structures were investigated by Ramachandran
plot (Sheik, Sundararajan, Hussain, & Sekar, 2002). Superimposition of the modeled
structures with 1XQ8 was carried out with chimera and root mean square deviation
(RMSD) values were calculated (Pettersen et al., 2004). In order to inspect the
structural deviations in the human specific mutations i.e. A30P, E46K, H50Q, G51D,
A53T involved in Parkinson‘s disease, mutant models were also generated by
Modeller (Webb & Sali, 2014).
2.8.2 Estimation of functional divergence among synuclein genes
Gene duplications play a pivotal role in the functional diversity of proteins which
ultimately responsible for the adaptive evolution of specific phenotype or trait.
Functional divergence among synuclein paralogs was detected by using clade model D
implemented in CodeML program of PAML4.7 software (Bielawski & Yang, 2004; Z.
Yang, 2007). Clade model D assumes that proportions of sites have evolved under
divergent selective pressure but not necessarily under positive selection between the
paralogs (Bielawski & Yang, 2004). The significance of functional divergence among
paralogs was checked by conducting likelihood ratio test (LRT) of codon substitutions
site model M3 (that assume variation in selective pressure among sites but not across
the branches or paralogs) null model against the alternative clade model D (Bielawski
& Yang, 2004; Z. Yang, et al., 2000). Functional divergence among synuclein paralogs
was also observed by using DIVERGE (DetectIng Variability in Evolutionary Rates
among Genes) software that first detect site-specific evolutionary rate shift among
paralogs and then predict those amino acid residues responsible for functional
divergence based on posterior probability (Gu & Vander Velden, 2002).
2.8.3 Identification of coevolutionary relationship among residues within
gene
Mutual Information Server To Infer Coevolution (MISTIC) web server was used to
infer coevolutionary relationship among amino acid residues within protein for all
synuclein paralogs (Simonetti, Teppa, Chernomoretz, Nielsen, & Marino Buslje,
Chapter 2 Materials and Methods
36
2013). MISTIC identify coevolutionary relationship between residues based on mutual
information (MI), proximity mutual information (pMI) and cumulative mutual
information (cMI) score for individual residue (Buslje, Santos, Delfino, & Nielsen,
2009; Buslje, Teppa, Di Doménico, Delfino, & Nielsen, 2010). The coevolutionary
relationships among the residues within protein were visualized using Circos. The
vertebrate‘s orthologous multiple sequence alignments of synuclein proteins were
submitting as input to MISTIC for the identification of coevolutionary relationship
between residues.
Chapter 3 Results
37
Results
3.1 Identification of candidate genes
By concentrating on specific loci candidate genes has the potential for extensive
phylogenetic and evolutionary analysis. Candidate genes have come from extensive
literature survey and focusing on only those genes that meet two conditions. First,
genes are involved in early brain development and second, impairment in the coding
sequence of those genes causes brain associated anomalies. Primary microcephaly
genes are excellent candidate genes in order to understand the evolutionary expansion
of brain size as primary microcephalic patients had reduced brain size similar to that of
early hominids (Woods, et al., 2005). In human, primary microcephaly is inherited in
recessive mode. Initially identified genes, ASPM and MCPH have been shown to
exhibit accelerated evolution in the lineage leading to human (Evans et al., 2004;
Wang & Su, 2004). In this study, ten newly identified genes WDR62, STIL, CEP135,
ZNF335, PHC1, CDK6, SASS6, MFSD2A, CIT and KIF4 are considered as candidate
genes for evolutionary analysis (Awad, et al., 2013; Basit, et al., 2016; Muhammad
Sajid Hussain, et al., 2012; Muhammad S Hussain, et al., 2013; Khan, et al., 2014;
Kumar, et al., 2009; H. Li, et al., 2016; Moawia, et al., 2017; Adeline K Nicholas, et
al., 2010; Shaheen, et al., 2016).
3.2 WD repeat domain 62 (WDR62)
3.2.1 Evolutionary history of MCPH2 gene WDR62
Phylogenetic tree for WDR62 and its putative paralogs was reconstructed using
neighbor joining (NJ) method in order to identify the origin and evolutionary
relationship between the WDR62 and its paralogs (Figure 3.12). NJ tree was
reconstructed with amino acid sequences from representative members of phyla
porifera, arthropoda, hemichordata, cephalochordata, and vertebrata. Phylogenetic tree
revealed that two duplication events were responsible for the expansion of this family
(Figure 3.1). First duplication event has eventuated during the early metazoan history,
before bilaterian-nonbilaterian divergence and produced most ancient member of this
family WDR16 gene and WDR62/MAPKBP1 ancestral gene (Figure 3.1). Second
duplication event diverged WDR62 and MAPKBP1 and has occurred prior to
Chapter 3 Results
38
actinopterygii-sarcopterygii split and after the divergence of vertebrates from
cephalochordate with 100% bootstrap score (Figure 3.1). From the tree topology
pattern it appears that subfamily encompasses WDR62 and MAPKBP1 genes is very
distantly related to most ancient paralog of this subfamily WDR16 (Figure 3.1). The
phylogeny confirms the presence of human WDR62 orthologs in all the five main
classes of vertebrates, i.e., teleostei, amphibia, reptilia, aves, and mammalia (Figure
3.1).
Figure 3. 1: Evolutionary history of MCPH2 gene WDR62.
Chapter 3 Results
39
The evolutionary history of human WDR62 and its putative paralogs was inferred through NJ method
based on evolutionary distances computed by uncorrected p distance based method. Fifty seven protein
sequences were used in this analysis. All positions that contain gaps and missing data were eradicated
prior to phylogenetic tree reconstruction. The numbers on the internal branches represent bootstrap
score. The bootstrap score greater than and equal to 50% are displayed on the nodes only. Scale bar
depicts number of amino acid substitution per site.
3.2.2 Molecular evolution of WDR62 in mammals
In order to identify the lineage specific Ka/Ks ratio, phylogenetic tree was
reconstructed with WDR62 orthologous coding sequences of representative primates
and non-primate mammalian species (Figure 3.2). The ratio of non-silent replacements
to silent replacements was determined for every external and internal branch of
phylogenetic tree. This revealed that non-synonymous substitutions outnumber the
synonymous substitutions only in human terminal branch (Ka/Ks=1.31). In contrast to
human terminal branch, all other internal and terminal branches, the synonymous
substitutions outnumber the non-synonymous substitutions and are suggestive of
purifying selection (Figure 3.2). From this analysis it appears that within mammals, the
evolution of WDR62 is accelerated particularly in human terminal branch after it
diverged from pan lineage.
Figure 3. 2: Estimation of WDR62 sequence evolution in therian.
Chapter 3 Results
40
Ka/Ks ratio for every internal and external branch of therian phylogeny was estimated and is shown
above each branch. The human terminal branch Ka/Ks score is highlighted in bold.
3.2.3 Human polymorphisms and signatures of selection
Molecular evolutionary rate analysis within mammals divulged variation in sequence
rate of WDR62 evolution between recently diverged Homo sapiens and Pan
troglodytes. Human WDR62 evolving slightly at higher rate (Ka/Ks = 1.31) as
compared to chimpanzee WDR62 (Ka/Ks = 0.844) and hence reject neutrality. To
investigate whether WDR62 sequence variation within humans is congruent with the
neutrality theory, the genetic diversity was estimated by using human genetic
polymorphic data of WDR62 gene obtained from dbSNP build-137 (Sherry et al.,
2001). Different classical neutrality tests i.e. Fu and Li‘s D, D* (without outgroup), F,
F* (without outgroup) and Tajima‘s D were implemented on coding sequence
polymorphisms only in order to detect the departure from neutrality (Fu & Li, 1993;
Tajima, 1989). Results unveiled that nucleotide polymorphism θw 0.00129 per site is
not equivalent to heterozygosity π 0.00043 per site, indicating that neutrality is
rejected. Nucleotide diversity within WDR62 coding sequence (0.00043) is smaller as
compared to the nucleotide diversity of human genome () and chromosome 19
(0.000764) (Sachidanandam et al., 2001). All the above mentioned classical neutrality
tests have significant negative values (Tajima‘s D = -2.51, P < 0.001; Fu and Li‘s D =
-3.95, P < 0.02; Fu and Li‘s D* = -4.0, P < 0.02, Fu and Li‘s F = -4.21, P < 0.02; Fu
and Li‘s F* = -4.14, P < 0.02) and reject neutral model. Difference in heterozygosity
and nucleotide polymorphism and significant negative values likely be a consequence
of demographic history and natural selection.
3.2.4 SWAKK analysis of WDR62
SWAKK (sliding window analysis of Ka/Ks) was employed on human and
chimpanzee WDR62 orthologous sequences in order to pinpoint WDR62 protein
regions that have been accelerated in the recent history of human after its divergence
from chimpanzee. Accelerated region might have implications in the functional
modification of human WDR62.
SWAKK graph identified six regions (R1-R6) in human WDR62 where Ka/Ks
exceeds one and congruent with positive selection (Figure 3.3). Remaining portion of
human WDR62 is under strong purifying selection (Figure 3.3). The non-silent
Chapter 3 Results
41
replacements within six regions R1-R6 are classified according to their
physicochemical properties and impact on WDR62 structure (Table 3.1). It appears
from ML ancestral sequence reconstruction since divergence from hominini ancestor,
eight and nine replacements are accumulated in human and chimpanzee WDR62
protein respectively. Comparative study of these replacements with ancestral sequence
revealed that 5/8 (62%), 5/9 (55%) substitutions in human and chimpanzee likely have
some implications in the structural and functional modification of protein (Table 3.1).
Chimpanzee comprehends one neutral and radical substitution in R1 and R2
respectively within uncharacterized portion of the protein. Human encompasses one
neutral and two radical alterations in R3 and R4 within uncharacterized portion of
WDR62. One neutral and one radical substitution were observed in R5 within an
uncharacterized region and MKK7β1 binding domain (MB) of chimpanzee WDR62.
Captivatingly, R6 incorporates more evolutionary changes in contrast to whole protein.
Two neutral and three radical substitutions were found in R6 both in human and
chimpanzee, within the proline rich domain and loop helix domain (Figure 3.3 and
Table 3.1) (Pervaiz & Abbasi, 2016).
Figure 3. 3: SWAKK plot of human and chimpanzee WDR62.
SWAKK plot display six region R1 to R6 (above dotted line) where rate of sequence evolution is
accelerated, indicate higher number of non-silent substitutions over neutral expectation i.e., Ka-Ks>0.
Dotted line depict Ka-Ks = 0.
Chapter 3 Results
42
Table 3. 1: Amino acids substitutions in human and chimpanzee lineage since the divergence from
hominini ancestor.
Ka/Ks > 1 Position Hominini
residue
Substitution in
Chimpanzee
Substitution in
Human
Neutral/Radical
Region 1
81 G S Neutral (0)
Region 2
393 R G Radical (-2)
Region 3
790 R H Neutral (0)
850 S L Radical (-2)
Region 4
1091 Y H Radical (2)
Region 5
1169 R H Neutral (0)
1273 T P Radical (-1)
Region 6
1304 V A Neutral (0)
1310 L Q Radical (-2)
1336 A T Neutral (0)
1345 R H Neutral (0)
1369 G R Radical (-2)
1372 V I Radical (3)
1390 F L Neutral (0)
1408 P S Radical (-1)
1458 R Q Radical (1)
1489 S T Radical (1)
Putative ancestral residues are constructed using maximum likelihood method. Physicochemical impact and
log-odd score in brackets for each amino acid replacement are illustrated in last column. Positive numbers
depicts a preferred replacement, negative numbers depicts an un-preferred replacement, and zero depicts a
neutral replacement.
3.2.5 Comparative analysis of WDR62 with archaic humans and modern
human populations
Comparative protein sequence analysis of human WDR62 with various non-human
primates revealed eight human specific amino acid replacements (Table 3.1). In order
to determine how many of human specific amino acid changes shared with archaic
humans (Neandertals and Denisovans) and how many of them are specific to modern
humans, we compare the human WDR62 protein sequence with two archaic humans
(Neandertals and Denisovans). This analysis revealed that extinct archaic humans, the
Neandertals and Denisovans, share six amino acid replacements R790H, S850L,
Y1091H, V1304A, G1369R, and V1372I with anatomically modern humans (hominin
specific replacements). Two replacements L1310Q, F1390L are specific to modern
humans, whereas in these sites archaic humans contain human-primate ancestral alleles
(Figure 3.4a).
Chapter 3 Results
43
Figure 3. 4: Comparative analysis of WDR62 among human populations.
a) Tree shows the previously well-defined relationship between various modern human populations and
archaic humans by using chimpanzee as outgroup (See Materials and Methods). Tree illustrates six
hominin specific amino acid substitutions from which five are fixed in modern human populations, while
the remaining one is polymorphic in modern humans. Two amino acid substitutions are unique to
modern humans and are not being shared with archaic humans. These two amino acid sites are
polymorphic in modern human populations. Comparative view of modern human specific and hominin
specific amino acid substitutions is illustrated on right side of the tree in modern human populations,
archaic humans and chimpanzee. Human reference sequence (GRCh 37) is color coded in red. YRI;
Yoruba in Ibadan Nigeria, LWK; Luhya in webuyo, Kenya, ASW; People with africans ancestry in
southwest united states, FIN; Finnish in Finland, GBR; British from England and Scotland UK, CEU;
Uttah residents with ancestry from northern and western Europe, TSI; Toscani in Italia, IBS; Iberian population in Spain, CLM; Colombians in Medellin, Colombia, PUR; Puetro Ricans in Puetro Rico,
MXL; People with Mexican ancestry in Los Angeles, JPT; Japanese in Tokyo, Japan, CHS; Han
Chinese south China, and CHB; Han Chinese in Beijing, China. Three polymorphic variations are
Chapter 3 Results
44
further investigated in 1000 Genomes Project, HapMap release 28 and CEPH Stanford HGDP data by
SPSmart webserver. b) Derived allele frequency of SNP rs2285745 (S850L) among modern human
populations in above mentioned human genomes variation projects show relatively high derived allele
frequency in Oceania and Asia and low in Africa. c) SNP rs2074435 (L1310Q) show high derived allele
frequency in European population as compared to Africans and Asians. American population for this
SNP is not genotyped by HapMap project as depicted in graph. d) SNP rs1008328 (F1390L) demonstrated high derived allele frequency in Africa and low in Asia.
Furthermore, in order to gain insight into the status of six hominin specific and two
modern human specific amino acid replacements in modern human populations, we
exploited the populations‘ variation data from 1000 Genomes Project (Abecasis, et al.,
2012). These data show that, among six hominin specific amino acid replacements,
five amino acid changes (R790H, Y1091H, V1304A, G1369R and V1372I) are fixed
in modern human population. While the remaining one hominin specific (S850L) and
two modern human specific replacements (L1310Q and F1390L) are polymorphic in
modern human populations (Figure 3.4a). These three polymorphic sites were also
examined in HapMap data (International HapMap et al., 2010) and CEPH Stanford
HGDP data (http://spsmart.cesga.es/). Combine analysis of 1000 Genomes Projects,
HapMap data and CEPH Stanford HGDP data shows that out of three polymorphic
sites, one variant S850L (human specific site shared with archaic humans), is present
at relatively high derived allele frequency in non-African populations, particularly in
Oceanian and Asian populations as compared to African populations (Figure 3.4b).
The other two polymorphic sites located in exon 30, L1310Q and F1390L show high
derived allele frequency in European and African populations respectively (Figure 3.4c
and 3.4d).
3.3 SCL/TAL1 interrupting locus (STIL)
3.3.1 Evolutionary history of STIL
The paralog of human STIL gene was not identified in any public database and by
similarity search tools. The orthologs of human STIL gene were identified in
representative members of phyla vertebrata, hemichordata, annelida, Mollusca,
cnidarian, and porifera (Figure 3.5). The phylogenetic analysis of STIL gene revealed
that it is originated at the root of metazoan (Figure 3.5). However, ortholog of human
STIL from Ciona intestinalis, Drosophila melanogaster, and Caenorhabditis elegan
lack evident sequence homology and share only two regions in carboxyl terminal
region. However, bidirectional blast hit strategy was not identified true ortholog of
Chapter 3 Results
45
human STIL gene from these three organisms. Because the genomes of these
traditional model invertebrates like Ciona intestinalis, Drosophila melanogaster, and
Caenorhabditis elegan, have experienced extensive changes in gene contents, gene
architecture and are highly rearranged.
Figure 3. 5: Phylogenetic analysis of STIL gene.
Chapter 3 Results
46
Evolutionary history of STIL was inferred by NJ (neighbor joining) method. Evolutionary distance was
calculated by uncorrected p distance method. All positions that contain gaps and missing data are
removed. The statistics denoted on the nodes represent bootstrap value. Bootstrap values greater than
and equal to 50 is shown here
3.3.2 Molecular Evolution of STIL in Mammals by Site Models
Maximum likelihood codon substitution site models were used to detect the selective
pressure on STIL locus in primates, nonprimate mammals and placental mammals.
These codon substitution site models assume variable selective pressure among sites
(amino acids) in the protein sequence. These models differ from each other in terms of
ω distribution and numbers of free parameters. We estimated the log likelihood values
for six different codon substitution sites models using CodeML package implemented
in PAML, in order to compute likelihood ratio test (LRT). The LRT values were
obtained by the comparison between null models that does not allow the ω to exceed 1
(M1, M7, and M8a) and alternative models that allow the ω value to exceed 1 (M2 and
M8). Positive selection inferred only if two out of three site pair comparisons (M1/M2,
M7/M8, and M8a/M8) were significant. After the gap removal, 3663 sites were
analyzed in primates. For primates, the LRT values were not significant in
comparisons M1/M2, M7/M8, and M8a/M8, indicated that no signature of positive
selection was found in primates (Table 3.2). In nonprimate mammals, 3426 sites were
analyzed after elimination of the gaps. The LRT values were significant for M7 vs M8,
and M8a vs M8 comparisons, indicated that signatures of positive selection were found
in nonprimate mammals (Table 3.2). Conversely, positive selection was not identified
by M2 selection model probably M2 is too conservative (Table 3.2). However, I were
unable to identify single significant positive selected site with probability >= 95% in
selection model M8 (Table 3.2). In placental mammals (combined data of primates and
nonprimate mammals), 3228 sites were analyzed. The two null models M7 and M8a
were significantly spurned in the favor of alternative model M8 which suggests
positive selection in placental mammals (Table 3.2). However, only one site was
detected as positive selected site with p value 0.05 by NEB and BEB methods (Table
3.2).
Chapter 3 Results
47
Table 3. 2: Parameter estimation and LRT for Mammals STIL.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q value
Primates M0: One ratio ω = 0.42728 -10963.86677
M1: Nearly neutral ω0 = 0.116, f0 = 0.6164, ω1 =
1.0, f1 = 0.38358
-10916.74197 0.92 1.0
M2:Positive
selection
ω0 = 0.1186, f0 = 0.6189, ω1
= 1.0, f1= 0.379, ω2 = 3.6, f2
= 0.0019
-10916.66704
M7: p = 0.2246, q = 0.2764 -10917.05514 0.55 0.51
M8: p0 = 0.94839, p = 0.3461 q =
0.5367, (p1 = 0.0516) ω =
1.16628
-10916.45545
M8a: p0 = 0.83646, p = 0.2541, q
= 0.4899, (p1 = 0.1635) ω
= 1.0
-10916.974231 0.31 0.39
Nonprimate M0: One ratio ω = 0.34063 -22972.35103
mammals M1: Nearly neutral ω0 = 0.1324, f0 =0.675, ω1 =
1.0, f1 = 0.32
-22630.38639 1.0 1.0
M2:Positive
selection
ω0 = 0.1324, f0 = 0.68, ω1 =
1.0, f1 = 0.283, ω2 = 1.0, f2 =
0.042
-22630.38639
M7: p = 0.42 q = 0.759 -22606.97892 0.0001 0.0002
M8: p0 = 0.975, p = 0.475, q =
0.945(p1 = 0.0247) ω = 1.89
-22597.841266
M8a: p0 = 0.8214, p = 0.602, q =
1.984, (p1 = 0.1786) ω =
1.0
-22602.375609 0.003 0.007
Mammals M0: One ratio ω = 0.363 -27276.3467
M1: Nearly neutral ω0 = 0.139, f0 = 0.646, ω1 =
1.0, f1 = 0.353
-26831.009 0.99 1.0
M2:Positive
selection
ω0 = 0.139, f0 = 0.646, ω1 =
1.0, f1 = 0.284, ω2 = 1.0, f2=
0.0696
-26831.0100
M7: p = 0.4307, q = 0.7177 -26786.5586 1⁎10-3 4⁎10-3
M8: p0 = 0.959, p = 0.496, q =
0.947
(p1 = 0.0402) ω = 1.613
(412P 0.958 NEB, 412P
0.958 BEB)
-26775.2467
M8a: p0 = 0.81227, p = 0.60291, q
= 1.8573, (p1 = 0.18773) ω
= 1.0
-26781.3313 0.0005 0.003
3.3.3 Episodic selection at various stages of primate evolution in STIL locus
Positive selection was generally occurred for short period of evolutionary time and
affects only few sites rather than the entire protein sequence. Branch-site codon
substitution method was used to identify the episodic positive selection on individual
codon at particular evolutionary stages from ancestral primate branch to human
terminal branch (Table 3.3). Significance of this test was determined by comparing
Chapter 3 Results
48
this model by null model which is similar to branch-site model but ω2 fixed to 1 for
foreground branch. The branch-site method revealed that 0.04-0.4% of STIL coding
sequence predicted to be evolving at accelerated rate in hominidea ancestral branch
and human terminal branch respectively (Table 3.3). However, the LRT statistics were
not significant for aforementioned branches (Table 3.3).
Table 3. 3: Branch-site analysis of STIL.
Foreground
branch
ω2 LRT p value q value Positive
selected sites
Human 3.13 0.1646 0.68 0.98
Hominini 1 0 1 0.98
Hominidae 1 0.0917 0.76 0.98
Hominidea 4.72 0.2159 0.64 0.98
Catarrhini 1 0.0689 0.79 0.98
Simians 1 0 1 0.98
Haplorhini 1 0.00008 0.99 0.98
Primates 1 0.000002 1 0.98
3.3.4 Divergent selection pressure between clades of mammals for STIL
locus
Divergence in protein function among clades can result in site specific variation in
selective pressure among clades. Positive selection is not necessarily required for
protein functional diversification among clades which eventually contribute to
adaptive phenotypic diversity. The signatures of functional divergence among different
partition of mammalian phylogeny for STIL locus were identified by codon
substitution clade model C (CmC). The significance of functional diversification
between different partitions of phylogeny was determined by comparing CmC and
M2a_rel null model of Weadick and Chang. The parameter estimation revealed 34% of
sites significantly evolved at divergent selective pressure between simians (ω3 = 0.138)
and nonsimians placental mammals (ω2 = 0.019) (Table 3.4). Parameters estimation
also showed functional divergence between hominini and nonhominini placental
mammals but p value correction by false discovery rate q value exposed this as a false
positive result (Table 3.4).
Chapter 3 Results
49
Table 3. 4: Divergent selection constraint parameters estimation and likelihood scores for STIL.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.449, ω0 = 0.339, p1= 0.223, ω1 =
1.0, p2 = 0.327, ωnp = 0.020, ωp = 0.041
-26778.0127
0.24
0.28
CmC-Simians p0 = 0.437, ω0 = 0.347, p1= 0.221, ω1 =
1.0, p2 = 0.342, ωns = 0.019, ωs = 0.138
-26771.3217 0.0001 0.002
CmC-catarrhini p0 = 0.436, ω0 = 0.35, p1= 0.22, ω1 = 1.0,
p2 = 0.344, ωnc = 0.027, ωc = 0.112
-26777.1301
0.076 0.42
CmC-greatapes p0= 0.46, ω0 = 0.33, p1= 0.23, ω1 = 1.0, p2 = 0.32, ωng = 0.021, ωg = 0.019
-26778.6965
0.93 0.80
CmC-hominini p0 = 0.32, ω0 = 0.021, p1= 0.23, ω1 = 1.0,
p2 = 0.46, ωnh = 0.33, ωh = 0.77
-26776.4205
0.03 0.30
M2a_rel p0 = 0.46, ω0 = 0.332, p1= 0.225, ω1 =
1.0, p2 = 0.32, ω2 = 0.213
-26778.7002
NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.3.5 Human polymorphisms and signatures of selection
Molecular evolutionary rate analysis revealed human STIL evolving at accelerated rate
(ω = 3.13) than its orthologous copy in ancestral hominini branch but not significant
(Table 3.3). To investigate whether the variation in the sequence of STIL within
human populations is congruent with neutrality, the genetic diversity was estimated by
using 1000 genome phase 1 data of 1092 humans from diverse ethnic group. Different
classical neutrality tests i.e., Fu and Li‘s D, D* (without outgroup), F, F* (without
outgroup) and Tajima‘s D (Fu & Li, 1993; Tajima, 1989) were employed on
polymorphisms located in coding sequence of 1092 individuals from different ethnic
group (Table 3.5). Results revealed that heterozygosity π (nucleotide diversity) for
STIL coding sequence in human is 0.00049 per site that is smaller as compared to the
nucleotide diversity whole human genome (0.000751) and chromosome 1 (0.000772)
(Sachidanandam, et al., 2001). All the aforementioned neutrality tests have significant
negative values (Table 3.5). Low heterozygosity and negative values of neutrality tests
significantly reject neutrality hypothesis and might indicate natural selection or
population expansion.
Chapter 3 Results
50
Table 3. 5: Tests for departure from neutrality through population’s variation data (1000
genome).
Test Statistics P value
With Chimpanzee as an outgroup
Tajima‘s D -2.75469 <0.001
Fu and Li‘s D -5.74963 <0.02
Fu and Li‘s F -5.59888 <0.02
Without an outgroup
Fu and Li‘s D* -6.15193 <0.02
Fu and Li‘s F* -4.98206 <0.02
P indicates the ‗probability value‘ that demonstrates the departure from null hypothesis (neutrality)
3.3.6 SWAKK analysis of STIL
SWAKK (sliding window analysis of Ka/Ks) analysis was employed on human and
chimpanzee coding sequences in order to pinpoint human STIL protein regions that
have been accelerated during recent history after its divergence from chimpanzee.
SWAKK graph indicated nine regions (R1-R9) where Ka-Ks difference exceeds one
and congruent with the pattern of positive selection (Figure 3.6). It further indicated
that rest of the portion of protein is under strong selective constraint (Figure 3.6). The
non-silent substitutions in R1 to R9 are classified according their predicted
physicochemical properties and impact on structure of STIL protein (Table 3.6). It
appears from ML ancestral sequence reconstruction that six substitution fixed
chimpanzee STIL protein and fourteen substitutions in humans since the divergence
from hominini ancestor (Table 3.6). Comparative analysis with inferred hominini
ancestor divulged that 10/14 substitutions in humans and 6/6 substitutions in
chimpanzee might have been implicated in structural and functional modification of
STIL protein (Table 3.6).Regions R1 to R4 experienced two neutral and five radical
changes in human (Table 3.6). R5 and R7 comprehend two radical substitutions within
chimpanzee and human protein (Table 3.6). Captivatingly, R6 contains more
evolutionary substitutions in contrast to other regions. R6 accomplished by two radical
changes in chimpanzee, and one neutral and two radical substitutions in human (Figure
3.6 and Table 3.6). Human contains one radical and one neutral change in R8 and R9
respectively. Whereas, chimpanzee comprehends two radical substitutions in R9
(Table 3.6) Thus, this investigation not only determined the amino acid substitutions
Chapter 3 Results
51
that have transpired independently in chimpanzee and human STIL protein since the
divergence from hominini ancestor approximately six to seven million years ago, but
also discriminate the substitutions that might have diminutive or no impact on the
structure/function of STIL and the ones that have probably involved in modifying the
structure/function of STIL during the last 6-7 million years in the course of
chimpanzee and human evolution.
Figure 3. 6: Sliding window analysis of STIL.
SWAKK graph displayed nine region R1-R9 where higher rate of non-silent substitutions have occurred
over neutral expectations i.e., Ka-Ks > 0.The dotted line depicts neutrality Ka-Ks = 0. Regions below
the dotted line indicate purifying selection Ka-Ks < 0.
Chapter 3 Results
52
Table 3. 6: Human and chimpanzee specific substitutions in STIL after the divergence from
hominini ancestor.
Ka-Ks > 0 Position Hominini
residue
Substitution in
Chimpanzee
Substitution in
Human
Neutral/Radical
Region-1
86 V A Neutral (0)
Region-2
268 I T Radical (-1)
289 R Q Neutral (0)
Region-3
385 P S Radical (-1)
Region-4
511 Y C Radical (-2)
522 V I Radical (3)
594 P L Radical (-3)
Region-5
616 P L Radical (-3)
672 D G Radical (-1)
Region-6
750 T M Radical (-1)
751 P T Radical (-1)
769 M T Radical (-1)
787 S G Neutral (0)
813 L M Radical (2)
Region-7
917 G E Radical (-2)
980 T R Radical (-1)
Region-8
1152 H Y Radical (2)
Region-9
1250 I V Radical (3) 1251 A T Neutral (0) 1262 T M Radical (-1)
Putative ancestral residues are constructed using maximum likelihood method. Physicochemical impact
and log-odd score in brackets for each amino acid replacement are illustrated in last column. Positive
numbers depicts a preferred replacement, negative numbers depicts an un-preferred replacement, and
zero depicts a neutral replacement.
3.3.7 Comparative analysis of STIL with archaic humans and modern
human populations
Comparative protein sequence analysis of human STIL with various non-human
placental mammals revealed fourteen human specific amino acid substitutions (Table
3.6). Furthermore, comparative protein sequence analysis was extended by comparing
the human STIL protein sequence with two archaic humans (Neandertals and
Denisovans) in order to determine how many of human specific amino acid changes
shared with archaic humans (Neandertals and Denisovans) and how many of them are
specific to anatomically modern humans. This analysis revealed that extinct archaic
humans, the Neandertals and Denisovans, share thirteen amino acid changes (I268T,
Chapter 3 Results
53
R289Q, P385S, Y511C, V522I, P594L, D672G, T750M, S787G and L813M) with
anatomically modern humans (hominin specific replacements). Only one substitution
V86A is specific to anatomically modern humans, whereas in this site archaic humans
contain hominini ancestral alleles.
Furthermore, in order to gain insight into the status of thirteen hominin specific and
one modern human specific amino acid substitutions in modern human populations,
we exploited the populations‘ variation data from 1000 Genomes Project phase1
(Abecasis, et al., 2012). These data show that, all hominin specific amino acid
replacements are fixed in modern human population. While the one modern human
specific (V86A) is polymorphic (rs3125630) in modern human populations.
3.4 Centrosomal Protein 135 (CEP135)
3.4.1 Evolutionary history of CEP135
Evolutionary history of CEP135 and its putative paralog TSGA10 (Testis specific 10)
was scrutinized through distance based neighbor joining (NJ) method (Figure 3.7).
Phylogenetic analysis revealed that human CEP135 paralogs originated by one
duplication event. The gene duplication event that split CEP135 and TSGA10 has
ocurred atleast prior to separation of chondrichthyes (cartilaginous fish) from
osteichthyes (bony vertebrates) and after vertebrates-invertebrates split (Figure 3.7).
Furthermore, phylogeny divulged that CEP135/TSGA10 putative ortholog was
originated during the earliest metazoan history (Figure 3.7). Bidirectional Blast hit
strategy was failed to identify any putative orthologs of CEP135 and TSGA10 in phyla
cephalochordata, nematode, arthropoda, cnidaria and placozoa.
Chapter 3 Results
54
Figure 3. 7: Evolutionary history of MCPH8 gene CEP135
Phylogenetic tree of CEP135 and its putative paralog was inferred using NJ method by implying JTT amino acid substitution model to compute evolutionary distances. All positions that contain gaps and
missing data are removed prior to tree reconstruction. Thirty eight amino acid sequences were used in
this analysis. The statistics present on the nodes represent bootstrap values that were estimated on the
basis of 500 pseudoreplicates. Bootstrap score greater than and equal to 50 were displayed on the
nodes only.
Chapter 3 Results
55
3.4.2 Estimation of pervasive signals of positive selection in CEP135 during
placental mammals
Three pair of site models (M1 & M2, M7 & M8, and M8a & M8) based on codon
substitutions were performed in order to examine whether the signals of positive
selection have operated on primates (18 species), nonprimate placental mammals (24
species) and all placental mammals (42 species) (Table 3.7). These site models
assume variable selective pressure ω among the amino acid sites of protein coding
gene across all species of the phylogeny. The signals of positive selection are
deliberated optimal if two out of three null models (M1, M7, and M8a) rejected in the
favor of more complex alternative model (M2 and M8). The one ratio site model (M0)
was also performed for all three datasets (primates, nonprimate placental mammals
and placental mammals) that revealed purifying selection dominated the evolution of
CEP135 throughout the eutherian with ω value ranging from 0.18-0.217(Table 3.7).
Parameter estimations and p value indicated that signals of positive selection were
found in primates and placental mammals with ω value 1.5 and 1.14 respectively but
by only one site pair model M7 & M8 (Table 3.7). Naïve empirical Bayes (NEB)
method implemented in M8 codon substitutions site model was pinpointed three and
twenty two positive selected sites in primates and placental mammals respectively
(Table 3.7).On the other hand, the LRT of M7 & M8 site pair model is less accurate
and yielded more false positive as compared to M8a & M8 and M1 & M2 site pairs
that‘s why above mentioned stringent criteria for positive selection was necessary for
this study. However, overall the results of site models suggested that no significant
signals of positive selection were found across the evolution of eutherian for CEP135
protein coding gene (Table 3.7).
3.4.3 Signature of positive selection by branch-site model
Branch-site model using codon based maximum likelihood method was performed to
test for episodic positive selection having acted on specific stages of evolution from
primate ancestral branch to modern human lineage (Table 3.8). Branch-site model
assume selective pressure ω to vary both across the branches and among the sites of
prespecified lineage. The significance of episodic selection was determined by
estimated the LRTs of null model (that is similar to branch-site model except ω2 = 1)
against the alternative branch-site model. Parameter estimations showed that sequence
accelerated at higher in primate ancestral branch with ω2 = 2.40 which suggest positive
Chapter 3 Results
56
selection but the LRTs values indicated that no significant signals of episodic
selections were found on the analyzed branches of CEP135 phylogeny (Table 3.8).
Table 3. 7: Selective pressure estimation and LRT for Mammals CEP135.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.21740 -10024.701547
M1: Nearly neutral ω0 = 0.07502, f0 = 0.81152, ω1
= 1.0, f1 = 0.18848
-9931.080961
0.58 1.0
M2:Positive
selection
ω0 = 0.08085, f0 = 0.82218, ω1
= 1.0, f1 = 0.1612, ω2 =
1.9284, f2 = 0.01665
-9930.537778
M7: p = 0.17565, q = 0.54351 -9934.038071 0.02 0.03
M8: p0 = 0.91865, p = 0.43353, q
= 2.41422, (p1 = 0.08135) ω =
1.50001 (10A, 598S, 1139M
NEB)
-9929.996489
M8a: p0 = 0.8167, p = 2.1971, q =
24.586, (p1 = 0.1833) ω =
1.0
-9931.075444 0.14 0.20
Nonprimate M0: One ratio ω = 0.184 -21043.22926
mammals M1: Nearly neutral ω0 = 0.068, f0 = 0.7883, ω1 =
1.0, f1 = 0.212
-20524.39690 1.0 1.0
M2:Positive
selection
ω0 = 0.0681, f0 = 0.7883, ω1 =
1.0, f1 = 01583, ω2 = 1.0, f2 =
0.0534
-20524.39690
M7: p = 0.2456, q = 0.8699 -20506.62095 3⁎10-4 1⁎10-3
M8: p0 = 0.8674, p = 0.432, q =
3.332(p1 = 0.1326) ω = 1.0
-20494.089869
M8a: p0 = 0.8562, p = 0.4137, q =
3.1877, (p1 = 0.14379) ω =
1.0
-20494.927683 0.20 0.24
Mammals M0: One ratio ω = 0.19050 -26901.5283
M1: Nearly neutral ω0 = 0.071, f0 = 0.776, ω1 =
1.0, f1 = 0.224
-26156.4468 1.0 1.0
M2:Positive
selection
ω0 = 0.071, f0 = 0.776, ω1 =
1.0, f1 = 0.168, ω2 = 1.0, f2 =
0.055
-26156.4468
M7: p = 0.253, q = 0.859 -26100.3429 2⁎10-9
4⁎10-8
M8: p0 = 0.899, p = 0.385, q =
2.37035, (p1 = 0.101) ω =
1.1404 (54S, 213Q, 221Q,
245L, 406S, 409L, 482P,
483P, 508R, 546S, 557S,
597N, 599V, 756L, 769V, 776L, 784T, 800S, 997S,
1004V, 1093N, 1130V NEB)
-26080.1784
M8a: p0 = 0.8641, p = 0.39485, q =
2.78179, (p1 = 0.13588) ω =
1.0
-26081.370241 0.12 0.19
Chapter 3 Results
57
Table 3. 8: Branch-site analysis of CEP135.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0.0003 0.98 0.98
Hominini 1 0 1 0.98
Homininae 1 0.0001 0.99 0.98
Hominidae 1 0.00002 1 0.98
Hominidea 1 0.00006 0.99 0.98
Catarrhini 1 0.000004 1 0.98
Simians 1 0.000002 1 0.98 Haplorhini 1 0 1 0.98
Primates 2.40 0.0488 0.83 0.98
3.4.4 Divergent selective pressure across CEP135 mammalian phylogeny
Clade model C (CmC) using codon based maximum likelihood method was performed
to estimate site-specific variable selective pressure between partitions of CEP135
mammalian phylogeny (Table 3.9). The significance of divergent selective constraint
between partitions of phylogeny was determined by estimated the LRTs of M2a_rel
null model against CmC-primates, simians, catarrhini, greatapes and hominini
partitions (Table 3.9). Parameter estimations and LRTs showed that no signatures of
divergent selection were found in any of analyzed partitions of CEP135 mammalian
phylogeny (Table 3.9). These observations suggest that function of CEP135 protein
coding gene was conserved throughout the eutherian evolution.
Table 3. 9: Divergent selection constraint parameters estimation and likelihood scores for
CEP135.
Model Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate
p0 = 0.560, ω0 = 0.0231, p1= 0.142, ω1 = 1.0,
p2 = 0.298, ωnp = 0.251, ωp = 0.315
-26080.0893 0.057 0.34
CmC-Simians p0 = 0.296, ω0 = 0.27, p1= 0.141, ω1 = 1.0,
p2 = 0.56, ωns = 0.023, ωs = 0.021
-26081.8471 0.75 0.66
CmC-catarrhini p0 = 0.293, ω0 = 0.271, p1= 0.141, ω1 = 1.0, p2 = 0.57, ωnc = 0.023, ωc = 0.0397
-26081.3823
0.31 0.58
CmC-greatapes p0 = 0.295, ω0 = 0.269, p1= 0.141, ω1 = 1.0,
p2 = 0.564, ωng = 0.0235, ωg = 0.0184
-26081.8801
0.85 0.66
CmC-hominini
p0 = 0.56, ω0 = 0.023, p1= 0.141, ω1 = 1.0,
p2 = 0.296, ωnh = 0.27, ωh = 0.59
-26081.1383
0.22 0.57
M2a_rel
p0 = 0.295, ω0 = 0.269, p1 = 0.0141, ω1 = 1.0,
p2 = 0.564, ω2 = 0.0235
-26081.8982 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
Chapter 3 Results
58
3.5 Zinc finger protein 335 (ZNF335)
3.5.1 Evolutionary history of ZNF335
Evolutionary history of ZNF335 was reconnoitered by encompassing the orthologous
protein sequences from the representative species of mammalia, aves, reptilia,
actinistia, osteichthyes and chondrichthyes using distance based neighbor joining (NJ)
method (Figure 3.8). The unconstrained phylogenetic tree revealed that ZNF335 gene
originated during the early history of vertebrates (Figure 3.8). Furthermore, phylogeny
also displayed that representative member of chondrichthyes Callorhinchus milii was
not consistent with the vertebrate phylogeny (Figure 3.8). However, this inconsistency
might have been either due to the Callorhinchus milii genome is the slowest evolving
genome in all extant vertebrates or might be because of osteichthyes genomes are
highly derived (Venkatesh et al., 2014). Bidirectional based similarity search strategy
was unable to identify any putative ortholog of human ZNF335 among
cephalochordate, hemichordate, echinodermata, arthropoda, nematoda, mollusca,
cnidaria, placozoa and porifera. No paralog of human ZNF335 was identified by
similarity search approaches and any public database. Absence of ortholog in
invertebrates and not identifying any paralog of human ZNF335 strengthen the
deduction of vertebrate specific origin of ZNF335.
3.5.2 Molecular evolution of ZNF335 in mammals by site models
To detect selective pressure across primates (18 species), nonprimate mammals (27
species) and placental mammals (45 species), codon substitution site models were
used. These site models allow selective pressure (ω) to vary among sites and
prohibited among lineages. According to the result of one ratio model that assume
average ω ratio for all sites in protein revealed dominating role of purifying selection
in the evolution of ZNF335 in primates, nonprimate mammals and all placental
mammals (Table 3.10). In order to detect variable selective pressure and positive
selection on individual codon, three different pairs of site models (M1/ M2, M7/M8
and M8a/M8) were used. For this study, positive selection considered if two out of
these three pairs significantly reject neutrality. After deleting gaps, 3993 sites were
considered for site analysis in primates. The LRT statistics was significant for only
M7/M8 comparison while other two comparisons (M1/M2 and M8a/M8) do not
support the adaptive sequence evolution of ZNF335 in primates (Table 3.10). In
Chapter 3 Results
59
nonprimate mammals, 3282 sites were examined. The site pairs M7/M8 and M8a/M8
models shows evidence for positive selection across nonprimate mammals (Table
3.10). In total, I found only 2% of positive selected sites in nonprimate mammals with
posterior probability of atleast greater than 95% (Table 3.10). In combined data of
primates and nonprimate placental mammals, 3795 sites were analyzed after removal
of gaps. For mammals, using M7/M8, M8a/M8 significant signature of pervasive
positive selection was found on subset of sites (Table 3.10). Parameters estimation
indicates that 2% of sites are under positive selection with ω value 1.4 (Table 3.10).
Figure 3. 8: Phylogenetic tree of MCPH10 gene ZNF335 using NJ approach
Chapter 3 Results
60
The evolutionary distance was computed JTT matrix based method and branch lengths were drawn with
the same units of evolutionary distance that was used to infer phylogenic tree. Twenty nine orthologous
protein sequences of human ZNF335 were used in this analysis. All positions that contain gaps and
missing data are eradicated. The statistics present on the nodes represent bootstrap score. Bootstrap
value less than 50 are not shown here.
Table 3. 10: Parameter estimation and LRT for Mammals ZNF335.
Data group Model Parameter estimation (ω) Log likelihood
value
P
value
q
value
Primates M0: One ratio ω = 0.10153 -12084.684867
M1: Nearly neutral ω0 = 0.054, f0 = 0.93, ω1 =
1.0, f1= 0.066
-12016.245554 1.0 1.0
M2:Positive
selection
ω0 = 0.054, f0 = 0.93, ω1 =
1.0, f1 = 0.05392, ω2 = 1.0, f2
= 0.01249
-12016.245554
M7: p = 0.17006, q = 1.28065 -12021.052413 0.004 0.007
M8: p0 = 0.97137, p = 0.41625,
q = 4.69287, (p1 = 0.02863)
ω = 1.45597 (62L, 974V
BEB, 62L, 974V NEB)
-12015.542921
M8a: p0 = 0.9415, p = 0.8569, q =
12.3542, (p1 = 0.0586) ω =
1.0
-12015.951088 0.37 0.38
Nonprimate M0: One ratio ω = 0.07781 -26830.40982
mammals M1: Nearly neutral ω0 = 0.045, f0 = 0.92, ω1 =
1.0, f1= 0.084
-26440.22043 1.0 1.0
M2:Positive
selection
ω0 = 0.045, f0 = 0.92, ω1 =
1.0, f1= 0.082, ω2 = 1.0, f2 =
0.0024
-26440.22043
M7: p = 0.21, q = 1.8333 -26314.22379 5⁎10-4 1⁎10-3
M8: p0 = 0.97249, p = 0.27027, q
= 3.3096, (p1 = 0.02751) ω
= 1.24 (198V, 932T)
-26302.159482
M8a: p0 = 0.9647 p = 0.248 q
= 2.8165, (p1 = 0.0353) ω
= 1.0
-26306.829686 0.002 0.008
Mammals M0: One ratio ω = 0.0793 -33009.5808
M1: Nearly neutral ω0 = 0.044, f0 = 0.905, ω1 =
1.0, f1= 0.094
-32513.4631 1.0 1.0
M2:Positive
selection
ω0 = 0.044, f0 = 0.905, ω1 =
1.0, f1= 0.094, ω2 = 55.0,
f2=0
-32513.4631
M7: p = 0.208, q = 1.765 -32303.6612 1⁎10-4 5⁎10-4
M8: p0 = 0.978, p = 0.248, q =
2.7792, (p1 = 0.0215) ω =
1.2 (198V, 932T)
-32289.9547
M8a: p0 = 0.97035, p = 0.23367,
q = 2.33818, (p1 = 0.02965)
ω = 1.0
-32299.26404 2⁎10-3 0.0002
Chapter 3 Results
61
3.5.3 Signatures of episodic positive selection at various evolutionary stages
from ancestral primate to human terminal branch
To order to detect pattern of episodic positive selection at diverse evolutionary epochs
from ancestral primate branch to human terminal branch, branch-site test was
implemented to ZNF335 coding sequence alignment of 45 placental mammal species
(Table 3.11). Branch-site analysis calculation of LRT divulged that no significant
signature of positive selection was found in any branch analyzed (Table 3.11).
However, the higher rate of sequence acceleration (ratio of nonsynonymous to
synonymous ratio (ω) exceeds 1) was found in hominini ancestral branch as compared
to human and other ancestral branches analyzed but LRT statistics indicate that the
difference is not significant between null model and selection model (Table 3.11).
Table 3. 11: Branch-site analysis of ZNF335.
Branch ω2 LRT P value q value Positive selected
sites
Human 1.0 0.000002 1 0.98
Hominini 3.95 0.0305 0.86 0.98
Homininae 1.0 0 1 0.98
Hominidae 1.0 0.000002 1 0.98
Hominidea 1.0 0 1 0.98
Catarrhini 1.0 0 1 0.98
Simians 1.0 0 1 0.98
Haplorhini 1.0 0 1 0.98
Primates 1.0 0 1 0.98
3.5.4 Divergent selection pressure between different partitions of
mammalian phylogeny
The patterns of divergent selective pressure between partitions of mammalian
phylogenetic tree were determined by implemented the codon substitutions clade
model C (CmC) (Table 3.12). This model was concerned to those sites that have
evolved under different selection constraint between clades, and positive selection is
not essential for those sites. The significance of functional divergence was determined
by likelihood ratio test (LRT) (Table 3.12). Likelihood ratio tests (LRTs) of M2a_rel
(null model) against CmC_ primates, simians, catarrhini, greatapes, and hominini
indicated no significant patterns of functional divergence observed in any partition of
ZNF335 mammalian phylogenetic tree (Table 3.12). However, this provides strong
evidence that function of ZNF335 is unaltered in the evolutionary history of placental
mammals.
Chapter 3 Results
62
Table 3. 12: Divergent selection constraint parameters estimation and likelihood scores for
ZNF335.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.75, ω0 = 0.015, p1= 0.029, ω1 =
1.0, p2 = 0.221, ωnp = 0.255, ωp = 0.285
-32294.0420
0.30 0.58
CmC-simians
p0 = 0.751, ω0 = 0.0155, p1= 0.029, ω1 =
1.0, p2 = 0.22, ωns = 0.256, ωs = 0.312
-32293.5981 0.16 0.57
CmC-catarrhini p0 = 0.751, ω0 = 0.0154, p1= 0.029, ω1 =
1.0, p2 = 0.22, ωnc = 0.266, ωc = 0.27
-32293.5586
0.15 0.57
CmC-greatapes
p0 = 0.75, ω0 = 0.0154, p1= 0.029, ω1 =
1.0, p2 = 0.22, ωng = 0.259, ωg = 0.297
-32294.4826
0.66 0.66
CmC-hominini p0 = 0.75, ω0 = 0.0154, p1= 0.029, ω1 =
1.0, p2 = 0.22, ωnh = 0.26, ωh = 0.25
-32294.5720
0.92 0.69
M2a_rel p0 = 0.75, ω0 = 0.015, p1 = 0.029, ω1 =
1.0, p2 = 0.221, ω2 = 0.26
-32294.5769 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.6 Polyhomeotic homolog 1 (PHC1)
3.6.1 Phylogenetic analysis of PHC1
Evolutionary history and relationship between PHC1 and its putative paralogs PHC2
and PHC3 was studied by including the protein sequence of phyla mollusca,
arthropoda, hemichordata, cephalochordata and vertebrata using neighbor joining (NJ)
method (Figure 3.9). Phylogenetic analysis imparted that two gene duplication events
are responsible for the expansion of polyhomeotic homolog family in vertebrates
(Figure 3.9). The tree topology further unveiled first duplication event has betided
during the earliest evolutionary history of vertebrates, atleast prior to chondrichthyes-
osteichthyes split and responsible for the deduction of PHC3 paralog and ancestral
PHC1/PHC2 gene (Figure 3.9). The second duplication diverged PHC2 and PHC1and
has occurred prior to sarcopterygii-actinopterygii split but after the divergence of
osteichthyes from chondrichthyes (Figure 3.9). From the tree topology, it is appeared
that PHC1 and PHC2 are closely related genes, whereas PHC3 is the most ancient
gene and this gene family originated at the root of bilateria (Figure 3.9).
Chapter 3 Results
63
Figure 3. 9: Phylogenetic tree of human PHC1 and its putative paralogs.
The evolutionary history of human PHC1 and its putative paralogs was inferred through NJ method
based on evolutionary distances computed by JTT matrix based method. Thirty seven amino acid
sequences were employed for tree reconstruction. All positions that contain gaps and missing data are
removed. The statistics present on the nodes portray bootstrap value. Bootstrap value greater than and
equal to 50 is shown here.
Chapter 3 Results
64
3.6.2 Molecular evolution of PHC1 by site models
Forty six protein coding sequences of PHC1 were retrieved by bidirectional BLAST
search in Ensembl and NCBI database for placental mammals (twenty sequences are
from primate species and twenty six sequences are from nonprimate placental
mammals. Six codon substitution site models (M0, M1, M2, M7, M8, and M8a) were
used to analyze the protein coding sequence of PHC1 in three data groups i.e.,
primates, nonprimate placental mammals and placental mammals (Table 3.13). The
one ratio model (M0) for all three data groups indicated that purifying selection
dominated the evolution of PHC1 with estimates ω values ranging from 0.094-0.123
(Table 3.13). The M0 site model estimated the average selective pressure on overall
sites of protein coding gene while other five codon substitutions site model postulate
that selective pressure does not operate at constant rate on the whole protein coding
sequence but vary among codon sites of protein coding gene. Likelihood ratio tests
(LRTs) were calculated by comparing three pairs of site models (M1 & M2, M7 &M8,
and M8a & M8) in order to identified variable selective pressure on individual codon
site and patterns of positive selection on the above mention three data groups
(primates, nonprimate placental mammals and placental mammals) (Table 3.13).
Positive selection was considered optimal only if two out of three site pair
comparisons reject simplest neutral model (M1, M7 and M8a) in the favor of nested
complex alternative models (M2 and M8). The p values calculated from LRTs values
by chi square tables (Table 3.13). The LRTs values indicated that none of above
mentioned data group (primates, nonprimate placental mammals and placental
mammals) experienced significant signature of positive selection by three site pair
models (Table 3.13). Although the ω value for nonprimate placental mammals
estimated by M8 codon substitution site model exceeded by one which suggest
positive selection and also identified one site under positive selection with p value >=
0.05 (Table 3.13). But these signatures of positive selection are not significant when
corrected the p value for false discovery rate q value (Table 3.13).
3.6.3 Episodic Selection at PHC1 mammalian phylogeny
In first approach, codon substitution site models indicated that positive selection does
not operated on the PHC1 primates, nonprimate placental mammals and placental
mammals data groups. These models allow selective pressure to vary only among sites
Chapter 3 Results
65
of protein coding genes but not across the braches of phylogeny (Table 3.14). But
positive selection also operated at specific evolutionary time points and at specific
species which are not identified by codon substitutions site models.
Table 3. 13: Parameter estimation and LRT for Mammals PHC1.
Data group Model Parameter estimation (ω) Log
likelihood
value
P
value
q
value
Primates M0: One ratio ω = 0.12368 -7903.776034
M1: Nearly neutral ω0 = 0.058, f0 = 0.92124, ω1 =
1.0, f1 = 0.07876
-7880.579564 1.0 1.0
M2:Positive
selection
ω0 = 0.058, f0 = 0.9212, ω1 = 1.0,
f1 = 0.025, ω2 = 1.0, f2 = 0.054
-7880.579564
M7: p = 0.15432, q = 1.03425 -7879.271927 1.0 0.83
M8: p0 = 0.9999, p = 0.15434, q =
1.03436, (p1 = 0.00001) ω = 1.0
-7879.271936
M8a: p0 = 0.97758, p = 0.11086, q =
0.77998,(p1 = 0.02242) ω =
1.0
-7879.250221 0.83 0.75
Nonprimate M0: One ratio ω = 0.09428 -17273.88625
mammals M1: Nearly neutral ω0 = 0.0432, f0 = 0.899, ω1 = 1.0,
f1 = 0.1001
-17095.19254 1.0 1.0
M2:Positive
selection
ω0 = 0.04319, f0 = 0.899, ω1 =
1.0, f1 = 01001, ω2 = 34.10, f2 =
0
-17095.19255
M7: p = 0.1873, q = 1.582 -17042.51078 0.37 0.35
M8: p0 = 0.9966, p = 0.1958, q =
1.738,(p1 = 0.0034) ω = 1.414
(478I BEB)
-17041.52137
M8a: p0 = 0.9817, p = 0.1783, q =
1.5162(p1 = 0.0183) ω = 1.0
-17043.89238 0.03 0.07
Mammals M0: One ratio ω = 0.098 -21063.3796
M1: Nearly neutral ω0 = 0.047, f0 = 0.898, ω1 = 1.0,
f1 = 0.102
-20818.3199 1.0 1.0
M2:Positive
selection
ω0 = 0.047, f0 = 0.898, ω1 = 1.0,
f1 = 0.102, ω2 = 58.0, f2 = 0
-20818.3199
M7: p = 0.197, q = 1.59 -20741.9953 0.08 0.08
M8: p0 = 0.982, p = 0.226, q = 2.20 (p1 = 0.0175) ω = 1.0
-20739.4396
M8a: p0 = 0.97385, p = 0.20298, q =
1.80393, (p1 = 0.02615) ω =
1.0
-20741.87595 0.03 0.07
Codon substitutions branch-site model was implemented to estimate the signature of
positive selection at specific evolutionary stages from primate ancestral branch to
modern humans and also among sites of protein coding genes of prespecified branch
(Table 3.14). The significance of this test was determined by comparing simplest null
model (that assume ω2 is fixed to one) and alternative model branch-site model (that
assume ω2 is greater than and equal to one). False discovery rate q value correction
Chapter 3 Results
66
over p value was used to eradicate the false positive results in branch-site model. The
LRTs and q value suggested that there is not any single analyzed branch of PHC1 was
under positive selection (Table 3.14). Although the ω2 for haplorhini ancestral branch
is greater than one (ω2 = 14.927) but it is not significant when compared to null model
(Table 3.14).
Table 3. 14: Branch-site analysis of PHC1.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0.000002 1 0.98
Hominini 1 0 1 0.98
Homininae 1 0 1 0.98
Hominidae 1 0.000002 1 0.98
Hominidea 1 0.00002 1 0.98 No positive
Catarrhini 1 0.0076 0.93 0.98 selected site
Simians 1 0.2901 0.59 0.98
Haplorhini 14.927 0.4681 0.49 0.98
Primates 1 0.00006 0.99 0.98
3.6.4 Divergent selective constraints across PHC1 mammalian phylogeny
The signature of variable selection pressure across the PHC1 mammalian phylogeny
was estimated by clade model C (CmC) (Table 3.15). Clade model C assumes
variation in sites-specific selective pressure between predefined partitions of
phylogeny i.e., background branches and foreground branches. In case of PHC1
mammalian phylogeny, variations in sites-specific selective pressure were estimated
for five foreground branches (primates, simians, catarrhini, greatapes, and hominini)
(Table 3.15). The significance of CmC was determined by calculating LRTs of
M2a_rel (null model that estimates only one ω2 for all branches of phylogeny) against
CmC-primates, simians, catarrhini, greatapes and hominini (Table 3.15). False positive
results for CmC were eliminated by correcting false discovery rate q value over p
value (Table 3.15). Parameter estimations and LRTs suggested no signature of
divergent selection among the analyzed partitions of PHC1 mammalian phylogeny
(Table 3.15). However, these observations indicated that the function of PHC1 coding
gene was conserved throughout the evolution of eutherian mammals.
Chapter 3 Results
67
Table 3. 15: Divergent selection constraint parameters estimation and likelihood scores for
PHC1.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.784, ω0 = 0.020, p1= 0.021, ω1 =
1.0, p2 = 0.195, ωnp = 0.339, ωn = 0.395
-20737.4013
0.29 0.58
CmC-simians p0 = 0.194, ω0 = 0.348, p1= 0.021, ω1 =
1.0, p2 = 0.78, ωns = 0.020, ωs = 0.018
-20737.9248
0.82 0.66
CmC-catarrhini
p0 = 0.78, ω0 = 0.020, p1= 0.021, ω1 =
1.0, p2 = 0.19, ωnc = 0.347, ωc = 0.364
-20737.9367
0.86 0.66
CmC-greatapes p0 = 0.78, ω0 = 0.020, p1= 0.021, ω1 =
1.0, p2 = 0.19, ωng = 0.349, ωg = 0.179
-20737.3879 0.29 0.58
CmC-hominini p0 = 0.78, ω0 = 0.020, p1= 0.021, ω1 =
1.0, p2 = 0.19, ωnh = 0.35, ωh = 0.13
-20737.5787
0.38 0.65
M2a_rel
p0 = 0.784, ω0 = 0.020 p1= 0.021, ω1 =
1.0, p2 = 0.194, ω2 = 0.348
-20737.9512 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.7 Cyclin Dependent Kinase 6 (CDK6)
3.7.1 Phylogenetic analysis of CDK6
Human CDK6 has one putative paralog CDK4 and evolutionary relationship between
these two paralog was probed by reconstructing the phylogenetic tree using neighbor
joining method (Figure 3.10). Phylogenetic tree revealed that gene duplication event
diverged CDK6 and CDK4 prior to osteichthyes-chondrichthyes split but after the
divergence of gnathostomata from cyclostomata (Figure 3.10). Bidirectional similarity
based approach was incapable to recognize the any putative ortholog of human
CDK4/CDK6 in phyla porifera. Phylogeny was further shown that CDK4/CDK6 gene
family originated at the root of parahoxozoa (placozoa, cnidaria and bilateria)
approximately 680 million years ago (Figure 3.10).
3.7.2 Molecular evolution of CDK6 by site model
Codon substitutions site models using maximum likelihood method were executed to
analyze the direction of natural selection operating on protein coding gene of
microcephaly locus 12 (Table 3.16). The value of selective pressure ω (ratio of
nonsynonymous to synonymous substitutions rate) indicates the direction of natural
selection such as ω = 1, ω < 1, and ω > 1 denote neutral evolution, negative selection,
and positive selection respectively. Site models were performed on three datasets of
CDK6 protein coding gene i.e., primates (17 sequences), nonprimate placental
mammals (26 sequences) and placental mammals (43 sequences). The one ratio model
Chapter 3 Results
68
(M0) indicates the overall strength and direction of natural selection operating on
whole sequence of protein coding gene. The selective pressure ω measure by M0 site
model revealed that high negative constraint acting on CDK6 protein coding gene
throughout the evolution of eutherian (Table 3.16). Generally, it is acceptable that
natural selection acting differently on the every site of protein coding gene and
positive selection was operating on only a few sites that were not measure by M0 site
model. Three site pair models (M1 & M2, M7 & M8 and M8a & M8) were used to
measure positive selection (Table 3.16). Parameter estimations and p value revealed
that one site are under positive selection in placental mammals by one site pair model
M7 & M8 (Table 3.16). But according to stringent criteria defined for this study atleast
two out of three site pair models are required in the favor of positive selection.
However, overall these results suggest that no signals of positive selection were found
on CDK6 protein coding gene in primates, nonprimate placental mammals and
placental mammals‘ datasets.
3.7.3 Episodic positive selection on CDK6 phylogeny
In order to identify the evidence of adaptive evolution specific to the primate ancestral
lineage to modern human with the hypothesis that positive selection acting on CDK6
protein coding gene of above mentioned lineages might have contributed to the
prefrontal cortex expansion that start to occurred since the common ancestor of
primates. For this purpose, branch-site test using codon substitutions based maximum
likelihood method was performed on specific evolutionary stages of CDK6
mammalian phylogeny (Table 3.17). The significance of positive selection was
determined by estimating the LRTs of simplest null model (that is similar to branch-
site model except fixed ω2 = 1 for lineage of interest) against the most complex
alternative branch-site models (Table 3.17). False positive results were eradicated by q
value correction over p value. LRTs values revealed that no signals of positive
selection were found on primate ancestral branch to modern human branch (Table
3.17).
Chapter 3 Results
69
Figure 3. 10: Evolutionary history of MCPH12 gene CDK6.
The phylogenetic tree of CDK6 was reconstructed through NJ method based on evolutionary distance
computed by JTT matrix based method. Thirty three amino acid sequences were employed for this analysis. All positions that contain gaps and missing data were eradicated. The statistics present on the
nodes indicate bootstrap value. Bootstrap value < 50 were not presented here.
Chapter 3 Results
70
Table 3. 16: Parameter estimation and LRT for Mammals CDK6.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.04042 -2035.596227
M1: Nearly neutral ω0 = 0.01613, f0 = 0.97157, ω1 = 1.0, f1 = 0.02843
-2032.561027
1.0 1.0
M2:Positive
selection
ω0 = 0.01613, f0 = 0.97157, ω1
= 1.0, f1 = 0.01248, ω2 = 1.0, f2
= 0.01595
-2032.561027
M7: p = 0.01416, q = 0.27849 -2032.354878 0.99 0.83
M8: p0 = 0.9999, p = 0.01366, q =
0.26870, (p1 = 0.00001) ω = 1.0
-2032.354826
M8a: p0 = 0.9746, p = 0.0358, q =
0.535, (p1 = 0.0254) ω = 1.0
-2032.534776 0.55 0.38
Nonprimate M0: One ratio ω = 0.0493 -4466.545497
mammals M1: Nearly neutral ω0 = 0.0238, f0 = 0.9608, ω1 =
1.0, f1 = 0.039
-4424.20087 1.0 1.0
M2:Positive selection
ω0 = 0.0238, f0 = 0.9608, ω1 = 1.0, f1 = 0.392, ω2 = 30.87, f2 =
0
-4424.20087
M7: p = 0.09732, q = 1.477 -4425.17022 0.04 0.04
M8: p0 = 0.867, p = 0.1736, q =
4.386, (p1 = 0.2095) ω = 1.0
-4421.95742
M8a: p0 = 0.97269, p = 0.15433, q = 3.44603, (p1 = 0.02731) ω = 1.0
-4422.36299 0.37 0.38
Mammals M0: One ratio ω = 0.04735 -5214.63577
M1: Nearly neutral ω0 = 0.0228, f0 = 0.958, ω1 =
1.0, f1 = 0.0416
-5165.32948 1.0 1.0
M2:Positive
selection
ω0 = 0.0228, f0 = 0.958, ω1 =
1.0, f1 = 0.0416, ω2 = 28.35, f2
= 0
-5165.32948
M7: p = 0.0929, q = 1.452 -5160.54685 0.02 0.03
M8: p0 = 0.99156, p = 0.11638, q =
2.42160, (p1 = 0.00844) ω =
1.3445 (302Y NEB, 302Y
BEB)
-5156.49740
M8a: p0 = 0.97951, p = 0.04652, q = 0.61023,(p1 = 0.02049) ω = 1.0
-5157.508379 0.16 0.21
Table 3. 17: Branch-site analysis of CDK6.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0.00006 0.99 0.98
Hominini 1 0.000002 1 0.98
Hominidae 2.94 0.00001 1 0.98
Hominidea 1 0.000004 1 0.98
Catarrhini 1 0.000004 1 0.98
Simians 1 0.000002 1 0.98
Primates 1 0 1 0.98
Chapter 3 Results
71
3.7.4 Divergent selective constraint across CDK6 mammalian phylogeny
As mentioned above, signals of Darwinian positive selection were not found in CDK6
protein coding gene throughout the evolution of eutherian animals both by site and
branch-site models (Table 3.18). It is not necessary that phenotypic change such as
brain expansion occurred only if protein coding gene evolved adaptively. However,
phenotypic change may occur due to variable selective pressure acting on orders and
suborders of class eutherian. The patterns of divergent selective constraint across the
mammalian phylogeny were determined by performing clade model C (CmC) (Table
3.18). The significance of divergent selection constraint between different partitions of
phylogeny was determined by calculating the likelihood ratio tests (LRTs) of null
model M2a_rel against CmC-primate, simians, catarrhini, greatapes, and hominini
from log likelihood score of each test (Table 3.18). The parameters estimation and
LRTs indicated no patterns of divergent selection were found across CDK6
mammalian phylogeny (Table 3.18). These observations suggest that CDK6 protein
coding gene have conserved function throughout the evolution of eutherian animals.
Table 3. 18: Divergent selection constraint parameters estimation and likelihood scores for CDK6.
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.8 SAS-6 centriolar assembly protein (SASS6)
3.8.1 Evolutionary history of SASS6
The paralog of human SASS6 was not identified by similarity based approaches and in
the databases. So, the phylogenetic tree of SASS6 gene was reconstructed by including
protein orthologous sequences from the phyla porifera, mollusca, annelida,
urochordata and vertebrata (Figure 3.11). The vertebrate clade was outgrouped by
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.895, ω0 = 0.011, p1= 0.012, ω1 =
1.0, p2 = 0.0926, ωnp = 0.298, ωp = 0.257
-5156.14749 0.76
0.66
CmC-Simians p0 = 0.897, ω0 = 0.012, p1= 0.014, ω1 =
1.0, p2 = 0.088, ωns = 0.271, ωs = 0.428
-5155.82423 0.39 0.65
CmC-catarrhini p0 = 0.902, ω0 = 0.012, p1= 0.011, ω1 =
1.0, p2 = 0.087, ωnc = 0.32, ωc = 0.16
-5156.0186 0.55 0.66
CmC-greatapes p0 = 0.089, ω0 = 0.298, p1= 0.012, ω1 =
1.0, p2 = 0.899, ωng = 0.012, ωg = 0.00
-5156.0817 0.63 0.66
CmC-hominini p0 = 0.089, ω0 = 0.296, p1= 0.012, ω1 =
1.0, p2 = 0.898, ωnh = 0.012, ωh = 0.00
-5156.1273 0.71 0.66
M2a_rel p0 = 0.897, ω0 = 0.011, p1= 0.012, ω1 =
1.0, p2 =0.0906, ω2 = 0.294
-5156.19494 NA NA
Chapter 3 Results
72
Ciona intestinalis with 99% bootstrap value. Branch lengths of teleost species are
longer as compared to other vertebrates‘ species, suggesting that SASS6 might have
been rapidly evolved in teleost in comparison with sarcopterygian (Figure 3.11).
Phylogeny further revealed that SASS6 originated during the early metazoan history
approximately 760 million years ago. Bidirectional blast best hit strategy was unable to
identify any putative paralog in cnidaria and placozoa.
Figure 3. 11: Evolutionary history of human SASS6 gene.
The phylogenetic tree of SASS6 was reconstructed using NJ method based on evolutionary distance
computed by JTT matrix based method. Twenty seven orthologous amino acid sequences of SASS6 were
used in this analysis. All those positions that contain missing data and gaps were eradicated prior to
phylogenetic tree reconstruction. The statistics at the nodes represent bootstrap score that was
established on the basis of 500 replicates. Scalar line denotes amino acid substitution per site.
Chapter 3 Results
73
3.8.2 Molecular Evolution of SASS6 in mammals by site models
After a comprehensive bidirectional blast search in Ensembl and NCBI database, we
retrieved SASS6 orthologous coding sequences (CDS) for 44 placental mammals‘
species from which 19 sequences are primates and 25 sequences are nonprimate
placental mammals. Whether some sites in SAAS6 sequences are adaptively evolved
in three datasets of mammals i.e., primates, nonprimate placental mammals and
placental mammal are still unknown. To detect the signature of positive selection on
individual sites while neglecting the branches of phylogeny we implemented codon
substitutions site models (M0, M, M2, M7, M8, and M8a) separately on the above
mentioned three datasets (Table 3.19). The estimated ω ratio in one ratio model (M0)
for all three datasets (primates, nonprimate mammals and placental mammals) point
out that purifying selection dominated in the evolution of SASS6 (Table 3.19).
However, this estimation is based on average overall sites in coding sequence and all
branches in the phylogenetic tree, but in actual selection constraint vary on individual
site and only few sites on the sequences are adaptively evolved. To check the positive
selection calculated the LRTs for three pairs of models M1 vs. M2, M7 vs. M8, and
M8a vs. M8 for above mentioned three datasets (Table 3.19). The signature of positive
selection considered optimal only if two out of three pairs significantly reject neutral
model in the favor of alternative models. The LRTs indicated significant signatures of
positive selection only in nonprimate placental mammal‘s dataset through M7 vs. M8
and M8a vs. M8 (Table 3.19). Although the signatures of positive selection also
identified in primates and placental mammals but only with one pair model M7 vs.
M8. The LRT M7 vs. M8 was considered less conservative and accurate as compared
to M1 vs. M2 and M8a vs. M8. Sites under positive selection in nonprimate placental
mammal‘s dataset were determined by Bayes empirical Bayes (BEB) and Naïve
empirical Bayes (NEB) methods implemented in M8 codon substitutions site model
(Table 3.10). Four sites with NEB and one site with BEB were identified under
positive selection with ω value 2.1637 (Table 3.19).
Chapter 3 Results
74
Table 3. 19: Parameter estimation and LRT for Mammals SASS6.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.15291 -5765.65349
M1: Nearly neutral ω0 = 0.07856, f0 = 0.90514, ω1 = 1.0, f1 = 0.09486
-5727.58667
0.2 1.0
M2:Positive
selection
ω0 = 0.0827, f0 = 0.912, ω1 = 1.0,
f1 = 0.085, ω2 = 4.02, f2 = 0.0045
-5725.97915
M7: p = 0.2189, q = 1.056 -5730.802537 0.009 0.01
M8: p0 = 0.98602, p = 0.38308, q =
2.27271,(p1 = 0.01398) ω =
2.7464 (189N NEB BEB)
-5726.072798
M8a: p0 = 0.9064, p = 8.644, q =
98.99 (p1 = 0.09362) ω = 1.0
-5727.586436 0.08 0.13
Nonprimate M0: One ratio ω = 0.14845 -11890.58065
mammals M1: Nearly neutral ω0 = 0.0577, f0 = 0.8386, ω1 =
1.0, f1 = 0.1614
-11639.96481 1.0 1.0
M2:Positive selection
ω0 = 0.0577, f0 = 0.8386, ω1 = 1.0, f1 = 010492, ω2 = 1.0, f2 =
0.0565
-11639.96481
M7: p = 0.2024, q = 0.94835 -11614.68495 2⁎10-3 4⁎10-3
M8: p0 = 0.984, p = 0.2425, q =
1.36288, (p1 = 0.016) ω =
2.1637 (575V BEB, 99A, 520S, 546T, 575V NEB)
-11603.89414
M8a: p0 = 0.9197, p = 0.2736, q =
2.1856 (p1 = 0.0803) ω = 1.0
-11607.49238 0.007 0.02
Mammals M0: One ratio ω = 0.1441 -15073.1713
M1: Nearly neutral ω0 = 0.066, f0 = 0.852, ω1 = 1.0,
f1 = 0.148
-14764.4178 1.0 1.0
M2:Positive
selection
ω0 = 0.066, f0 = 0.852, ω1 = 1.0,
f1 = 0.148, ω2 = 24.78, f2 = 0
-14764.4178
M7: p = 0.237, q = 1.12056 -14717.7196 3⁎10-4 1⁎10-3
M8: p0 = 0.968, p = 0.308, q =
1.1978 (p1 = 0.032) ω = 1.4611
(99A, 189M, 494A, 520S,546T
NEB)
-14705.1686
M8a: p0 = 0.9265, p = 0.32266, q =
2.51318, (p1 = 0.07350) ω =
1.0
-14706.7643 0.07 0.13
3.8.3 Signature of episodic positive selection at SASS6 mammalian
phylogeny
The above codon substitutions site models predicted selective pressure that vary
among sites across the phylogeny but selective pressure may also vary among the
branches of the phylogeny. Darwinian positive selection could be take place only at
specific evolutionary stages or at specific species of phylogeny and only affected few
sites in a protein coding sequence with ω ratio great than one. The codon substitutions
branch-site model was implemented in order to detect signature of transient positive
Chapter 3 Results
75
selection in ancestral primate‘s branch to human terminal branch (Table 3.20). The
significance of branch-site model was determined by calculating the LRTs for null
model that fixed ω2 = 1 against prespecified branch-site model (Table 3.20). False
positive results for branch-site model were controlled by calculating false discovery
rate q value over p value. These result indicated that no significant signatures of
positive selection were took place at any SASS6 ancestral branch analyzed to modern
human terminal branch (Table 3.20). These branch-site results are similar to above site
models calculations for primate‘s dataset of SASS6 that also suggested no signature of
positive selection in primates (Table 3.19 and Table 3.20).
Table 3. 20: Branch-site analysis of SASS6.
Branch ω2 LRT P value q value Positive selected
Sites
Human 1 0 1 0.98
Hominini 4.64 0.1646 0.68 0.98
Hominidae 1 0.0917 0.76 0.98
Hominidea 1 0 1.0 0.98
Catarrhini 1 0.0689 0.79 0.98
Simians 1 0.2159 0.64 0.98
Haplorhini 1 0.00008 0.99 0.98
Primates 1 0.000002 1 0.98
3.8.4 Divergent selective constraints between partitions of SASS6
mammalian phylogeny
Positive selection is not necessarily required for functional divergence at specific
stages of evolution; variation in site specific selective pressure between different
partitions of phylogeny might be responsible for divergence pattern and adaptive
function in evolution. Clade model C (CmC) was used to detect such complex forms of
divergence in selective pressure between clades or partitions of SASS6 mammalian
phylogeny (Table 3.21). The significant of CmC was determined by calculating LRTs
for M2a_rel against CmC-primates, simians, catarrhini greatapes and hominini. The
LRTs indicated that divergent selective pressure between simians (ω = 0.499) and
nonsimians placental mammals (ω = 0.214) and between hominini (ω = 1.082) and
nonhominini placental mammals (0.26) (Table 3.21). Parameter estimations under
CmC-simians recommended that large proportion of sites (58%) evolving under strong
negative selection with ω value 0.013, 8% of sites neutrally evolved and 34% of sites
evolving under divergent selective pressure between simians (ω = 0.499) and
nonsimians placental mammals (ω = 0.214) (Table 3.21). Though parameter
estimations point out big difference in selective pressure between hominini and
Chapter 3 Results
76
nonhominini placental but the difference is not significant after p value correction for
false discovery rate (Table 3.21).
Table 3. 21: Divergent selection constraint parameters estimation and likelihood scores for
SASS6.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.612, ω0 = 0.017, p1= 0.077, ω1 =
1.0, p2 = 0.310, ωnp = 0.253, ωp = 0.270
-14706.9216
0.67 0.66
CmC-simians
p0 = 0.58, ω0 = 0.013, p1= 0.082, ω1 =
1.0, p2 = 0.34, ωns = 0.214, ωs = 0.499
-14697.4440 0.00001 0.0003
CmC-catarrhini
p0 = 0.31, ω0 = 0.26, p1= 0.077, ω1 =
1.0, p2 = 0.62, ωnc = 0.017, ωc = 0.024
-14706.9516 0.72 0.66
CmC-greatapes p0 = 0.61, ω0 = 0.17, p1= 0.078, ω1 =
1.0, p2 = 0.31, ωng = 0.25, ωg = 0.47
-14706.3050
0.23 0.57
CmC-hominini p0 = 0.62, ω0 = 0.017, p1= 0.077, ω1 =
1.0, p2 = 0.31, ωnh = 0.26, ωh = 1.082
-14705.0071
0.045 0.33
M2a_rel p0 = 0.614, ω0 = 0.017, p1 = 0.077, ω1 =
1.0, p2 = 0.308, ω2 = 0.259
-14707.0141 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c: catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.9 Major Facilitator Superfamily Domain Containing 2A (MFSD2A)
3.9.1 Phylogenetic analysis of MFSD2A
Phylogenetic tree for MFSD2A and its putative paralogs was reconstructed using
neighbor joining (NJ) method in order to identify the origin and evolutionary
relationship between the major facilitator superfamily domain family paralogs (Figure
3.12). Phylogenetic tree revealed that two duplication events were responsible for the
expansion of this family (Figure 3.12). First duplication event has arose during the
early metazoan history, before bilaterian-nonbilaterian split and produced most ancient
member of this family MFSD12 gene and MFSD2A/MFSD2B ancestral gene (Figure
3.12). Second duplication event diverged MFSD2A and MFSD2B and has occurred
prior to actinopterygii-sarcopterygii split and after the divergence of vertebrates from
cephalochordate (Figure 3.12). Phylogenetic tree further showed that teleost specific
duplication event occurred in MFSD2A gene approximately 310 million years ago
(Figure 3.12). From the tree topology pattern it appears that MFSD2A and MFSD2B
are closely related genes, whereas MFSD12 gene is very distantly related to this
subfamily (MFSD2A and MFSD2B) (Figure 3.12).
Chapter 3 Results
77
Figure 3. 12: Phylogenetic tree of MCPH15 gene MFSD2A gene
The phylogenetic tree of MFSD2A was reconstructed using NJ method based on evolutionary distance computed by JTT matrix based method. The statistics at braches represent bootstrap score (only value
≥50% is shown) that was established on the basis of 500 replicates. Scalar bar denote amino acid
substitution per site.
Chapter 3 Results
78
3.9.2 Pervasive adaptive evolution of MFSD2A in placental mammals
Signal of positive selection was examined in microcephaly loci 15 that encode
MFSD2A protein coding gene by performing codon substitutions site models (M0,
M1, M2, M7, M8, and M8a) using maximum likelihood method on three data groups
of placental mammals (primates, nonprimate placental mammals, and combined data
of primates and nonprimate placental mammals i.e., all placental mammals) (Table
3.22). The estimated ω ratio by one ratio model (M0) for all three above mentioned
data groups revealed that purifying selection dominated the evolution of euthrian
MFSD2A (Table 3.22). Signal of positive selection considered optimal if two out of
three site pairs (M1 & M2, M7 & M8, and M8a & M8) models rejected the neutral
model (M1, M7 and M8a) in the favor of alternative M2 and M8 models (positive
selection models). Parameter estimations suggested that signal of positive selection
was identified only in the combined data set of primate and nonprimate placental
mammals by two site pairs M7 & M8 and M8a & M8 (Table 3.22).
Six and four positive selected sites were also pinpointed by both NEB and BEB
methods respectively, these methods are implemented in M8 codon substitution
maximum likelihood method (Table 3.22). But the p value correction by false
discovery rate q value suggested that signal of positive selection in placental mammals
are not significant according to stringent criteria of positive selection for this study.
3.9.3 Episodic adaptive evolution across the MFSD2A mammalian
Phylogeny
Previous studies have proposed that some of microcephaly genes (MCPH1, WDR62,
CDK5RAP2 and ASPM) have evolved at accelerated rate along specific primates
evolutionary stages. To detect the signature of episodic selection in MFSD2A protein
coding gene, branch-site model was performed at various evolutionary time points
from ancestral primate branch to modern human terminal branch by using codon
substitution maximum likelihood method (Table 3.23). Branch-site model allow ω to
vary not across the branches of the phylogeny but also among the sites of prespecified
lineage of interest. The significance of positive selection was determined by
calculating LRTs of null model (that is similar to branch-site model except ω2 is fixed
to one) against alternative branch-site model of prespecified linages of interest (Table
3.23). Parameter estimations suggested that protein coding sequence accelerated in
Chapter 3 Results
79
simian, catarrhini and hominini ancestral branch with ω2 22.54, 11.65, and 3.195
respectively. But the LRTs values rejected these acceleration and suggested that none
of the analyzed branch of MFSD2A protein coding gene significantly evolved under
Darwinian positive selection (Table 3.23)
Table 3. 22: Parameter estimation and LRT for Mammals MFSD2A.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.13249 -4444.611090
M1: Nearly neutral ω0 = 0.06104, f0 = 0.90335, ω1 = 1.0, f1 = 0.09665
-4421.032654
1.0 1.0
M2:Positive
selection
ω0 = 0.061, f0 = 0.9034, ω1 =
1.0, f1 = 0.0009, ω2 = 1.0, f2 =
0.09579
-4421.032654
M7: p = 0.18941, q = 1.12348 -4421.078982 0.61 0.55
M8: p0 = 0.965, p = 0.3059, q =
2.5129 (p1 = 0.03519) ω = 1.27048
-4420.582485
M8a: p0 = 0.9308, p = 0.3499, q =
3.4023, (p1 = 0.0692) ω = 1.0
-4420.593859 0.88 0.77
Nonprimate M0: One ratio ω = 0.10915 -9498.29415
mammals M1: Nearly neutral ω0 = 0.0438, f0 = 0.8756, ω1 = 1.0, f1 = 0.1244
-9291.79433
1.0 1.0
M2:Positive
selection
ω0 = 0.04384, f0 = 0.8756, ω1 =
1.0, f1 = 01244, ω2 = 14.88, f2 =
0
-9291.79433
M7: p = 0.1559, q = 1.02 -9260.06512 0.03 0.04
M8: p0 = 0.944, p = 0.219, q =
2.445, (p1 = 0.056) ω = 1.0
-9256.61184
M8a: p0 = 0.92897, p = 0.1985, q =
2.161, (p1 = 0.07103) ω = 1.0
-9257.53496 0.17 0.21
Mammals M0: One ratio ω = 0.10830 -11628.04642
M1: Nearly neutral ω0 = 0.0458, f0 = 0.882, ω1 =
1.0, f1 = 0.1179
-11355.56887 0.99 1.0
M2:Positive selection
ω0 = 0.0458, f0 = 0.882, ω1 = 1.0, f1 = 0.1179, ω2 = 25.96, f2 =
0
-11355.56893
M7: p = 0.16574, q = 1.0699 -11307.34092 0.0002 0.0004
M8: p0 = 0.9559, p = 0.2321,q =
2.4611, (p1 = 0.04409) ω =
1.1129 (196S,
198T,209R,335V,437E, 438R NEB, 209R, 335V, 437E, 438R
BEB)
-11298.89537
M8a: p0 = 0.9366, p = 0.221, q =
2.311, (p1 = 0.0634) ω = 1.0
-11301.04443 0.04 0.08
Chapter 3 Results
80
Table 3. 23: Branch-site analysis of MFSD2A.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0 1 0.98
Hominini 3.195 0.00003 1 0.98
Homininae 1 0.0648 0.79 0.98
Hominidae 1 0.0528 0.82 0.98
Catarrhini 11.65 0.2656 0.61 0.98
Simians 22.54 2.1289 0.14 0.98
Primates 1 0.00002 1.0 0.98
3.9.4 Divergent selective constraint across the MFSD2A mammalian
phylogeny
The significant signals of positive selection in MFSD2A protein coding sequences
were not found by codon substitution site models and branch-site model (Table 3.22
and Table 3.23). But positive selection is not necessary for functional divergence of
protein coding gene between the groups of orthologous sequences in the phylogeny.
Clade model C (CmC) was used to detect such complex form of divergent selective
pressure between the partitions of MFSD2A mammalian phylogeny (Table 3.24). The
significance of divergent selective constraint between the partitions of phylogeny was
determined by comparing null model M2a_rel against alternative CmC-primate,
simians, catarrhini, greatapes and hominini models. The parameter estimations and p
value indicated divergent selection constraint occurred between simians (ω = 0.38) and
nonsimians placental mammals (ω = 0.23) (Table 3.24). But after eradicated the false
positive results of CmC by calculating false discovery rate q value correction over p
value revealed that none of the analyzed partitions of MFSD2A mammalian phylogeny
experienced divergent selective pressure. However, these observations suggest that
MFSD2A protein coding gene hold conserved function across the eutherian evolution.
Table 3. 24: Divergent selection constraint parameters estimation and likelihood scores for
MFSD2A.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q value
CmC-Primate p0 = 0.731, ω0 = 0.0152, p1= 0.062, ω1 = 1.0,
p2 = 0.207, ωnp = 0.257, ωp = 0.279
-11299.7769
0.66 0.66
CmC-simians p0 = 0.719, ω0 = 0.014, p1= 0.066, ω1 = 1.0,
p2 = 0.21, ωns = 0.23, ωs = 0.38
-11297.2803
0.02 0.30
CmC-catarrhini
p0 = 0.725, ω0 = 0.015, p1= 0.065, ω1 = 1.0,
p2 = 0.21, ωnc = 0.25, ωc = 0.33
-11299.5559 0.43 0.66
CmC-greatapes p0 = 0.73, ω0 = 0.015, p1= 0.062, ω1 = 1.0,
p2 = 0.21, ωng = 0.262, ωg = 0.289
-11299.8560
0.85 0.66
CmC-hominini
p0 = 0.73, ω0 = 0.015, p1= 0.061, ω1 = 1.0,
p2 = 0.21, ωnh= 0.266, ωh = 0.196
-11299.8222
0.75 0.66
M2a_rel
p0 = 0.732, ω0 = 0.0154, p1= 0.061, ω1 = 1.0,
p2 = 0.206, ω2 = 0.265
-11299.8738
NA NA
Chapter 3 Results
81
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.10 Citron rho-interacting serine/threonine kinase (CIT)
3.10.1 Evolutionary history of CIT gene
Phylogenetic tree of CIT gene was constructed by the orthologous protein sequences
from the metazoan species (Figure 3.13). Phylogeny showed that CIT is originated at
the root of parahoxozoa (placozoa, cnidaria and bilateria) (Figure 3.13). Furthermore,
phylogeny also revealed that lineage specific duplication occurred at the root of teleost
fish approximately 310 million years ago (Figure 3.13). Ensembl genome browser
shows five paralog (CDC42BPA, CDC42BPG, CDC42BPG, ROCK1 and ROCK2) of
CIT. Evolutionary relationship between all these Ensembl genome browser predicted
paralogs was estimated, which revealed CIT gene is distantly related to all five
Ensembl predicted paralog genes, suggesting these genes might have not been the
putative paralogs of CIT.
3.10.2 Molecular evolution of CIT across eutherian
A statistical approach was used to study selective pressure acting on CIT protein gene
in 37 species of placental mammals from which 15 species are primates and 22 are
nonprimate placental mammals. Six site model (M0, M1, M2, M7, M8, and M8a)
using codon based maximum likelihood method were performed on separately on three
data groups i.e., primates (15 species), nonprimate placental mammals (22 species) and
placental mammals (37 species). (Table 3.25) The simplest one ratio model (M0)
measure average ω ratio on overall amino acid sites of protein coding gene. The
selective pressure estimated by M0 codon substitutions site model revealed that
extreme negative selection dominated the evolution of CIT protein coding gene in all
three data groups (primate, nonprimate placental mammals and all placental mammals)
with ω value 0.03 (Table 3.25). The other five site models was implemented to
compute the likelihood ratio tests for three site pairs M1 (neutral model) against M2
(selection model), M7 (beta) against M8 (beta & ω) and M8a (beta & ω = 1) against
M8 (beta & ω) in order to check the signs of positive selection on CIT protein coding
gene in all three above mentioned data groups of eutherian mammals (Table 3.25).
Parameters estimations and LRTs indicated that patterns of positive selection were
spotted in primates by only one site pair M7 vs. M8a and in nonprimate placental
Chapter 3 Results
82
mammals by two site pair models M7 vs. M8 and M8a vs. M8 (Table 3.25). For this
study signals of positive selection were considered optimal only if two out of three site
pair models detected adaptive evolution. Under this criterion positive selection is
acting only in nonprimate placental mammals (Table 3.25). Bayes empirical Bayes
(BEB) method detected two sites that have greater than 95% probability to be under
positive selection in nonprimate placental mammals (Table 3.25).
Figure 3. 13: Evolutionary history of human CIT gene.
The phylogenetic tree of CIT was reconstructed using NJ method based on evolutionary distance
computed by JTT matrix based method. All positions that contain gaps and missing data are eradicated.
Twenty four amino acid sequences were used in this analysis.
Chapter 3 Results
83
Table 3. 25: Parameter estimation and LRT for Mammals CIT.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.03313 -14283.42317
M1: Nearly neutral ω0 = 0.0165, f0 = 0.981, ω1 = 1.0, f1 = 0.0187
-14240.91466 0.69 1.0
M2:Positive
selection
ω0 = 0.0165, f0 = 0.981, ω1 =
1.0, f1 = 0.0187, ω2 = 21.34,
f2 = 0
-14240.91669
M7: p = 0.0148, q = 0.310 -14245.90441 0.005 0.008
M8: p0 = 0.987, p = 1.923, q =
99.0, (p1 = 0.013) ω = 1.35 (99I, 253N, 255R, 304T,
308S, 625A, 1834S NEB,
625A BEB)
-14240.57215
M8a: p0 = 0.982, p = 1.77, q =
99.0, (p1 = 0.0185) ω = 1.0
-14240.95137 0.38 0.38
Nonprimate M0: One ratio ω = 0.03576 -29309.608609
mammals M1: Nearly neutral ω0 = 0.1634, f0 = 0.96225, ω1
= 1.0, f1 = 0.03775
-28964.379310
1.0 1.0
M2:Positive
selection
ω0 = 0.1634, f0 = 0.96225, ω1
= 1.0, f1 = 0.3775, ω2 =
48.43, f2 = 0
-28964.373910
M7: p = 0.06866, q = 1.2419 -28925.116597 4*10-9 4⁎10-8
M8: p0 = 0.98525, p = 0.08625, q
= 2.5095, (p1 = 0.01475) ω
= 1.3 (12S, 227T BEB)
-28905.698238
M8a: p0 = 0.97683, p = 0.03155, q
= 0.399, (p1 = 0.02317) ω =
1.0
-28914.804177 2⁎10-3 0.0002
Mammals M0: One ratio ω = 0.03559 -35783.34624
M1: Nearly neutral ω0 = 0.0165, f0 =0.9613, ω1 =
1.0, f1= 0.03874
-35349.637428 1.0 1.0
M2:Positive
selection
ω0 = 0.0165, f0 = 0.9613, ω1 =
1.0, f1 = 0.03875, ω2 =
136.65, f2 = 0
-35349.637429
M7: p = 0.07159, q = 1.2654 -35273.812696 5⁎10-4 1⁎10-4
M8: p0 = 0.97858, p = 0.03136, q
= 0.38527, (p1 = 0.02142) ω
= 1.0
-35261.612543
M8a: p0 = 0.97858, p = 0.02221, q
= 0.2331, (p1 = 0.02142) ω=
1.0
-35261.613592 0.96 0.80
3.10.3 Molecular evolution of CIT protein coding gene by branch-site
model
The above codon substitutions site model approach is incapable to identify positive
selection acting on a short period of time and affects only a fraction of codons. The
branch-site model is able to detect lineage specific selective pressure changes on
specific codons. To determine episodic positive selection signatures, branch-site model
Chapter 3 Results
84
was performed on specific evolutionary stages of CDK6 mammalian phylogeny more
specifically from primate ancestral lineage to modern human terminal branch (Table
3.26). The significance of positive selection was determined by conducting likelihood
ratio tests (LRTs) of null model (similar to branch-site model except ω2 is fixed to one
in predefined lineage of interest) against branch-site model from log likelihood score
of each test (Table 3.26). Parameters estimation and LRTs revealed that only simian‘s
ancestral branch evolved adaptively (Table 3.26). Bayes empirical Bayes (BEB)
method implemented in branch-site model pinpoint one positive selected codon site
with greater than 99% probability (Table 3.26). But when false positive results are
eliminated by calculating q value over p value revealed that no analyzed branch is
under positive selection (Table 3.26).
Table 3. 26: Branch-site analysis of CIT.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0 1 0.98
Hominini 1 0.0003 0.98 0.98
Homininae 1 0.000006 1 0.98 Hominidae 1 0.000002 1 0.98
Catarrhini 1 0.000004 1 0.98
Simians 999 8.3012 0.004 0.11 1897A⁎⁎
Haplorhini 1.06 0.0056 0.94 0.98
Primates 1 0.0005 0.98 0.98
3.10.4 Divergent selective pressure across CIT mammalian phylogeny
In order to check divergence in selective pressure between different partitions of CIT
mammalian phylogeny, Clade model C (CmC) was performed (Table 3.27). CmC
accommodates both the heterogeneity and divergence in selective pressure among the
sites. The significance of divergence in selective pressure between the partitions of
CIT mammalian phylogeny was tested by conducting likelihood ratio tests (LRTs) of
null model M2a_rel against CmC-primates, simians, catarrhini, greatapes, and
hominini from log likelihood score of each test (Table 3.27). Parameter estimation and
LRTs revealed that no patterns of divergent selective constraint were observed in any
single partition of CIT mammalian phylogeny (Table 3.27). This suggests that CIT
protein coding gene responsible to perform similar function throughout the eutherian
mammals.
Chapter 3 Results
85
Table 3. 27: Divergent selection constraint parameters estimation and likelihood scores for CIT.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.89, ω0 = 0.0058, p1= 0.014, ω1 =
1.0, p2 = 0.095, ωnp = 0.24, ωp = 0.19
-35246.2438 0.18 0.57
CmC-simians p0 = 0.095, ω0 = 0.226, p1= 0.015, ω1 =
1.0, p2 = 0.89, ωns = 0.0059, ωs = 0.0043
-35246.9740 0.57 0.66
CmC-catarrhini p0 = 0.094, ω0 = 0.227, p1= 0.015, ω1 =
1.0, p2 = 0.89, ωnc = 0.0057, ωc = 0.0088
-35246.8939 0.49 0.66
CmC-greatapes p0 = 0.89, ω0 = 0.0058, p1= 0.015, ω1 = 1.0, p2 = 0.094, ωng = 0.2279, ωg = 0.2074
-35247.1167 0.86 0.66
CmC-hominini p0 = 0.89, ω0 = 0.0057, p1= 0.015, ω1 =
1.0, p2 = 0.095, ωnh = 0.224, ωh = 0.379
-35246.9556 0.55 0.66
M2a_rel p0 = 0.89, ω0 = 0.0058, p1 = 0.015, ω1 =
1.0, p2 = 0.095, ω2 = 0.227
-35247.1321 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.11 Kinesin Family Member 14 (KIF14)
3.11.1 Evolutionary history of KIF14
Putative paralog of KIF14 was not detected by similarity search based approaches and
in any public databases and genome browsers. KIF14 phylogeny was reconstructed
through neighbor joining (NJ) method by encompassing the orthologous sequences
from the representative species of kingdom animalia (Figure 3.14). Phylogenetic tree
revealed that KIF14 gene is originated during the early evolutionary history of
metazoan (Figure 3.14). Bidirectional blast best hit approach was failed to detect any
ortholog of KIF14 in Pteromyzon marinus, Drosophila melanogaster, Caenorhabditis
elegans and Trichoplax adhaerens.
3.11.2 Pervasive adaptive evolution in KIF14 across eutherian mammals
First checked whether the imprint of pervasive positive selection is present on KIF14
protein coding gene by performing codon substitutions site models (M0, M1, M2, M7,
M8, and M8a) separately on three data groups i.e., primates, nonprimate placental
mammals and placental mammals (Table 3.28). The estimated selective pressure from
the simplest one ratio model revealed that overall negative selection dominated the
evolution of KIF14 with ω values ranging from 0.275-0.335 (Table 3.28). The one
ratio model estimated the average ω for all sites in protein coding gene and unable to
identify positive selection on individual codon site. Three site pair‘s models (M1 vs.
M2, M7 vs. M8 and M8a vs. M8) were used to measure the strength of positive
selection acting on individual codon. For this purpose, calculated likelihood ratio tests
Chapter 3 Results
86
(LRTs) for three site pair‘s models (M1 vs. M2, M7 vs. M8 and M8a vs. M8) from log
likelihood scores estimated for five site models (M1, M2, M7, M8, and M8a). The
parameters estimation and LRTs values revealed evidence for positive across primates
and placental mammals with all three sit pairs models (M1 vs. M2, M7 vs. M8 and
M8a vs. M8) and two site pairs model (M7 vs. M8 and M8a vs. M8) respectively
(Table 3.28). Naïve empirical Bayes (NEB) method implemented in M2 selection
model identified one site with posterior probability > 95 in primates, while M8 site
model using Naïve empirical Bayes (NEB) and Bayes empirical Bayes (BEB) methods
identified two sites that have probability to be evolved under positive selection in
primates (Table 3.28). In placental mammals, positive selected sites pinpointed by
Naïve empirical Bayes (NEB) are three and by Bayes empirical Bayes (BEB) method
is one (Table 3.28).
Figure 3. 14: Evolutionary history of human KIF14 gene.
Chapter 3 Results
87
The phylogenetic tree of KIF14 was reconstructed using NJ method based on evolutionary distance
computed by JTT matrix based method. All positions that contain gaps and missing data are eradicated.
The numbers at each branches represent bootstrap score.
Table 3. 28: Parameter estimation and LRT for Mammals KIF14.
Data group Model Parameter estimation (ω) Log likelihood
value (lnL)
P
value
q
value
Primates M0: One ratio ω = 0.33576 -14591.300046
M1: Nearly neutral ω0 = 0.1172, f0 = 0.7345, ω1 =
1.0, f1 = 0.2655
-14497.808970 0.002 0.0
M2:Positive
selection
ω0 = 0.1258, f0 = 0.739, ω1 =
1.0, f1 = 0.253, ω2 = 5.59, f2 =
0.0074 (855H NEB)
-14491.077188
M7: p = 0.256, q = 0.4773 -14502.410656 3⁎10-3 0.0006
M8: p0 = 0.9868, p = 0.3511, q =
0.715, (p1 = 0.0132) ω = 4.35
(855H, 1477R NEB, 855H,
1477R BEB)
-14492.030684
M8a: p0 = 0.7359, p = 13.408, q =
99.0, (p1 = 0.264) ω = 1.0
-14497.848037 0.0006 0.004
Nonprimate M0: One ratio ω = 0.2751 -24123.222491
mammals M1: Nearly neutral ω0 = 0.1393, f0 = 0.72708, ω1 =
1.0, f1 = 0.27292
-23863.957733
1.0 1.0
M2:Positive
selection
ω0 = 0.13933, f0 = 0.72708, ω1
= 1.0, f1 = 0.1965, ω2 = 1.0, f2
= 0.0764
-23863.957733
M7: p = 0.512, q = 1.1425 -23817.981687 0.01 0.02
M8: p0 = 0.98039, p = 0.57517, q
= 1.4066, (p1 = 0.01961) ω =
1.756 (239T BEB)
-23813.387527
M8a: p0 = 0.90368, p = 0.62754, q
= 1.926, (p1 = 0.09632) ω =
1.0
-23815.732058 0.03 0.07
Mammals M0: One ratio ω = 0.28548 -31622.75422
M1: Nearly neutral ω0 = 0.13362, f0 = 0.70483, ω1
= 1.0, f1 = 0.29517
-31137.117565
1.0 1.0
M2:Positive
selection
ω0 = 0.13362, f0 = 0.70483, ω1
= 1.0 f1 = 0.25082, ω2 = 1.0, f2
= 0.04435
-31137.117565
M7: p = 0.47224, q = 1.0128 -31066.759798 1*10-6
5⁎10-6
M8: p0 = 0.9666, p = 0.5499, q =
1.358, (p1 = 0.03334) ω =
1.6429 (232T, 1414S, 1436S
NEB, 1414S BEB)
-31053.422533
M8a: p0 = 0.86864, p = 0.63251, q =
2.1528, (p1 = 0.13136) ω = 1.0
-31058.603827 0.001 0.005
3.11.3 Episodic positive selection across KIF14 mammalian phylogeny
The transient or episodic imprint of positive selection on KIF14 protein coding gene
that affects only subset of lineages and fraction of sites is unable to identify by codon
substitutions site models (M2 and M8). The transient or episodic positive selection on
various evolutionary stages of KIF14 mammalian phylogeny was determined by
Chapter 3 Results
88
branch-site model (Table 3.29). The significance of the transient imprint of adaptive
evolution was determined by likelihood ratio tests (LRTs) of null model (similar to
branch-site model except ω2 is restricted to one for the predefined lineage of interest)
against branch-site model from log likelihood score for each test (Table 3.29). False
positive results obtained by branch-site model were eliminated by estimating the false
discovery rate q value amendment over p value. The ω2 and LRTs values for
predefined foreground branches revealed that only homininae ancestral branch evolved
significantly under positive selection with ω value 123.22 (Table 3.29). Bayes
empirical Bayes (BEB) method identified one codon site with posterior probability >
95%of evolving under positive selection in homininae ancestral branch (Table 3.29).
Table 3. 29: Branch-site analysis of KIF14.
Branch ω2 LRT P value q value Positive selected
sites
Human 1 0 1 0.98
Hominini 1 0 1 0.98
Homininae 123.22 11.0705 0.0009 0.04 619M⁎
Hominidae 1 0 1 0.98
Hominidea 1 0 1 0.98
Catarrhini 3.34 0.4272 0.51 0.98
Simians 1 0.000002 1 0.98
Haplorhini 1 0 1 0.98 Primates 1 0.0679 0.79 0.98
3.11.4 Site-specific functional divergence among the partitions of KIF14
mammalian phylogeny
Changes in site specific divergent selective pressure between the clades of protein
coding genes contribute to adaptive phenotypic diversity. To check the site specific
divergence in selective constraint between the different partitions of KIF14
mammalian phylogeny, clade model C (CmC) using codon based maximum likelihood
approach was performed (Table 3.30). Likelihood ratio tests (LRTs) of null model
M2a_rel against CmC-primates, simians, catarrhini, great apes, and hominini were
conducted in order to check the significance of divergence in selective pressure
between them (Table 3.30). Parameter estimation and LRTs revealed that KIF14
evolved with divergent selective pressure in hominini (ω = 0.25) and nonhominini
placental mammals (ω = 0.025) (Table 3.30). But when false positive results
eliminated by q values correction over p values, the result of divergent selective
constraint between hominini and nonhominini eutherian are not significant (Table
Chapter 3 Results
89
3.30). This suggests that KIF14 protein coding gene have conserved function
throughout the evolution of eutherian animals.
Table 3. 30: Divergent selection constraint parameters estimation and likelihood scores for KIF14.
Model & Partition Parameter estimation (ω) Log likelihood
value (lnL)
P value q value
CmC-Primate p0 = 0.43, ω0 = 0.043, p1= 0.15, ω1 = 1.0, p2 = 0.42, ωnp = 0.36, ωp = 0.39
-31056.0434 0.58 0.66
CmC-simians p0 = 0.43, ω0 = 0.043, p1= 0.14, ω1 =
1.0, p2 = 0.42, ωns = 0.36, ωs = 0.42
-31055.4805 0.23 0.57
CmC-catarrhini p0 = 0.43, ω0 = 0.042, p1= 0.15, ω1 =
1.0, p2 = 0.42, ωnc = 0.36, ωc = 0.41
-31055.9687 0.50 0.66
CmC-greatapes p0 = 0.42, ω0 = 0.37, p1= 0.15, ω1 =
1.0, p2 = 0.43, ωng = 0.042, ωg = 0.13
-31054.6979 0.083 0.42
CmC-hominini p0 = 0.42, ω0 = 0.37, p1= 0.15, ω1 =
1.0, p2 = 0.43, ωnh = 0.042, ωh = 0.25
-31053.9072 0.03 0.30
M2a_rel p0 = 0.42, ω0 = 0.37, p1 = 0.15, ω1 =
1.0, p2 = 0.43, ω2 = 0.043
-31056.1959 NA NA
np: nonprimate eutherian, p: primates, ns: nonsimians eutherian, s: simians, nc: noncatarrhini, c:
catarrhini, ng: nongreatapes eutherian, g: greatapes, nh: nonhominini eutherian, h: hominini.
3.12 Synuclein gene family
3.12.1 Evolutionary history of synuclein family
Evolutionary relationship between α synuclein and its putative paralogs β synuclein
and γ synuclein was estimated by comprehending the protein sequences from
representative members of subphylum vertebrata Class Mammalia, Aves, Reptilia,
Amphibia, Osteichthyes, Chondrichthyes, and Agnathas through Maximum Likelihood
(ML) method (Figure 3.15). The molecular phylogenetic investigation suggests that
synuclein family has been diversified by two independent gene duplication events
much earlier in vertebrate history (Figure 3.15). The γ synuclein was the first gene to
diverge at the root of vertebrates prior to jawless-jawed vertebrates split (Figure 3.15).
However, α and β synuclein originated through a subsequent duplication event that
occurred after jawless-jawed vertebrates split and prior to cartilaginous-bony
vertebrates divergence (Figure 3.15). Furthermore, position of lizard in β synuclein
subfamily is not according to the well-established vertebrate phylogeny (Figure 5.15).
However, position and branch length of lizard β synuclein indicating rapid sequence
evolution of β synuclein in lizard as compare to other ortholog (Figure 3.15).
Phylogeny further revealed species specific duplication in lamprey and having two
copies of γ synuclein (Figure 3.15). Bidirectional blast/blat best hit strategy was unable
to detect any ortholog of synuclein family among all phyla of invertebrate‘s metazoan
(Figure 3.15). The vertebrate specific origin and their localization at presynaptic
Chapter 3 Results
90
terminals suggest that synuclein family might have contributed towards synaptic
complexity differences between invertebrates and vertebrates.
Figure 3. 15: Evolutionary history of synuclein family.
Evolutionary history of synuclein gene family was inferred through maximum likelihood (ML) method.
The statistics present on the nodes depicts bootstrap value. Values ≥50% are displayed here. Two gene
duplications in the early vertebrate lineage diversify this family into three members α, β, and γ
synuclein. First duplication occurred before lamprey divergence from other vertebrates, while second duplication transpired after lamprey split from other vertebrates.
Chapter 3 Results
91
3.12.2 Sequence evolution and Coevolutionary relationship
Vertebrate specific gene innovation and duplications facilitate the acquisition of
unique biological function and are hence considered major driving force behind
synaptic evolution (Bayés et al., 2017). Gene duplications provide a substantial raw
substrate from which a new gene function may evolve by mutations. Protein sequence
alignment of human synuclein paralogs reveals multiple regions of conservation and
divergence, with high sequence identity at the amino terminal domain, but low
sequence identity (α/γ) at the carboxyl terminal domain. Both α and β synuclein
contain two Proline rich Ca2+
binding motifs within the carboxyl terminal domain,,
while γ synuclein lack these subfamily specific motifs (Figure 3.16) (M. S. Nielsen,
Vorum, Lindersson, & Jensen, 2001). Within NAC domain, the most striking
difference among paralogs is the deletion of eleven amino acid residues in β synuclein,
a region responsible for amyloidogenic characteristic of α synuclein (Figure 3.16). As
mentioned above amino terminal region is highly conserved among paralogs. This
finding was further reinforced by the physical positioning of previously reported six
human specific mutations linked with hereditary Parkinson‘s disorder i.e., A30P,
E46K, H50Q, G51D, A53E and A53T on human α synuclein (Figure 3.16) (Kruger et
al., 1998; Lesage et al., 2013; Polymeropoulos et al., 1997; Silke et al., 2013; Zarranz
et al., 2004). Results revealed the confinement of these mutations explicitly towards
the amino terminal domain which in turn implies the significance of high conservation
of this region not only with functional perspective but also for pathogenesis of familial
Parkinson‘s disease (Figure 3.16). With the help of SLAC-window analysis, it appears
that amino terminal and NAC domains of α synuclein contained of 25 negatively
constrained sites which further advocates that strong purifying selection are operating
their role in preserving this region during vertebrate evolution (Table 3.31).
These sequence differences must necessarily underlie the functional and pathogenic
differences among paralogs and orthologs. However, the mechanism by which
sequence alteration lead to differential phenotypes among paralogs remains unclear.
Intriguingly, in α synuclein Q50H is the PD hotspot (when histidine gets mutated to
glutamine, PD is caused). In contrast the wild type β and γ synucleins contain
Glutamine at this position (Figure 3.16). This differential phenotypic impact of
synuclein paralogous copies likely to have arisen either independently by the effects of
protective alleles or coevolved with other proteins.
Chapter 3 Results
92
Mutual information (MI) score can be used to predict the coevolutionary relationship
between amino acid residues in a protein family or subfamily (Teppa, Wilkins,
Nielsen, & Buslje, 2012). The two or more residues are suggestive to be coevolved, if
they have high MI signals. These residues likely to decipher biological information
related to protein structure and functions (Teppa, et al., 2012). The extent of
coevolutionary relationship between residues within α, β and γ synuclein has been
inferred by MISTIC server (Figure 3.17) (Simonetti, et al., 2013). Figure 3.17 shows
that the most conserved positions are proline and glycine residues in α and β synuclein.
In γ synuclein, glycine residues are most conserved residues. Furthermore, it can be
revealed that information accumulated in the carboxyl terminal domain of α and γ
synuclein: residues 101-114 (α synuclein) and 98-126 (γ synuclein) (Figure 3.17).
Within these regions, large number of MI connections (red lines represent MI values
with top 5% percentile) with high values of proximity mutual information (pMI) and
cumulative mutual information (cMI) for individual residue were found (Figure 3.17).
High MI values suggested coevolutionary relationship between residues. Whereas in β
synuclein large number of MI connections with high value of cMI and pMI was
observed in three main regions of amino, NAC and carboxyl domains: residues 10-18,
47-94 and 98-111. Interestingly, high degree of coevolutionary relationship between
residues of β synuclein (especially in NAC domain residue: 63-94) was observed as
compared to α and γ synuclein (Figure 3.17). These observations suggest that high
degree of coevolution between the residues of β synuclein likely to have occurred due
to deletion of ten residues in NAC domain, ultimately in order to maintain the
structural stability and perhaps function of β synuclein.
93
Figure 3. 16: Sequence alignment of human synuclein paralogs.
Sequence comparison of human synuclein paralogs revealed that eight substitutions have accumulated in the amino terminal and NAC domains of α synuclein, while four substitutions and eleven residues deletion have occurred in the amino terminal and NAC domains of β synuclein after second duplication. * show the hotspots of
neurodegenerative diseases. Paralogs specific changes are color coded. ND: Neurodegenerative disorders.
α-syn specific Amino terminal domain
β-syn specific NAC domain
Ca2+ binding motif Carboxyl terminal domain
* Mutations linked to ND disorders
A30P* E46K* H50Q*
Human_αsyn MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVH
Human_βsyn MDVFMKGLSMAKEGVVAAAEKTKQGVTEAAEKTKEGVLYVGSKTREGVVQ
Ancesteral_α/βsyn MDVFMKGLSMAKEGVVAAAEKTKQGVTEAAEKTKEGVLYVGSKTKEGVVQ
Human_γsyn MDVFKKGFSIAKEGVVGAVEKTKQGVTEAAEKTKEGVMYVGAKTKENVVQ
G51D* V70M* Human_αsyn GVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQL
Human_βsyn GVASVAEKTKEQASHLGGAVFS-----------GAGNIAAATGLVKREEF
Ancesteral_α/βsyn GVASVAEKTKEQASNVGGAVVSGVTAVAQKTVEGAGNIAAATGLVKKEEL
Human_γsyn SVTSVAEKTKEQANAVSEAVVSSVNTVATKTVEEAENIAVTSGVVRKEDL A53T/E* P123H* Human_αsyn GKN-----EEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA
Human_βsyn PTDLKPEEVAQEAAEEPLIEPLMEPEGESYEDPPQEEYQEYEPEA
Ancesteral_α/βsyn PKQ------EEEAAQEPLIEEMVEPEGESYEDPPQEEYQEYEPEA
Human_γsyn RP----SAPQQ--------------EGEASKEKEEVAEEAQSGGD
94
Figure 3. 17: Coevolutionary relationship within synuclein genes.
Circular representation of coevolutionary relationship among the residues within each synuclein family member. Outer circle show the one letter amino acid code of human sequences. Second circle colored box depicts the conservation score (from red represent highest conservation score to cyan show lowest conservation score). The third
indicate cumulative mutual information score, whereas fourth circle showed proximity mutual information scores. Lines displayed in the center of depicts connection between
the residues with mutual information score greater than 6.5. Red lines represent MI values with top 5% percentile; black ones represent MI score between 95 and 70%, while
gray lines indicate last 70%.
Chapter 3 Results
95
Table 3. 31: Sites under negative selection constraint in α synuclein among vertebrates alignment
with SLAC analysis.
Index Residue no dN-dS P-value
1 15 -3 0.03
2 18 -4 0.01
3 20 -4.272426 0.01
4 23 -7.122523 0.0009
5 30 -3 0.03
6 37 -5 0.004
7 39 -4.83201 0.02
8 41 -3 0.03
9 47 -4 0.01
10 49 -4 0.01
11 50 -4.177328 0.05
12 52 -3 0.03
13 62 -4.282012 0.01
14 65 -4.83201 0.01
15 67 -5 0.004
16 69 -6 0.001
17 72 -4 0.01
18 73 -5 0.004
19 75 -3 0.03
20 77 -4 0.01
21 78 -4 0.01
22 85 -3 0.03
23 86 -3 0.03
24 88 -2.894755 0.05
25 98 -7.248015 0.002
dN: non-synonymous substitutions per non-synonymous site, dS: synonymous substitutions per
synonymous site, significant negatively selected sites are presented here only with p value <= 0.05.
3.12.3 Structural evolution of α synuclein
To further inspect how sequence differences impact on structure, comparative
structural study was conducted. NMR structure of human α synuclein is available and
extracted from PDB (1XQ8) and used as a reference to modelled paralogous and
orthologous ancestral proteins structures of α synuclein by homology modelling
(Figure 3.18, 3.19). RMSD values were used to study the structural deviations (Figure
3.18, 3.19). Results revealed that β and γ synuclein structures are highly diverged from
α synuclein at amino terminal and NAC domain (Figure 3.18). Comparative ancestral
orthologous structural analysis of α synuclein suggests that structure of α synuclein has
passed through series of transitions to acquire its favored conformation (Figure 3.19).
Superimposed models of ancestral α synuclein and 1XQ8 revealed common deviated
region encompassed 32 to 58 of amino terminal lipid binding domain of α synuclein,
suggesting that region encompasses 32 to 58 amino acids of α synuclein is constantly
evolved at structural level during vertebrate evolution, despite of its high sequence
conservation. Intriguingly, all mutations that are involved in Parkinson‘s disease
Chapter 3 Results
96
pathogenesis situated in the aforementioned constantly evolving region (32-58 amino
acids) of α synuclein which indicates that any alteration in this region will be
deleterious because of the strong selection and functional constraints imposed on it.
Superimposed mutant models with 1XQ8 identified major shifts toward lipid binding
domain in A30P and H50Q, whereas major change was observed in lipid binding and
NAC domains in case of E46K and A53T. Only G51D showed altered NAC region
only (Figure 3.20). All five mutant models were having highly deviated region from 32
to 58 in common. It can be postulated from this comparative structural analysis that the
primary effects and the role of these five α synuclein mutations in Parkinson‘s disease
pathogenesis can be different because of their differential structural morphologies.
Figure 3. 18: Structural deviation among synuclein paralogs.
Major structural shifts were observed in amino terminal lipid binding and NAC domains due to
paralogous specific substitutions. Deviated residues in comparison with human α synuclein (1XQ8) are
color coded. Structural deviations were evaluated by RMSD values. SNCA: α synuclein, SNCB: β
synuclein, SNCG: γ synuclein.
Chapter 3 Results
97
Figure 3. 19: Structural evolution of α synuclein protein. since the split from last common
sarcopterygian ancestor.
Significant structural divergence towards human α synuclein was observed after the split of common
sarcopterygian ancestor. Lineage specific substitutions are written above the arrow. Deviated residues
in terms of backbone torsion angles (Φ◦,Ψ◦) from the human α synuclein (1XQ8) are represented in red
color. Structural deviations were examined by RMSD values.
Figure 3. 20: Structural analysis of mutant models of human α synuclein.
Human specific mutations involved in FPD are red color coded. NMR structure of α synuclein was
obtained from PDB (1XQ8). Overall quality factor is expressed as percentage of the protein for which the calculated error value falls below the 95% rejection limit, calculated by Errat.
Chapter 3 Results
98
3.12.4 Divergent selective constraint among synuclein genes
Sequence and structure variations, followed by functional changes among synuclein
paralogs might have contributed in the evolution of physiological pathways and
lineage specific phenotypic traits (Kaessmann, 2010).
Two patterns of amino acid sequence variations are considered an evidence of protein
functional divergence (Gu & Vander Velden, 2002). First, type 1 functional
divergence refers to an evolutionary process that describes alteration in functional
constraints among duplicated genes (Gu, 2001). Second, type 2 functional divergence
refers to an evolutionary process that represents no change in functional constraint
among duplicated genes, but radical change occur among them after gene duplication
(Gu, 2001). Several statistical methods have been developed to detect difference in site
specific selective constraint among duplicated genes (Bielawski & Yang, 2004; Gu &
Vander Velden, 2002). The extent of synucleins functional divergence is illustrated by
clade model D which assumes two or three site classes (k2 and k3), and that allow a
proportion of sites to undergo divergent selection pressures in two or more clades
(Table 3.32 and Table 3.33) (Bielawski & Yang, 2004; Z. Yang, 2007). These sites
may have any ω value that suggests differential selection pressure in different
clade. Clade model D was compared to the null model, Discrete model (M3 with k2
and k3). Discrete model (M3 with k2 and k3) along with clade model (model D with
k2 and k3) indicate significant divergent selective pressure and heterogeneity among
sites (Table and Table 2). Clade model with k3 site classes proposed 9%, 15% and 8%
of sites evolving under divergent selective pressure with strong purifying selection in
clade α and β (ωα = 0.187, ωβ = 0.177) and with positive selection in clade γ (ωγ =
1.90) respectively (Table 3.32 and Table 3.33). Similarly, clade model with k2 sites
suggests divergent selective pressure with ω = 0.206 for α clade, ω = 0.16 for β clade,
and ω = 0.746 for γ clade (Table 3.32). Furthermore, eight sites have been seen to
evolved under divergent selective pressure in synuclein paralogs (42S, 72T, 102K,
112I, 113L, 125Y, 128P, and 130E); human α synuclein used as a reference for
position number and residue) with a posterior probability >= 95%. These sites are
distributed non-randomly within synuclein domains. Out of eight, six sites were
located on the carboxyl terminal domain, responsible for chaperone like activity.
Chapter 3 Results
99
Type 1 functional divergence (site specific evolutionary rate shift) is also observed
among α, β, and γ synuclein by using DIVERGE package (Table 3.34) (Gu & Vander
Velden, 2002) Phylogenetic tree, coupled with clade models and DIVERGE suggest
that γ synuclein is the ancient and most variable paralog in the synuclein family.
Table 3. 32: Parameter estimation and likehood score for synuclein family to detect functional
divergence.
Model Parameter estimation (ω) Log likelihood
value
M0: One ratio ω = 0.079 -3228.9759
Site-specific models
M3: Discrete (k = 2) ω0 = 0.027, f0 = 0.75, ω1 = 0.309, f1= 0.246 -3151.7289
M3: Discrete (k = 3) ω0 = 0.0226, f0 = 0.69, ω1 = 0.168, f1= 0.192, ω2 = 0.472,
f2 = 0.113
-3148.9903
Branch-site models
Model D (k = 2)
sncα ω0 = 0.028, f0 = 0.756, ω1sncα = 0.206, ω1sncβγ = 0.370, f1=
0.243
-3149.7281
sncβ ω0 = 0.029, f0 = 0.763, ω1sncβ = 0.160, ω1sncαγ = 0.417, f1=
0.237
-3146.6509
sncγ ω0 = 0.032, f0 = 0.778, ω1sncγ = 0.746, ω1sncαβ = 0.191, f1=
0.222
-3140.1041
Model D (k = 3)
sncα ω0 = 0.0235, f0 = 0.70, ω1 = 0.1859, f1= 0.21 -3144.1261
ω2sncα = 0.187, ω2sncβγ = 0.71, f2 = 0.094
sncβ ω0 = 0.0225, f0 = 0.67, ω1 = 0.136, f1= 0.17 -3144.0289
ω2sncβ = 0.177, ω2sncαγ = 0.542, f2 = 0.158
sncγ ω0 = 0.0251, f0 = 0.716, ω1 = 0.226, f1= 0.20 -3130.30517
ω2sncγ = 1.90, ω2sncαβ = 0.129, f2 = 0.081
f: Proportion of sites, k: site categories, ω values in bold shows positive selection.
Chapter 3 Results
100
Table 3. 33: Statistical significance of functional divergence among synuclein family.
Test 2δ df p value
LRT for sites model
M0 / M3 (K = 2) 154.5 2 0
M0 / M3 (K = 3) 159.9 4 0
M3 (K = 2) / M3 (K = 3) 5.48 2 0.06
Model D (k = 2) / Model D (k = 3)
sncα 11.204 2 0.003
sncβ 5.24 2 0.07
sncγ 19.59 2 <0.0001
LRT for sites and branch model
M3 (K = 2) / Model D (K = 2)
sncα 4.0016 2 0.1
sncβ 10.156 2 0.006
sncγ 23.25 2 <0.0001
M3 (K = 3) / Model D (K = 3)
sncα 9.73 2 0.007
sncβ 9.92 2 0.007
sncγ 37 2 <0.00001
M3 (K = 2) / Model D (K = 3)
sncα 15.21 4 0.004
sncβ 15.4 4 0.0039
sncγ 42.85 4 <0.000001
df: degree of freedom, δ: Difference between LRT values, p value ≤ 0.05 shows significant
divergence.
Table 3. 34: Type 1 functional divergence of synuclein family.
Comparison θ±SE Z score LRT P value(Z-score)
SNCα/SNCβ 0.807998±0.27 3.19 5.27127 0.001
SNCα/SNCγ 0.6622±0.258 2.76 2.934 0.005
SNCβ/SNCγ 0.994±0.22 5.1067 15.325 <0.00001
SNCαβ/SNCγ 0.66019±0.23 3.13 7.6 0.001
SE: standard error, θ: coefficient of functional divergence, LRT: likelihood ratio test.
Chapter 4 Discussion
101
Discussion
Over the last 60-70 million years of evolution have transformed most part of the brain
in both size and complexity, the hominin neocortex size has significantly enlarged in
short period of time since the divergence of Pan lineage from human approximately 6-
7 million years ago (McHenry, 1994). Although overall expansion in neocortex size is
occurred prior to anatomically modern human split from archaic hominins the
Neandertals and Denisovans approximately 550,000-750,000 years ago as both
modern human and Neandertals exhibit large comparable brain (Florio, Borrell, &
Huttner, 2017; Prüfer, et al., 2014). However, evident relative lobe size difference
exist between anatomically modern humans and Neandertals, most prominently
parieto-temporal lobe of neocortex has increased and orbitofrontal cortex is wider in
modern human as compared to Neandertals (Bastir, et al., 2011; Florio, et al., 2017).
This indicates that certain neocortical regions have evolved after Neandertals and
anatomically modern humans split. The size of neocortex is predominantly
determined by the magnitude of neurogenesis and cytokinesis during fetal
development. During neurogenesis, cortical neurons originate from progenitor cell in
the ventricular zone of the developing brain. The progenitor cells undergo successive
cycles of proliferative division before entering to neurogenic division and formation of
subventricular zone (Bystron, Blakemore, & Rakic, 2008; Pasko Rakic, 1988, 1995;
Stancik, Navarro-Quiroga, Sellke, & Haydar, 2010). Massive expansion in neocortex
size during evolution has been explained prominently by radial unit hypothesis of
cortical development. The radial unit hypothesis propose a general mechanism for
rapidly increases the neocortical surface area during evolution is owing to prolong
proliferative/symmetric division period and yields increase number of radial columnar
units that ultimately generate neurons and consequently expanded the neocortical
surface area (P Rakic, 2000). Alternative hypothesis, intermediate progenitor model
proposed that expansion in neocortical surface area and folding occurred during
evolution due to increase in basal progenitor pool size (BP originate from apical radial
glia the main neural progenitor cells in ventricular zone) and their subsequent
expansion in subventricular zone as compared to radial unit in ventricular zone
(Kriegstein, Noctor, & Martínez-Cerdeño, 2006). Recently, Nonaka‐Kinoshita et al.,
suggested another hypothesis for cortical expansion and folding and proposed that
increased abundance of basal radial galia (bRG exhibit stem cell properties) and outer
Chapter 4 Discussion
102
subventricular expansion are primarily responsible for neocortical surface area
expansion and gyrencephaly (Nonaka‐Kinoshita et al., 2013). However, during
neurogenesis BP is better suited for quantitative expansion of neuron production as
compared to apical radial glia because they are not under constraint that imposed on
apical radial glia proliferation by the limited ventricular space. So, increase in BP
generation and their proliferation in subventricular zone (inner and outer) are key
determinants in the evolutionary expansion of neocortex size. Though the timing of
brain development is conserved across mammals but species specific differences in the
duration of cortical neurogenesis (6 days in mice, 60 days in macaque, and 100 days in
humans) most likely contributed to the distinction of neocortex size and complexity
throughout the lineage from primate ancestor to modern humans (Finlay & Darlington,
1995; Geschwind & Rakic, 2013). During evolution, BP pool size abundance
differences and alteration in the timing of neocorticogenesis among species have some
genetic underpinnings likely to be based on lineage specific genomic changes and need
to decipher these changes by comparative genomics analysis.
The availability of whole genome sequence of many extinct and extant species, along
with advances in bioinformatics, molecular biology and comparative genomics
approaches, have ushered in an astonishing new era of human brain evolution (T. M.
Preuss, 2012). Despite the increased upswing in our understanding of the evolution of
the human genome, our awareness about the relationship between genetic changes and
phenotypic changes particularly the expansion of brain size is shaky. (O'Bleness,
Searles, Varki, Gagneux, & Sikela, 2012; T. M. Preuss, 2012). Three possible genetic
mechanisms have been proposed to explain brain/neocortex size differences between
humans and nonhuman primates. First hypothesis focuses on human specific gene loss
and duplication to explain the enlargement of human brain size during Pliocene-
Pleistocene epoch. Second, human specific changes in the regulatory regions of genes
have been proposed to be responsible for alteration in gene expression and ultimately
brain size. Third, human specific accelerated sequence evolution in nervous system
developmental protein coding genes are likely contributed to the rapid expansion of
human brain size during the period of last 5 million years.
ARHGAP11B gene encodes 267 amino acids long Rho GTPase-activating protein and
arose by partial duplication of ARHGAP11A gene after the divergence of human from
Pan lineage but prior to modern human and archaic hominin (Neandertals and
Chapter 4 Discussion
103
Denisovans) split, within the time window of 5 million years to 750,000 years ago
(Florio et al., 2015). ARHGAP11A is found throughout the metazoan while its
truncated paralog present only in hominin and losses RhoGAP activity after
duplication and prior to modern human and archaic human split (Florio, et al., 2015).
ARHGAP11B promoted BP generation from apical radial glia and their proliferation
in subventricular zone and also cause folding in mouse neocortex, while
ARHGAP11A did not effect on BP (Florio, et al., 2015; Florio, Namba, Pääbo, Hiller,
& Huttner, 2016). Therefore, hominin specific gene ARHGAP11B has been implicated
in increased neural progenitor proliferation and evolutionary expansion of neocortex
size in both modern humans and Neandertals (Florio, et al., 2015).
Figure 4. 1: Human neocortical cell types.
Schematic depiction of main neural progenitor cells that involved in neurons production in fetal human neocortex at mid neurogenesis. This is adapted from [(Florio, et al., 2016)].
However, another human specific duplicated gene SRGAP2 (SLIT-ROBO Rho
GTPase activating protein 2) gene has been implicated in cortical development.
SRGAP2 duplicated two times recently in humans after its divergence from
chimpanzee and produced SRGAP2B as result of partial duplication, and SRGAP2C
Chapter 4 Discussion
104
and SARGAP2D in subsequent duplication event from SRGAP2B (Dennis et al.,
2012). The timing of these two human specific duplications is between the windows of
3.5-1 million years ago. Human specific SRGAP2C gene led human specific neuronal
development features in mouse brain including augmented the density of longer spines
and neoteny during spine maturation (Charrier et al., 2012). Consolidated data suggest
SRGAP2C and ARHGAP11B genes transpired by human specific duplications
contributed to the human brain development and evolutionary expansion of human
neocortex size ultimately yield phenotypic differences between human and nonhuman
primates. Furthermore, human specific substitutions in the noncoding human
accelerated region 5 (HARE5) has been experimentally verified to increase the neural
progenitor cells and thus enhanced neocorticogenesis and exert immense difference in
the size of mice brain (Boyd et al., 2015). This noncoding regulatory region serves as
enhancer for FZD8 and contains sixteen human specific substitutions since its
divergence from the lineage leading to chimpanzee and bonobo approximately 6-7
million years ago. FZD8 was more abundant in human developing neocortex as
compared to macaque and suggested that human specific substitutions likely to
enhanced the expression of FZD8 cortical areas of neonatal human brain (Boyd, et al.,
2015).
Human specific evolutionary changes in protein coding genes might contribute to
phenotypic differences between human and nonhuman primates. Study on nervous
system development and housekeeping genes revealed that nervous system
developmental protein coding genes had accelerated evolution along the lineage from
primate ancestor to humans (Dorus et al., 2004). Furthermore, ADCYAP1 (adenylate-
cyclase-activating polypeptide 1) gene is highly conserved in primates and has been
involved in neural precursor amplification and also regulating the proliferative to
differentiated state transition during neurogenesis (Y. Wang et al., 2005). ADCYAP1
gene has been shown to exhibit signature of positive selection in human lineage after
the divergence from our closest extant relative chimpanzee and likely be contributed to
evolutionary changes in neocorticogenesis and might be responsible to expand the
magnitude of neocortex size in human (Y. Wang, et al., 2005). Positive selection
inferred if more number of nonsynonymous substitutions are accumulating faster than
synonymous substitutions in a protein coding gene.
Chapter 4 Discussion
105
Primary microcephaly protein coding genes are considered a key group of candidate
genes in relation to understand evolution of brain size because mutation in the coding
sequence of these gene cause severe reduction in brain size particularly cerebral
cortex. It is seems to be atavistic process because the brain size of primary
microcephalic patients is similar to that of nonhuman apes and early hominids. All
primary microcephaly genes expressed in neuroprogenitor or neuroepithelial cells
during early brain development and perform multiple seemingly unrelated functions
including DNA damage repair, centriole biogenesis, spindle organization, neuronal
differentiation and migration, chromosomal alignment and segregation, transport of
DHA across the blood brain barrier, cytokinesis, regulation of gene transcription and
controls the progenitor amplification (Faheem et al., 2015). Turning to evolutionary
pattern, previous studies highlight that genes involved to control the duration and
mode of cell division were targeted by positive selection during evolution (Bond, et
al., 2002; Evans, Vallender, & Lahn, 2006; S. Montgomery & Mundy, 2012; Y.-q.
Wang et al., 2005). Initial evolutionary studies revealed that four MCPH genes
(MCPH1, CDK5RAP2, ASPM and CENPJ) seem to be evolved adaptively in human
lineage since divergence from chimpanzee (Bond, et al., 2005). Latter as more
eutherian species incorporated into the evolutionary analysis, signature of episodic
positive selection extended beyond human to throughout the eutherian mammals in six
MCPH genes (ASPM, CDK5RAP2, MCPH1, CENPJ, CEP152, and WDR62) as a
pervasive positive selection (S. H. Montgomery & Mundy, 2014). In contrast, the
results of current study are not consistent with the above mentioned study of
Montgomery and Mundy; as all analyzed have no such pattern pervasive positive
selection across the eutherian mammals. It is not necessary that adaptive phenotype
result only if positive selection acting on protein coding genes, changes in site specific
divergent selective pressure between the clades of protein coding genes also
contributed to adaptive phenotypic diversity. Furthermore, the signatures of divergent
selection constraints between simians and nonsimians mammals are significant for
only two loci STIL and SASS6. There is an ample evidence to suggest majority of the
MCPH loci have maintained their conserve functions throughout the placental
mammals. Additionally, significant signatures of episodic selection were not found in
any of the ancestral branch analyzed from primates to hominini branch for MCPH loci
analyzed except KIF14 homininae ancestral branch. However, STIL and WDR62 have
shown to exhibit pattern adaptive evolution in human but these patterns are not
Chapter 4 Discussion
106
significant by codon substitutions based method while significant by frequency based
method in human population. However, protein alignment showed that among all
analyzed MCPH genes, WDR62 and STIL have accumulated greater number of human
specific amino acid replacements after its divergence from chimpanzee (Table 3.1, 3.4
and Table 4.1).
Table 4. 1: Chimpanzee, hominin and human specific amino acids replacements in MCPH genes
since the divergence from hominini ancestor.
Gene Residue
number
Hominini
ancestor
Chimpanzee Denisovans Neandertals Human
CEP135 581 I V I I I
691 R K R R R
844 A S A A A
936 I L I I I
ZNF335 83 G G S S S
294 T T S S S
359 R R P R R
384 P P R P P
403 M L M M M
770 P P S S S
856 A V A A A
1317 E E D D D
PHC1 103 I M I I I
518 T T A A A
SASS6 443 V A V V V
MFSD2A 276 A A S S S
290 S R S S S
415 Q L Q Q Q
CIT 13 D D E D D
78 R R W W R
229 I I V V V
331 T S T T T
332 S G S S S
338 I V I I I
KIF14 73 K K R R R
204 S N S S S
208 E Q E E E
289 P P R R R
321 F L F F F
330 A A T A A
339 E Q E E E
Chapter 4 Discussion
107
395 M M T T T
605 A A S A A
637 I V I I I
733 N N S S S
1081 M M V V V
1165 V V A A A
1315 E E E G E
1361 S S L L L
1363 I T I I I
1391 L Q L L L
1403 N N N H N
1408 S G S S S
1543 S N S S S
1624 R R R H R
Most of these changes shared with archaic humans Neandertals and Denisovans.
Indeed, three MCPH genes (CDK6, CEP135 and SASS6) have no human specific
amino acid substitutions after its divergence from hominini ancestor (Table 4.1).
Among all MCPH genes, CDK6 is most conserved gene in mammals. As pointed out
in current and previous evolutionary studies on MCPH genes have shown that the
coding sequences of majority of those genes that contained human specific
substitutions have experienced positive selection in different time periods during
eutherian evolution rather than being specific to human (S. H. Montgomery & Mundy,
2014; S. Xu et al., 2017). Majority of primry microcephaly causing mutations in
MCPH genes are truncated mutations and perhaps lower the expression level of
normal protein in primary microcephaly, suggesting that extent of the expression of
MCPH genes might have been important for expansion of brain size. The data
demonstrate evolutionary enlargement in the magnitude of human brain during the last
two million years might have not related to the coding sequences of human
microcephaly genes only. However, transcriptional and posttranslational changes with
the combination of human specific changes in MCPH genes might have been
responsible for the evolutionary expansion of human brain size after its divergence
from australopithecus, as coding and noncoding regulatory changes amend the
functional impact of each other (Dimas et al., 2008). Therefore, the complex
conditional effects of human specific coding and noncoding changes in MCPH loci
Chapter 4 Discussion
108
may therefore have paramount consequences for human brain size expansion during
Pliocene-Pleistocene age.
Evolutionary expansion in the magnitude of human brain size is mediated by a
functional trade-off between higher cognitive capabilities and susceptible to
neurodegenerative disorders. Parkinson‘s disease is the second common
neurodegenerative disorder after Alzheimer‘s disease. Both these disorders are
considered specific to human as no other animal species naturally affect either from
Parkinson‘s and Alzheimer‘s disorder. Neurodegenerative disorders were seemed to be
affecting those regions of brain which are evolved during the recent history of human
evolution. Human neurodegenerative genes are evolutionary conserved and are strong
selection constraint as compared to non-neurodegenerative genes (Panda, Begum, &
Ghosh, 2012). Alpha synuclein is highly penetrant gene in early onset of hereditary
Parkinson‘s disease, while paralogs of human α synuclein (β and γ) are not associated
with Parkinson‘s disease. All synuclein members are under strong purifying selection
in sarcopterygians. Gamma synuclein is the ancient and most functionally diverged
gene among three synuclein. Expression data rendered evidence in the favor of
functional divergence among synuclein paralogs, as α synuclein is abundantly present
in catecholaminergic regions while β synuclein expression is weak or absent in these
regions and abundant in somatic cholinergic regions (J.-Y. Li, Jensen, & Dahlström,
2002). The most ancient duplicated paralog of this family, γ synuclein appears to be
localized in both catecholaminergic and somatic cholinergic regions but also has
differential expression pattern in a selected population of the peripheral and central
neuron (J.-Y. Li, et al., 2002; Ninkina et al., 2003). However, α and β synuclein are
able to substitute each other in the auditory system, indicating their role in this system
is ancestrally derived (Mooney, 2009). Furthermore, researchers have identified
distinct function of α and γ synuclein in neuronal synapses to rescue the phenotype
developed by the ablation of a CSPα gene (Chandra, Gallardo, Fernández-Chacón,
Schlüter, & Südhof, 2005; Ninkina et al., 2012). This indicates that protein
diversification between α and γ synuclein (especially in the carboxyl terminal domain)
could have arisen after first duplication event. In spite of above observations, the high
degree of functional redundancy was observed among paralogs for synaptic structure
and terminal size, as well as age dependent neuronal dysfunctions in αβγ-Synuclein
triple knockout mice (Greten-Harrison et al., 2010). These phenotypic changes did not
Chapter 4 Discussion
109
appear in mice when one or two paralogs were deleted (Greten-Harrison, et al., 2010).
This suggests that in addition to ancestral function, each paralog could have
substantially acquired paralog specific function that cannot be compensated by the
other members that is sufficient to explain why functions of mutated α synuclein are
not compensated by two other paralog in PD patients. The mechanism by which
mutated α synuclein induces pathogenesis invokes multiple pathways such as self-
aggregation, and act as chaperon protein that are known to affect substantial nigral
neuronal viability (Figure 4.2a, b and e).
Figure 4. 2: Schematic overview of neurodegenerative (a, b,e) and neuroprotective (c,d) role of
alpha synuclein.
α synuclein protein play a dual role in nervous system. a) Abberent α synuclein interact with 14-3-3
proteins and forms aggregates. As a result of this interaction, pro-apoptotic proteins are translocated
into mitochondria and inhibit anti-apoptotic activity of Bcl2 and Bcl-xL and initiate caspase-9
dependent neuronal death by releasing cytochrome c. b) Binding of α synuclein with dopamine
generates ROS which increasing mitochondrial membrane permiability and activate caspase dependent
neuronal apoptosis by releasing cytochrome c. c) Wild type α synuclein protect dopaminergic neuron from apoptosis by binding with pro-apoptotic proteins and inhibit their translocation into mitochondria.
d) Wild type cytoplasmic α synuclein inhibit p300 and NFkB-p65 acetyltion by inhibiting histone acetyl-
transferase activity of p300 which leads to interuption in their binding to promoter region, ultimateley
inhibit transcription of PKCδ and other pro-apoptotic proteins. e) Mutant α synuclein induce ER stress
which promote neuronal apoptosis either by Ca2+ions or by pro-apoptotic proteins. ROS: Reactive
oxygen specie, M/P/O: Mutations/post translational modifications/overexpression, ER: Endoplasmic
reticulumn, P: Phosphorylation, Ac: Acetylation.
Alpha synuclein is traditionally considered as intrinsically disordered protein; however
on binding to its target it undergoes transitions to more ordered alpha-helical
conformation (Dikiy & Eliezer, 2012; Siddiqui, et al., 2016). Evolutionary studies
Chapter 4 Discussion
110
suggest that α synuclein has attained its intrinsic disordered conformation through a
series of transitions in NAC and carboxyl-terminus acidic domain which actually
regulate the structural dynamics of small region of amino-terminus lipid binding
domain 32-58 (critical region) by epistatic effect (Siddiqui, et al., 2016). Generally,
intrinsically disordered proteins exhibit a specific amino acid sequence that develops
long range interactions within protein to prevent aggregation (Kokhan, Van‘kin,
Bachurin, & Shamakina, 2013). In case of α synuclein, GAV motif present in NAC
domain is considered to be the most aggregation prone region but it is partially
protected by long range interactions between domains through positively and
negatively charged residues of amino and carboxyl termini (Du, et al., 2003; Lashuel,
Overk, Oueslati, & Masliah, 2012; Siddiqui, et al., 2016; Uverskya & Finka, 2002).
Recent study revealed that disease associated mutations (all present in critical
region:32-58) altered the lipid binding and NAC domain dynamics (Siddiqui, et al.,
2016). Therefore, it can be speculated that these pathogenic mutations might disturb
the long range interactions between synuclein domains and thus increase the
propensity of self-assembly to aggregate into neurotoxic oligomer. These neurotoxic
oligomers induce endoplasmic reticulum stress which initiates neuronal cell death in
three distinct ways; first by releasing Ca+ that increase the permeability of
mitochondrial membrane and activate mitochondrial neuronal death pathway by
releasing cytochrome c (Figure 4.2e) (Hald & Lotharius, 2005; W. W. Smith, 2005).
Secondly, ER stress increases the maturation of proapoptotic protein (Bad) by
cleavage, which gets translocated into mitochondria and inhibits antiapoptotic activity
of Bcl-xL and Bcl2, ultimately activating the cytochrome c dependent mitochondrial
apoptotic pathway (Figure 4.2e)(W. W. Smith, 2005). Third pathway is mitochondrial
independent by activating caspase-12 which directly activates caspase-3 to induce
neuronal death (Figure 4.2e) (W. W. Smith, 2005).
It has been recently reported that 54-84 kDa protein complex of α synuclein and 14-3-
3 chaperon protein is present in the substantia nigra of PD patients (Binolfi et al.,
2008). α synuclein interacts with 14-3-3 protein through critical region. This suggests
that pathogenic mechanism in PD is mediated by interaction between α synuclein and
14-3-3. This complex reduces the anti-apoptotic activity of 14-3-3 protein and
promotes neuronal apoptosis by inhibiting interaction of 14-3-3 with proapoptotic
protein such as Bad and Bax (Figure 4.2a)(Berg, Holzmann, & Riess, 2003). Mutations
Chapter 4 Discussion
111
and post-translational modifications might increase interaction propensity of α
synuclein towards 14-3-3 protein which might explain the presence of this complex in
PD. Additionally, 14-3-3 and α synuclein are also important in dopamine synthesis
(Berg, et al., 2003; Sidhu, Wersinger, MOUSSA, & Vernier, 2004). Normally, 14-3-3
binds to phosphorylated tyrosine hydrolase and enhances the activity of tyrosine
hydrolase and subsequent dopamine production. On the other hand α synuclein reduces
the activity of tyrosine hydrolase and dopamine production through binding with
dephosphorylated tyrosine hydrolase (Berg, et al., 2003).
The interaction between α synuclein and dopamine plays very important role in the
production of cytotoxic species and hence pathogenesis of PD because auto oxidation
of dopamine is compulsory for this interaction (Bisaglia et al., 2010; Chan et al.,
2012). The amino-terminal critical region of α synuclein forms the interface for
dopamine dependent oligomerisation of α synuclein (Leong et al., 2015). Dopamine
stabilizes the neurotoxic oligomeric form of α synuclein (Cookson et al., 2009; Lee et
al., 2011). Additionally, ROS can cause oxidative stress and alter the function of
protein, DNA and lipids, resulting in mitochondrial impairment and ultimately
increasing the neuronal vulnerability. The mutated α synuclein (especially A53T) has a
greater tendency to trigger the dopamine dependent neurotoxicity at low concentration
than wild type (Pan, Bruening, Giasson, Lee, & Godwin, 2002). As all these mutations
are residing in the critical region which is important for dopamine interaction, might
disrupt the structural integrity and perhaps increase its propensity towards dopamine
by forming neurotoxic adduct α synuclein: dopamine quinone (Leong, et al., 2015;
Siddiqui, et al., 2016). This suggests that dopamine induces pathogenicity in two ways,
firstly, by forming neurotoxic adduct with α synuclein, and secondly, by producing
cytotoxic ROS species which promotes neuronal death by cytochrome c dependent
caspase activation in cytosol (Figure 4.2b).
At physiological condition wild type α synuclein is considered to be involved in
antiapoptotic and/or neuroprotective phenotype (da Costa, Paitel, Vincent, & Checler,
2002). Physiological concentration of wild type α synuclein was found to protect non-
differentiated brain dopaminergic cells, cortical and hippocampal neurons against
neurotoxicity induced either by oxidative stress, rotenone and 1-methyl-4-
phenylpyridinium (MPP+) (da Costa, Ancolio, & Checler, 2000; Sidhu, et al., 2004).
Recently, the neuroprotective role of native wild type α synuclein has been reported
Chapter 4 Discussion
112
against MPP+ and retenone toxicity by modulating the expression of Protein Kinase C
delta (PKCδ) in dopaminergic neurons (Kaul, Anantharam, Kanthasamy, &
Kanthasamy, 2005). Wild type α synuclein diminishes the PKCδ transcription to
inhibit apoptosis by down regulating the enzymatic activity of p300 protein which is
parallel to the loss of its corresponding histone acytyltransferase activity (HAT
activity), as a result inhibits the p300 mediated acetylation of NFkB-p65 (Arumugam
et al., 2014; Jin et al., 2011; Lanzillotta et al., 2015). As a consequence, NFkB and
p300 do not bind to PKCδ promoter region and generalized transcription machinery,
eventually inhibiting PKCδ transcription (Jin, et al., 2011).Therefore, downregulation
of PKCδ by α synuclein confers neuroprotection due to the reduced proteolytic
activation of PKCδ (Figure 4.2d). However, current study shows that PD associated
mutations in α synuclein not only alter its physio-chemical properties but also modify
its regulatory functions (Segura-Ulate, Yang, Vargas-Medrano, & Perez, 2017).
Furthermore, Physiological concentration of native α synuclein and 14-3-3
antiapoptotic protein together prevents degeneration of dopaminergic neurons by
inhibiting proapoptotic proteins translocation into mitochondria and hence block the
apoptosis of dopaminergic neuronal cells, while mutations disturbed the interaction of
α synuclein with proapoptotic protein (Figure 4.2c) (Berg, et al., 2003). Wild type α
synuclein drastically inhibits p53 dependent caspase 3 activation and apoptosis by
reducing both p53 expression and transcriptional activity in non-dopaminergic neurons
(da Costa, et al., 2002). These two proteins regulate the expression of each other in a
feedback mechanism and explain a functional interplay driving their cellular
homeostasis in neurons (Duplan, Giordano, Checler, & Alves da Costa, 2016).
However, mutations in α synuclein disturb the homeostasis of both proteins and might
explain the neuronal death in PD brain.
Chapter 5 Conclusion and Future Prospects
113
Conclusion and Future Prospects
This study demonstrates that almost all the analyzed primary microcephaly genes
maintain their conserve functions throughout the placental mammals except STIL and
SAAS6. Collectively, the data demonstrate that dramatic evolutionary expansion of
human brain size during Pliocene-Pleistocene period might have not concomitant to
the human specific substitutions in the coding sequences of human microcephaly genes
only. However, transcriptional and posttranslational changes with the combination of
human specific changes in MCPH genes might have been responsible for the
evolutionary expansion of human brain size during Pliocene-Pleistocene period as
coding and noncoding regulatory changes amend the functional impact of each other.
In future, cis-regulatory elements of MCPH loci will be identified by comparative
genomics approach because whose evolutionary patterns might provide the
underpinnings for the dramatic expansion of human brain size during the upper
Pliocene and Pleistocene age. Then functional testing in model organism will be
performed on those noncoding sequences which are adaptively evolved in human
lineage and also trying to elucidate the complex conditional effects of human specific
coding and noncoding changes in MCPH loci.
Amino terminal lipid binding domain region (32-58 amino acids) of) of α synuclein is
most critical region, not only for evolutionary perspective but also evidently
significant for the normal cellular function of α synuclein as well as in Parkinson‘s
disease pathogenesis. Alpha synuclein develops interactions through critical region
with variety of proteins which are involved in apoptosis and transcriptional regulation.
Mutations in α synuclein cause drastic structural shifts in amino terminal and NAC
domains and might alter its interaction propensity towards its interacting proteins and
dopamine, which ultimately induce pathogenesis. In future, more evolutionary analysis
study was conducted on all the identified Parkinson‘s associated genes to completely
understand whether and why Parkinsonism is specific to humans solely.
References
114
References
Abdullah, U., Farooq, M., Mang, Y., Bakhtiar, S. M., Fatima, A., Hansen, L., et al.
(2017). A novel mutation in CDK5RAP2 gene causes primary microcephaly
with speech impairment and sparse eyebrows in a consanguineous Pakistani
family. European journal of medical genetics, 60(12), 627-630.
Abecasis, G. R., Auton, A., Brooks, L. D., DePristo, M. A., Durbin, R. M., Handsaker,
R. E., et al. (2012). An integrated map of genetic variation from 1,092 human
genomes. Nature, 491(7422), 56-65.
Al-Dosari, M. S., Shaheen, R., Colak, D., & Alkuraya, F. S. (2010). Novel CENPJ
mutation causes Seckel syndrome. Journal of medical genetics, 47(6), 411-414.
Alakbarzade, V., Hameed, A., Quek, D. Q., Chioza, B. A., Baple, E. L., Cazenave-
Gassiot, A., et al. (2015). A partially inactivating mutation in the sodium-
dependent lysophosphatidylcholine transporter MFSD2A causes a non-lethal
microcephaly syndrome. Nature genetics, 47(7), 814.
Alkema, M. J., Bronk, M., Verhoeven, E., Otte, A., van't Veer, L. J., Berns, A., et al.
(1997). Identification of Bmi1-interacting proteins as constituents of a
multimeric mammalian polycomb complex. Genes & development, 11(2), 226-
240.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic
Local Alignment Search Tool. Journal of Molecular Biology, 215(3), 403-410.
Amigo, J., Salas, A., Phillips, C., & Carracedo, A. (2008). SPSmart: adapting
population based SNP genotype databases for fast and comprehensive web
access. BMC bioinformatics, 9, 428.
Anisimova, M., & Kosiol, C. (2008). Investigating protein-coding sequence evolution
with probabilistic codon substitution models. Molecular biology and evolution,
26(2), 255-271.
Aplan, P. D., Lombardi, D. P., & Kirsch, I. R. (1991). Structural characterization of
SIL, a gene frequently disrupted in T-cell acute lymphoblastic leukemia.
Molecular and cellular biology, 11(11), 5462-5469.
Arquint, C., & Nigg, E. A. (2016). The PLK4–STIL–SAS-6 module at the core of
centriole duplication. Biochemical Society Transactions, 44(5), 1253-1263.
Arumugam, T. V., Liang, J., Luan, Y., Lu, B., Zhang, H., Luo, Y.-n., et al. (2014).
Protection of Ischemic Postconditioning against Neuronal Apoptosis Induced
by Transient Focal Ischemia Is Associated with Attenuation of NF-κB/p65
Activation. PloS one, 9(5), e96734.
Awad, S., Al-Dosari, M. S., Al-Yacoub, N., Colak, D., Salih, M. A., Alkuraya, F. S., et
al. (2013). Mutation in PHC1 implicates chromatin remodeling in primary
microcephaly pathogenesis. Human molecular genetics, 22(11), 2200-2213.
Barr, A. R., Kilmartin, J. V., & Gergely, F. (2010). CDK5RAP2 functions in
centrosome to spindle pole attachment and DNA damage response. The
Journal of cell biology, 189(1), 23-39.
Barrera, J. A., Kao, L.-R., Hammer, R. E., Seemann, J., Fuchs, J. L., & Megraw, T. L.
(2010). CDK5RAP2 regulates centriole engagement and cohesion in mice.
Developmental cell, 18(6), 913-926.
Basit, S., Al-Harbi, K. M., Alhijji, S. A., Albalawi, A. M., Alharby, E., Eldardear, A.,
et al. (2016). CIT, a gene involved in neurogenic cytokinesis, is mutated in
human primary microcephaly. Human genetics, 135(10), 1199-1207.
References
115
Bastir, M., Rosas, A., Gunz, P., Peña-Melian, A., Manzi, G., Harvati, K., et al. (2011).
Evolution of the base of the brain in highly encephalized human species.
Nature Communications, 2, 588.
Basto, R., Lau, J., Vinogradova, T., Gardiol, A., Woods, C. G., Khodjakov, A., et al.
(2006). Flies without centrioles. Cell, 125(7), 1375-1386.
Bayés, À., Collins, M. O., Reig-Viader, R., Gou, G., Goulding, D., Izquierdo, A., et al.
(2017). Evolution of complexity in the zebrafish synapse proteome. [Article].
Nature Communications, 8, 14613.
Ben-Zvi, A., Lacoste, B., Kur, E., Andreone, B. J., Mayshar, Y., Yan, H., et al. (2014).
Mfsd2a is critical for the formation and function of the blood–brain barrier.
Nature, 509(7501), 507.
Berg, D., Holzmann, C., & Riess, O. (2003). 14-3-3 proteins in the nervous system.
Nature Reviews Neuroscience, 4(9), 752-762.
Berger, J. H., Charron, M. J., & Silver, D. L. (2012). Major facilitator superfamily
domain-containing protein 2a (MFSD2A) has roles in body growth, motor
function, and lipid metabolism. PloS one, 7(11), e50629.
Betts, M. J., & Russell, R. B. (2003). Amino acid properties and consequences of
substitutions. Bioinformatics for geneticists, 317, 289-298.
Beukelaers, P., Vandenbosch, R., Caron, N., Nguyen, L., Belachew, S., Moonen, G., et
al. (2011). Cdk6‐Dependent Regulation of G1 Length Controls Adult
Neurogenesis. Stem cells, 29(4), 713-724.
Bielawski, J. P., & Yang, Z. (2004). A maximum likelihood method for detecting
functional divergence at individual codon sites, with application to gene family
evolution. Journal of Molecular Evolution, 59(1), 121-132.
Bilguvar, K., Ozturk, A. K., Louvi, A., Kwan, K. Y., Choi, M., Tatli, B., et al. (2010).
Whole-exome sequencing identifies recessive WDR62 mutations in severe
brain malformations. Nature, 467(7312), 207-U293.
Binolfi, A., Lamberto, G. R., Duran, R., Quintanar, L., Bertoncini, C. W., Souza, J. M.,
et al. (2008). Site-specific interactions of Cu (II) with α and β-synuclein:
bridging the molecular gap between metal binding and aggregation. Journal of
the American Chemical Society, 130(35), 11801-11812.
Bisaglia, M., Greggio, E., Maric, D., Miller, D. W., Cookson, M. R., & Bubacco, L.
(2010). α-Synuclein overexpression increases dopamine toxicity in BE(2)-M17
cells. BMC Neuroscience, 11(1), 41.
Bisaglia, M., Mammi, S., & Bubacco, L. (2009). Structural insights on physiological
functions and pathological effects of α-synuclein. The FASEB Journal, 23(2),
329-340.
Blachon, S., Gopalakrishnan, J., Omori, Y., Polyanovsky, A., Church, A., Nicastro, D.,
et al. (2008). Drosophila Asterless the ortholog of vertebrate Cep152 is
essential for centriole duplication. Genetics.
Bogoyevitch, M. A., Yeap, Y. Y. C., Qu, Z. D., Ngoei, K. R., Yip, Y. Y., Zhao, T. T.,
et al. (2012). WD40-repeat protein 62 is a JNK-phosphorylated spindle pole
protein required for spindle maintenance and timely mitotic progression.
Journal of Cell Science, 125(21), 5096-5109.
Bond, J., Roberts, E., Mochida, G. H., Hampshire, D. J., Scott, S., Askham, J. M., et
al. (2002). ASPM is a major determinant of cerebral cortical size. Nature
genetics, 32(2), 316.
Bond, J., Roberts, E., Springell, K., Lizarraga, S., Scott, S., Higgins, J., et al. (2005). A
centrosomal mechanism involving CDK5RAP2 and CENPJ controls brain size.
Nature genetics, 37(4), 353.
References
116
Boyd, J. L., Skove, S. L., Rouanet, J. P., Pilaz, L.-J., Bepler, T., Gordân, R., et al.
(2015). Human-chimpanzee differences in a FZD8 enhancer alter cell-cycle
dynamics in the developing neocortex. Current Biology, 25(6), 772-779.
Bruner, E. (2010). Morphological differences in the parietal lobes within the human
genus: a neurofunctional perspective. Current Anthropology, 51(S1), S77-S88.
Brunet, M., Guy, F., Pilbeam, D., Mackaye, H. T., Likius, A., Ahounta, D., et al.
(2002). A new hominid from the Upper Miocene of Chad, Central Africa.
Nature, 418(6894), 145.
Bryant, K. L., & Preuss, T. M. (2018). A Comparative Perspective on the Human
Temporal Lobe. In E. Bruner, N. Ogihara & H. C. Tanabe (Eds.), Digital
Endocasts: From Skulls to Brains (pp. 239-258). Tokyo: Springer Japan.
Buslje, C. M., Santos, J., Delfino, J. M., & Nielsen, M. (2009). Correction for
phylogeny, small number of observations and data redundancy improves the
identification of coevolving amino acid pairs using mutual information.
Bioinformatics, 25(9), 1125-1131.
Buslje, C. M., Teppa, E., Di Doménico, T., Delfino, J. M., & Nielsen, M. (2010).
Networks of high mutual information define the structural proximity of
catalytic sites: implications for catalytic residue identification. PLoS
computational biology, 6(11), e1000978.
Bystron, I., Blakemore, C., & Rakic, P. (2008). Development of the human cerebral
cortex: Boulder Committee revisited. Nature Reviews Neuroscience, 9(2), 110.
Campaner, S., Kaldis, P., Izraeli, S., & Kirsch, I. R. (2005). Sil phosphorylation in a
Pin1 binding domain affects the duration of the spindle checkpoint. Molecular
and cellular biology, 25(15), 6660-6672.
Campion, D., Martin, C., Heilig, R., Charbonnier, F., Moreau, V., Flaman, J. M., et al.
(1995). The NACP/synuclein gene: chromosomal assignment and screening for
alterations in Alzheimer disease. Genomics, 26(2), 254-257.
Carroll, S. B. (2003). Genetics and the making of Homo sapiens. Nature, 422(6934),
849-857.
Castiel, A., Danieli, M. M., David, A., Moshkovitz, S., Aplan, P. D., Kirsch, I. R., et
al. (2011). The Stil protein regulates centrosome integrity and mitosis through
suppression of Chfr. Journal of Cell Science, jcs. 079731.
Chan, T., Chow, A. M., Cheng, X. R., Tang, D. W. F., Brown, I. R., & Kerman, K.
(2012). Oxidative Stress Effect of Dopamine on α-Synuclein: Electroanalysis
of Solvent Interactions. ACS Chemical Neuroscience, 3(7), 569-574.
Chandra, S., Gallardo, G., Fernández-Chacón, R., Schlüter, O. M., & Südhof, T. C.
(2005). α-Synuclein cooperates with CSPα in preventing neurodegeneration.
Cell, 123(3), 383-396.
Charrier, C., Joshi, K., Coutinho-Budd, J., Kim, J.-E., Lambert, N., De Marchena, J., et
al. (2012). Inhibition of SRGAP2 function by its human-specific paralogs
induces neoteny during spine maturation. Cell, 149(4), 923-935.
Chen, C.-Y., Olayioye, M. A., Lindeman, G. J., & Tang, T. K. (2006). CPAP interacts
with 14-3-3 in a cell cycle-dependent manner. Biochemical and biophysical
research communications, 342(4), 1203-1210.
Cho, J.-H., Chang, C.-J., Chen, C.-Y., & Tang, T. K. (2006). Depletion of CPAP by
RNAi disrupts centrosome integrity and induces multipolar spindles.
Biochemical and biophysical research communications, 339(3), 742-747.
Cmarko, D., Verschure, P. J., Otte, A. P., van Driel, R., & Fakan, S. (2003). Polycomb
group gene silencing proteins are concentrated in the perichromatin
compartment of the mammalian nucleus. Journal of Cell Science, 116(2), 335-
343.
References
117
Cohen-Katsenelson, K., Wasserman, T., Khateb, S., Whitmarsh, A. J., & Aronheim, A.
(2011). Docking interactions of the JNK scaffold protein WDR62. Biochem J,
439(3), 381-390.
Cookson, M. R., Outeiro, T. F., Klucken, J., Bercury, K., Tetzlaff, J., Putcha, P., et al.
(2009). Dopamine-Induced Conformational Changes in Alpha-Synuclein. PloS
one, 4(9), e6906.
da Costa, C. A., Ancolio, K., & Checler, F. (2000). Wild-type but not Parkinson's
disease-related ala-53→ Thr mutant α-synuclein protects neuronal cells from
apoptotic stimuli. Journal of Biological Chemistry, 275(31), 24065-24069.
da Costa, C. A., Paitel, E., Vincent, B., & Checler, F. (2002). α-Synuclein Lowers p53-
dependent Apoptotic Response of Neuronal Cells ABOLISHMENT BY 6-
HYDROXYDOPAMINE AND IMPLICATION FOR PARKINSON′ S
DISEASE. Journal of Biological Chemistry, 277(52), 50980-50984.
Dennis, M. Y., Nuttle, X., Sudmant, P. H., Antonacci, F., Graves, T. A., Nefedov, M.,
et al. (2012). Evolution of human-specific neural SRGAP2 genes by
incomplete segmental duplication. Cell, 149(4), 912-922.
Desir, J., Cassart, M., David, P., Van Bogaert, P., & Abramowicz, M. (2008). Primary
microcephaly with ASPM mutation shows simplified cortical gyration with
antero‐posterior gradient pre‐and post‐natally. American journal of medical
genetics Part A, 146(11), 1439-1443.
Di Cunto, F., Calautti, E., Hsiao, J., Ong, L., Topley, G., Turco, E., et al. (1998).
Citron rho-interacting kinase, a novel tissue-specific ser/thr kinase
encompassing the Rho-Rac-binding protein Citron. Journal of Biological
Chemistry, 273(45), 29706-29711.
Di Cunto, F., Imarisio, S., Hirsch, E., Broccoli, V., Bulfone, A., Migheli, A., et al.
(2000). Defective neurogenesis in citron kinase knockout mice by altered
cytokinesis and massive apoptosis. Neuron, 28(1), 115-127.
Dikiy, I., & Eliezer, D. (2012). Folding and misfolding of alpha-synuclein on
membranes. Biochimica et Biophysica Acta (BBA)-Biomembranes, 1818(4),
1013-1018.
Dimas, A. S., Stranger, B. E., Beazley, C., Finn, R. D., Ingle, C. E., Forrest, M. S., et
al. (2008). Modifier effects between regulatory and protein-coding variation.
PLoS genetics, 4(10), e1000244.
Donahue, C. J., Glasser, M. F., Preuss, T. M., Rilling, J. K., & Van Essen, D. C.
(2018). Quantitative assessment of prefrontal cortex in humans relative to
nonhuman primates. Proceedings of the National Academy of Sciences,
201721653.
Dorus, S., Vallender, E. J., Evans, P. D., Anderson, J. R., Gilbert, S. L., Mahowald,
M., et al. (2004). Accelerated evolution of nervous system genes in the origin
of Homo sapiens. Cell, 119(7), 1027-1040.
Du, H. N., Tang, L., Luo, X. Y., Li, H. T., Hu, J., Zhou, J. W., et al. (2003). A peptide
motif consisting of glycine, alanine, and valine is required for the fibrillization
and cytotoxicity of human α-synuclein. Biochemistry, 42(29), 8870-8878.
Duplan, E., Giordano, C., Checler, F., & Alves da Costa, C. (2016). Direct α-synuclein
promoter transactivation by the tumor suppressor p53. Molecular
Neurodegeneration, 11(1).
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Research, 32(5), 1792-1797.
Ericson, K. K., Krull, D., Slomiany, P., & Grossel, M. J. (2003). Expression of Cyclin-
Dependent Kinase 6, but not Cyclin-Dependent Kinase 4, Alters Morphology
References
118
of Cultured Mouse Astrocytes11NSF under CAREER grant# 9984454 to
Martha J. Grossel. Molecular Cancer Research, 1(9), 654-664.
Evans, P. D., Anderson, J. R., Vallender, E. J., Gilbert, S. L., Malcom, C. M., Dorus,
S., et al. (2004). Adaptive evolution of ASPM, a major determinant of cerebral
cortical size in humans. Human molecular genetics, 13(5), 489-494.
Evans, P. D., Gilbert, S. L., Mekel-Bobrov, N., Vallender, E. J., Anderson, J. R., Vaez-
Azizi, L. M., et al. (2005). Microcephalin, a gene regulating brain size,
continues to evolve adaptively in humans. Science, 309(5741), 1717-1720.
Evans, P. D., Vallender, E. J., & Lahn, B. T. (2006). Molecular evolution of the brain
size regulator genes CDK5RAP2 and CENPJ. Gene, 375, 75-79.
Faheem, M., Naseer, M. I., Rasool, M., Chaudhary, A. G., Kumosani, T. A., Ilyas, A.
M., et al. (2015). Molecular genetics of human primary microcephaly: an
overview. BMC medical genomics, 8(1), S4.
Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the
bootstrap. Evolution, 39(4), 783-791.
Fietz, S. A., Lachmann, R., Brandl, H., Kircher, M., Samusik, N., Schröder, R., et al.
(2012). Transcriptomes of germinal zones of human and mouse fetal neocortex
suggest a role of extracellular matrix in progenitor self-renewal. Proceedings of
the National Academy of Sciences, 109(29), 11836-11841.
Filges, I., Nosova, E., Bruder, E., Tercanli, S., Townsend, K., Gibson, W., et al.
(2014). Exome sequencing identifies mutations in KIF14 as a novel cause of an
autosomal recessive lethal fetal ciliopathy phenotype. Clinical genetics, 86(3),
220-228.
Finlay, B. L., & Darlington, R. B. (1995). Linked regularities in the development and
evolution of mammalian brains. Science, 268(5217), 1578-1584.
Fish, J. L., Kosodo, Y., Enard, W., Pääbo, S., & Huttner, W. B. (2006). Aspm
specifically maintains symmetric proliferative divisions of neuroepithelial
cells. Proceedings of the National Academy of Sciences, 103(27), 10438-
10443.
Fletcher, W., & Yang, Z. (2010). The effect of insertions, deletions, and alignment
errors on the branch-site test of positive selection. Molecular biology and
evolution, 27(10), 2257-2267.
Florio, M., Albert, M., Taverna, E., Namba, T., Brandl, H., Lewitus, E., et al. (2015).
Human-specific gene ARHGAP11B promotes basal progenitor amplification
and neocortex expansion. Science, 347(6229), 1465-1470.
Florio, M., Borrell, V., & Huttner, W. B. (2017). Human-specific genomic signatures
of neocortical expansion. Current opinion in neurobiology, 42, 33-44.
Florio, M., Namba, T., Pääbo, S., Hiller, M., & Huttner, W. B. (2016). A single splice
site mutation in human-specific ARHGAP11B causes basal progenitor
amplification. Science advances, 2(12), e1601941.
Fu, Y. X., & Li, W. H. (1993). Statistical tests of neutrality of mutations. Genetics,
133(3), 693-709.
Fujikura, K., Setsu, T., Tanigaki, K., Abe, T., Kiyonari, H., Terashima, T., et al.
(2013). Kif14 mutation causes severe brain malformation and
hypomyelination. PloS one, 8(1), e53490.
Gagneux, P., & Varki, A. (2001). Genetic differences between humans and great apes.
Molecular Phylogenetics and Evolution, 18(1), 2-13.
Gai, M., Bianchi, F. T., Vagnoni, C., Vernì, F., Bonaccorsi, S., Pasquero, S., et al.
(2016). ASPM and CITK regulate spindle orientation by affecting the
dynamics of astral microtubules. EMBO reports, e201541823.
References
119
Gao, H.-M., & Hong, J.-S. (2008). Why neurodegenerative diseases are progressive:
uncontrolled inflammation drives disease progression. Trends in immunology,
29(8), 357-365.
Genin, A., Desir, J., Lambert, N., Biervliet, M., Van Der Aa, N., Pierquin, G., et al.
(2012). Kinetochore KMN network gene CASC5 mutated in primary
microcephaly. Human molecular genetics, 21(24), 5306-5317.
George, J. M. (2002). The synucleins. Genome Biol, 3(1), 3002.3001-3002.3006.
Geschwind, D. H., & Rakic, P. (2013). Cortical evolution: judge the brain by its cover.
Neuron, 80(3), 633-647.
Ghika, J. (2008). Paleoneurology: neurodegenerative diseases are age-related diseases
of specific brain regions recently developed by homo sapiens. Medical
hypotheses, 71(5), 788-801.
Goldman, N., & Yang, Z. (1994). A codon-based model of nucleotide substitution for
protein-coding DNA sequences. Molecular biology and evolution, 11(5), 725-
736.
Gonzalez, C., Saunders, R., Casal, J., Molina, I., Carmena, M., Ripoll, P., et al. (1990).
Mutations at the asp locus of Drosophila lead to multiple free centrosomes in
syncytial embryos, but restrict centrosome duplication in larval neuroblasts.
Journal of Cell Science, 96(4), 605-616.
Grantham, R. (1974). Amino acid difference formula to help explain protein evolution.
Science, 185(4154), 862-864.
Greten-Harrison, B., Polydoro, M., Morimoto-Tomita, M., Diao, L., Williams, A. M.,
Nie, E. H., et al. (2010). αβγ-Synuclein triple knockout mice reveal age-
dependent neuronal dysfunction. Proceedings of the National Academy of
Sciences, 107(45), 19573-19578.
Groussin, M., Hobbs, J. K., Szöllősi, G. J., Gribaldo, S., Arcus, V. L., & Gouy, M.
(2014). Toward more accurate ancestral protein genotype–phenotype
reconstructions with the use of species tree-aware gene trees. Molecular
biology and evolution, 32(1), 13-22.
Gruneberg, U., Neef, R., Li, X., Chan, E. H., Chalamalasetty, R. B., Nigg, E. A., et al.
(2006). KIF14 and citron kinase act together to promote efficient cytokinesis.
The Journal of cell biology, 172(3), 363-372.
Gu, X. (2001). Maximum-likelihood approach for gene family evolution under
functional divergence. Molecular biology and evolution, 18(4), 453-464.
Gu, X., & Vander Velden, K. (2002). DIVERGE: phylogeny-based analysis for
functional–structural divergence of a protein family. Bioinformatics, 18(3),
500-501.
Guemez-Gamboa, A., Nguyen, L. N., Yang, H., Zaki, M. S., Kara, M., Ben-Omran, T.,
et al. (2015). Inactivating mutations in MFSD2A, required for omega-3 fatty
acid transport in brain, cause a lethal microcephaly syndrome. Nature genetics,
47(7), 809.
Guernsey, D. L., Jiang, H., Hussin, J., Arnold, M., Bouyakdan, K., Perry, S., et al.
(2010). Mutations in centrosomal protein CEP152 in primary microcephaly
families linked to MCPH4. The American Journal of Human Genetics, 87(1),
40-51.
Gul, A., Hassan, M. J., Hussain, S., Raza, S. I., Chishti, M. S., & Ahmad, W. (2006).
A novel deletion mutation in CENPJ gene in a Pakistani family with autosomal
recessive primary microcephaly. Journal of human genetics, 51(9), 760-764.
Gunz, P., Neubauer, S., Golovanova, L., Doronichev, V., Maureille, B., & Hublin, J.-J.
(2012). A uniquely modern human pattern of endocranial development.
References
120
Insights from a new cranial reconstruction of the Neandertal newborn from
Mezmaiskaya. Journal of human evolution, 62(2), 300-313.
Hald, A., & Lotharius, J. (2005). Oxidative stress and inflammation in Parkinson's
disease: is there a causal link? Experimental neurology, 193(2), 279-290.
Hashimoto, M., & Masliah, E. (1999). Alpha synuclein in Lewy Body Disease and
Alzheimer's Disease. Brain pathology, 9(4), 707-720.
Hashimoto, N., Brock, H., Nomura, M., Kyba, M., Hodgson, J., Fujita, Y., et al.
(1998). RAE28, BMI1, and M33 are members of heterogeneous multimeric
mammalian Polycomb group complexes. Biochemical and biophysical
research communications, 245(2), 356-365.
Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., et al. (2002).
The Ensembl genome database project. Nucleic Acids Research, 30(1), 38-41.
Hung, L.-Y., Chen, H.-L., Chang, C.-W., Li, B.-R., & Tang, T. K. (2004).
Identification of a novel microtubule-destabilizing motif in CPAP that binds to
tubulin heterodimers and inhibits microtubule assembly. Molecular biology of
the cell, 15(6), 2697-2706.
Hussain, M. S., Baig, S. M., Neumann, S., Nürnberg, G., Farooq, M., Ahmad, I., et al.
(2012). A truncating mutation of CEP135 causes primary microcephaly and
disturbed centrosomal function. The American Journal of Human Genetics,
90(5), 871-878.
Hussain, M. S., Baig, S. M., Neumann, S., Peche, V. S., Szczepanski, S., Nürnberg, G.,
et al. (2013). CDK6 associates with the centrosome during mitosis and is
mutated in a large Pakistani family with primary microcephaly. Human
molecular genetics, 22(25), 5199-5214.
Huyton, T., Bates, P. A., Zhang, X., Sternberg, M. J., & Freemont, P. S. (2000). The
BRCA1 C-terminal domain: structure and function. Mutation Research/DNA
Repair, 460(3-4), 319-332.
International HapMap, C., Altshuler, D. M., Gibbs, R. A., Peltonen, L., Altshuler, D.
M., Gibbs, R. A., et al. (2010). Integrating common and rare genetic variation
in diverse human populations. Nature, 467(7311), 52-58.
Isono, K.-i., Fujimura, Y.-i., Shinga, J., Yamaki, M., Jiyang, O., Takihara, Y., et al.
(2005). Mammalian polyhomeotic homologues Phc2 and Phc1 act in synergy
to mediate polycomb repression of Hox genes. Molecular and cellular biology,
25(15), 6694-6706.
Izraeli, S., & Colaizzo-Anas, T. (1997). Expression of the SIL gene is correlated with
growth induction and cellular proliferation. leukemia, 3, 4.
Izraeli, S., Lowe, L. A., Bertness, V. L., Good, D. J., Dorward, D. W., Kirsch, I. R., et
al. (1999). The SIL gene is required for mouse embryonic axial development
and left–right specification. Nature, 399(6737), 691.
Jackson, A. P., Eastwood, H., Bell, S. M., Adu, J., Toomes, C., Carr, I. M., et al.
(2002). Identification of microcephalin, a protein implicated in determining the
size of the human brain. The American Journal of Human Genetics, 71(1), 136-
142.
Jamieson, C. R., Govaerts, C., & Abramowicz, M. J. (1999). Primary autosomal
recessive microcephaly: homozygosity mapping of MCPH4 to chromosome 15.
American journal of human genetics, 65(5), 1465.
Jayaraman, D., Bae, B.-I., & Walsh, C. A. (2018). The Genetics of Primary
Microcephaly. Annual review of genomics and human genetics(0).
Jayaraman, D., Kodani, A., Gonzalez, D. M., Mancias, J. D., Mochida, G. H.,
Vagnoni, C., et al. (2016). Microcephaly proteins Wdr62 and Aspm define a
References
121
mother centriole complex regulating centriole biogenesis, apical complex, and
cell fate. Neuron, 92(4), 813-828.
Jin, H., Kanthasamy, A., Ghosh, A., Yang, Y., Anantharam, V., & Kanthasamy, A. G.
(2011). -Synuclein Negatively Regulates Protein Kinase C Expression to
Suppress Apoptosis in Dopaminergic Neurons by Reducing p300 Histone
Acetyltransferase Activity. Journal of Neuroscience, 31(6), 2035-2051.
Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of
mutation data matrices from protein sequences. Bioinformatics, 8(3), 275-282.
Jouan, L., Bencheikh, B. O. A., Daoud, H., Dionne-Laporte, A., Dobrzeniecka, S.,
Spiegelman, D., et al. (2016). Exome sequencing identifies recessive
CDK5RAP2 variants in patients with isolated agenesis of corpus callosum.
European Journal of Human Genetics, 24(4), 607.
Kaessmann, H. (2010). Origins, evolution, and phenotypic impact of new genes.
Genome research, 20(10), 1313-1326.
Kaindl, A. M., Passemard, S., Kumar, P., Kraemer, N., Issa, L., Zwirner, A., et al.
(2010). Many roads lead to primary autosomal recessive microcephaly.
Progress in neurobiology, 90(3), 363-383.
Kakar, N., Ahmad, J., Morris-Rosendahl, D. J., Altmüller, J., Friedrich, K., Barbi, G.,
et al. (2015). STIL mutation causes autosomal recessive microcephalic lobar
holoprosencephaly. Human genetics, 134(1), 45-51.
Kalay, E., Yigit, G., Aslan, Y., Brown, K. E., Pohl, E., Bicknell, L. S., et al. (2011).
CEP152 is a genome maintenance protein disrupted in Seckel syndrome.
Nature genetics, 43(1), 23.
Kaul, S., Anantharam, V., Kanthasamy, A., & Kanthasamy, A. G. (2005). Wild-type
α-synuclein interacts with pro-apoptotic proteins PKCδ and BAD to protect
dopaminergic neuronal cells against MPP+-induced apoptotic cell death.
Molecular Brain Research, 139(1), 137-152.
Khan, M. A., Rupp, V. M., Orpinell, M., Hussain, M. S., Altmüller, J., Steinmetz, M.
O., et al. (2014). A missense mutation in the PISA domain of HsSAS-6 causes
autosomal recessive primary microcephaly in a large consanguineous Pakistani
family. Human molecular genetics, 23(22), 5940-5949.
Kim, D. S., & Hahn, Y. (2011). Identification of novel phosphorylation modification
sites in human proteins that originated after the human–chimpanzee
divergence. Bioinformatics, 27(18), 2494-2501.
Kim, K., Lee, S., Chang, J., & Rhee, K. (2008). A novel function of CEP135 as a
platform protein of C-NAP1 for its centriolar localization. Experimental cell
research, 314(20), 3692-3700.
Kirkham, M., Müller-Reichert, T., Oegema, K., Grill, S., & Hyman, A. A. (2003).
SAS-4 is a C. elegans centriolar protein that controls centrosome size. Cell,
112(4), 575-587.
Kiyomitsu, T., Obuse, C., & Yanagida, M. (2007). Human Blinkin/AF15q14 is
required for chromosome alignment and the mitotic checkpoint through direct
interaction with Bub1 and BubR1. Developmental cell, 13(5), 663-676.
Kochiyama, T., Ogihara, N., Tanabe, H. C., Kondo, O., Amano, H., Hasegawa, K., et
al. (2018). Reconstructing the Neanderthal brain using computational anatomy.
Scientific Reports, 8(1), 6296.
Kokhan, V. S., Van‘kin, G. I., Bachurin, S. O., & Shamakina, I. Y. (2013). Differential
involvement of the gamma-synuclein in cognitive abilities on the model of
knockout mice. BMC Neuroscience, 14(1), 1.
References
122
Kontopoulos, E., Parvin, J. D., & Feany, M. B. (2006). α-synuclein acts in the nucleus
to inhibit histone acetylation and promote neurotoxicity. Human Molecular
Genetics, 15(20), 3012-3023.
Kouprina, N., Pavlicek, A., Collins, N. K., Nakano, M., Noskov, V. N., Ohzeki, J.-I.,
et al. (2005). The microcephaly ASPM gene is expressed in proliferating
tissues and encodes for a mitotic spindle protein. Human molecular genetics,
14(15), 2155-2165.
Kousar, R., Hassan, M. J., Khan, B., Basit, S., Mahmood, S., Mir, A., et al. (2011).
Mutations in WDR62 gene in Pakistani families with autosomal recessive
primary microcephaly. Bmc Neurology, 11.
Kriegstein, A., Noctor, S., & Martínez-Cerdeño, V. (2006). Patterns of neural stem and
progenitor cell division may underlie evolutionary cortical expansion. Nature
Reviews Neuroscience, 7(11), 883.
Kruger, R., Kuhn, W., Muller, T., Woitalla, D., Graeber, M., Kosel, S., et al. (1998).
Ala30Pro mutation in the gene encoding α-synuclein in Parkinson's disease.
[10.1038/ng0298-106]. Nat Genet, 18(2), 106-108.
Kumar, A., Girimaji, S. C., Duvvari, M. R., & Blanton, S. H. (2009). Mutations in
STIL, encoding a pericentriolar and centrosomal protein, cause primary
microcephaly. The American Journal of Human Genetics, 84(2), 286-290.
Lanzillotta, A., Porrini, V., Bellucci, A., Benarese, M., Branca, C., Parrella, E., et al.
(2015). NF-κB in Innate Neuroprotection and Age-Related Neurodegenerative
Diseases. Frontiers in Neurology, 6.
Lashuel, H. A., Overk, C. R., Oueslati, A., & Masliah, E. (2012). The many faces of α-
synuclein: from structure and toxicity to therapeutic target. Nature Reviews
Neuroscience, 14(1), 38-48.
Lavedan, C., Leroy, E., Dehejia, A., Buchholtz, S., Dutra, A., Nussbaum, R. L., et al.
(1998). Identification, localization and characterization of the human γ-
synuclein gene. Human genetics, 103(1), 106-112.
Lee, H.-J., Baek, S. M., Ho, D.-H., Suk, J.-E., Cho, E.-D., & Lee, S.-J. (2011).
Dopamine promotes formation and secretion of non-fibrillar alpha-synuclein
oligomers. Experimental and Molecular Medicine, 43(4), 216.
Leidel, S., Delattre, M., Cerutti, L., Baumer, K., & Gönczy, P. (2005). SAS-6 defines a
protein family required for centrosome duplication in C. elegans and in human
cells. Nature cell biology, 7(2), 115.
Leidel, S., & Gönczy, P. (2005). Centrosome duplication and nematodes: recent
insights from an old relationship. Developmental cell, 9(3), 317-325.
Leong, S. L., Hinds, M. G., Connor, A. R., Smith, D. P., Toth, E. I., Pham, C., et al.
(2015). The N-Terminal Residues 43 to 60 Form the Interface for Dopamine
Mediated α-Synuclein Dimerisation. PloS one, 10(2), e0116497.
Lesage, S., Anheim, M., Letournel, F., Bousset, L., Honoré, A., Rozas, N., et al.
(2013). G51D α synuclein mutation causes a novel Parkinsonian–pyramidal
syndrome. Annals of neurology, 73(4), 459-471.
Létard, P., Drunat, S., Vial, Y., Duerinckx, S., Ernault, A., Amram, D., et al. (2018).
Autosomal recessive primary microcephaly due to ASPM mutations: An
update. Human mutation, 39(3), 319-332.
Li, H., Bielas, S. L., Zaki, M. S., Ismail, S., Farfara, D., Um, K., et al. (2016). Biallelic
mutations in citron kinase link mitotic cytokinesis to human primary
microcephaly. The American Journal of Human Genetics, 99(2), 501-510.
Li, J.-Y., Jensen, P. H., & Dahlström, A. (2002). Differential localization of α-, β-and
γ-synucleins in the rat CNS. Neuroscience, 113(2), 463-478.
References
123
Li, W. H. (1993). Unbiased Estimation of the Rates of Synonymous and
Nonsynonymous Substitution. Journal of Molecular Evolution, 36(1), 96-99.
Liang, H., Zhou, W., & Landweber, L. F. (2006). SWAKK: a web server for detecting
positive selection in proteins using a sliding window substitution rate analysis.
Nucleic Acids Research, 34(Web Server issue), W382-384.
Librado, P., & Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics, 25(11), 1451-1452.
Lin, Y. C., Chang, C. W., Hsu, W. B., Tang, C. J. C., Lin, Y. N., Chou, E. J., et al.
(2013). Human microcephaly protein CEP135 binds to hSAS‐6 and CPAP, and
is required for centriole assembly. The EMBO journal, 32(8), 1141-1154.
Lizarraga, S. B., Margossian, S. P., Harris, M. H., Campagna, D. R., Han, A.-P.,
Blevins, S., et al. (2010). Cdk5rap2 regulates centrosome function and
chromosome segregation in neuronal progenitors. Development, 137(11), 1907-
1917.
Löytynoja, A. (2014). Phylogeny-aware alignment with PRANK Multiple sequence
alignment methods (pp. 155-170): Springer.
Löytynoja, A., & Goldman, N. (2008). Phylogeny-aware gap placement prevents
errors in sequence alignment and evolutionary analysis. Science, 320(5883),
1632-1635.
Lücking, C., & Brice, A. (2000). Alpha-synuclein and Parkinson's disease. Cellular
and Molecular Life Sciences CMLS, 57(13-14), 1894-1908.
Mahony, D., Parry, D. A., & Lees, E. (1998). Active cdk6 complexes are
predominantly nuclear and represent only a minority of the cdk6 in T cells.
Oncogene, 16(5), 603.
McEvoy, B. P., Powell, J. E., Goddard, M. E., & Visscher, P. M. (2011). Human
population dispersal "Out of Africa'' estimated from linkage disequilibrium and
allele frequencies of SNPs. Genome research, 21(6), 821-829.
McHenry, H. M. (1994). Tempo and mode in human evolution. Proceedings of the
National Academy of Sciences, 91(15), 6780-6786.
Memon, M. M., Raza, S. I., Basit, S., Kousar, R., Ahmad, W., & Ansar, M. (2013). A
novel WDR62 mutation causes primary microcephaly in a Pakistani family.
Mol Biol Rep, 40(1), 591-595.
Moawia, A., Shaheen, R., Rasool, S., Waseem, S. S., Ewida, N., Budde, B., et al.
(2017). Mutations of KIF14 cause primary microcephaly by impairing
cytokinesis. Annals of neurology, 82(4), 562-577.
Mochida, G. H., & Walsh, C. A. (2001). Molecular genetics of human microcephaly.
Curr Opin Neurol, 14(2), 151-156.
Montgomery, S., & Mundy, N. (2012). Positive selection on NIN, a gene involved in
neurogenesis, and primate brain evolution. Genes, Brain and Behavior, 11(8),
903-910.
Montgomery, S. H., & Mundy, N. I. (2014). Microcephaly genes evolved adaptively
throughout the evolution of eutherian mammals. BMC evolutionary biology,
14(1), 120.
Mooney, M. (2009). Role of alpha synuclein in noise induced hearing loss.
Moynihan, L., Jackson, A. P., Roberts, E., Karbani, G., Lewis, I., Corry, P., et al.
(2000). A third novel locus for primary autosomal recessive microcephaly
maps to chromosome 9q34. The American Journal of Human Genetics, 66(2),
724-727.
Murdock, D. R., Clark, G. D., Bainbridge, M. N., Newsham, I., Wu, Y. Q., Muzny, D.
M., et al. (2011). Whole-Exome Sequencing Identifies Compound
Heterozygous Mutations in WDR62 in Siblings With Recurrent
References
124
Polymicrogyria. American journal of medical genetics Part A, 155A(9), 2071-
2077.
Nakazawa, Y., Hiraki, M., Kamiya, R., & Hirono, M. (2007). SAS-6 is a cartwheel
protein that establishes the 9-fold symmetry of the centriole. Current Biology,
17(24), 2169-2174.
Neubauer, S., Hublin, J.-J., & Gunz, P. (2018). The evolution of modern human brain
shape. Science advances, 4(1), eaao5961.
Nguyen, L. N., Ma, D., Shui, G., Wong, P., Cazenave-Gassiot, A., Zhang, X., et al.
(2014). Mfsd2a is a transporter for the essential omega-3 fatty acid
docosahexaenoic acid. Nature, 509(7501), 503.
Nicholas, A. K., Khurshid, M., Désir, J., Carvalho, O. P., Cox, J. J., Thornton, G., et
al. (2010). WDR62 is associated with the spindle pole and is mutated in human
microcephaly. Nature genetics, 42(11), 1010.
Nicholas, A. K., Khurshid, M., Desir, J., Carvalho, O. P., Cox, J. J., Thornton, G., et
al. (2010). WDR62 is associated with the spindle pole and is mutated in human
microcephaly. Nature genetics, 42(11), 1010-U1138.
Nielsen, M. S., Vorum, H., Lindersson, E., & Jensen, P. H. (2001). Ca2+ binding to α-
synuclein regulates ligand binding and oligomerization. Journal of Biological
Chemistry, 276(25), 22680-22684.
Nielsen, R., & Yang, Z. (1998). Likelihood models for detecting positively selected
amino acid sites and applications to the HIV-1 envelope gene. Genetics,
148(3), 929-936.
Ninkina, N., Papachroni, K., Robertson, D. C., Schmidt, O., Delaney, L., O'Neill, F., et
al. (2003). Neurons expressing the highest levels of γ-synuclein are unaffected
by targeted inactivation of the gene. Molecular and cellular biology, 23(22),
8233-8245.
Ninkina, N., Peters, O. M., Connor-Robson, N., Lytkina, O., Sharfeddin, E., &
Buchman, V. L. (2012). Contrasting effects of α-synuclein and γ-synuclein on
the phenotype of cysteine string protein α (CSPα) null mutant mice suggest
distinct function of these proteins in neuronal synapses. Journal of Biological
Chemistry, 287(53), 44471-44477.
Nonaka‐Kinoshita, M., Reillo, I., Artegiani, B., Martínez‐Martínez, M. Á., Nelson, M.,
Borrell, V., et al. (2013). Regulation of cerebral cortex size and folding by
expansion of basal progenitors. The EMBO journal, 32(13), 1817-1828.
O'Bleness, M., Searles, V. B., Varki, A., Gagneux, P., & Sikela, J. M. (2012).
Evolution of genetic and genomic features unique to the human lineage. Nature
Reviews Genetics, 13(12), 853-866.
Ohta, T., Essner, R., Ryu, J.-H., Palazzo, R. E., Uetake, Y., & Kuriyama, R. (2002).
Characterization of Cep135, a novel coiled-coil centrosomal protein involved
in microtubule organization in mammalian cells. J Cell Biol, 156(1), 87-100.
Olson, M. V., & Varki, A. (2003). Sequencing the chimpanzee genome: insights into
human evolution and disease. Nat Rev Genet, 4(1), 20-28.
Pan, Z.-Z., Bruening, W., Giasson, B. I., Lee, V. M.-Y., & Godwin, A. K. (2002). γ-
Synuclein promotes cancer cell survival and inhibits stress-and chemotherapy
drug-induced apoptosis by modulating MAPK pathways. Journal of Biological
Chemistry, 277(38), 35050-35060.
Panda, A., Begum, T., & Ghosh, T. C. (2012). Insights into the evolutionary features
of human neurodegenerative diseases. PloS one, 7(10), e48336.
Paramasivam, M., Chang, Y., & LoTurco, J. J. (2007). ASPM and citron kinase co-
localize to the midbody ring during cytokinesis. Cell cycle, 6(13), 1605-1612.
References
125
Park, J. S., Lee, M.-K., Kang, S., Jin, Y., Fu, S., Rosales, J. L., et al. (2015). Species-
specific expression of full-length and alternatively spliced variant forms of
CDK5RAP2. PloS one, 10(11), e0142577.
Pattison, L., Crow, Y. J., Deeble, V. J., Jackson, A. P., Jafri, H., Rashid, Y., et al.
(2000). A fifth locus for primary autosomal recessive microcephaly maps to
chromosome 1q31. The American Journal of Human Genetics, 67(6), 1578-
1580.
Pervaiz, N., & Abbasi, A. A. (2016). Molecular evolution of WDR62, a gene that
regulates neocorticogenesis. Meta gene, 9, 1-9.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng,
E. C., et al. (2004). UCSF Chimera—a visualization system for exploratory
research and analysis. Journal of computational chemistry, 25(13), 1605-1612.
Pfaff, K. L., Straub, C. T., Chiang, K., Bear, D. M., Zhou, Y., & Zon, L. I. (2007). The
zebra fish cassiopeia mutant reveals that SIL is required for mitotic spindle
organization. Molecular and cellular biology, 27(16), 5887-5897.
Polymeropoulos, M. H., Lavedan, C., Leroy, E., Ide, S. E., Dehejia, A., Dutra, A., et
al. (1997). Mutation in the α-synuclein gene identified in families with
Parkinson's disease. science, 276(5321), 2045-2047.
Preuss, T. M. (2011). The human brain: rewired and running hot. Annals of the New
York Academy of Sciences, 1225(1).
Preuss, T. M. (2012). Human brain evolution: from gene discovery to phenotype
discovery. Proc Natl Acad Sci U S A, 109 Suppl 1, 10709-10716.
Prüfer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S., et al.
(2014). The complete genome sequence of a Neanderthal from the Altai
Mountains. Nature, 505(7481), 43.
Pruitt, K. D., Tatusova, T., & Maglott, D. R. (2007). NCBI reference sequences
(RefSeq): a curated non-redundant sequence database of genomes, transcripts
and proteins. Nucleic Acids Research, 35, D61-D65.
Pulvers, J. N., Bryk, J., Fish, J. L., Wilsch-Bräuninger, M., Arai, Y., Schreier, D., et al.
(2010). Mutations in mouse Aspm (abnormal spindle-like microcephaly
associated) cause not only microcephaly but also major defects in the germline.
Proceedings of the National Academy of Sciences, 107(38), 16595-16600.
Rakic, P. (1988). Specification of cerebral cortical areas. Science, 241(4862), 170-176.
Rakic, P. (1995). A small step for the cell, a giant leap for mankind: a hypothesis of
neocortical expansion during evolution. Trends in neurosciences, 18(9), 383-
388.
Rakic, P. (2000). Radial unit hypothesis of neocortical expansion. Paper presented at
the Novartis Foundation Symposium.
Reich, D., Green, R. E., Kircher, M., Krause, J., Patterson, N., Durand, E. Y., et al.
(2010). Genetic history of an archaic hominin group from Denisova Cave in
Siberia. Nature, 468(7327), 1053.
Rhoads, A., & Kenguele, H. (2005). Expression of IQ-motif genes in human cells and
ASPM domain structure. ETHNICITY AND DISEASE, 15(4), S5.
Richardson, J., Shaaban, A. M., Kamal, M., Alisary, R., Walker, C., Ellis, I. O., et al.
(2011). Microcephalin is a new novel prognostic indicator in breast cancer
associated with BRCA1 inactivation. Breast cancer research and treatment,
127(3), 639-648.
Rightmire, G. P. (2004). Brain size and encephalization in early to mid‐Pleistocene
Homo. American Journal of Physical Anthropology: The Official Publication
of the American Association of Physical Anthropologists, 124(2), 109-123.
References
126
Rightmire, G. P. (2013). Homo erectus and Middle Pleistocene hominins: brain size,
skull form, and species recognition. Journal of human evolution, 65(3), 223-
252.
Rodrigues-Martins, A., Bettencourt-Dias, M., Riparbelli, M., Ferreira, C., Ferreira, I.,
Callaini, G., et al. (2007). DSAS-6 organizes a tube-like centriole precursor,
and its absence suggests modularity in centriole assembly. Current Biology,
17(17), 1465-1472.
Rodrigues-Martins, A., Riparbelli, M., Callaini, G., Glover, D. M., & Bettencourt-
Dias, M. (2008). From centriole biogenesis to cellular function: centrioles are
essential for cell division at critical developmental stages. Cell cycle, 7(1), 11-
16.
Russo, A. A., Tong, L., Lee, J.-O., Jeffrey, P. D., & Pavletich, N. P. (1998). Structural
basis for inhibition of the cyclin-dependent kinase Cdk6 by the tumour
suppressor p16 INK4a. Nature, 395(6699), 237.
Saadi, A., Borck, G., Boddaert, N., Chekkour, M. C., Imessaoudene, B., Munnich, A.,
et al. (2009). Compound heterozygous ASPM mutations associated with
microcephaly and simplified cortical gyration in a consanguineous Algerian
family. European journal of medical genetics, 52(4), 180-184.
Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth,
G., et al. (2001). A map of human genome sequence variation containing 1.42
million single nucleotide polymorphisms. Nature, 409(6822), 928-933.
Saitou, N., & Nei, M. (1987). The Neighbor-Joining Method - a New Method for
Reconstructing Phylogenetic Trees. Molecular biology and evolution, 4(4),
406-425.
Sarkisian, M. R., Li, W., Di Cunto, F., D'Mello, S. R., & LoTurco, J. J. (2002). Citron-
Kinase, A Protein Essential to Cytokinesis in Neuronal Progenitors, Is Deleted
in the FlatheadMutant Rat. Journal of Neuroscience, 22(8), RC217-RC217.
Sarkisian, M. R., Rattan, S., D'Mello, S. R., & LoTurco, J. J. (1999). Characterization
of seizures in the flathead rat: a new genetic model of epilepsy in early
postnatal development. Epilepsia, 40(4), 394-400.
Sato, R., Takanashi, J.-i., Tsuyusaki, Y., Kato, M., Saitsu, H., Matsumoto, N., et al.
(2016). Association between invisible basal ganglia and ZNF335 mutations: a
case report. Pediatrics, e20160897.
Schiebel, E. (2000). γ-tubulin complexes: binding to the centrosome, regulation and
microtubule nucleation. Current opinion in cell biology, 12(1), 113-118.
Schoenemann, P. T., Sheehan, M. J., & Glotzer, L. D. (2005). Prefrontal white matter
volume is disproportionately larger in humans than in other primates. Nature
neuroscience, 8(2), 242.
Segura-Ulate, I., Yang, B., Vargas-Medrano, J., & Perez, R. G. (2017). FTY720
(Fingolimod) reverses α-synuclein-induced downregulation of brain-derived
neurotrophic factor mRNA in OLN-93 oligodendroglial cells.
Neuropharmacology, 117, 149-157.
Semendeferi, K., & Damasio, H. (2000). The brain and its main anatomical
subdivisions in living hominoids using magnetic resonance imaging. Journal of
human evolution, 38(2), 317-332.
Shaheen, R., Hashem, A., Abdel-Salam, G. M., Al-Fadhli, F., Ewida, N., & Alkuraya,
F. S. (2016). Mutations in CIT, encoding citron rho-interacting serine/threonine
kinase, cause severe primary microcephaly in humans. Human genetics,
135(10), 1191-1197.
Sheik, S., Sundararajan, P., Hussain, A., & Sekar, K. (2002). Ramachandran plot on
the web. Bioinformatics, 18(11), 1548-1549.
References
127
Sherry, S. T., Ward, M. H., Kholodov, M., Baker, J., Phan, L., Smigielski, E. M., et al.
(2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids
Research, 29(1), 308-311.
Siddiqui, I. J., Pervaiz, N., & Abbasi, A. A. (2016). The Parkinson Disease gene
SNCA: Evolutionary and structural insights with pathological implication.
Scientific Reports, 6, 24475.
Sidhu, A., Wersinger, C., MOUSSA, C. E. H., & Vernier, P. (2004). The role of α‐
synuclein in both neuroprotection and neurodegeneration. Annals of the New
York Academy of Sciences, 1035(1), 250-270.
Silke, A. C., Carles, V. G., Encarnacion, M., Sherman, H., Yu, I., Shah, B., et al.
(2013). Alpha synuclein p. H50Q, a novel pathogenic mutation for Parkinson's
disease. Movement Disorders, 28(6), 811-813.
Simonetti, F. L., Teppa, E., Chernomoretz, A., Nielsen, M., & Marino Buslje, C.
(2013). MISTIC: mutual information server to infer coevolution. Nucleic Acids
Research, 41(W1), W8-W14.
Singleton, A., & Hardy, J. (2016). The evolution of genetics: Alzheimer‘s and
Parkinson‘s diseases. Neuron, 90(6), 1154-1163.
Smith, C. M., Finger, J. H., Hayamizu, T. F., McCright, I. J., Eppig, J. T., Kadin, J. A.,
et al. (2006). The mouse gene expression database (GXD): 2007 update.
Nucleic Acids Research, 35(suppl_1), D618-D623.
Smith, W. W. (2005). Endoplasmic reticulum stress and mitochondrial cell death
pathways mediate A53T mutant alpha-synuclein-induced toxicity. Human
Molecular Genetics, 14(24), 3801-3811.
Solano, S. M., Miller, D. W., Augood, S. J., Young, A. B., & Penney, J. B. (2000).
Expression of α‐synuclein, parkin, and ubiquitin carboxy‐terminal hydrolase
L1 mRNA in human brain: genes associated with familial Parkinson's disease.
Annals of neurology, 47(2), 201-210.
Spillantini, M. G., Divane, A., & Goedert, M. (1995). Assignment of human α-
synuclein (SNCA) and β-synuclein (SNCB) genes to chromosomes 4q21 and
5q35. Genomics, 27(2), 379-381.
Stancik, E. K., Navarro-Quiroga, I., Sellke, R., & Haydar, T. F. (2010). Heterogeneity
in ventricular zone neural precursors contributes to neuronal fate diversity in
the postnatal neocortex. Journal of Neuroscience, 30(20), 7028-7036.
Stephan, H., Frahm, H., & Baron, G. (1981). New and revised data on volumes of
brain structures in insectivores and primates. Folia primatologica, 35(1), 1-29.
Storey, J. D., Taylor, J. E., & Siegmund, D. (2004). Strong control, conservative point
estimation and simultaneous conservative consistency of false discovery rates:
a unified approach. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 66(1), 187-205.
Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies.
Proceedings of the National Academy of Sciences, 100(16), 9440-9445.
Stouffs, K., Stergachis, A., Vanderhasselt, T., Dica, A., Janssens, S., Vandervore, L.,
et al. (2018). Expanding the clinical spectrum of biallelic ZNF335 variants.
Clinical genetics.
Strnad, P., Leidel, S., Vinogradova, T., Euteneuer, U., Khodjakov, A., & Gönczy, P.
(2007). Regulated HsSAS-6 levels ensure formation of a single procentriole per
centriole during the centrosome duplication cycle. Developmental cell, 13(2),
203-213.
Sukumaran, S. K., Stumpf, M., Salamon, S., Ahmad, I., Bhattacharya, K., Fischer, S.,
et al. (2017). CDK5RAP2 interaction with components of the Hippo signaling
References
128
pathway may play a role in primary microcephaly. Molecular Genetics and
Genomics, 292(2), 365-383.
Surguchov, A., McMahan, B., Masliah, E., & Surgucheva, I. (2001). Synucleins in
ocular tissues. Journal of neuroscience research, 65(1), 68-77.
Swanson, W. J., Nielsen, R., & Yang, Q. (2003). Pervasive adaptive evolution in
mammalian fertilization proteins. Molecular biology and evolution, 20(1), 18-
20.
Szczepanski, S., Hussain, M. S., Sur, I., Altmüller, J., Thiele, H., Abdullah, U., et al.
(2016). A novel homozygous splicing mutation of CASC5 causes primary
microcephaly in a large Pakistani family. Human genetics, 135(2), 157-170.
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics, 123(3), 585-595.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., & Kumar, S. (2011).
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum
Likelihood, Evolutionary Distance, and Maximum Parsimony Methods.
Molecular biology and evolution, 28(10), 2731-2739.
Tang, C. J. C., Lin, S. Y., Hsu, W. B., Lin, Y. N., Wu, C. T., Lin, Y. C., et al. (2011).
The human microcephaly protein STIL interacts with CPAP and is required for
procentriole formation. The EMBO journal, 30(23), 4790-4804.
Team, R. C. (2018). R: A Language and Environment for Statistical Computing.
Teppa, E., Wilkins, A. D., Nielsen, M., & Buslje, C. M. (2012). Disentangling
evolutionary signals: conservation, specificity determining positions and
coevolution. Implication for catalytic residue prediction. BMC bioinformatics,
13(1), 235.
Thomopson, J., Higgins, D. G., & Gibson, T. (1994). ClustalW. Nucleic Acids
Research, 22, 4673-4680.
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994). Clustal-W - Improving the
Sensitivity of Progressive Multiple Sequence Alignment through Sequence
Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic
Acids Research, 22(22), 4673-4680.
Trimborn, M., Bell, S. M., Felix, C., Rashid, Y., Jafri, H., Griffiths, P. D., et al. (2004).
Mutations in microcephalin cause aberrant regulation of chromosome
condensation. The American Journal of Human Genetics, 75(2), 261-266.
Ulmer, T. S., Bax, A., Cole, N. B., & Nussbaum, R. L. (2005). Structure and dynamics
of micelle-bound human α-synuclein. Journal of Biological Chemistry,
280(10), 9595-9603.
Uverskya, V. N., & Finka, A. L. (2002). Amino acid determinants of alpha synuclein
aggregation: putting together pieces of the puzzle. FEBS Letters, 522, 9-13.
Vallender, E. J., Mekel-Bobrov, N., & Lahn, B. T. (2008). Genetic basis of human
brain evolution. Trends in Neurosciences, 31(12), 637-644.
van Breugel, M., Hirono, M., Andreeva, A., Yanagisawa, H.-a., Yamaguchi, S.,
Nakazawa, Y., et al. (2011). Structures of SAS-6 suggest its organization in
centrioles. Science, 331(6021), 1196-1199.
Vargas, K. J., Makani, S., Davis, T., Westphal, C. H., Castillo, P. E., & Chandra, S. S.
(2014). Synucleins regulate the kinetics of synaptic vesicle endocytosis.
Journal of Neuroscience, 34(28), 9364-9376.
Vargas, K. J., Schrod, N., Davis, T., Fernandez-Busnadiego, R., Taguchi, Y. V.,
Laugks, U., et al. (2017). Synucleins have multiple effects on presynaptic
architecture. Cell reports, 18(1), 161-173.
References
129
Vekrellis, K., Xilouri, M., Emmanouilidou, E., Rideout, H. J., & Stefanis, L. (2011).
Pathological roles of α-synuclein in neurological disorders. The Lancet
Neurology, 10(11), 1015-1025.
Venkatesh, B., Lee, A. P., Ravi, V., Maurya, A. K., Lian, M. M., Swann, J. B., et al.
(2014). Elephant shark genome provides unique insights into gnathostome
evolution. Nature, 505(7482), 174.
Wang, Y.-q., Qian, Y.-p., Yang, S., Shi, H., Liao, C.-h., Zheng, H.-K., et al. (2005).
Accelerated evolution of the pituitary adenylate cyclase-activating polypeptide
precursor gene during human origin. Genetics, 170(2), 801-806.
Wang, Y.-q., & Su, B. (2004). Molecular evolution of microcephalin, a gene
determining human brain size. Human molecular genetics, 13(11), 1131-1137.
Wang, Y., Qian, Y., Yang, S., Shi, H., Liao, C., Zheng, H., et al. (2005). Accelerated
evolution of the PACAP precursor gene during human origin. Genetics.
Weadick, C. J., & Chang, B. S. (2011). An improved likelihood ratio test for detecting
site-specific functional divergence among clades of protein-coding genes.
Molecular biology and evolution, 29(5), 1297-1300.
Webb, B., & Sali, A. (2014). Comparative protein structure modeling using Modeller.
Current protocols in bioinformatics.
Whelan, S., & Goldman, N. (2001). A general empirical model of protein evolution
derived from multiple protein families using a maximum-likelihood approach.
Molecular biology and evolution, 18(5), 691-699.
Wollnik, B. (2010). A common mechanism for microcephaly. Nature genetics, 42(11),
923-924.
Wong, W. S., Yang, Z., Goldman, N., & Nielsen, R. (2004). Accuracy and power of
statistical methods for detecting adaptive evolution in protein coding sequences
and for identifying positively selected sites. Genetics, 168(2), 1041-1051.
Woods, C. G., Bond, J., & Enard, W. (2005). Autosomal recessive primary
microcephaly (MCPH): a review of clinical, molecular, and evolutionary
findings. Am J Hum Genet, 76(5), 717-728.
Xu, B., & Yang, Z. (2013). PAMLX: a graphical user interface for PAML. Molecular
biology and evolution, 30(12), 2723-2724.
Xu, S., Sun, X., Niu, X., Zhang, Z., Tian, R., Ren, W., et al. (2017). Genetic basis of
brain size evolution in cetaceans: insights from adaptive evolution of seven
primary microcephaly (MCPH) genes. BMC evolutionary biology, 17(1), 206.
Yang, Y. J., Baltus, A. E., Mathew, R. S., Murphy, E. A., Evrony, G. D., Gonzalez, D.
M., et al. (2012). Microcephaly gene links trithorax and REST/NRSF to control
neural stem cell proliferation and differentiation. Cell, 151(5), 1097-1112.
Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Molecular
biology and evolution, 24(8), 1586-1591.
Yang, Z., Kumar, S., & Nei, M. (1995). A new method of inference of ancestral
nucleotide and amino acid sequences. Genetics, 141(4), 1641-1650.
Yang, Z., Nielsen, R., Goldman, N., & Pedersen, A.-M. K. (2000). Codon-substitution
models for heterogeneous selection pressure at amino acid sites. Genetics,
155(1), 431-449.
Yang, Z., & Swanson, W. J. (2002). Codon-substitution models to detect adaptive
evolution that account for heterogeneous selective pressures among site
classes. Molecular biology and evolution, 19(1), 49-57.
Yang, Z., Wong, W. S., & Nielsen, R. (2005). Bayes empirical Bayes inference of
amino acid sites under positive selection. Molecular biology and evolution,
22(4), 1107-1118.
References
130
Yu, X., Chini, C. C. S., He, M., Mer, G., & Chen, J. (2003). The BRCT domain is a
phospho-protein binding domain. Science, 302(5645), 639-642.
Zarranz, J. J., Alegre, J., Gómez‐Esteban, J. C., Lezcano, E., Ros, R., Ampuero, I., et
al. (2004). The new mutation, E46K, of α-synuclein causes parkinson and
Lewy body dementia. Annals of neurology, 55(2), 164-173.
Zhang, J. (2000). Rates of conservative and radical nonsynonymous nucleotide
substitutions in mammalian nuclear genes. Journal of Molecular Evolution,
50(1), 56-68.
Zhang, J., Nielsen, R., & Yang, Z. (2005). Evaluation of an improved branch-site
likelihood method for detecting positive selection at the molecular level.
Molecular biology and evolution, 22(12), 2472-2479.
Zhang, X., Liu, D., Lv, S., Wang, H., Zhong, X., Liu, B., et al. (2009). CDK5RAP2 is
required for spindle checkpoint function. Cell cycle, 8(8), 1206-1216.
Zhong, X., Liu, L., Zhao, A., Pfeifer, G. P., & Xu, X. (2005). The abnormal spindle-
like, microcephaly-associated (ASPM) gene encodes a centrosomal protein.
Cell cycle, 4(9), 1227-1229.
Zollikofer, C. P., de León, M. S. P., Lieberman, D. E., Guy, F., Pilbeam, D., Likius,
A., et al. (2005). Virtual cranial reconstruction of Sahelanthropus tchadensis.
Nature, 434(7034), 755.
Zuckerkandl, E., & Pauling, L. (1965). Evolutionary divergence and convergence in
proteins Evolving genes and proteins (pp. 97-166): Elsevier.
PUBLICATIONS
Pervaiz, N., & Abbasi, A. A. (2016). Molecular evolution of WDR62, a gene
that regulates neocorticogenesis. Meta gene, 9, 1-9.
Pervaiz, N., Shakeel, N., Qasim, A., Zehra, R., Anwar, S., Rana, N., ... &
Abbasi, A. A. (2019). Evolutionary history of the human multigene families
reveals widespread gene duplications throughout the history of animals. BMC
Evolutionary Biology, 19(1), 128.
Siddiqui, I. J., Pervaiz, N., & Abbasi, A. A. (2016). The Parkinson Disease
gene SNCA: Evolutionary and structural insights with pathological
implication. Scientific reports, 6, 24475.
Seemab, S., Pervaiz, N., Zehra, R., Anwar, S., Bao, Y., & Abbasi, A. A.
(2019). Molecular evolutionary and structural analysis of familial exudative
vitreoretinopathy associated FZD4 gene. BMC evolutionary biology, 19(1), 72.
Ma, L., Cao, J., Liu, L., Li, Z., Shireen, H., Pervaiz, N., ... & Abbasi, A. A.
(2019). Community Curation and Expert Curation of Human Long Noncoding
RNAs with LncRNAWiki and LncBook. Current Protocols in
Bioinformatics, 67(1), e82.