7/28/2019 0 Front Pages New_merged
1/317
GENOME WIDE SURVEY OF CERTAIN
MAMMALIAN GPCRS AND OLFACTORY
RECEPTORS
A THESIS
Submitted by
NAGARATHNAM B
in partial f ul fi llment for the award of the degree
of
DOCTOR OF PHILOSOPHY
FACULTY OF SCIENCE AND HUMANITIES
ANNA UNIVERSITY
CHENNAI 600 025
JUNE 2012
7/28/2019 0 Front Pages New_merged
2/317
7/28/2019 0 Front Pages New_merged
3/317
7/28/2019 0 Front Pages New_merged
4/317
ii
7/28/2019 0 Front Pages New_merged
5/317
iii
ABSTRACT
In the recent era of G-protein coupled receptor (GPCR) research,
computational approaches in sequence analysis play a vital role in identifying
related sequences (homologues), conserved features, (domains, motifs) and
evolutionary impacts (orthologs) for the interested protein families at intra-
and inter-genomic levels. Candidate GPCRs and ORs (class A type GPCR)
are important for their diverse cellular activities and have been considered for
the genome-wide survey in selected eukaryotic genomes, which further helps
to establish a structure, function resemblance.
Generally, GPCRs are predicted for having extracellular N-terminal
(N-out topology), intracellular C-terminal with seven transmembrane-helices
(TMHs) and are connected by three intra and extracellular loops thereby
termed as serpentine-like receptors.
Previous cross-genome studies on human- Drosophila GPCRs,
motivated to perform a cross-genome clustering on human- C. elegans
GPCRs (Chapter 2). A profile based clustering (RPS-BLAST) was employed
to associate more than 1000 C. elegans GPCRs with already grouped human
GPCR clusters of eight major types of receptors. The generated 32 human- C.
elegans GPCR clusters were analyzed for five different types of cluster
association with proposed terminologies such as human GPCR clade [HC],
coclusters [CC], neighbor clades [NC], neighbor members [NM], species-
specific members [SS] observed at tree topology which facilitate to connect
functional relevance at intra-and inter-genomic levels. Interestingly, the referred
CC was significant and exhibited evolutionary integrity at inter-genomic level.
Also, the identified 27 orthologs were evident to illustrate the effectiveness of
using cross-genome clustering techniques in connecting related GPCRs even at
7/28/2019 0 Front Pages New_merged
6/317
iv
remote homology. Overall 84% of the GPCR sequences across genomes have
been associated at the significant E-value thresholds (ranges from 0.001 to 1)
successfully by RPS-BLAST (work published).
Cross-genome clustering on human and C. elegans GPCRs motivated
to perform a phylogenetic analysis on serpentine receptors (SRs) exclusively
(Chapter 3). As we know, nearly 20 protein families of SRs from C. elegans
were related to chemosensation, a phylogenetic analysis on 683 serpentine
receptors was carried out to identify the related sequences/clusters to
represent the family specific/receptor specific sequence features, ultimately to
connect at superfamily level. Interestingly, the only one receptor annotated for
olfaction (odr-10) in C. elegans to sense di-acetyl compounds has been
noticed along with 43 SRs in the phylogeny. All the associated homologues to
odr-10 are from Str superfamily and particularly str-112 has been found as the
most closely related sequence homologue to odr-10 from the phylogenetic
analysis. As a case study, odr-10 has been modelled for understanding
secondary structural details. A str family specific QLF motif was identified
in ICL3, TM6 of odr-10 and 92 other SR family specific motifs were also
identified by using TM-MOTIF package. The identified sequence features can
be used further to train SVM models and to predict putative receptors from
other nematode species.
Attempts have been made to design an user-friendly alignment
viewer TM-MOTIF (work published) to detect and to display conserved
motifs on the predicted membrane topology in the set of aligned
transmembrane proteins (Chapter 4). The tool is very effective in identifying
not only the conserved motifs (default 60%) but also the amino acid
substitution (AAS) with its respective physico-chemical properties (by using
7/28/2019 0 Front Pages New_merged
7/317
v
an in-house program namely,MotifS) at each position of the alignment. TM-
MOTIF provide option for the users to submit their sequence of interest
(multiple FASTA and MSA) to visualize the seven predicted helices of TM
proteins in VIBGYOR colouring scheme. User can also align sequence of interest
with any one of the given reference sequence (known structure) to get a pairwise
alignment and this particular display is highly helpful as a pre-requisite for
homology modelling. User can also perform a BLAST search to identify a nearest
homologue from the incorporated cross-genome GPCR and OR cluster datasets of
selected organisms. In short, TM-MOTIF is highly suitable for the comparative
genomics and to identify the cluster-specific or receptor specific and common
motifs observed at various percentage of conservation within and across the
genome(s). The package is integrated to DOR (Database of Olfactory Receptors).
As we know, the role of conserved motifs and AAS play crucial
role in functional aspects. The previously established 32 clusters of eight
major types of receptors of cross-genome GPCR clusters such as human-
Drosophila GPCR clusters, human- C. elegans GPCR clusters and human
only GPCR cluster dataset were considered to study primarily for the
conserved motifs (MotifS program) and TM-MOTIF package has been used
to record the observed motifs to its respective membrane topology
(Chapter 5).
Interestingly, a total of 33 conserved motifs have been identified
from the human-Drosophila GPCR clusters and 76% of them were observed
in TM helices, predominately in TM2 and TM7. Besides the classical motifs
such as E/DRY and NPXXY, motifs observed in single receptor type (cluster-
specific motifs or receptor-specific), two-receptor and multi-receptors types
were also documented for the cross-genome GPCR clusters (work published).
7/28/2019 0 Front Pages New_merged
8/317
vi
Olfactory receptor data repository was generated for selected eukaryotic
organisms (yeast, worm, fly, mouse and human) and these sequences were aligned
to produce intra- and inter-genomic phylogeny. Interestingly, 371 functional ORs
from human genome were distributed in 10 distinct clusters, and class I (to sense
water-borne odors), II (to sense air-borne odors) type receptors were discriminated
while introducing few selected fish and amphibian ORs in the human OR
phylogeny. In other study, fly ORs showed no significant coclustering with human
OR phylogeny and proves that insect ORs are evolutionarily distinct from
mammalian ORs. This could be due to the independent evolution, life style orreverse topology of fly ORs. Selected nematode ORs also shows no coclustering
with human ORs due to long lineage and nematode life style. Study on human-
mouse OR clusters showed significant coclustering and studies were carried with
ORs of canine, rodents and nonhuman primates to analyze cluster association with
human ORs. The results of sequence studies were organized in a publically
available database namely DOR. It provides sequences, predicted TM boundaries,
intra- and inter-genomic alignments, phylogeny of selected genomes. It also includes
motif identification tool (TM-MOTIF) and is associated with other features like
predicted secondary structure and dimer prediction from collaborators (work in press).
In essence, genome-wide survey suggests representative sequences,
cluster association, cluster specific motifs, orthologs, coclusters arrived at
intra- and inter-genomic levels and are ultimately guiding to connect functional
properties of known to unknown gene/protein and to understand structure function
relationship.
7/28/2019 0 Front Pages New_merged
9/317
vii
ACKNOWLEDGEMENT
I express my deep sense of gratitude to Dr. V. Balakrishnan,
Department of Biotechnology, KSR College of Technology, Tiruchengode for
his valuable guidance for my Ph.D. study. Besides I am extremely thankful to
my co-supervisor and mentor Prof. Dr. R. Sowdhamini, Lab-25, National
Center for Biological Sciences, Bangalore who has been a source of
inspiration, help, guidance, advice to me throughout the course of this
research work. Further, I sincerely express my earnest gratitude to my
doctoral committee member Dr. S. SenthilKumar, PSG College of
Technology, Coimbatore. I express my heartfelt thanks to Prof. Dr. K.
Karunakaran, Vice Chancellor, and Dr. P. Renuka Devi, Director-Research,
Anna University of Technology Coimbatore for graciously permitting me to
do this research.
I submit my gratitude to Prof. Dr. K. Vijayaragavan, RSF,
Director, NCBS, Bangalore, Prof. Dr. Obaid Siddiqi, RSF, Prof. Dr. Apurva
Sarin,Prof. N. Srinivasan from IISc., Bangalore for extending care and moral
support to pursue the research work and I submit my deepest gratitude to
Mr. Ashok Rao,Mr. Shaju, teaching and non-teaching staff, my lab mates and
all@ncbs for their kind hearted support in encouraging my research thirst.
Thanks to my family members and my beloved APPA.
B. NAGARATHNAM
7/28/2019 0 Front Pages New_merged
10/317
viii
TABLE OF CONTENT
CHAPTER NO. TITLE PAGE NO.
ABSTRACT iii
LIST OF TABLES xxii
LIST OF FIGURES xxiv
LIST OF ABBREVIATIONS xxx
1 INTRODUCTION 1
1.1. PRIOR ART ON GENOME-WIDE SURVEY 21.2. BREAKTHROUGHS IN GPCR
CRYSTALLOGRAPHY STUDIES 4
1.3. GPCRS: POPULAR DRUG TARGETS 61.4. STRUCTURE AND CELLULAR ACTIVITIES
OF MEMBRANE PROTEINS 7
1.5. MEMBRANE PROTEIN: TOPOLOGY 71.6. GPCR MECHANISM 91.7. GPCR CLASSIFICATION 10
1.7.1 Olfactory Receptors (ORs) 11
1.7.2 Classical Knowledge on Olfactory
Receptors 12
1.7.3 Olfactory Signaling Pathway in
Human ORs 13
1.7.4 ORs, GRs and IRs inDrosophila 14
1.7.5 Insect olfaction (Drosophila ORs) 14
1.7.6 Nematode Olfaction 15
1.7.7 Mouse Olfaction 16
7/28/2019 0 Front Pages New_merged
11/317
ix
CHAPTER NO. TITLE PAGE NO.
1.8 DATA REPOSITORIES FOR MEMBRANE
PROTEINS 16
1.9 COLLECTION OF GPCR- HOMOLOGUES 17
1.9.1 BLAST (Basic Local Alignment
Search Tool) 18
1.9.2 PSI-BLAST (Profile Vs Sequence
comparison method) 19
1.9.3 Reverse PSI-BLAST (Sequence Vs
Profile comparison method) 20
1.10 MULTIPLE SEQUENCE ALIGNMENT
TECHNIQUES 22
1.10.1 CLUSTAL W 23
1.10.2 PRALINETM
24
1.10.3 MAFFT 24
1.11 DERIVING PHYLOGENY OF GPCRs/ORs 25
1.11.1 PHYLIP 26
1.11.2 TREE-PUZZLE 26
1.11.3 MEGA (Molecular Evolutionary
Genetics Analysis) 27
1.12 CLUSTER ASSOCIATIONS 27
1.13 SEQUENCE CONSERVATION AND
DIVERSITY 28
1.14 HOMOLOGY MODELLING OF GPCRs/ORs 29
2 CROSS-GENOME CLUSTERING OF HUMAN AND
C. ELEGANSG-PROTEIN COUPLED
RECEPTORS 30
2.1 INTRODUCTION 30
7/28/2019 0 Front Pages New_merged
12/317
x
CHAPTER NO. TITLE PAGE NO.
2.2 C. elegans - AN ATTRACTIVE ANIMALMODEL 30
2.2.1 Features Related to C. elegans and
Human GPCRs 31
2.3 OBJECTIVES 33
2.4 PRIOR ART 33
2.4.1 Superfamilies of Serpentine Receptors 34
2.5 METHODOLOGY 35
2.5.1 Selection Criteria forC. elegans GPCRs 35
2.5.2 Generation of Representative Profiles 38
2.5.3 Performing RPS-Blast 38
2.5.4 CrossGenome Alignment of
HumanC. elegans GPCRs 39
2.5.5 Cross -Genome Phylogeny of Human
C. elegans GPCRs 40
2.5.6 Terminologies used to Describe Phylogeny
2.5.6.1 Human GPCR clade [HC] 40
2.5.6.2 Coclusters [CC] 40
2.5.6.3 Neighbor Clades [NC] 41
2.5.6.4 Neighbor Members [NM] 41
2.5.6.5 Species specific Members [SS] 41
2.5.6.6 Superfamilies of Serpentine
receptors (SR) 41
2.6 RESULTS AND DISCUSSION 42
2.6.1 Result Summary for Peptide Receptors 43
2.6.2 Result Summary for Chemokine Receptors 67
2.6.3 Result Summary for Nucleotide and Lipid
receptors 68
7/28/2019 0 Front Pages New_merged
13/317
xi
CHAPTER NO. TITLE PAGE NO.
2.6.4 Result Summary for Biogenic Amine
Receptors 81
2.6.5 Result Summary for Class B (Secretin)
Receptors 94
2.6.6 Result Summary for Cell
Adhesion Receptors 99
2.6.7 Result Summary for Class C (Glutamate)
Receptors 101
2.6.8 Result Summary for Frizzed/Smoothened
Receptors 108
2.7 CONCLUSION 110
3 PHYLOGENETIC ANALYSIS OF SERPENTINE
RECEPTORS OF C. ELEGANSAND
IDENTIFICATION OF CONSERVED MOTIFS IN
SERPENTINE RECEPTOR SUPERFAMILIES 117
3.1 INTRODUCTION 117
3.2 HOMOLOGUES OF C. elegans GPCRs 118
3.3 OBJECTIVES 118
3.4 CHEMOSENSORY RECEPTORS IN C. elegans 119
3.5 CHEMOSENSORY NEURONS AND
OLFACTORY APPARATUS IN C. elegans 119
3.6 FAMILIES AND SUPERFAMILIES OF
SERPENTINE RECEPTORS IN C. elegans 120
3.7 FEATURES AND IMPORTANCE OF SRs 122
3.8 SRs: FUNCTIONAL RELEVANCE WITH
OTHER EUKARYOTIC GPCRs 122
3.9 METHODOLOGY 123
7/28/2019 0 Front Pages New_merged
14/317
xii
CHAPTER NO. TITLE PAGE NO.
3.9.1 Data Collection 123
3.9.2 Prediction of TM-helices by HMMTOP 123
3.9.3 Alignment Procedure by MAFFT 124
3.9.4 Phylogeny of Selected Serpentine Receptors 124
3.9.5 Identification of Motifs in SRs 124
3.10 RESULTS 125
3.10.1 Identified Motifs in SR Families : A
Pilot Study 127
3.10.2 Homology Modelling of odr-10 128
3.10.2.1 Pairwise alignment of odr-10
with bovine rhodopsin sequence 128
3.10.2.2 Alignment by MAFFT 129
3.10.2.3 Structure validation for Odr-
10 model 130
3.10.2.4 Preliminary phylogenetic analysis 131
3.10.2.5 Odr-10 an outgroup to HOR 131
3.11 CONCLUSION 132
4 TM-MOTIF: A PACKAGE AND AN ALIGNMENT
VIEWER TO IDENTIFY CONSERVED MOTIFS
AND AMINO ACID SUBSTITUTIONS INALIGNED SET OF SEVEN TRANSMEMBRANE
HELIX PROTEINS 135
4.1 INTRODUCTION 135
4.1.1. Functional Importance of ConservedMotifs in TM-Proteins 136
4.1.2. Motif Related to Structural Integrityand Stability 137
7/28/2019 0 Front Pages New_merged
15/317
xiii
CHAPTER NO. TITLE PAGE NO.
4.1.3. Impacts of Motifs in EvolutionaryBioinformatics 138
4.2. OBJECTIVES OF TM-MOTIF 1384.3. KEY FEATURES OF TM-MOTIF 1394.4. METHODOLOGY 140
4.4.1. In-Built Dataset of Cross-Genome GPCRand OR Cluster Dataset 141
4.4.1.1 Human-Drosophila cross-genome
GPCR clusters 141
4.4.1.2 Human-C. elegans cross-genome
GPCR clusters 141
4.4.1.3 Human-mouse cross-genome OR
clusters 141
4.4.2 Alignment Procedures for Cross-Genome
GPCR/OR Clusters 141
4.4.3. Prediction of Membrane Topology forTM Helices and Loops 142
4.4.4 Detection of Motifs and Amino AcidSubstitution (AAS) in the Cross-Genome
Alignment 143
4.4.5 Mapping of Identified Motifs onTM-helices and Loops in MSA 143
4.4.6 Identification of Homologues Sequencesfor user Submitted Queries by Performing
BLAST 144
4.4.7 Pairwise Alignment in TM-MOTIF 144
4.5 RESULTS 145
7/28/2019 0 Front Pages New_merged
16/317
xiv
CHAPTER NO. TITLE PAGE NO.
4.5.1. Software Input and Output Options 1454.5.2. Input Options 146
4.5.3. Output Options 146
4.5.3.1 Display of predicted 7 TM-
|helices in VIBGYOR colouring
scheme: (by using Run TM option)146
4.5.3.2 Display of Identified Motifs andAAS in MSA: (by using Run
Motif option) 147
4.5.3.3 Display of Detected Motifs
on TM-helices: (by using
Run TM-Motif option) 148
4.5.3.4 Alignment with ReferenceSequence 150
4.5.3.5 Identifying closest homologues
of user sequence in selected
organisms 151
4.5.3.6 Display of Over predicted helices 151
4.6. DEFAULT PARAMETERS 1524.6.1 TM-MOTIF- Output Files 152
4.7. CAVEAT AND FUTURE DEVELOPMENT 1534.8. AVAILABILITY 1544.9. CONCLUSIONS 154
7/28/2019 0 Front Pages New_merged
17/317
xv
CHAPTER NO. TITLE PAGE NO.
5 ANALYSIS ON CONSERVED MOTIFS
AND PERMITTED AMINO ACID
EXCHANGES IN CROSS-GENOME
GPCR CLUSTERS 156
5.1 INTRODUCTION 156
5.2 OBJECTIVES 157
5.3 RESIDUE CONSERVATION IN CROSS-
GENOME SEQUENCES 158
5.4 IMPACT OF AMINO ACID CONSERVATION
AND TYPES OF SUBSTITUTIONS 159
5.5 METHODS 159
5.5.1 Cross-genome GPCR cluster dataset 160
5.5.2 Alignment Procedure 160
5.5.3 Prediction of membrane topology 161
5.5.4 Program to Detect Motifs and AAS 161
5.6 RESULTS 162
5.7 OCCURRENCE OF MOTIFS FOR SINGLE
RECEPTOR TYPE 163
5.8 MOTIFS OBSERVED IN HUMAN-
DROSOPHILA CROSS-GENOME CLUSTERS 1645.8.1 Motifs Observed in Transmembrane
Helices 164
5.8.2 Motifs Observed in Loop Regions 165
5.9 MOTIFS OBSERVED IN HUMAN- C. elegans
GPCR CROSS-GENOME CLUSTERS 167
5.10 CHARACTERISTIC MOTIFS FROM
CROSS-GENOME GPCR CLUSTERS 169
7/28/2019 0 Front Pages New_merged
18/317
xvi
CHAPTER NO. TITLE PAGE NO.
5.10.1 Conserved D/ERY and NPXXY motifs in
GPCR Clusters 169
5.10.2 Identified KLK/R and RLAR/K motif in
Secretin Receptor 169
5.10.3 Conserved PMNYM / PMSYM motif in
BGA Receptor 170
5.11 SUMMARY 171
6 GENOME WIDE SURVEY OF
OLFACTORY RECEPTORS (ORS) IN
SELECTED EUKARYOTIC GENOMES 173
6.1. PHYLOGENETIC STUDY ON SELECTEDHUMAN ORS 173
6.1.1.
Introduction 1736.1.2. Objectives and Scopes 1736.1.3. Olfactory Receptors 1746.1.4. OR: Membrane Topology 1756.1.5. Prior Studies on ORs 1756.1.6. Methodology 177
6.1.6.1. Retrieval of OR sequences 1776.1.6.2. Prediction of membrane topology
: Human ORs 178
6.1.6.3. Alignment procedure 1796.1.6.4. Phylogeny on selected human
olfactory receptors 179
6.1.6.5. Analysis of phylogeny 180
7/28/2019 0 Front Pages New_merged
19/317
xvii
CHAPTER NO. TITLE PAGE NO.
6.1.7. Results 1816.1.7.1. Class I and II type receptors in
human OR phylogeny 181
6.1.7.2. Sequence features of 10human OR-subclusters 181
6.1.7.3. Representative OR sequences 1826.1.7.4. Motif analysis on human
olfactory receptors 183
6.1.7.5.SVM Analysis 185
6.2. CROSS-GENOME PHYLOGENY ONSELECTED ORS FROM HUMAN AND
FISH GENOMES 186
6.2.1. Objective 1866.2.2. Review of Literatures 1876.2.3. Fish ORs 1876.2.4. Results 1886.2.5. Sequence conservation: across fish and
human ORs 189
6.3 CROSS-GENOME PHYLOGENY ON
SELECTED ORS FROM HUMAN AND
AMPHIBIAN GENOME 191
6.3.1 Objective 191
6.3.2 Literature survey on class
I and II type ORs 192
6.3.3 Amphibian ORs 192
6.3.4 Results 193
6.3.5.1 Cocluster HXC1 Class
I type receptors 195
6.3.5.2 Cocluster HXC2- class
II type receptors 195
7/28/2019 0 Front Pages New_merged
20/317
xviii
CHAPTER NO. TITLE PAGE NO.
6.3.5.3 Cocluster HXC3 - classII type receptors 196
6.4 PHYLOGENETIC ANALYSIS ON
DROSOPHILA OLFACTORY RECEPTORS 199
6.4.1 Background 199
6.4.2 Drosophila ORs 199
6.4.3 Results onDrosphila OR
Phylogeny Analysis 200
6.4.3.1 Cluster association: 10 subclusters 200
6.4.4 Summary 203
6.5 CROSS-GENOME PHYLOGENETIC
ANALYSIS ON SELECTED ORS
FROMDROSOPHILA, YEAST AND
HOMO SAPIENS 204
6.5.1 Background 204
6.5.2 Insect ORs and mammalian ORs:
(Evolutionarily unrelated) 204
6.5.3 Membrane proteins in Yeast 205
6.5.4 Results 205
6.5.5 Summary 206
6.6 CROSS-GENOME PHYLOGENETICANALYSIS ON SELECTED OLFACTORY
RECEPTORS FROM HUMAN AND
C. elegans GENOMES 206
6.6.1 Odr -10 and homologues 207
6.6.2 Results and Discussion 208
6.6.3 Summary 211
7/28/2019 0 Front Pages New_merged
21/317
xix
CHAPTER NO. TITLE PAGE NO.
6.7 CROSS-GENOME PHYLOGENETIC
ANALYSIS ON SELECTED ORS
FROM HUMAN AND MOUSE GENOMES 212
6.7.1 Introduction 212
6.7.2 Objectives 213
6.7.3 HumanMouse OR Orthology 213
6.7.4 Complex Picture on Human-Mouse
OR Orthology 214
6.7.5 Methodology 215
6.7.6 Results 215
6.7.6.1 Cross-genome OR cluster
association 215
6.7.6.2 Cross- genome phylogeny
with Class-I type receptor
homologues 217
6.7.7 Common motifs in the Cross-genome
phylogeny 218
6.7.8 Summary 218
6.8 PHYLOGENETIC ANALYSIS ON
OLFACTORY RECEPTORS FROM
SELECTED HUMAN AND NON-HUMAN
PRIMATES 220
6.8.1 Objectives 220
6.8.2 Background 220
6.8.3 Methodology 220
6.8.4 Results 221
6.8.4 Summary 222
7/28/2019 0 Front Pages New_merged
22/317
xx
CHAPTER NO. TITLE PAGE NO.
6.9 DATABASE OF OLFACTORY
RECEPTORS (DOR) 222
6.9.1 Objectives 222
6.9.2 Features on OR sequences in DOR 224
6.9.2.1 OR sequences of target genomes: 225
6.9.2.2 Predicted TM boundaries 226
6.9.2.3 Single/cross- genome OR
alignments 227
6.9.2.4 Cluster association and
Phylogeny 228
6.9.2.5 Softwares and Tools
(TM-MOTIF) in DOR 229
6.9.3 Structural features (Application of
sequence searches) 230
6.9.4 Summary 233
7 CONCLUSION 236
7.1 COMPENDIUM 236
7.2 CROSS-GENOME GPCR CLUSTERING 237
7.3 PHYLOGENETIC ANALYSIS ON
SERPENTINE RECEPTORS 240
7.4 TM-MOTIF PACKAGE 242
7.5 STUDY ON CONSERVED MOTIFS AND
AAS IN CROSS-GENOME GPCR CLUSTERS 245
7.6 PHYLOGENETIC ANALYSIS ON ORS
IN SELECTED EUKARYOTIC GENOMES 247
7.7 SUMMARY 253
7/28/2019 0 Front Pages New_merged
23/317
xxi
CHAPTER NO. TITLE PAGE NO.
APPENIDX 1 THE LIST OF IDENTIFIED
FAMILY-SPECIFIC MOTIFS IN SR 256
REFERENCES 260
LIST OF PUBLICATIONS 284
CURRICULUM VITAE 285
7/28/2019 0 Front Pages New_merged
24/317
xxii
LIST OF TABLES
TABLE NO. TITLE PAGE NO.
2.1 Distribution of Human and C. elegans GPCRs in
32 Clusters 114
2.2 List of Identified Orthologs 116
3.1 List of identified motifs in serpentinereceptor super families 134
5.1 Motifs@
observed in the transmembrane
helices and loop regions of human andDrosophila
GPCR clusters+
162
6.1 Analysis on sequence features of 10 human
OR subclusters 183
6.2 List of conserved motifs in 10 human OR
subclusters (60% level of conservations) 184
6.3 Sequence identity of neighboring fish ORs
and human class I type receptors observed in
cross-genome OR phylogeny 191
6.4 Sequence identity of neighboring frog ORs and
human class I type receptors observed in
cross-genome OR phylogeny 197
6.5 Sequence identity of neighboring frog ORs and
human class II type receptors observed in
cross-genome OR phylogeny (referred as HXC2) 198
6.6 Sequence identity of neighboring frog ORs and
human class II type receptors observed in
cross-genome OR phylogeny (referred as HXC3) 198
7/28/2019 0 Front Pages New_merged
25/317
xxiii
TABLE NO. TITLE PAGE NO.
6.7 Significant cluster association for str type
receptors in CeC3 and sequence pairs with high
/low identity has been given 210
6.8 Sequence identity and similarity between
odr-10 and associated SR 213
6.9 Percentage identity for selected human and
mouse ORs for significant association from
cross-genome OR phylogeny 219
6.10 Percentage Identity between selected human ORs and
non-human ORs 221
7/28/2019 0 Front Pages New_merged
26/317
xxiv
LIST OF FIGURES
FIGURE NO. TITLE PAGE NO.
1.1 Central dogma of genome-wide survey on sequences 3
1.2 Crystal structure of bovine rhodopsin (Li et al 2004) 5
1.3 Membrane topology of olfactory receptor
(odr-10) in C. elegans 8
1.4 GPCR signaling pathway 10
1.5 ORs and organization of the olfactory system in
mammals and OR signaling pathway
(Meyer et al 2000) 13
1.6 Overview on the techniques involved in
genomewide survey 22
2.1 Flow-chart to depict the step-wise procedure for
cross-genome clustering of GPCRs 37
2.2(a-c) Pictorial representation for various types of
cluster association 42
2.3(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 46
2.4(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 48
2.5(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 50
2.6(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 52
2.7(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 55
7/28/2019 0 Front Pages New_merged
27/317
xxv
FIGURE NO. TITLE PAGE NO.
2.8(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 57
2.9(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 59
2.10(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 61
2.11(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 63
2.12 (a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display and Radial Display 64
2.13(a-b) Cross-genome phylogeny of peptide receptors:
(Rectangular Display & Radial Display) 66
2.14(a-b) Cross-genome phylogeny of chemokine receptors:
(Rectangular Display & Radial Display) 69
2.15(a-b) Cross-genome phylogeny of chemokine receptors:
(Rectangular Display & Radial Display) 70
2.16(a-b) Cross-genome phylogeny of nucleotide and lipid
receptors (Rectangular Display & Radial Display) 72
2.17(a-b) Cross-genome phylogeny of nucleotide and
lipid receptors(Rectangular Display & Radial Display) 74
2.18(a-b) Cross-genome phylogeny of nucleotide and lipid
receptors (Rectangular Display & Radial Display) 76
2.19(a-b) Cross-genome phylogeny of peptide receptors
nucleotide and lipid receptors (Rectangular Display
& Radial Display) 78
2.20(a-b) Cross-genome phylogeny of nucleotide and lipid
receptors (Rectangular Display & Radial Display) 80
7/28/2019 0 Front Pages New_merged
28/317
xxvi
FIGURE NO. TITLE PAGE NO.
2.21(a-b) Cross-genome phylogeny of nucleotide and lipid
receptors (Rectangular Display & Radial Display) 82
2.22(a-b) Cross-genome phylogeny of biogenic amine
receptor receptors (Rectangular Display &
Radial Display) 84
2.23(a-b) Cross-genome phylogeny of biogenic amine
receptor (Rectangular Display & Radial Display) 86
2.24(a-b) Cross-genome phylogeny of biogenic amine
receptor (Rectangular Display & Radial Display) 88
2.25(a-b) Cross-genome phylogeny of biogenic amine
receptor (Rectangular Display & Radial Display) 91
2.26(a-b) Cross-genome phylogeny of biogenic amine
receptor (Rectangular Display & Radial Display) 93
2.27(a-b) Cross-genome phylogeny of secretin type
receptors (Rectangular Display & Radial Display) 96
2.28(a-b) Cross-genome phylogeny of secretin type
receptors (Rectangular Display & Radial Display) 98
2.29(a-b) Cross-genome phylogeny of cell adhesion type
receptor (Rectangular Display & Radial Display) 100
2.30(a-b) Cross-genome phylogeny of glutamate receptor
(Rectangular Display & Radial Display) 102
2.31(a-b) Cross-genome phylogeny of glutamate receptor
(Rectangular Display & Radial Display) 104
2.32(a-b) Cross-genome phylogeny of glutamate receptor
(Rectangular Display & Radial Display) 105
2.33(a-b) Cross-genome phylogeny of glutamate receptor
(Rectangular Display & Radial Display) 107
7/28/2019 0 Front Pages New_merged
29/317
xxvii
FIGURE NO. TITLE PAGE NO.
2.34(a-b) Cross-genome phylogeny of FRZ/SMT type
receptor (Rectangular Display & Radial Display) 109
2.35 (a-b) Distribution ofC. elegans GPCRs at various E-value
thresholds 112
3.1 Pie-diagram to show the distribution of serpentine
receptors (SR) in the dataset 123
3.2 Phylogeny on selected serpentine receptors
(circular view tree) 125
3.3 The subcluster showing odr-10 and its homologues 127
3.4 Pairwise alignment of odr-10 with bovine
rhodopsin sequence 129
3.5 Three -dimensional model of olfactory receptor
odr-10 and structure validation 130
3.6 Phylogeny on selected human olfactory receptors
with an olfactory receptor (odr-10) from C.elegans 132
4.1 Flow-chart 140
4.2 Tool guide of TM-MOTIF : an overview 142
4.3 Snapshot for the available main menu of the frontwindow of TM-MOTIF with user interactive features 145
4.4 Options given for the submission of input sequences
in TM-MOTIF package 146
4.5 Sample output for the option RUN TM 147
4.6 Sample output for the option RUN MOTIF 148
4.7 Sample output for the option RUN TM-Motif 149
4.8 Snapshot for the display of pairwise alignment of
users input sequence with selected reference sequence 150
4.9 Snapshot Depicts the Display of Over Predicted
TM-Helices 151
7/28/2019 0 Front Pages New_merged
30/317
xxviii
FIGURE NO. TITLE PAGE NO.
5.1 Pictorial representation to denote the occurrence
of highly conserved DRY motif in TM3,ICL2 158
5.2 Flow-chart describes about the steps involved in
the study 159
5.3 Percentage residue conservation in TM helices and
loops in GPCR Clusters 168
5.4(a-c) Illustration of characteristic motifs (observed at
60% conservation) 171
6.1 Flow-chart for the sequence analysis on
olfactory receptors 179
6.2(a-b) Phylogenetic display of selected human
olfactory receptor 180
6.3 Phylogeny of selected olfactory receptors in
Homo sapiens and fish genomes 189
6.4 Snapshot of Alignment window for the motif
KAFSTC in human ORs and in few fish
ORs at cross-genome alignment 190
6.5 Snapshot depicts the co-clustering of fish ORs
with class I type receptors of human ORs in
HSC1(given in A),also exhibiting the coclusters
like HXC1,HXC2 and HXC3 to indicate the class
I and II type receptors from frog ORs with humanORs (given in B). 193
6.6 Snapshot depicts the co-clustering of fish ORs
with class I type receptors of human ORs in
HSC1(given in A),also exhibiting the
coclusters like HXC1,HXC2 and HXC3 to indicate
the class I and II type receptors from frog ORs
with human ORs (given in B). 194
7/28/2019 0 Front Pages New_merged
31/317
xxix
FIGURE NO. TITLE PAGE NO.
6.7 Phylogeny ofDrosophila Olfactory receptors 2016.8 Observed 10 subclusters ofDrosophila olfactory
receptors 203
6.9 Cross-genome phylogeny on selected ORs from
human,Drosophila and yeast 206
6.10 Observed cluster association in the cross-genome
phylogeny of selected ORs from human and
C. elegans genomes 208
6.11 Cross-genome phylogeny of selected olfactory
receptors (ORs) from human and mouse genomes 216
6.12 Phylogeny on selected human and mouse
olfactory receptors with special emphasize to mouse
class I type receptors 216
6.13 Cross genome phylogeny on selected human ORs
with ORs from non human primates and aves 222
6.14 Available main menu in the front page of DOR 225
6.15 A snapshot of the give option sequence and
its application in DOR 226
6.16 Display of predicted membrane boundaries in DOR 227
6.17 Display of Alignment option in DOR 228
6.18 Display of cross-genome OR phylogeny in DOR 229
6.19 Overview on pictorial representation of available
features in DOR for sequence analysis 230
6.20 Overview on DOR features for sequence
and structural information for olfactory receptors
in DOR 231
6.21 Display of 3D Structure and related features in DOR 233
7/28/2019 0 Front Pages New_merged
32/317
xxx
LIST OF ABBREVIATIONS
AAS - Amino acid substitutions
BGA receptors - Biogenic amine receptors
BLAST - Basic Local Alignment Tool
BS - Bootstrap
CAR - Cell adhesion receptors (CAR),
CC - Co-clusters
CMK - Chemokine receptors (CMK),
FRZ/SMT - Frizzed/smoothened receptors
GLR - Class C (glutamate) receptors
GPCRs - G-protein coupled receptors
HC - Human GPCR clade
MAFFT - Multiple Alignment using Fast Fourier Transform
MEGA - Molecular Evolutionary Genetics Analysis
N&L - Nucleotide and lipid receptors
NC - Neighbor clades
NJ - Neighbor joining
NM - Neighbor members
ORs - Olfactory receptors
PR - Peptide receptors
RMSD - Root-mean-square deviation
RPS-BLAST - Reverse PSI-BLAST
SEC - Class B (secretion) receptors
SR - Serpentine receptors
SS - Species-specific members
SVM - Support vector machine
TM proteins - Trans-membrane proteins
7/28/2019 0 Front Pages New_merged
33/317
1
CHAPTER 1
INTRODUCTION
The vast and frequent update of sequence databases to build
repositories for various genomes and predicting accurate structural
information of these sequences are two critical steps in Computational
Genomics. Available knowledge and approaches for genomics (Lipman et al2011) and structural genomics (Redfern et al 2008) are drastically different,
but can be inter-connected effectively for the cause of identifying functional
annotations (Alfarano et al 2005).
Huge accumulation of sequence information in one end and limited
resources on structural details on the other end is the crucial scenario in
bioinformatics. This imbalance is indeed a challenge to achieve the goal ofidentifying function(s) of interested gene(s) immediately.
However, the accumulated large size data repositories can be
handled effectively only through bioinformatics techniques such as genome
wide survey which is a more sophisticated approach than the traditional gene-
by-gene approach and provide clues to connect sequences from various
genomes for the common function. Methods such as data clustering orprincipal component analysis, artificial neural networks or support vector
machines are useful for gene/protein prediction, classification, association and
annotation of novel proteins etc., further support in analyzing functional
genomics data.
My current objective is applying effective bioinformatics
approaches such as genome-wide survey, cross-genome phylogenetic analysis
7/28/2019 0 Front Pages New_merged
34/317
2
on certain GPCRs and ORs to propose representative sequences, cluster
association, cluster-specific motifs, orthologs, species-specific behavior and
co-clusters arrived at intra- and inter-genomic levels, ultimately to connect
the functional properties of known to unknown gene/protein (Figure 1.1).
In principle, sequence comparison studies, along with reference to
structural similarities, provide clues to connect functional resemblance
(Redfern et al 2008) at cellular, biochemical and molecular levels
(Ye et al 2006).
This unidirectional hypothesis of associating sequences, predicting
structural details, relating biochemical functions with the phenotypes, forms
the baseline of computational biology. Sequence studies for various genomes
will provide opportunity to identify a group of associated proteins based on
phylogeny and can be exploited for functional relevance. This conceptual
framework really helps to compare sequences from various genomes and
provides clues to connect the sequences of known function to the
unknown. These rationale on genome-wide survey of interested
gene/protein sequences provide platform to integrate knowledge on sequence-
structure-function paradigm for public access (Kerrien et al 2011). Thus,
sequence studies act as a primary step to connect structural and functional
studies.
1.1 PRIOR ART ON GENOME-WIDE SURVEYPerforming genomewide survey on selected or interested protein
families (Tripathi and Sowdhamini, 2008 and Metpally and Sowdhamini
2005) will be appropriate to explain the approach of accumulating related
proteins (associated gene clusters), identifying putative orthologs and to
observe conserved motifs from various genomes. Cross-genome sequence
analysis provides knowledge on sequence conservation across taxa, preserved
7/28/2019 0 Front Pages New_merged
35/317
3
species-specific tendencies and exhibit evolutionary integrity at cross-genome
level (Figure 1.1). Particularly, cross-genome sequence studies with selected
model organisms will be useful for vast practical applications. For instance, a
cross-genome phylogenetic analysis on selected GPCRs of human and
Drosophila genome (Metpally and Sowdhamini, 2005) organized as eight
major groups of GPCRs, led to generate 32 cross-genome GPCR clusters.
Such an approach proved valuable for identifying the natural ligands of
Drosophila and human orphan receptors.
[
Figure 1.1 Central dogma of genome-wide survey on sequences
Note: Pictorial representation describing the procedures involved in genome-wide sequenceanalysis. Label 1 refers to the selection of interested genomes. Label 2 refers to thecollection of non-redundant sequences from the selected genomes. Label 3 refers to cross-genome alignment procedure. Label 4 refers to cross-genome phylogeny on sequences. Label
5 refers to cross-genome cluster association and analysis for species-specificity, co-clusterarrangements, identification of orthologs, conserved motifs, observing functional clues tohypothetical proteins in the phylogeny.
Other case studies like genome-wide survey on identifying putative
serine/threonine protein kinases (STKs) in cyanobacteria, (Zhang et al 2007),
gaining practically useful insights on symbiotic nitrogen-fixing alpha-
7/28/2019 0 Front Pages New_merged
36/317
4
proteobacterium like Sinorhizobium meliloti (Schluter et al 2010) based on
experimental data, phylogenetic classification on transporters and membrane
proteins from lower organisms (De Hertogh et al 2002) to higherorder
organisms (Chang et al 2004), phylogenetic analysis on olfactory receptor
subfamilies (class I and class II type) in fish (Freitag et al 1999), amphibians
(Freitag et al 1995), phylogenetic analysis in discriminating gustatory and
olfactory receptors in Drosophila (Robertson et al 2003), phylogenetic
grouping of serpentine receptor superfamilies in C. elegans (Robertson and
Thomas 2006), identifying olfactory receptor subfamilies in mouse
(Sullivan, et al 1996) and human (Glusman et al 2001), influence of
phylogenetic analysis in ethno-medicinal studies (Saslis-Lagoudakis et al
2011) are highly commendable. These case studies illustrate the important
applications of genome-wide survey and usage of phylogeny in identifying
similar or related sequences for protein of interest across genomes.
1.2 BREAKTHROUGHS IN GPCR CRYSTALLOGRAPHY STUDIESAs we know, the diverse cell surface proteins exist as 30% in
human genome and are very popular for their therapeutic importance and
applications. Among the available (>82,160) structures in the PDB, crystal
structures are available for only very few membrane proteins. For structural
crystallization, membrane proteins embedded in the lipid bilayer have to be
extracted and need to form a protein-detergent complex (PDC) (Koszelak-
Rosenblum et al 2009). Also, the surrounding environmental lipids in cell
membranes interfere with both crystallography and nuclear magnetic
resonance (NMR) spectroscopy, while solving three-dimensional structures of
membrane proteins. As purification and crystallization of membrane proteins
are very crucial events in membrane protein crystallography (Dilanian et al
2011), only a limited number of membrane proteins have been reported so far.
7/28/2019 0 Front Pages New_merged
37/317
5
Figure 1.2 Crystal structure of bovine rhodopsin (Li et al 2004)
a) Crystal structure of bovine rhodopsin displayed in ribbon representation(Li, et al., 2004). The observed seven TM-helices and one peripheral helix are coloredin the rainbow order: TM-helix1 in dark blue (residues 34
64); TM-helix 2 in light blue
(71100); TM-helix 3 in blue-green (106140); TM-helix 4 in yellow-green (150
173); TM-helix 5 in yellow (200230); TM-helix 6 in orange (241276); TM-helix 7 inred (286309); TM-helix 8 in magenta (311321). 1.2.
b) Space-filling representation of rhodopsin- a photoreceptor protein.Rhodopsin- is the first solved crystal structure (Palczewski et al
2000) (Figure 1.2 a and b), 1 adrenergic receptor (Warne et al 2008), 2
adrenergic receptor (Rasmussen et al 2007), adenosine receptor (Jaakola et al
2008), dopamine D3 receptor, CXCR4 chemokine receptor (Wu et al 2010),
histamine receptor and most recently reported lipid GPCR - sphingosine 1-
phosphate receptors (S1P1 receptors) are few important crystal structures.
These structural studies will guide to compare the reference structures with
disease-implicated genes based on modelling to interpret the dysfunctions.
Most of the solved structures are used as templates for molecular modelling.
7/28/2019 0 Front Pages New_merged
38/317
6
1.3 GPCRS: POPULAR DRUG TARGETSAs GPCRs are involved in a wide variety of physiological
processes, such as regulation of immune system activity and inflammation,
cell density sensing, sense of smell, visual sense, autonomous nervous system
transmission and behavioral and mood regulation, they are effectively
targeted in medicinal chemistry. Several previous reviews and literature
highlight the clinical importance of GPCRs (Insel et al 2007) and few
examples can be discussed to denote the importance of GPCR biology in
medicine. For instance, a number of monogenic mutations have been
identified in rhodopsin causing disease called retinitis pigmentosa, number of
endocrine disorders, serious illness such as schizophrenia (Seeman 1987),
Alzheimer's disease and Parkinson's disease (Lee et al 1978). Also there are
many reported disorders such as genetic disorders of the calcium-sensing
receptor (CaSR), graves disease, cancer, diabetes, heart diseases,
neurodegenerative diseases, asthma, and diseases related to autoimmunity,AIDS and so on are few other examples to emphasize the multi-functional
role of GPCRs and its clinical implications.
Diversity of GPCRs and ligand-binding properties make these
receptors as interesting targets for the structure-based drug design (Schlyer
and Horuk 2006) and even lead the scope for personalized medicine.
Notably, receptors such as AT1 angiotensin, adrenergic, dopamine
and serotonin (5-hydroxytryptamine, 5-HT) receptor subtypes are most
exploited for their clinical importance and related diseases which are all
useful drug targets.
7/28/2019 0 Front Pages New_merged
39/317
7
1.4 STRUCTURE AND CELLULAR ACTIVITIES OF MEMBRANEPROTEINS
Membrane proteins are embedded within the lipid bilayer and are
designated as transmembrane proteins, since they loop inside and outside of
the cell boundaries (Figure 1.2). A class of cell-surface receptors retain
structural features, having extracellular N-terminal, intracellular C-terminal
with seven transmembrane-helices (TMHs) connected by three intra and
extracellular loops and reminding a snake-like structural element /display to
have names such as 7TM receptors or heptahelical receptors or serpentine-like
receptors (Probst et al 1992). Since the downstream targets of such membrane
receptors are guanine nucleotide binding proteins, they are also referred as
Guanine nucleotide-binding protein-coupled receptors, G-protein coupled
receptors (GPCRs), serpentine receptors, and are popular for their versatile
functional importance.
GPCRs are ubiquitous as they majorly participate in signal
transduction, and recognize various type of ligands (Bockaert and Pin 1999).
Substantial evidence on GPCR oligomerization (Prinster et al 2005),
participation in signaling pathways (Greenwald 2005), clinical importance
(Kuwabara and N 2001) and availability of repositories for multiple
organisms (Fredriksson and Schioth 2005) provide significant impetus for the
study of GPCR sequences and their ligand-binding properties. Ligands could
be endogenous compounds such as amines, peptides, Wnt proteins or
endogenous cell surface adhesion molecules or photons and exogenous
compounds like odorants.
1.5 MEMBRANE PROTEIN: TOPOLOGYThere are several prediction methods available online to predict
topology of membrane proteins. The prediction methods are mainly based on
7/28/2019 0 Front Pages New_merged
40/317
8
the hydrophobicity profile of the helices. Notably, canonical GPCR
members exhibit N-in and C-out topology, but olfactory receptors show N-out
and C-in topology in higher order organisms (Figure 1.3). The other
interesting fact is that especially Drosophila ORs and GRs retain N-in and
C-out topology (Bargmann 2006, Benton et al 2006, Lundin et al 2007) and
also referred as inverted/reverse topology.
The methods like HMMTOP (Tusnady and Simon 2001), SOSUI
(Hirokawa et al 1998),TMHMM (Krogh et al 2001), TMAP, MEMSAT,
TMpred, TSEG, TM-finder, Pred-TMP, SPLIT, DAS, TopPred II, PRED-
TMR2, MPEx, Phobious and TOPCON are popularly used to predict the
secondary structure of membrane proteins. Methods are also available to
discriminate signal peptides (Lao et al 2002) in proteins.
Figure 1.3 Membrane topology of olfactory receptor (odr-10) in
C. elegans
The predicted seven trans membrane helices (by HMMTOP) for odr-10 was given in TOPO2
display, wherein residues from 12-31 for TM1, 44-63 for TM2 , 94 -113 for TM3, 126-145
for TM4, 202-225 for TM5, 256-275 for TM6 and 286-305 for TM7 was predicted by
HMMTOP. The conserved YRY motif in TM3, ICL2 and the Str superfamily specific
QLF motif in ICL3 has been highlighted in red colour.
7/28/2019 0 Front Pages New_merged
41/317
9
1.6 GPCR MECHANISMMembrane proteins are effectively involved in signal transduction
(Figure 1.4), where GPCRs are activated by various external stimuli
(Rodbell et al 1971). Due the influence of various external stimuli, receptors
undergo conformational change (i.e., minimal rearrangement occur in TM6
and TM3 helices, but still the area remains unclear) and causes the activation
of a guanine nucleotide-binding proteins (G-protein). GPCRs are dedicated to
recognize intercellular messenger molecules (such as hormones,
neurotransmitters, lipids, biogenic amines, growth and developmental
factors), and several sensory messages (such as light, odors and gustative
molecules). Also, this event is primarily dependent on the type of the
G-protein. For instance, The Golf subunit is mainly related to sense the
chemosensory signals and participates in olfactory signaling pathways
(Figure 1.4). Gs state of G-protein regulates the enzyme called adenylate
cyclase (AC). AC activity is triggered when it binds to a subunit of the
activated G-protein and subsequently triggers cAMP pathway for further
transduction to result in various biological responses. Activation of AC stops
when G-proteins return to the GDP-bound state (Figure 1.4). GPCRs are also
involved in various secondary pathways like ion channels, adenylyl cyclases,
and phospholipases.
7/28/2019 0 Front Pages New_merged
42/317
10
Figure 1.4 GPCR signaling pathway
Image represents about GPCR-signal transductions which depicts the entry of ligands/stimuli, activation of G-protein subunit, subsequent activation of cAMP and event ofinternalization for biological responses. (Image adopted from DB-DRD4 - a database of
dopamine D4 receptor (home page) and SOURCE: TRENDS in Pharmacological sciencesURL: http://www.ibibiobase.com/projects/db-drd4/G_protein.htm)
1.7 GPCR CLASSIFICATIONGPCRs comprise the most prolific family of cell membrane
proteins. Knowledge on GPCR classification is necessary since they involve
in various signaling pathways and recognize diverse set of ligands and are
related to various biological functions. The candidate GPCRs with
characteristic seven TM-helices were classified with the aid of several
prediction methods and classifiers. Though all the candidate GPCRs from
various families retain seven TM-helices and are connected by ICLs and
ECLs, sequence differences occur and exhibit subtle structural diversity
(Gether 2000). Superfamily of GPCRs are classified majorly as class A
(rhodopsin-like), class B (Secretin-like), class C (Metabotropic glutamate),
class D (Fungal pheromone), class E (cAMP receptors) and class F
(Frizzled/smoothened) (Kristiansen 2004). Particularly, class A is the largest,
occupying 80% of the distribution and retains diverse receptors like
rhodopsin, olfactory, biogenic amine, bioactive lipid, nucleic acid, and
7/28/2019 0 Front Pages New_merged
43/317
11
peptide receptors. Wherein receptors such as secretin, calcitonin, glucagon,
parathyroid hormone, vasoactive intestinal peptide and so on are related to
class B. Class C includes receptors like metabotropic glutamate receptors(mGluRs), Ca
2+-sensing receptor, -aminobutyric acid type B receptors
(GABA-B) and vomeronasal receptors type 2. Class D retains receptors such
as fungal pheromone P and -factor receptors (STE2/MAM2), whereas fungal
pheromone A and M-factor receptors (STE3/MAP3) are related to class E.
Class F retains slime mold cyclic adenosine monophosphate (cAMP)
receptors. Recently, few other GPCR families, such as frizzled type
receptors/FRZ (Vinson and Adler 1987, Bhanot et al 1996), smoothened type
receptors/SMT (Alcedo et al 1996 and Nehme et al 2010), vomeronasal
receptors type 1 /VNS (Dulac and Axel, 1995), ocular albinism (Schiaffino et
al 1996, Schiaffino et al 1999), and plant receptors (Grill and Christmann
2007)i.e.,Arabidopsis thaliana receptor GCR1 (Josefsson and Rask 1997),
(Perfus-Barbeoch et al 2004) have also been added to the existing GPCR
families. It has been observed that Class A, B and C cover nearly 600 GPCRs
in the human genome, excluding putative candidate GPCRs. Notably,
olfactory receptors (ORs) are members of class A type receptors and has been
dealt exclusively in Chapter 6 under the title of genomewide survey on
olfactory receptors in selected eukaryotes.
1.7.1 Olfactory Receptors (ORs)
Sense of smell - a process of olfaction is beyond simple scientific
understanding. In general, chemical senses are broadly divided into olfaction
(the sense of smell) and gustation (the sense of taste). Critical knowledge on
understanding and analyzing about the olfaction is a necessary science, not
only for its biological or chemical perspective, but also for its powerful socio-
cultural phenomenon (Low 2005).
Olfactory receptors participate in sensing diverse chemical stimuli
or odors (Firestein 2001). ORs are fascinating for their functional significance
7/28/2019 0 Front Pages New_merged
44/317
12
in detecting food, to assess its quality, to enhance its flavor, to indicate the
presence of potential toxins and pathogens, to know about reproductive status,
gender, genetic identity, conspecifics, mates as well as threats. ORs activatechemosensory cells leading to neural recognition and influence behaviours,
hormone state and also mood (Munger et al 2009). Due to their diverse role,
ORs are very important as well as present in our everyday life experiences
and are need to be explored more in detail for the vast practical applications in
the field of pharmaceutical industry (aroma therapy), cosmetic industry
(scent/perfume manufacturing), food industry, olfacto-sexual function and to
study olfacto-neural communication, olfactory dis-orders and so on. Thus,
performing genome-wide survey on ORs of selected eukaryotic organisms
will improve scientific credibility and ultimately serve for human benefit.
1.7.2 Classical Knowledge on Olfactory Receptors
The landmark paper published in the year 1991, by Nobel
Laureates Buck and Axel, have explained about the role of olfactory receptors
and the organization of olfactory system in humans (Buck and Axel 1991).
Around three percent of our genes are used to code for different odorant
receptors on the membrane of the olfactory receptor cells. Further research
studies on phylogenetic approach in discriminating class I and class II type
receptors to sense the water- and air-borne odors in higher eukaryotes i.e.,
human and mouse (Zozulya et al 2001; Niimura and Nei 2005), studies
related to insect olfaction (Robertson et al 2003), nematode olfaction
(Robertson and Thomas 2006), olfactory signaling , availability of ORs in
various genomes, and observed common peptides in OR subfamilies
(Gottlieb et al 2009) are providing remarkable background and facilitate the
genome-wide survey of ORs in selected eukaryotic genomes further to
identify OR subclusters, cluster-specific motifs, species-specific tendencies
and co-clusters in tree topology (Chapter 6 for more details).
7/28/2019 0 Front Pages New_merged
45/317
13
1.7.3. Olfactory Signaling Pathway in Human ORs
The process of olfaction primarily starts with binding of an odor to
specific receptor on sensory neuron where chemical energies transformed to
electrical signals to sense the smell. Such binding activates Golf a G
protein. The alpha subunit of Golf activates the enzyme adenyl cyclase,
generating the major second messenger 3`,5`-cyclic adenosine
monophosphate (cAMP) which directly opens the cyclic nucleotide gated
channel. This allows the Na2+
and Ca2+
to flow in and depolarize the cell.
Depolarization of these cells cause action potentials (nerve impulses) and are
sent to the olfactory bulb and also by the pathway involving guanylyl cyclase
GC-D (Meyer et al 2000). Human nose expresses different types of receptors,
enabling the main olfactory system and using common pathway to encode
thousands of odorants (Figure 1.5 a and b).
(a) (b)Figure 1.5 ORs and organization of the olfactory system in mammals
and OR signaling pathway (Meyer et al 2000)
a)Depicts the pictorial representation of ORs and organization of the olfactory system inmammals
b)Depicts OR signaling pathway, which depicts the proposed two hypothesis of OR-signaltransduction (Meyer et al 2000). In this, upper panel describes the entry of various odors
and recognized by ORs and initiate cGMP signaling pathway which involves G protein(Golf), an adenylyl cyclase (ACIII), a cyclic nucleotide-gated (CNG) channel (341b) anda chloride channel (ClC). After the response, cAMP is degraded by a CaM-dependentphosphodiesterase (PDE1C2). The other hypothesis (lower panel in b) explains the
components of cGMP-signaling pathway and putative targets of cGMP which involvesreceptor guanylyl cyclase GC-D, cGMP-regulated PDE2, an unknown cGMP-regulated ion
channel and the known CNG channel of the cAMP-signaling pathway.
7/28/2019 0 Front Pages New_merged
46/317
14
1.7.4. ORs, GRs and IRs in Drosophila
As we know, olfactory neurons play a central role in sensing
volatile cues that afford the organism the ability to detect food, predators and
mates. But, gustatory neurons sense soluble chemical cues that elicit feeding
behaviours. In insects, the taste neurons initiate innate sexual and
reproductive responses.
It is believed that nearly 60 olfactory receptors (Berkeley
Drosophila Genome Project database) play a major role in identifying and
discriminating diverse odors for the insectsurvival and these Drosophila
olfactory receptor (DORs) gene family are identified as G-protein coupled
receptors (Clyne et al 1997, Gao and Chess 1999, Vosshall and Stocker
2007). These proteins are expressed in distinct subsets of olfactory neurons
and certain family members were restricted to distinct portions of the
olfactory system. Nearly the same numbers of gustatory receptors (GR) are
meant for gustatory functions (Clyne et al 1997).
Notably, insects GRs have the same transmembrane topology as
ORs. Ionotropic Glutamate Receptors (IR)inDrosophilais referred as a new
family of odorant receptors and these proteins accumulate in sensory
dendrites and not present at synapses. They mediate chemical communication
between neurons at synapses and are expressed in a combinatorial fashion in
sensory neurons that respond to many distinct odors, but do not express either
insect odorant receptors (ORs) or gustatory receptors (GRs).
1.7.5. Insect olfaction (DrosophilaORs)
Several fundamental explanations have been published (Siddiqi,
1990), (Clyne et al 1999) to investigate molecular mechanism onDrosophila
olfaction. Electrophysiological studies explained the differentiation in the
7/28/2019 0 Front Pages New_merged
47/317
15
morphology of the olfactory sensilla and their distribution patterns
(Venkatesh and Singh 1984, Stocker 1994). Studies suggest that there are 30
different classes of ORNs in the antenna (in adult ~40), based upon the odor
response profile of individual neurons and few exhibit odor specificity.
Notably, 24 antennal receptors such as Or2a, Or47b, Or33b, Or49b, Or65a,
Or23a, Or85f, Or88a, Or67c, Or43a, Or7a, Or43b, Or59b, Or9a, Or85a,
Or47a, Or22a, Or19a, Or67a, Or35a, Or98a, Or85b, Or82a and Or10a were
tested experimentally with 110 odorant molecules using empty neuron system
(Dobritsa et al 2003) and responses of receptors vary to different chemical
classes.
Generally, the functional insect ORs retain variable insect ORs with
a constant odorant binding receptor called OR83b and forms the heteromeric
complex then participate in signaling pathway. OR83b is also called asco-
receptor (Vosshall and Stocker 2007) for its functional importance. In the
literature (Larsson et al 2004), it is also mentioned that heteromeric insect
ORs comprise a new class of ligand-activated non-selective cation channels
(Sato et al 2008).
Notably, insects ORs lack homology to G-protein coupled
chemosensory receptors of vertebrates and exhibit drastically differing
mechanisms in olfaction. Recent studies explained insect ORs as heteromeric
ligand-gated ion channels (More details in Chapter 6).
1.7.6. Nematode Olfaction
Chemosensory receptors in nematodes are highly diverse and large
in number. Since worms lack both auditory and visual sense, chemosensation
plays a central role in nematodes for its survival. In C. elegans, chemosensory
receptors belong to G-protein coupled receptors and retain seven
transmembrane proteins. Around 1330 genes and 400 pseudo genes have been
7/28/2019 0 Front Pages New_merged
48/317
16
identified as chemoreceptors (Robertson and Thomas 2006) in C. elegans.
Also many of these receptors are known as serpentine receptors and around
19 largest gene families are reported so far. Among the large number of
proteins, only one protein namely odr-10 (Figure 1.3), was reported as an
olfactory receptor in C. elegans (Sengupta et al 1996).
1.7.7. Mouse Olfaction
As found in human olfactory receptors, mouse ORs also possess
two broad classes of ORs with excellent bootstrap support (Glusman et al
2001). The class I type in mouse ORs are as found in fish and in the frog, but
had been considered an evolutionary relic in mammals (Ngai et al 1993) and
the class II receptors are found in amphibians and terrestrial vertebrates
(Freitag et al 1995). There are 147 class I OR genes found in mouse OR
subgenome, among them 120 OR genes were potentially functional. In
mouse, all of the class-I type ORs were located in a single large cluster in
chromosome 7.
1.8 DATA REPOSITORIES FOR MEMBRANE PROTEINS
There are a huge number of data repositories and prediction servers
for membrane topology are available exclusively for membrane proteins.
Notably, repositories related to GPCRs (Elefsinioti et al 2004) like gpDB
(Theodoropoulou et al 2008), GPCRDB and integrated web resources like GProtein Coupled Receptor - Oligomerization Knowledge Base Project, GPCR
Natural Variants database (NaVa). Database namely SEVENS (Ono et al
2005) provides useful sequence information, chromosomal location and intra-
genomic phylogenetic clusters for membrane proteins from more than 50
eukaryotic organisms. IUPHAR (Committee on Receptor Nomenclature and
Drug classification) incorporates detailed pharmacological, functional and
7/28/2019 0 Front Pages New_merged
49/317
17
patho-physiological information on GPCRs, voltage-gated ion channels,
ligand-gated ion channels and nuclear hormone receptors.
The other related databases for structural resources like PDBTM,
TOPDB (Tusnady et al 2008), provide collection of domains and sequence
motifs. TMpad (Trans Membrane Protein Helix-Packing Database) and
MPDB (Membrane Protein Data Bank) are useful to provide structural
information on integral, peripheral and anchored membrane proteins and also
peptides (Raman et al 2006).
Data repositories for olfactory receptors are also available for
public access. ORDB (Skoufos et al 2000), HORDE (The Human Olfactory
Data Explorer) and integrated web resources from Sense Lab for ORs with
associated links such as odorDB, odorMapDB are highly useful and
particularly relevant to retrieve sequences for the olfactory receptors (ORs)
from multi-genomes.
1.9 COLLECTION OF GPCR- HOMOLOGUES
Sequence similarity searches are robust techniques to identify
nearest homologues for a query sequence from database of interest. Pairwise
comparison of proteins is a fundamental step in sequence similarity
searches. The similarity scores depend upon the sequence features like
amino acids and permitted amino acid substitutions (AAS). Generally, when aquery and the subject are aligned with high similarity scores, then they can be
referred for their sequence relevance and can be called as homologues. In
other words, two proteins retaining similar sequences can be called as
homologues. Homologues are further classified into orthologs and paralogs.
While orthologous proteins evolved from a common ancestral gene belonging
to two different genomes, paralogs were generated by the event of gene
duplication and belong to the same genome. Thus, homologues share
7/28/2019 0 Front Pages New_merged
50/317
18
significant sequence similarity and can be further connected for their
functional relevance. A necessity arises to select an appropriate technique for
similarity search when we deal with evolutionarily distant sequences and
particularly membrane proteins. Each method is unique for its scoring scheme
with respect to amino acid substitutions and the gap penalties.
Functionally and evolutionarily important protein similarities can
be recognized by comparing three-dimensional structures, but when structures
are not available, patterns of conservation such as motifs, profiles, position-
specific scoring matrices, and Hidden Markov Models can be used to identify
related sequences from the database of protein sequences.
Several methods like BLAST (Altschul, et al 1997), FASTA
(sequence based searches) (Lipman and Pearson 1985), IMPALA (profile-
based searches) (Schaffer et al 1999) other approaches like PSI-BLAST, RPS-
BLAST, are effectively used to find homologues and further to identify
common functional relevance.
1.9.1 BLAST (Basic Local Alignment Search Tool)
Sequence comparisons between two sequences are achieved by
producing quality alignments which maximize the correspondence between
similar residues and minimize gaps (Altschul et al 1997). The objective here
is to align or match a sequence of unknown function with
characterized/annotated proteins from model organisms, so that the structure
and function can be extrapolated to the new sequence. Generally, dynamic
programming technique has been implicated to achieve alignments locally
(BLAST) or globally (FASTA). BLAST and FASTA (Lipman and Pearson
1985) are robust methods. Conceptually, the heuristic approach (BLAST) can
deal with sequences considerably differing in length and identifies islands of
7/28/2019 0 Front Pages New_merged
51/317
19
short matches. It relies upon Smith-Waterman algorithm (Smith and
Waterman 1981), and is guaranteed to find the optimal local alignment with
respect to the scoring system to provide maximal scoring segment pairs
(MSPs). The scoring system majorly includes the substitution matrix and the
gap-scoring scheme to align the sequences based on possible similarities.
BLAST-a robust sequence comparison tool - is applicable for five main
search methods such as blastp, blastn, blastx, tblastn and tblastx for varying
inputs such as nucleotide and protein sequences.
BLAST produces statistically significant alignments in the output
and features like raw scores, bit scores and E-values are considered for
quantify the alignment significance. Among them, E-values are most often
used. Generally, lowest E-values are considered as highly significant for best
alignment. An E-value refers to the number of alignments one expects to find
with a score greater than or equal to the observed alignment score in a search
against a random database. PAM (point accepted mutations per 100residues) amino acid scoring matrix which is based on an explicit
evolutionary model (Dayhoff et al 1978) is provided in the BLAST
software distribution. It includes PAM40, PAM120, and PAM250, whereas
the BLOSUM matrices are based on an implicit model of evolution and
includes BLOSUM 45, 62 and 85 (Henikoff and Henikoff 1992). Generally,
these matrices are very appropriate to deal with globular proteins, whereas
PAM and JTT-200 (Jones et al 1992) can be used for membrane proteins.
1.9.2 PSI-BLAST (Profile Vs Sequence comparison method)
Among the five BLAST programs, the work described in this thesis
mostly relies on the basic protein BLAST technique, which includes blastp
(protein-protein BLAST), PSI-BLAST (Position Specific Iterated BLAST),
PHI-BLAST (Pattern Hit Initiated BLAST) and DELTA-BLAST (Domain
7/28/2019 0 Front Pages New_merged
52/317
20
Enhanced Lookup Time Accelerated BLAST). As the name suggests, blastp
compares a protein query with a protein database, PSI-BLAST allows the user
to build a PSSM (position-specific scoring matrix) using the results of the first
blastp run and iteratively uses the profile as query against the database of
protein sequences (Altschul et al 1997). The generated profiles at each
iteration, are searched against the database of protein sequences by rigorous
iterations until convergence (meaning iterate until no new sequences are
found). Thus, this method is effective in associating even distantly related
sequences with remote homology. The application can be further improvised
by using as jump-start PSI-BLAST (Altschul et al 1997), jack-knife approach,
HOE (Homologous over-extension) reduced profile search (Gonzalez and
Pearson, 2010) and the improved PSI-BLAST search techniques such as
cascade PSI-BLAST (Bhadra et al 2006) as per user requirement.
1.9.3 Reverse PSI-BLAST (Sequence Vs Profile comparison method)
To associate remotely related sequences, reverse PSI-BLAST
technique (RPS-BLAST) is highly effective. This method differs from other
sequence searches, wherein the query sequences are searched against a
database of PSSM (Position Specific Scoring Matrices) profiles. PSSMs give
the amino acid propensities at each sequence position based on the multiple
alignments. PSSM generation also uses the multiple alignment sequence
weights, the expected number of amino acids and the frequencies of un-
observed amino acids (pseudo counts). Representative sequences from the
protein families (example:3PFDB Shameer et al 2009), related domains and
cluster types can be used to generate profiles to represent sequence properties
as a block of consensus of amino acids. Hence, sequence search space has
been broadened and opportunity has been extended to connect sequences at
remote homology (Figure 1.6).
7/28/2019 0 Front Pages New_merged
53/317
21
In the other method, that compares protein sequences against
database of protein sequences, some limitations do exist. If stringent sequence
properties are employed, scaled at sequence against database of sequences,
there is little chance of missing very distantly related sequences in these
search techniques. But, RPS-BLAST helps to associate even the distantly
related sequences to its related profiles. So, the practical implications like
generating cross-genome phylogenies, finding new members, associating
evolutionarily distant sequences, classification and to associate functional
annotation to new sequences based on known data. This effective method can
be employed carefully in designing profiles, setting significant E-value
thresholds and to interpret sequence search for related profiles.
Separately, Hidden Markov Model (HMM) can also be used for
pattern recognition and it provides a mathematical representation of a protein
sequence (Eddy 1998, Karplus et al 1998). HMMs have been used for gene
prediction, recognition of transmembrane helices (Sonnhammer et al 1998),phylogenetic analysis (Felsenstein and Churchill, 1996) and in distant
homology detection (Krogh et al 1994b). Machine learning approaches are
appropriate techniques to deal with pattern recognition problems and to
recognize remote homology. Method like support vector machines (SVMs)
(Pugalenthi et al 2010) is effectively used in classification problems where the
already trained dataset with known features (Positive set) is used to associate
unknown gene/protein sequence (Negative set) and is useful to propose
putative members, where the predictions relay upon training dataset.
7/28/2019 0 Front Pages New_merged
54/317
22
Figure 1.6 Overview on the techniques involved in genomewide survey
The given diagram depicts the use of available data repositories related to membrane
proteins (GPCRDB, SEVENS DB, ORDB, HORDE and so on.) following the collection of
sequences, predicting the membrane topology, using redundancy filter as the primary step
for the cross-genome studies. The methodology is starting with sequence search programs
(such as BLAST, PHI-BLAST, PSI-BLAST, RPS-BLAST) to homologues sequences and to
perform cross-genome analysis.
1.10 MULTIPLE SEQUENCE ALIGNMENT TECHNIQUES
Alignment procedures play a crucial role (Figure 1.1 and
Figure 1.6) in analyzing the relationships among diverse sequences. The
arrangement of two or more sequences can be possible by aligning the
sequences for common properties or sites. Weights can be assigned to thealigned elements so as to determine the degree of relatedness or to detect the
existing homology between the multiple sequences. A pairwise alignment is
between two sequences and a multiple sequence alignment (MSA) with many
sequences, which are facilitating sequence comparison studies and the
sequence can be aligned by various alignment methods. MSA can be referred
as a generalization of pairwise sequence alignments. Here, instead of aligning
two sequences, n number of sequences were aligned simultaneously, where
7/28/2019 0 Front Pages New_merged
55/317
23
n is always >2, thus called as multiple sequence alignments and the alignment of
multiple sequences is possible by introducing the gaps _ into the sequences.
Membrane proteins differ considerably from globular proteins in
sequence composition. The region that inserts into the cell membrane
possesses different hydrophobicity patterns when compared to soluble
proteins. Multiple sequence alignment techniques which are designed for
globular proteins are not optimal to align the transmembrane proteins. And
recommended alignment procedures (Pirovano 2008), can be employed
carefully. When sequences from different genomes have been aligned
together, then the alignment has been referred as cross-genome sequence
alignments and the resulting phylogeny is referred as cross-genome
phylogeny (Figure 1.6).
1.10.1 CLUSTAL W
The CLUSTAL W (Thompson JD, 1994) is a popular MSA tooland generally the MSA technique consists of three main stages like 1) All
pairs of sequences are aligned separately in order to calculate distance matrix
giving the divergence of each pair of sequences. 2) A guide tree is generated
from the distance matrix. 3) The sequences are progressively aligned
according to the branching order in the guide tree.
Initially, the CLUSTAL W program apply fast approximate(heuristic) method based on the number of K-tuple (this is the size of exactly
matching fragment that is used) matches for generating pairwise distances
(Wilbur and Lipman, 1983). Later, dynamic programming algorithm was used
to enhance accuracy by providing the scores using gap opening penalties
(GOP) and gap extension penalties (GEP). The method improves quality of
alignment by implementing amino acid weight matrices such as BLOSUM
with series of 80,62,45,30, PAM with series of 20, 60, 120, 350, GONNET
7/28/2019 0 Front Pages New_merged
56/317
24
matrix (can be used for larger datset) with series of 80, 120, 160, 250 and 350.
Though CLUSTAL W is handy to align large number of sequences with
reliable accuracy, there are few recommended alignment tools to align
transmembrane proteins, which are conceptually different in aligning TM helices
and loops by using different matrices (for example PRALINE TM and MAFFT).
1.10.2 PRALINETM
Thus, the servers to align TM-proteins (like PRALINETM
)are more
specific, where the transmembrane regions are first predicted (Pirovano
2008). The reliable topology prediction methods guide the boundaries of TM
domain and loop as an initial requirement. PRALINETM
refers HMMTOP v2.
1 (Tusnady and Simon, 2001), TMHMM v2. 0 (Krogh et al 2001) and
Phobius (Kll et al 2007) for membrane predictions. Then, the profile scoring
scheme simply applies TM-specific substitution scores from the matrices like
PHAT to reliably compare TM positions. Finally, an alternative iterative
scheme was implied to enhance the alignment quality. Recent study suggests
that PHAT matrix (Ng et al 2000) outperforms to the JTT matrix (Jones et al
1992) especially on database searching (Ng et al 2000). Earlier methods like
STMP (Shafrir and Guy, 2004) is also useful and is the first multiple sequence
alignment program targeted to align transmembrane proteins.
1.10.3 MAFFT
MAFFT (Multiple Alignment using Fast Fourier Transform) can be
used for aligning large datasets of transmembrane protein. The method is very
advanced than other alignment programs, in increasing the accuracy of
alignments even for sequences having large insertions or extensions as well as
distantly related sequences of similar length. MAFFT alignment program
(Katoh et al 2002) is more effective with two different heuristics, such as the
progressive method (FFT-NS-2) and the iterative refinement method
7/28/2019 0 Front Pages New_merged
57/317
25
(FFT-NS-I). The other important feature of the program is that the number of
input sequences can be very large and it offers a range of multiple alignment
methods such as L-INS-I (accurate; for alignment of
7/28/2019 0 Front Pages New_merged
58/317
26
by assigning probabilities to every possible evolutionary change at
informative sites, and by maximizing the total probability of the tree, search
for the optimal choice can be reached. In NJ method, it eliminates possible
errors that can occur when we use UPGMA method. NJ algorithm searches
not only evaluate pairwise distances (using distance matrices), but also set
neighbors that minimize the total length of the tree. NJ method is
recommended to deal with sequences whose evolutionary distances are short.
There are multiple packages available both for the standalone and on-line access.
Suites like PHYLIP, TREE-PUZZLE and MEGA are more user-friendly and are
appropriate tools to perform phylogenetic analysis both for ML and NJ method.
1.11.1 PHYLIP
PHYLIP (Phylogeny Inference Package) (Felsenstein, 1981) is a
free computational phylogenetic package consisting of 35 portable programs.
It facilitates to perform parsimony, distance matrix, and likelihood methods,
including bootstrapping and consensus trees.
1.11.2 TREE-PUZZLE
It is a popular computer program to reconstruct phylogenetic trees
from molecular sequence data such as nucleotide sequence/ proteins based on
the maximum likelihood (ML) method (Schmidt et al 2002). It implements
quartet puzzling algorithm. The average distance between all pairs ofsequences (maximum likelihood distances) is computed. These distances can
be viewed as a rough measure for the overall sequence divergence. This is
performed in three steps: In ML step, the supplied n (number of sequences
in the alignments) is set for the quartets. All quartets are evaluated using ML
method and the three quartet topologies such as ab|cd, ac|bd, and ac|bd are
weighted by their posterior probabilities. In the puzzling step, quartet trees are
considered from intermediate tree adding sequences one-by-one. As this step
7/28/2019 0 Front Pages New_merged
59/317
27
is highly dependent on the order of sequences, many intermediate trees from
different input orders are constructed. In the consensus step, with the
generated intermediate trees, a majority rule consensus tree has been built.
These two steps are timeconsuming and the result files (.dist, .puzzle, and
.outtree) are useful for interpreting tree topologies. The evolutionary models
such as DAYHOFF, JTT and mtREV24 (Adachi and Hasegawa, 1996) (is for
use with proteins encoded on mtDNA) matrices are provided. Others like
BLOSUM 62 and the WAG model (Whelan and Goldman, 2004) are for more
distantly related amino acid sequences. VT is for use with proteins of distant
relationships as well (Muller and Vingron 2000).
1.11.3 MEGA (Molecular Evolutionary Genetics Analysis)
MEGA is an user-friendly software for phylogenetic studies, which
also integrates sequence alignment approaches like CLUSTAL W and
MUSCLE. MEGA 5 can be employed for phylogenetic reconstruction and
phylogeny visualization, testing an array of evolutionary hypotheses using
maximum likelihood (ML), maximum composite likelihood (MCL),
neighbor-joining (NJ), minimum evolution (ME) and maximum parsimony
(MP) to produce bootstrap construction tree for the required replications.
MEGA is handy to display tree topologies legibly such as rectangular, radial
and circular displays (Kumar et al 2008).
1.12 CLUSTER ASSOCIATIONS
The generated tree topologies can be inferred for cluster associations.
Understanding the distribution of clusters with significant bootstrap (BS) values
helps to classify / group the related sequences. For example, in the phylogenetic
analysis on mouse olfactory receptors (Zhang and Firestein 2002), by using
consensus tree, nearly 1000 OR genes were classified into several OR families.
For the classification, they identified reliable clusters as those having >50%
7/28/2019 0 Front Pages New_merged
60/317
28
bootstrap support and more than 40% protein identity. By this definition, mouse
ORs were classified into 228 families. This kind of segregation of gene/protein
sequences will create cluster association for the interested protein families. Cluster
associations will provide information about the conserved species-specific
behaviors and evolutionary integrity obtained at intra- and inter-genomic level
(Figure 1.6).
1.13 SEQUENCE CONSERVATION AND DIVERSITY
The performed intra- and inter-genomic phylogenetic studies guide
the sequence association for the species-specific tendency as well as co-
clustering arrangements. Evolutionarily conserved sequence properties such
as motifs (Scott Gleim 2009) are highly important to connect further for the
structural and functional relevance.
Several computational techniques and software tools are available
to locate and display conserved amino acid residues in the aligned set ofhomologues sequences. Available tools and databases such as TOPDOM,
MeMotif, PROSITE, IMOTdb and SmoS, WEBLOGO, and with the guidance
of in-house program MotifS program (by Sowdhamini, yet to be published)
can be used to visualize the set of aligned TM-proteins and observed motifs
and AAS. Such annotation tools can be applied in comparative genomics of
GPCRs or ORs to identify cluster-specific/family-specific motifs along with
the knowledge on predicted topology (Figure 1.6).
1.14 HOMOLOGY MODELLING OF GPCRs/ORs
The sequence searches and clustering provide representative
sequences to generate three-dimensional structures and this further helps to
map hotspots and to associate functional properties. Comparative
7/28/2019 0 Front Pages New_merged
61/317
29
modelling/homology modelling is an appropriate procedure for generating 3D
models for the interested proteins and can be achieved by the following steps:
i) Primarily, homologues sequences of the query can becollected by using effective sequence search methods. The
nearest homologues sequence with reference sequence, whose
structure is known, can be used as a template.
ii) Pairwise alignment of template and target sequence can bemade by using appropriate alignment methods. Procedures
such as PRALINE TM, MAFFT can be used for membrane
proteins. Alignments can be manually edited to improve the
alignment quality (using MEGA).
iii) Building co-ordinates of the three-dimensional model basedon the generated alignment can be achieved by using software
like MODELLER (Sali and Blundell, 1993) and web server
like SWISS-MODEL (Arnold et al 2006).
iv) Assessing potential accuracy for the generated models andmodels with least energy constraints can be selected. If
unfavorable conformations and short contacts are observed,
model can be minimized by using SYBYL software package
(Tripos associate Inc).
v)
Structure validation can be done by checking for disallowedconformations or structural environments (can be guided by
Ramachandran Plot values, using PROCHECK server
(Laskowski et al 1993) and VERIFY 3D (Bowie et al 1991).
In essence, the compiled writings in this introductory chapter
provide a necessary background to the following work chapters 2-6.
7/28/2019 0 Front Pages New_merged
62/317
30
CHAPTER 2
CROSS-GENOME CLUSTERING OF HUMAN AND
C. ELEGANSG-PROTEIN COUPLED RECEPTORS
2.1 INTRODUCTION
Membrane proteins are ubiquitous (Perez 2005), constitute nearly
20% of whole genomes and are most attractive drug targets since they are
implicated in various diseases. Membrane proteins are embedded within the
lipid bilayer and are designated as transmembrane proteins, since they loop
inside and outside of the cell boundaries. A class of cell-surface receptors
retains structural features, having extracellular N-terminal, intracellular
C-terminal with seven transmembrane-helices (TMHs) connected by three
intra and extracellular loops and provides a snake-like structural element
/display to have names such as 7TM receptors or heptahelical receptors or
serpentine-like receptors. If the downstream targets of such membrane
Top Related