Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R...
-
Upload
samuel-miles -
Category
Documents
-
view
213 -
download
0
Transcript of Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R...
![Page 1: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/1.jpg)
Domains, their prediction and domain databases
Lecture 16:
Introduction to Bioinformatics
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
![Page 2: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/2.jpg)
Sequence
Structure
Function
Threading
Homology searching (BLAST)
Ab initio prediction and folding
Function prediction from structure
Sequence-Structure-Function
impossible but for the smallest structures
very difficult
![Page 3: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/3.jpg)
TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)
Genome
Expressome
Proteome
Metabolome
Functional Genomics – Systems Functional Genomics – Systems BiologyBiology
Metabolomics
fluxomics
![Page 4: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/4.jpg)
Systems Biologyis the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behaviour of that system (for example, the enzymes and metabolites in a metabolic pathway). The aim is to quantitatively understand the system and to be able to predict the system’s time processes
• the interactions are nonlinear• the interactions give rise to emergent properties, i.e. properties
that cannot be explained by the components in the system • Biological processes include many time-scales, many
compartments and many interconnected network levels (e.g. regulation, signalling, expression,..)
![Page 5: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/5.jpg)
Systems Biologyunderstanding is often achieved through modeling and simulation of the system’s components and interactions.
Many times, the ‘four Ms’ cycle is adopted:
Measuring
Mining
Modeling
Manipulating
![Page 6: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/6.jpg)
‘The silicon cell’
(some people think ‘silly-con’ cell)
![Page 7: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/7.jpg)
![Page 8: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/8.jpg)
A system response
Apoptosis: programmed cell death
Necrosis: accidental cell death
![Page 9: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/9.jpg)
This pathway diagram shows a comparison of pathways in (left) Homo sapiens (human) and (right) Saccharomyces cerevisiae (baker’s yeast). Changes in controlling enzymes (square boxes in red) and the pathway itself have occurred (yeast has one altered (‘overtaking’) path in the graph)
We need to be able to do automatic pathway comparison (pathway alignment)
Human Yeast
‘Comparative metabolomics’
Important difference with human pathway
![Page 10: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/10.jpg)
Experimental
• Structural genomics
• Functional genomics
• Protein-protein interaction
• Metabolic pathways
• Expression data
![Page 11: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/11.jpg)
Issue when elucidating function experimentally
• Partial information (indirect interactions) and subsequent filling of the missing steps
• Negative results (elements that have been shown not to interact, enzymes missing in an organism)
• Putative interactions resulting from computational analyses
![Page 12: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/12.jpg)
Protein function categories• Catalysis (enzymes)
• Binding – transport (active/passive)– Protein-DNA/RNA binding (e.g. histones, transcription factors)
– Protein-protein interactions (e.g. antibody-lysozyme) (experimentally determined by yeast two-hybrid (Y2H) or bacterial two-hybrid (B2H) screening )
– Protein-fatty acid binding (e.g. apolipoproteins)
– Protein – small molecules (drug interaction, structure decoding)
• Structural component (e.g. -crystallin)
• Regulation
• Signalling
• Transcription regulation
• Immune system
• Motor proteins (actin/myosin)
![Page 13: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/13.jpg)
Catalytic properties of enzymes
[S]
Mo
les/
s
Vmax
Vmax/2
Km
Michaelis-Menten equation:
Km kcat
E + S ES E + P• E = enzyme• S = substrate• ES = enzyme-substrate complex (transition state)• P = product
• Km = Michaelis constant
• Kcat = catalytic rate constant (turnover number)
• Kcat/Km = specificity constant (useful for comparison)
Vmax × [S]V = ------------------- Km + [S]
![Page 14: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/14.jpg)
Protein interaction domains
http://pawsonlab.mshri.on.ca/html/domains.html
![Page 15: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/15.jpg)
Energy difference upon binding
Examples of protein interactions (and of functional importance) include:
• Protein – protein (pathway analysis);
• Protein – small molecules (drug interaction, structure decoding);
• Protein – peptides, DNA/RNA
The change in Gibb’s Free Energy of the protein-ligand binding interaction can be monitored and expressed by the following equation:
G = H – T S
(H=Enthalpy, S=Entropy and T=Temperature)
![Page 16: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/16.jpg)
![Page 17: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/17.jpg)
Protein-protein interaction networks
![Page 18: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/18.jpg)
Protein function • Many proteins combine functions
• Some immunoglobulin structures are thought to have more than 100 different functions (and active/binding sites)
• Alternative splicing can generate (partially) alternative structures
![Page 19: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/19.jpg)
Protein function & Interaction
Active site / binding cleft
Shape complementarity
![Page 20: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/20.jpg)
Protein function evolution
Chymotrypsin
![Page 21: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/21.jpg)
How to infer function• Experiment• Deduction from sequence
– Multiple sequence alignment – conservation patterns
– Homology searching
• Deduction from structure– Threading– Structure-structure comparison– Homology modelling
![Page 22: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/22.jpg)
Cholesterol Biosynthesis:
Cholesterol biosynthesis primarily occurs in eukaryotic cells. It is necessary for membrane synthesis, and is a precursor for steroid hormone production as well as for vitamin D. While the pathway had previously been assumed to be localized in the cytosol and ER, more recent evidence suggests that a good deal of the enzymes in the pathway exist largely, if not exclusively, in the peroxisome (the enzymes listed in blue in the pathway to the left are thought to be at least partly peroxisomal). Patients with peroxisome biogenesis disorders (PBDs) have a variable deficiency in cholesterol biosynthesis
![Page 23: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/23.jpg)
Mevalonate plays a role in epithelial cancers: it can inhibit EGFR
Cholesterol Biosynthesis: from acetyl-Coa to mevalonate
![Page 24: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/24.jpg)
Epidermal Growth Factor as a Clinical Target in Cancer
A malignant tumour is the product of uncontrolled cell proliferation. Cell growth is controlled by a delicate balance between growth-promoting and growth-inhibiting factors. In normal tissue the production and activity of these factors results in differentiated cells growing in a controlled and regulated manner that maintains the normal integrity and functioning of the organ. The malignant cell has evaded this control; the natural balance is disturbed (via a variety of mechanisms) and unregulated, aberrant cell growth occurs. A key driver for growth is the epidermal growth factor (EGF) and the receptor for EGF (the EGFR) has been implicated in the development and progression of a number of human solid tumours including those of the lung, breast, prostate, colon, ovary, head and neck.
![Page 25: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/25.jpg)
Energy housekeeping:Adenosine diphosphate (ADP) – Adenosine triphosphate (ATP)
![Page 26: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/26.jpg)
Chemical Reaction
![Page 27: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/27.jpg)
Add Enzymatic Catalysis
![Page 28: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/28.jpg)
Add Gene Expression
![Page 29: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/29.jpg)
Add Inhibition
![Page 30: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/30.jpg)
Metabolic Pathway: Proline Biosynthesis
Proline as end product effects a negative feedback loop
![Page 31: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/31.jpg)
Transcriptional Regulation
![Page 32: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/32.jpg)
Methionine Biosynthesis in E. coli
![Page 33: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/33.jpg)
Shortcut Representation
![Page 34: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/34.jpg)
High-level Interaction representation
![Page 35: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/35.jpg)
Levels of Resolution
![Page 36: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/36.jpg)
SREBP Pathway
![Page 37: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/37.jpg)
Signal Transduction
Important signalling pathways: Map-kinase (MapK) signalling pathway, or TGF- pathway
![Page 38: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/38.jpg)
Transport
![Page 39: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/39.jpg)
Phosphate Utilization in Yeast
![Page 40: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/40.jpg)
Multiple Levels of Regulation
• Gene expression
• Protein posttranslational modification
• Protein activity
• Protein intracellular location
• Protein degradation
• Substrate transport
![Page 41: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/41.jpg)
Graphical Representation – Gene Expression
![Page 42: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/42.jpg)
Protein interaction domains
Protein Interaction Domains
http://pawsonlab.mshri.on.ca/index.php?option=com_content&task=view&id=30&Itemid=63
![Page 43: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/43.jpg)
Domain function
Active site / binding cleft
![Page 44: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/44.jpg)
Protein-protein (domain-domain) interaction
Shape complementarity
![Page 45: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/45.jpg)
A domain is a:
• Compact, semi-independent unit (Richardson, 1981).
• Stable unit of a protein structure that can fold autonomously (Wetlaufer, 1973).
• Recurring functional and evolutionary module (Bork, 1992).“Nature is a tinkerer and not an inventor” (Jacob, 1977).
• Smallest unit of function
![Page 46: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/46.jpg)
Delineating domains is essential for:• Obtaining high resolution structures (x-ray but
particularly NMR – size of proteins)• Sequence analysis • Multiple sequence alignment methods• Prediction algorithms (SS, Class, secondary/tertiary
structure)• Fold recognition and threading• Elucidating the evolution, structure and function of
a protein family (e.g. ‘Rosetta Stone’ method)• Structural/functional genomics• Cross genome comparative analysis
![Page 47: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/47.jpg)
Domain connectivity
linker
![Page 48: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/48.jpg)
Pyruvate kinasePhosphotransferase
barrel regulatory domain
barrel catalytic substrate binding domain
nucleotide binding domain
1 continuous + 2 discontinuous domains
Structural domain organisation can be nasty…
![Page 49: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/49.jpg)
Domain size•The size of individual structural domains varies widely
– from 36 residues in E-selectin to 692 residues in lipoxygenase-1 (Jones et al., 1998)
– the majority (90%) having less than 200 residues (Siddiqui and Barton, 1995)
– with an average of about 100 residues (Islam et al., 1995).
•Small domains (less than 40 residues) are often stabilised by metal ions or disulphide bonds.•Large domains (greater than 300 residues) are likely to consist of multiple hydrophobic cores (Garel, 1992).
![Page 50: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/50.jpg)
![Page 51: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/51.jpg)
![Page 52: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/52.jpg)
![Page 53: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/53.jpg)
![Page 54: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/54.jpg)
![Page 55: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/55.jpg)
![Page 56: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/56.jpg)
![Page 57: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/57.jpg)
Analysis of chain hydrophobicity in multidomain proteins
![Page 58: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/58.jpg)
Analysis of chain hydrophobicity in multidomain proteins
![Page 59: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/59.jpg)
Domain characteristicsDomains are genetically mobile units, and multidomain families are found in all three kingdoms (Archaea, Bacteria and Eukarya) underlining the finding that ‘Nature is a tinkerer and not an inventor’ (Jacob, 1977). The majority of genomic proteins, 75% in unicellular organisms and more than 80% in metazoa, are multidomain proteins created as a result of gene duplication events (Apic et al., 2001). Domains in multidomain structures are likely to have once existed as independent proteins, and many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes (Davidson et al., 1993).
![Page 60: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/60.jpg)
Protein function evolution- Gene (domain) duplication -
Chymotrypsin
Active site
![Page 61: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/61.jpg)
Pyruvate phosphate dikinase
• 3-domain protein
• Two domains catalyse 2-step reaction
A B C
• Third so-called ‘swivelling domain’ actively brings intermediate enzymatic product (B) over 45Å from one active site to the other
/
![Page 62: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/62.jpg)
Pyruvate phosphate dikinase
• 3-domain protein
• Two domains catalyse 2-step reaction
A B C
• Third so-called ‘swivelling domain’ actively brings intermediate enzymatic product (B) over 45Å from one active site to the other
/
![Page 63: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/63.jpg)
The DEATH Domain• Present in a variety of Eukaryotic proteins involved with cell death.• Six helices enclose a tightly packed hydrophobic core.• Some DEATH domains form homotypic and heterotypic dimers.
http
://w
ww
.msh
ri.o
n.ca
/paw
son
![Page 64: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/64.jpg)
Detecting Structural Domains• A structural domain may be detected as a compact,
globular substructure with more interactions within itself than with the rest of the structure (Janin and Wodak, 1983).
• Therefore, a structural domain can be determined by two shape characteristics: compactness and its extent of isolation (Tsai and Nussinov, 1997).
• Measures of local compactness in proteins have been used in many of the early methods of domain assignment (Rossmann et al., 1974; Crippen, 1978; Rose, 1979; Go, 1978) and in several of the more recent methods (Holm and Sander, 1994; Islam et al., 1995; Siddiqui and Barton, 1995; Zehfus, 1997; Taylor, 1999).
![Page 65: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/65.jpg)
Detecting Structural Domains
•However, approaches encounter problems when faced with discontinuous or highly associated domains and many definitions will require manual interpretation.
•Consequently there are discrepancies between assignments made by domain databases (Hadley and Jones, 1999).
![Page 66: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/66.jpg)
Detecting Domains using Sequence only
• Even more difficult than prediction from structure!
![Page 67: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/67.jpg)
SnapDRAGON
Richard A. George
George R.A. and Heringa, J. (2002) J. Mol. Biol., 316, 839-851.
Integrating protein multiple sequence alignment, secondary and tertiary structure
prediction in order to predict structural domain boundaries in sequence data
![Page 68: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/68.jpg)
Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
PRIMARY STRUCTURE (amino acid sequence)
QUATERNARY STRUCTURE
SECONDARY STRUCTURE (helices, strands)
TERTIARY STRUCTURE (fold)
![Page 69: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/69.jpg)
Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
PRIMARY STRUCTURE (amino acid sequence)
QUATERNARY STRUCTURE
SECONDARY STRUCTURE (helices, strands)
TERTIARY STRUCTURE (fold)
![Page 70: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/70.jpg)
Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
PRIMARY STRUCTURE (amino acid sequence)
QUATERNARY STRUCTURE
SECONDARY STRUCTURE (helices, strands)
TERTIARY STRUCTURE (fold)
![Page 71: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/71.jpg)
Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
PRIMARY STRUCTURE (amino acid sequence)
QUATERNARY STRUCTURE
SECONDARY STRUCTURE (helices, strands)
TERTIARY STRUCTURE (fold)
![Page 72: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/72.jpg)
SNAPDRAGONDomain boundary prediction protocol using sequence information
alone (Richard George)
1. Input: Multiple sequence alignment (MSA) and predicted secondary structure
2. Generate 100 DRAGON 3D models for the protein structure associated with the MSA
3. Assign domain boundaries to each of the 3D models (Taylor, 1999)
4. Sum proposed boundary positions within 100 models along the length of the sequence, and smooth boundaries using a weighted windowGeorge R.A. and Heringa J.(2002) SnapDRAGON - a method to delineate protein structural
domains from sequence data, J. Mol. Biol. 316, 839-851.
![Page 73: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/73.jpg)
SnapDragonFolds generated by Dragon
Boundary recognition
(Taylor, 1999)Summed and Smoothed Boundaries
CCHHHCCEEE
Multiple alignment
Predicted secondary structure
![Page 74: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/74.jpg)
SNAPDRAGONDomain boundary prediction protocol using sequence information
alone (Richard George)
1. Input: Multiple sequence alignment (MSA)
1. Sequence searches using PSI-BLAST (Altschul et al., 1997)
2. followed by sequence redundancy filtering using OBSTRUCT (Heringa et al.,1992)
3. and alignment by PRALINE (Heringa, 1999)
• and predicted secondary structure4. PREDATOR secondary structure prediction
programGeorge R.A. and Heringa J.(2002) SnapDRAGON - a method to delineate protein structural domains from sequence data, J. Mol. Biol. 316, 839-851.
![Page 75: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/75.jpg)
Distance Regularisation Algorithm for Geometry OptimisatioN
(Aszodi & Taylor, 1994)
Domain prediction using DRAGON
•Fold proteins based on the requirement that (conserved) hydrophobic residues cluster together.
•First construct a random high dimensional C distance matrix.
•Distance geometry is used to find the 3D conformation corresponding to a prescribed target matrix of desired distances between residues.
![Page 76: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/76.jpg)
SNAPDRAGONDomain boundary prediction protocol using sequence information
alone (Richard George)
2. Generate 100 DRAGON (Aszodi & Taylor, 1994) models for the protein structure associated with the MSA– DRAGON folds proteins based on the requirement that
(conserved) hydrophobic residues cluster together– (Predicted) secondary structures are used to further
estimate distances between residues (e.g. between the first and last residue in a -strand).
– It first constructs a random high dimensional C (and pseudo C) distance matrix
– Distance geometry is used to find the 3D conformation corresponding to a prescribed matrix of desired distances between residues (by gradual inertia projection and based on input MSA and predicted secondary structure)
DRAGON = Distance Regularisation Algorithm for Geometry OptimisatioN
![Page 77: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/77.jpg)
•The C distance matrix is divided into smaller clusters.
•Separately, each cluster is embedded into a local centroid.
•The final predicted structure is generated from full embedding of the multiple centroids and their corresponding local structures.
3NN
NN
C distancematrix
Targetmatrix
N
CCHHHCCEEE
Multiple alignment
Predicted secondary structure100 randomised
initial matrices
100 predictions Input data
![Page 78: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/78.jpg)
Lysozyme 4lzm
PDB
DRAGON
![Page 79: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/79.jpg)
Methyltransferase 1sfe
DRAGON
PDB
![Page 80: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/80.jpg)
Phosphatase 2hhm-A
PDB DRAGON
![Page 81: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/81.jpg)
Taylor method (1999)DOMAIN-3D
3. Assign domain boundaries to each of the 3D models (Taylor, 1999)
• Easy and clever method• Uses a notion of spin glass theory (disordered magnetic
systems) to delineate domains in a protein 3D structure• Steps:
1. Take sequence with residue numbers (1..N)2. Look at neighbourhood of each residue (first shell)3. If (“average nghhood residue number” > res no) resno = resno+1
else resno = resno-1 4. If (convergence) then take regions with identical “residue
number” as domains and terminate
Taylor,WR. (1999) Protein structural domain identification. Protein Engineering 12 :203-216
![Page 82: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/82.jpg)
Taylor method (1999)
41
5
6
89
56
78
repeat until convergence
if 41 < (5+6+56+78+89)/5
then Res 41 42 (up 1)
else Res 41 40 (down 1)
![Page 83: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/83.jpg)
Taylor method (1999)
continuous
discontinuous
![Page 84: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/84.jpg)
SNAPDRAGONDomain boundary prediction protocol using sequence information
alone (Richard George)
4. Sum proposed boundary positions within 100 models along the length of the sequence, and smooth boundaries using a weighted window (assign central position)
Window score = 1≤ i ≤ l Si × Wi
Where Wi = (p - |p-i|)/p2 and p = ½(n+1).
It follows that l Wi = 1George R.A. and Heringa J.(2002) SnapDRAGON - a method to delineate protein structural domains from sequence data, J. Mol. Biol. 316, 839-851.
i
Wi
![Page 85: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/85.jpg)
SNAPDRAGONStatistical significance:• Convert peak scores to Z-scores using
z = (x-mean)/stdev • If z > 2 then assign domain boundary
Statistical significance using random models:• Test hydrophibic collapse given distribution of
hydrophobicity over sequence• Make 5 scrambled multiple alignments (MSAs) and predict
their secondary structure• Make 100 models for each MSA• Compile mean and stdev from the boundary distribution
over the 500 random models• If observed peak z > 2.0 stdev (from random models) then
assign domain boundary
![Page 86: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/86.jpg)
SnapDRAGON prediction assessment
• Test set of 414 multiple alignments;183 single and 231 multiple domain proteins.
• Boundary predictions are compared to the region of the protein connecting two domains (maximally 10 residues from true boundary)
![Page 87: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/87.jpg)
SnapDRAGON prediction assessment• Baseline method I:
• Divide sequence in equal parts based on number of domains predicted by SnapDRAGON
• Baseline method II: • Similar to Wheelan et al., based on domain length
partition density function (PDF)• PDF derived from 2750 non-redundant structures
(deposited at NCBI) • Given sequence, calculate probability of one-
domain, two-domain, .., protein• Highest probability taken and sequence split equally
as in baseline method I
![Page 88: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/88.jpg)
Continuous set Discontinuous set Full set
SnapDRAGONCoverage 63.9 (± 43.0) 35.4 (± 25.0) 51.8 (± 39.1)
Success 46.8 (± 36.4) 44.4 (± 33.9) 45.8 (± 35.4)
Baseline 1Coverage 43.6 (± 45.3) 20.5 (± 27.1) 34.7 (± 40.8)
Success 34.3 (± 39.6) 22.2 (± 29.5) 29.6 (± 36.6)
Baseline 2Coverage 45.3 (± 46.9) 22.7 (± 27.3) 35.7 (± 41.3)
Success 37.1 (± 42.0) 23.1 (± 29.6) 31.2 (± 37.9)
Average prediction results per protein
Coverage is the % linkers predicted (TP/TP+FN)Success is the % of correct predictions made (TP/TP+FP)
![Page 89: Domains, their prediction and domain databases Lecture 16: Introduction to Bioinformatics C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I.](https://reader036.fdocuments.us/reader036/viewer/2022070410/56649eaa5503460f94baeb83/html5/thumbnails/89.jpg)
Average prediction results per protein