SYSTEMS BIOLOGY Lukasz Huminiecki, DPhil Nobel medical institute, Karolinska, Stockholm & Ludwig...
-
Upload
griselda-evans -
Category
Documents
-
view
220 -
download
0
description
Transcript of SYSTEMS BIOLOGY Lukasz Huminiecki, DPhil Nobel medical institute, Karolinska, Stockholm & Ludwig...
SYSTEMS BIOLOGY
Lukasz Huminiecki, DPhil
Nobel medical institute, Karolinska, Stockholm & Ludwig Institute for Cancer Research, Uppsala
Please, tell me who you are!
Computer scientist/mathematician
Computational biologist/bioinformatician
Raise your hand if you are:
Experimental biologist
Postgraduate
Undergraduate
Post-doc
WHAT IS ”SYSTEMS BIOLOGY”?
”Systems biology is the coordinated study of biological systems by (1) investigating the components of cellular networks and their interactions, (2) applying exprerimental high-throughput and whole-genome techniques, and (3) integrating computational methods with experiemntal efforts.” – first sentence of the Preface, to Klipp E et al. ”Systems Biology in Practice”, WILEY-VCH, 2005.
What do you think?
Back to the Roots?
In fact, early criticics argued that molecular approaches are too reductionist, attempting to explain complex biological phenomena, through actions of few genes or proteins.
There is a cyclical element to all progress!
Before the era of the molecular revolution physiology-oriented biologists were much more used to looking at living things as systems.
Four areas of systems biology on which I will focus today
• Analysis of expression patterns
• Mathematical modeling
• Phylogenetics
• Web-resources and data integration
PART 1EXPRESSION PATTERN
EVOLUTION
Classic view of evolution through gene duplication
• Susumu Ohno, 1970. Evolution by Gene Duplication. Springer, Berlin
• “Natural selection merely modified while redundancy created"
• The neo-functionalization model
Genome-scale tests (1)
Genome-scale tests (2)• Nembaware et al. 2002: Impact of the
presence of paralogs on sequence divergence in a set of mouse-human orthologs. Genome Research
Gene Expression Atlas• http://expression.gnf.org• 101 human (microchip U95A) and 89 mouse
(microchip U74A) Affymetrix experiments • Huminiecki L, Lloyd AT, Wolfe KH.
Congruence of tissue expression profiles from Gene Expression Atlas, SAGEmap and TissueInfo databases. BMC Genomics. 2003 Jul 29;4(1):31
• Mapping to Ensembl via LocusLink• TRIBE families and Ka/Ks calculations using
yn00 from PAML
Huminiecki et al. “Congruence of tissue expression profiles from GEA, SAGEmap and TissueInfo databases”. BMC Genomics
R vs. Ks in paralogs
One-to-one orthologs
Human or mouse duplication
Cumulative plots
Randomisation test
R > 0.6 R > 0.7 R > 0.8 R > 0.9
Human duplication
91%p = 0.37
58% p = 0.0042
52% p = 0.0043
60% p = 0.038
Mouse duplication
61% p = 0.0111
48% p = 0.0027
36% p = 0.0015
24% p = 0.0018
The percentages indicate the ratios between the fractions of genes having a particular R-value in sets of orthologues with the human (163 sets) or mouse (139 sets) duplication versus the group of one-to-one orthologues (1,324 pairs).
Sub-functionalisation• Force et al. argue that neo-
functionalisation alone could not account for high accumulation of duplicated genes in eucaryotes
• Duplication-degeneration-complementation (DDC)
• Should lead to tissue-specific expression!
Tissue-specific genes evolve faster and are more likely to belong to large gene families
Gene expression patterns are, in evolutionary perspective,
surprisingly labile!
Literature• Khaitovich P, Weiss G, Lachmann M, Hellmann I, Enard W, Muetzel
B, Wirkner U, Ansorge W, Paabo S. A neutral model of transcriptome evolution. PLoS Biol. 2004 May;2(5):E132. Epub 2004 May 11.
• Huminiecki L, Wolfe KH. Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. Genome Res. 2004 Oct;14(10A):1870-9.
• Jordan IK, Marino-Ramirez L, Koonin EV. Evolutionary significance of gene expression divergence. Gene. 2005 Jan 17;345(1):119-26. Epub 2004 Dec 29.
• Khaitovich P, Paabo S, Weiss G. Toward a neutral evolutionary model of gene expression. Genetics. 2005 Jun;170(2):929-39. Epub 2005 Apr 16.
• Liao BY, Zhang J. Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol. 2006 Mar;23(3):530-40. Epub 2005 Nov 9.
The take home message• An entirely new paradigm is emerging
in evolutionary biology: expression patterns can change dramatically in the course of evolution.
• This impacts on our understanding of biodiversity, human origins, and drug discovery.
Broad goals of collaboration with Pfizer
We aim towards a set of heuristic rules to identify the most “druggable” GPCRs and the best model species in which to conduct preclinical tests. By “druggable” it is meant those which possess any single or combination of characteristics favourable to drug development, such as: (1) conserved sequence, (2) tissue-specificity, and (3) expression domain not overlapping with other members of the family.
Conserved sequence suggests that function is the same, and that drugs will have similar efficacy. A tissue-specific gene facilitates targeting into specific organs or tumour types, and is less likely to engage in multiple functions - both of these features are likely to result in advantageous toxicological profiles. Non-overlapping expression domain minimises the possibility of functional redundancy. Finally, the best animal model for preclinical trials is likely to be the species with the most “human” expression pattern of the target gene, especially in tissues directed for therapeutic intervention, as well as in toxicologically important organs, such as heart, lung, liver, kidney, and brain.
Specific goals of collaboration with Pfizer
• Generate high quality RNA preparations from 20 organs from duplicate male and female rat, guinea pig and dog samples, for comparison with commercial human RNA samples.
• Using qPCR techniques, determine the expression profile of at least 25 genes (with representatives from the histaminergic, serotinergic, and adrenergic GPCR families) in each of these tissues.
• Analyse data to consider congruence in expression profiles between species from an evolutionary bioinformatics perspective, in addition to gaining a deeper understanding into the degree of human-animal model translation and therefore into the suitability of animal species used for functional efficacy and toxicological studies at Pfizer.
Results: RNA isolation
a) b) c)
Polytron/RNAeasy with additional acid phenol step and DNAaway for difficult tissues
• cumulative genes RT_samples ---------- ------ ----------
• run ----> id <---------- id assay | symbol prep tissue | species
• gene ---------- -------- tissue rt ------- | ct | |
• \/ \/ • | RT_summary tissue preps • | ---------- ------------ -------• | id date ----- tissue_index <---------- prep • | technician | tissue_name species • | kit | tissue • | samples <- donor • | description ratio • | dilution yield • | technician• | housekeep_actb • -------------> housekeep_hprt • -------------- • count • run • tissue • ct • dev
The Database
The Ct value• Two-tube comparative method
with ”virtual” housekeeping gene • Amplification assumed to be exponential
with 100% efficiency, Cts scaled accordingly• a) histogram of Ct-values for over 6000 reactions; b) standard deviations
in triplicates; c) ACTB plotted against HPRT1. a) b) c)
• Tissue RNAs from rat, guinea pig and dog were isolated. Human RNAs were purchased from Clontech.
• Human, rat and canine expression profiles of just under 40 genes have been examined thus far. Approximately 8 thousand assays have been performed.
• A number of striking differences in expression patterns have been revealed.
• Thus far, the most remarkable expression shifts have been observed in heart and aorta, among histamine, prostacyclin and adrenergic beta receptors. Numerous changes were also localised to the uterus.
• Apart from divergent expression patterns, mean expression levels also appeared rather different for many genes.
• Differences in expression between prostanoid receptors may have implications for the pharmacology of troublesome COX-2 inhibitors (such as Celebrex, Bextra, and Vioxx).
Results overview
PART 2MODELING
Mathematische Modellierung von Stoffwechsel und Genexpression
Mathematical Modeling ofMetabolism and Gene Expression
• Dr. Edda Klipp• Kinetic Modeling Group
• Vorlesung in der Reihe• “Gene und Genome: die Zukunft der Biologie”
What is a model?Yeast, mouse – as models for human
Verbal explanation
A sequence of letters ATTCGAGGTATA for DNA sequence
Wiring scheme
Mathematical description: Boolean NetworkDifferential EquationsStochastic Equations
- Abstraction-(Simplified) representation allowing for understanding
Edda Klipp, Kinetic Modeling Group
Why modeling?
Even the behavior of simple systems can usually not be predicted intuitively and from experience.
The behavior of complex dynamical processes can not predicted with sufficient precision just from experience.
For prediction and explanation of processes one needs a model.
Experimental observations: many simple and complex processes
isolated enzymatic reaction:
temporal prozesses in metabolic networkspattern of gene expression and regulation
Edda Klipp, Kinetic Modeling Group
Why modeling?
Advantages- Time scales may be streched or compressed.
- Solution algorithms / computer programmes can often be used indepentend of the actually modeled system.
- Costs of modeling are lower than for experiments.
- Representation of quantities that are experimentally hidden.
- No risk for real systems, no interactions investigation/system.
Edda Klipp, Kinetic Modeling Group
Why modeling?
Burning questions- How is cellular response to environmental changes and stress regulated?
- How should a cell be treated to yield a high output of a desired product (Biotechnology)
- Where should a drug operate to cure a disease (Health care)?
- Is our knowledge about a network/pathway complete?
Edda Klipp, Kinetic Modeling Group
Structure of the system
SextS1 S2 S3 S4 S5 Smito
S6
fast
slowslow
Variables, parameters, constantsState variables - set of variables describing the system completelyDimension of the systems = number of independent state variables
How many variables are used in my model? too few – System ist under-determinedtoo many – System ist over-determined and may be contratictery
Units of variables and parameters etc. fit together?
Boundary of the system
Edda Klipp, Kinetic Modeling Group
Biological processes arecomplex phenomena
Central dogma of molecular biology:
GenemRNA
ProteinesCellular processes
Edda Klipp, Kinetic Modeling Group
Direction of discovery
known to be predicted
Structure FunctionProtein interactions Biochemical actionMetabolic pathways Concentration changesEnzyme sets Influence of perturbations
Possible behavior, bifurcations : :
Function StructureTransmission of a signal Sequence of signaling compoundsTime course of concentrations Possible protein interactions : :
Edda Klipp, Kinetic Modeling Group
Concept of stateThe state of a system is a snapshot of the system at a given time that contains enough information to predict the behaviour of the system for all future times. The state of the system is described by the set of variables that must kept track of in a model.Different models of gene regulation have different representations of the state:Boolean model: a state is a list containing for each gene involved, of whether
it is expressed („1“) or not expressed („0“)Differential equation model: a list of concentrations of each chemical entityProbabilistic model: a current probability distribution and/or a list of actual numbers of molecules of a type
Each model defines what it means by the state of a system.Given the current state the model predicts what state/s can occur next.
Edda Klipp, Kinetic Modeling Group
Kinetics – change of stateA Bk
Deterministic, continuous time and state: e.g. ODE modelconcentration of A decreases and concentration of B increases. Concentration change in per time interval dt is given by
AkdtdB
Probabilistic, discrete time and state : transformation of a molecule of type A into a molecule of type Sorte B. The probability of this event in a time interval dt is given by
aktadttaP ,,1a – number of molecules of type A
Deterministic, discrete time and state : e.g. Boolean network modelPresence (or activity) of B at time t+1 depends on presence (or activity) of A at time t tAftB 1
Edda Klipp, Kinetic Modeling Group
Boolean Models
(George Boole, 1815-1864)Each gene can assume one of two states:
expressed („1“) or not expressed („0“)
Background: Not enough information for more detailed descriptionIncreasing complexity and computational effort for more specific models
(discrete, deterministic)
Replacement of continuousfunctions (e.g. Hill function)by step function
Edda Klipp, Kinetic Modeling Group
Boolean ModelsBoolean network is characterized by- the number of nodes („genes“): N- the number of inputs per node (regulatory interactions): k
The dynamics are described by rules:
„if input value/s at time t is/are...., then output value at t+1 is....“
Boolean network have always a finite number of possible states and,therefore, a finite number of state transitions.
B C
Linear chain
Ring
A B C D
A B
C D
A
B
A
Edda Klipp, Kinetic Modeling Group
Boolean ModelsTruth functions
in outputp p not p
0 0 0 1 11 0 1 0 1
rule 0 1 2 3
And Or Nor0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
rule 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
input outputp q
A B
B(t+1) = not (A(t))rule 2
Edda Klipp, Kinetic Modeling Group
Boolean Models
gene a gene b
gene c gene d
C
A
D
B
AB
+
+
repression
activation
transcription
translation
gene
protein
a b
c d
Boolean network
a(t+1) = a(t)
b(t+1) = (not c(t)) and d(t)
c(t+1) = a(t) and b(t)
d(t+1) = not c(t)
0000 00010001 01010010 00000011 00000100 00010101 01010110 00000111 0000
Steady state: 0101
1000 10011001 11011010 10001011 10001100 10111101 11111110 10101111 1010
Zyklus: 1000 1001 1101 1111 1010 1000
Edda Klipp, Kinetic Modeling Group
Boolean Models
- The number of states is finite, , as well as number of state changes.
- The system may reach steady states or cycles.
- Not every state can be reached from every other state.
-The successor state is unique, the predecessor state not.
Advantages: easy description with simple rules, no parameterscomputationally not demanding
Drawbacks: no intermediate values
N2
Edda Klipp, Kinetic Modeling Group
Description with Differential Equations
X + DNA X-DNAk1
X-DNA X + DNAk-1
Nucleic acids + DNA mRNA + DNAk1
mRNA Nucleic acidsk-1
Amino acids + mRNA Proteins + mRNAk2
Proteins Amino acidsk-2
DNAXkDNAXkdtDNAXd
11
SfS dtd
S – vector of concentrationsf – function(s), often non-linear
Edda Klipp, Kinetic Modeling Group
Basic Elements of Biochemical Networks
S1
S2
S4
S3
v1 v2
v3
v4
v5
dtdSdtdSSpdtdS
SpSSpdtdSSppdtdS
24
253
244132
1211
S1[0] = 0
S2[0] = 0S3[0] = 0S4[0] = 1
p1 = 1p2 = 1 p3 = 1 p4 = 0.5p5 = 0.5 0 1 2 3 4 5
0
0.5
1
S[t]
S1S2
S3 S4
Time
Systems equationsr – number of reactionsSi – metabolite concentrationsvj – reaction ratesnij – stoichiometric coefficients
Network properties Individual reaction properties
r
jjij
i vndt
dS
1
p,pSvv ijnN
Kinetics Dynamics admissible steady state fluxes conservation relations
Edda Klipp, Kinetic Modeling Group
ODE - concept of steady state
0pS,vN 0dtdS or
•no change of concentrations•but (usually) non-vanishing fluxes or rates
Time
To restrict modeling to main aspects often the asymptotic behaviour of dynamic systems is analyzed (behavior after sufficient long time). It may be Va
riabl
e
- oscillatory- chaotic
- in many relevant situations the system will reach a steady state.
Edda Klipp, Kinetic Modeling Group
Data BasesGO (Gene Ontology) http://www.geneontology.org, functional description of gene products KEGG (Kyoto Enzyclopedia of Genes and Genomes) http://www.genome.ad.jp/kegg/, reference knowledge base offering information about genes and proteins, biochemical compounds and reactions, and pathways BRENDA (Comprehensive Enzyme Information System) http://www.brenda.uni-koeln.de, curated database containing functional data for individual enzymes NCBI (National Center for Biotechnology) http://www.ncbi.nlm.nih.gov/ ,provides several databases: - molecular databases, with information about nucleotide sequences, proteins, genes, molecular structures, and gene expression - taxonomy database: names and lineages of more than 130,000 organisms
SPAD (Signaling PAthway Database) http://www.grt.kyushu-u.ac.jp/spad/index.html, information about signaling pathways (schemes, links) JWS Online, Model database http://jjj.biochem.sun.ac.za/database/index.html , published models,implemented in Mathematica®
Models can be simulatedBiomodels, Model database http://www.biomodels.net/ , published models,implemented in SBML
Edda Klipp, Kinetic Modeling Group
Modeling Tools•BALSA•BASIS•BIOCHAM•BioCharon•biocyc2SBML•BioGrid•BioModels•BioNetGen•BioPathway Explorer•Bio Sketch Pad•BioSens•BioSPICE Dashboard•BioSpreadsheet•BioTapestry•BioUML•BSTLab•CADLIVE•CellDesigner•Cellerator•CellML2SBML•Cellware•CL-SBML•COPASI
•Cytoscape•DBsolve•Dizzy•E-CELL•ecellJ•ESS•FluxAnalyzer•Fluxor•Gepasi•INSILICO discovery•JACOBIAN•Jarnac•JDesigner•JigCell•JWS Online•Karyote*•KEGG2SBML•Kinsolver*•libSBML•MathSBML•MesoRD•MetaboLogica•MetaFluxNet
•MMT2•Modesto•Moleculizer•Monod•Narrator•NetBuilder•Oscill8•PANTHER Pathway•PathArt•PathScout•PathwayLab•Pathway Tools•PathwayBuilder•PaVESy•PNK•Reactome•ProcessDB•PROTON•pysbml•PySCeS•runSBML•SBML ODE Solver•SBMLeditor
•SBMLmerge•SBMLR•SBMLSim•SBMLToolbox•SBToolbox•SBW•SCIpath•Sigmoid*•SigPath•SigTran•SIMBA•SimBiology•Simpathica•SimWiz•SmartCell•SRS Pathway Editor•StochSim•STOCKS•TERANODE Suite•Trelis•Virtual Cell•WinSCAMP•XPPAUT
http://sbml.orgEdda Klipp, Kinetic Modeling Group
Conclusions•Mathematical models of cellular processes allow for a testable representation of experimental knowledge.
•Models clarify systemic and dynamic properties of the investigated object.
•Models allow simulating processes independent of the experiment.
•Modeling reveals regulatory properties of cellular networks Osmostress response:
–The role of channel Fps1 in osmoresponse–The ability to repeated stimulation and the contribution of phosphatases–Feedback loops / signal integration and separation
•Models can have predictive value–Mutant phenotypes–Effect of intervention–Integration of external signals to cell cycle progression–Critical cell size for G1/S transition
Edda Klipp, Kinetic Modeling Group
Process of model development- Analysis of the objects to be modeled
- Formulating of the scientific PROBLEMS
- Design of a simple model - as „cartoon“- in mathematical terms
- Solve the respective (mathematical) problemes- Comparison of results with real system (EXPERIMENT) - Difference- iterative enhancement of the models (structure, parameters, …)
Distribution of molecules on Both sides of a membrane
Ai Ao
dAi/dt = f(Ai, Ao, C, p)
If we would not make models, then we would not know, why they are wrong
Edda Klipp, Kinetic Modeling Group
Modeling
Mathematical Models for Cellular Processes
ODE-Systemsstructural
Knowledge +
experimental Data
System AnalysisSimulation,Parameteridentification
System Understanding + Prediction
Metabolic and Regulatory Networks
Edda Klipp, Kinetic Modeling Group
Basic Elements of Biochemical Networks
Glucose1-P Glucose6-P Fructose6-Pv1 v2
Phospho-glucomutase
Glucose-Phosphat-isomerase
Metabolite Metabolite Metabolite
Reaction Reaction
Design of structured metabolic models
1. Determination of system limitsG1P G6P F6Pv1 v2
Systemextern extern
Concentration change = Production – Degradation + Transport Transportvvv
dtPdG
2162. Balancing
PGKPGVv
M 11
1
maxRate as function of concentrations and parameters
3. Assignment of Kinetics
Transport
Edda Klipp, Kinetic Modeling Group
Hypothesis Generation
establish a mathematical model of the network
-define a performance function
-calculate parameters optimizing the performance function
-compare prediction with experimental data
Possible theoretical approaches:
Structure FunctionModelling of Systems Dynamics
Function StructureEvolutionary Optimization
HomeostasisAppropriate ResponseExperimental data
Network Control patternParameters
Edda Klipp, Kinetic Modeling Group
Model examples -MetabolismIn Vivo Analysis of Metabolic Dynamics in S. cerevisiae:M. Rizzi, M. Baltes, U. Theobald, M. ReussBiotechnol Bioeng.55: 592–608, 1997.
Representation of Metabolismin the KEGG data basewww.kegg/kegg2.jp
Edda Klipp, Kinetic Modeling Group
Model examples –Signaling pathways
GDPG GTPGG
GDPG
GTP GDP
Ra*
P
Signal
MAP KKKK
MAP KKK MAP KKK-P
MAP KK MAP KK-P
MAP K MAP K-P Signal
ATP ADP
MAP KK-PP
MAP KKK-PP
MAP K-PP
ATP ADP
ATP ADP ATP ADP
ATP ADP ATP ADP
MAP K cascade
A-P A
ADP ATP
B B-P
C-P C
P
k1
k2
k3
k4
PhosphoRelaysystem
Signal
G-Protein
Edda Klipp, Kinetic Modeling Group
Common properties
Cellular network has a high degree of connectivity.
The processes are reactions, molecular interactions.bindingintramolecular transformationsrelease
Differences in modeling of different partsare due to appropriate approximations.
Edda Klipp, Kinetic Modeling Group
Concentrations
Signalling Metabolism
Proteins low
~ 100-300 nmol/L(~ 103-104 molecules per cell)
(catalysts and substrates)
ATP ~ 2 mmol/L
Enzymes low
Metabolites higher
Edda Klipp, Kinetic Modeling Group
Network CharacteristicsSignaling
Reactions can be - catalysed by enzymes- autocatalytic.
The network is given by the existing proteinand their interactions.
Metabolism
All reactions are catalysed by enzymes.
The network is determined by the existing enzymes(which not necessarily interact).
Metabolites need not to be there initially.
Edda Klipp, Kinetic Modeling Group
Network CharacteristicsSignaling Metabolism
MAP K MAP K-P
ATP ADP
MAP K-PP
ATP ADP
P P
Glucose Gluc 6-P
ATP ADP
Fruc 1,6-PP
ATP ADP
P
Fruc 6-P
State changes: change in phosporylation statesCoding of information
But: Conservation(MAPK + MAPK-P + MAPK-PP)in the considered time window
Important feature:Flux through the pathway,(final) transformation of metabolites
Phosphorylation energy transfer
Edda Klipp, Kinetic Modeling Group
Rate equations…. Are a Choice of the Modeler
Signaling Metabolism
MAP K MAP K-P
ATP ADP
Glucose Gluc 6-PATP ADP
Catalyst and Substrate have aboutthe same concentration (ES)
Binding slow compared to intramolecularrearrangements.
First order kinetics
Typical choice:Michaelis-Menten-Kinetics
E+S ES E+P
Requirement: E << S
Hexokinase
Mg2+
MAP KK-PP
fast slowtot
MEkV
SKSVv
2max
max ,
ATPkkSEkv ,
Mass action kinetics
Edda Klipp, Kinetic Modeling Group
Spatial effectsSignaling Metabolism
„well stirred“
Molecules are considered to meet with probability according to their concentration (mass action).
Spatial effects usually neglected.
„well stirred“ ???
Low number of molecules,Highly organised complexes,Often membrane-bound.
Spatial effects should be considered.(problem with ODEs)At least as „compartmentalisation“
Edda Klipp, Kinetic Modeling Group
Temporal characterisationSignaling Metabolism
Time constants for reactions
kk1 A B
k+
k-
1
i i
jijj S
vn
nij - stoichiometric coefficients
0
0
dtdxdf
dtdxdft
Tc
Time constants for metabolites
Definition acc.to Llorens et al. 1998
Amplitude
Heinrich et al., 2002
0
0
dttX
dttXt
i
i
i
2
0
0
2
i
i
i
i
dttX
dttXt
i
i
i
dttX
S
20
Transition time
Duration
time
Edda Klipp, Kinetic Modeling Group
Conclusions
Models for Metabolism and Signaling can use theSame Design Principles.
Metabolism and Signaling may take place in different areas of the cellsdifferent regions of the concentration spacedifferent time scales
Signaling models have to account for the hierarchy in the system
Regulatory couplings (feedback) distribute control in both cases.
Edda Klipp, Kinetic Modeling Group
EXAMPLE TGFbeta signal transduction:
the SMAD engine
Overview of the
pathway• Ligand dimer binds to
receptor heterotetramer (type I and II receptors, both ser/tre kinases)
• r-SMAD1/5/8 versus r-SMAD 2/3
• Phosphorylated r-SMAD binds SMAD4 and travels to the nucleus
• Ubiquitylation (SMURF1-dependent and independent)
LETS LOOK UP THE TGFbeta PATHWAY!
www.reactome.org
Example: Vilar et al. 2006, PLoS Computational Biology
Signal Processing in the TGFbeta Superfamily Ligand/Receptor Network
From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology
14 ligands, 5 type II and 7 type I receptors – this results in 50 different ligand/receptor complexes
Figure 2
Unusual features of the TGFbeta pathway
Simple core trasduction engine (two SMAD channels: 2/3 and 1/5/8) but very complex, diverse respones (42 ligands, 5 type II and 7 type I receptors, 300 target genes)
Receptors are constitutively internalised and recycled – only app. 10% present on the plasma membrane at any time
Comparatively late activation peak: app. 60 minutes (compare with EGFR of only 5 minutes)
Several negative feedback loops, including:- constitutive degradation- ligand-induced degradation (Smad7-Smurf2)
From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology
Ki = 1/3 min
30 min
60 min
Klid = 1/4 min
Figure 3
Sources of experimental dataMitchell H, Choudhury A, Pagano RE, Leof EB. Ligand-dependent
and –independent transforming growth factor-beta receptor recycling regulated by clathrin-mediated endocytosis and Rab11. Mol Biol Cell, 2004, 15: 4166-4178:
• Recycling rate – Figure 3 (app. 30min)• Internalisation rate – Figure 4Di Guglielmo GM, Le Roy C, Goodfellow AF, Wrana JL. Distinct
endocytic pathways regulate TGF-beta receptor signalling and turnover. Nat Cell Biol, 2003, 5: 410-421:
• Internalisation rate – Table 1 - receptors are internalised through the clathrin pathway and lipid-caveolar compartments with similar rates
• Degradatation rate – Figure 3 – app. 400 min
Figure 3, Mitchell et al.Figure 3. TGF-beta receptors recycle at the same rate in the presence and absence of ligand. (A) Mb202 1-18 cells were processed for imaging and fluorescence quantitation as in Figure 2, B and C,, except 10 ng/ml GM-CSF was included in both incubations. Bar, 10 µm. (B) Cultures were labeled with 125I-Fab anti-GM-CSF receptor- for 2 h at 4°C in the presence ( ) or absence ( ) of 10 ng/ml GM-CSF. After washing and incubation at 37°C for 30 min (in the presence or absence of 10 ng/ml GM-CSF), labeled receptor antibody was removed by acid wash and the cultures returned to 37°C. (…) Results are expressed as percentage of the total cell-associated radioactive counts after the first acid strip and before further incubation at 37°C, and indicate the mean ± SD of two experiments done in duplicate.
Figure 4, Mitchell et al.
• Figure 4. TGF-beta receptors internalize at the same rate regardless of activation state. Mb202 1-18 cells were prebound with radiolabeled antibody in the presence ( ) or absence ( ) of 10 ng/ml GM-CSF as in Figure 3B and then incubated at 37°C for the indicated times. Surface antibody was removed by acid treatment at 4°C, after which cells were processed to determine internalized radioactivity (see Materials and Methods). Results are expressed as percentage of total cell-associated radioactive counts before incubation at 37°C and indicate the mean ± SD of two experiments done in duplicate.
Table 1, Di Guglielmo et al. Quantitation of TGF-beta receptor distribution by
immunoelectron microscopy
From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology
Figure 4
Plasma membrane concentrations[lRiRii] - ligand/heterotetramer receptor complex[l] - ligand[Ri] - receptor type i[Rii] - receptor type ii
kα - ligand/receptor complex formation ratekcd - constitutive degradation rateklid - ligand induced degradation rateki - internalisation rate
Endosomal concentrations[lRiRii] - ligand/heterotetramer receptor complex[Ri] - receptor tpe i[Rii] - receptor type ii
ki - internalisation ratekr - recycling rate
From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology
Late and long
Slower rates for internalisation and recycling: Ki = 1/10 min kr = 1/100 min
Figure 5
From Vilar et al. 2006, 2:1 0036-0045, PLoS Computational Biology
CIR makes the difference
Figure 6
PART 3PHYLOGENETICS
WHAT I WILL TALK ABOUT• A BIT OF THEORY
• EXAMPLES (CRISPs AND SMADs)
• MEGA PACKAGE - HOWTO
• INTERPRETATIONS (what can a simple BLAST search, multiple sequence alignment, or a tree, tell me about BIOLOGY)
First Things First(definitions)
• Phylogenetic analysis• Phylogenetic tree
– rooted– unrooted
• Homology– paralogy– orthology
• one-to-one• co-orthology
• Nucleotide substitutions– synonymous– non-synonymous
A phylogenetic analysis of a family of related nucleic acid or protein sequences, is a determination of how the family members might have been derived during evolution
Phylogenetic tree – a graphical representation that depicts evolutionary relationships between a set of related sequences. Most-alike sequences are placed at the outer ends if two branches that are joined below into a lower common branch, representing their derivation from an ancetral sequence. An unrooted tree does not provide information on the common ancestor to the group.
What is phylogenetics?
The simplest tree
Species A
Species B
Ancestral species
Evolutionary time
Gene A
Gene B
Ancestral genebranches
node
root
Homologs. Genes whose sequences are so similar that they almost certainly arose from a common ancestor gene
(1) Orthologs are genes in different species that arose from a single gene in the most recent
common ancestor of those species – that is, by a process of speciation
(2) Paralogs, on the other hand, are genes in the same species that arose from a single gene in an ancestral species by a process of
duplication
Who is Who of -ologs
Evolutionary time
Gene A1
Gene A2
Gene B1
Ancestral gene
Gene A2b
Gene B2
paralogs
co-orthologs1:1 orthologs
Non-synonymous substitution – a nucleotide substitution that results in an amino acid change (dn)
Synonymous substitution – a ”silent” nucleotide substitution, often in the third codon position, that does not result in an amino acid change (ds)
dn/ds – the simplest test for the rate of evolution (1 <, > 1, = 1)
Synonymous or non-synonymous?
EXAMPLE
cysteine-rich secretory proteins (CRISPs)
There are three CRISP genes in human, rat and mouse. However, their nomenclature is misleading
• None of the genes are simple one-to-one orthologs
• A single ancestral gene at the base of the vertebrate lineage was most likely subject to two rounds of gene duplication before the human/rodent split, but the picture is complicated by species-specific duplications and lineage-specific losses
• A surprisingly high number of changes in gene expression patterns have occurred during the evolution of the CRISP family. For detailed discussion, please see: (Huminiecki and Wolfe, Genome Research, 2004)
EXAMPLE TGFbeta signal transduction:
the SMAD engine
Overview of the
pathway• Ligand dimer binds to
receptor heterotetramer (type I and II receptors, both ser/tre kinases)
• r-SMAD1/5/8 versus r-SMAD 2/3
• Phosphorylated r-SMAD binds SMAD4 and travels to the nucleus
• Ubiquitylation (SMURF1-dependent and independent)
Interesting phylogenetic phenomena
• DPP/BMP Type-1 receptor and an r-SMAD found in non-bilaterian cnidarian (Acropora millepora) – has the pathway evolved in a context other than dorsoventral patterning?
• Two SMAD4 in frogs: XSMAD4α and XSMAD4β. Also worms could have two co-SMADs (Sma-4 and Daf-3) but only one SMAD4 expected in mammals!
What is the ancestral SMAD?• Hypothesis: an ancestral SMAD – CoRe-SMAD
– worked as a homodimer. The gene duplicated and gave rise to an r-SMAD and a co-SMAD
• But where did the i-SMADs come from? – i-SMADs evolve faster (evidence: average dn/ds,
length of protein branches, missing phosphorylation motif, and L3 sequence not conserved between DAD and i-SMAD6, 7);
– (((mad, dsmad2), medea),dad)– (((((SMAD1,SMAD5), SMAD9),SMAD2, SMAD3), SMAD4), SMAD6, SMAD7)
Amino-acid PAM matrix, neighbour joining tree
vertebrate SMAD1,5,9 D. melanogaster Mad
vertebrate receptor SMADs D. melanogaster dSMAD2
sma-2
daf-8
sma-3
daf-3
sma-4
vertebrate co-SMADs D. melanogaster Med Medea dSMAD4
daf-14
tag-68
D. melanogaster Dad
vertebrate SMAD7 vertebrate SMAD6
0.5
Fascinating C. elegans SMADs
Positive selection in sma/daf branches?
Sma genes control body size, while daf genes control dauer formation. Lengths of protein branches suggested that daf genes underwent a period of very fast protein evolution. Could it be positive selection in response to environmental change? dn/ds test positive!
Daf corresponding SMAD evidenceDaf-3 co-SMAD(?) nj_PAM, newfeld2_MH1_ml
i-SMAD newfeld2_p-loop_degenerateDaf-8 r-SMAD nj_PAM, newfeld2_p-loopDaf-14 co-SMAD(?) nj_PAM
co-SMAD newfeld2_p-loop(2S)Tag-68 i-SMAD nj_PAM
Interpretations of phylogenies
How all this could help in my project?
I will propose just a few ideas – please, join in, voice your suggestions, discuss your favourite gene family!!!
Application 1”Evolutionary Saga” or my gene family over the eons
Is the gene family present in bacteria, yeast, plants, non-bilaterial animals? To find out, just run a BLAST search against GenBank and read names of the species with hits. Can one infer from this how old the family is?
How many gene duplication events, and when did they occur? Have there been any deletions? Has the intron number changed, or there is no introns (suggestive of retroposition)
Can these events be correlated with the development of a new body plan, new organs, or novel physiology? Is this correlation supported by the sites of expression?
Porifera (sponges)
Cnidaria (jellyfish, coral)
Flatworms
Molluscs (gastropods)
Annelids (leeches)
Arthropods (insects)
Vertebrates (fish, birds, mammals)
Urochordates
Cephalochordates
HemichordatesEchinoderms (sea urchins, starfish)Nematodes (?)
Bilateri
a
Metazoanphylogeny
Wnt
TGFbet
a FGF2R?
Expansion of the signal transduction toolkit
Cnidaria C. elegans Drosophila Human and porifera
TGFbeta 1(?) 4 - 27
Wnt >1 5 7 18
FGF - 1 1 23
Increased anatomical complexity(diversification of body plans and body parts)
Application 2”My Gene and the Genome”, or how my favourite gene compares to other members of the gene family?
How many related genes, how similar, and in what physical location in the genome (most duplications are tandem, head-to-tail)
Evidence for functional redundancy? (important for knockouts)
Tissue-specific expression patterns, or do they overlap (expression.gnf.org)?
Genomic context (www.ensembl.org)
Application 3”Special Sites in my Gene”
Multiple sequence alignment:- regions of conservation- regions of change
Important for the design of my next deletion mutant, hybridization probe, or a set of primers
Visual inspection of the multiple sequence alignment will be sufficient in most cases (check out Pfam or ENSEMBL for precomputed alignments of your favourite family – www.ensembl.org)
Reference Bioinformatics: Sequence and Genome Analysis
David W. Mount
CSHL lab manual series
Great introductionto the field
Reference Molecular Evolutionand Phylogenetics
Masatoshi Nei, Sudhir Kumar
Nuts and bolts of tree drawing methods
Reference From DNA to Diversity: Molecular Genetics and the Evolution of Animal Design
S. Carroll, J. Grenier, S. Weatherbee
Interpretations
THANK YOU!