PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
Amphioxus Evolution - Western Washington University of the 2R hypothesis • Susumu Ohno (1928-2000)...
Transcript of Amphioxus Evolution - Western Washington University of the 2R hypothesis • Susumu Ohno (1928-2000)...
Evolutionary genomics of therecently duplicated amphioxus
Hairy genesSenda Jiménez-Delgado1, Miguel Crespo2, Jon
Permanyer1, Jordi Garcia-Fernàndez1# and MiguelManzanares2#
Sean McAllister, Emily Loter, Aaron Gibbs
What is an amphioxus?•Amphioxus = both ends pointed,in Greek•Branciostoma lanceolatum is themost common and well known ofthe approximately 25-30 species•Typical beach bums…they canusually be found in shallow,tropical and temperate oceansand spend most of their timeburied in the sand•Part of the Cephalochordatebranch of the animal kingdom•Grow to be 5-8cm long atadulthood•There are separate sexes, andeggs are fertilized externally,which then develop into free-swimming larvae•They don’t have any hard parts,and as a result, their fossil recordis very sparse
AmphioxusThe closest invertebrate
relative to the vertebratesPossesses a vertebrate like
body plan:•Notochord – a muscularizednod that supports the nervechord•Hollow dorsal nerve chord•Segmented muscle blocks– these are V-shapedstructures called myomeres•Perforated pharyngealregion – also called “gillslits,” which strain foodparticles out of the water•Post anal tail
Devoid of the most complexfeatures of vertebrates
•Elaborate brain – theamphioxus brain is very smalland poorly developed, as aremost of its sensory organs•Paired fins•No true vertebrae
Evolution
• The amphioxus is not extinct, and in fact isenjoying a life living on the temperate sands ofJamaica (among other sandy shores)
• It is believed to be the closest living invertebraterelative to vertebrates
• Its genome has been independently evolving, ashas the genome of vertebrates, since theirdivergence
AmphioxusBranchiostoma
lanceolatum
Garcia-Fernàndez J. Amphioxus: a peaceful anchovy fillet to illuminate Chordate Evolution (I). Int J Biol Sci2006; 2:30-31. http://www.biolsci.org/v02p0030.htm
Amphioxus song
Folk Musician and MarineBiologist Sam Hinton wrote andperformed this song, which issung to the tune, “It’s a long wayfrom Tipperary.”
The 2R Hypothesis and DDCModel
Formulation of the 2Rhypothesis
• Susumu Ohno (1928-2000)• Wrote Evolution by Gene
Duplication (1970)• Hypothesized that there were
two or more full genomeduplications in the earlyevolution of vertebrates
• He bases his hypothesissolely on genome size indifferent chordates andevidence of tetraploidizationin fish lineages
Formulation of the 2Rhypothesis
• In the last decade, the “tworounds of genome duplicationin vertebrate evolutionhypothesis (2R hypothesis)”resurfaced and gainedpopularity among biologists
• A series of articles proposedifferent numbers and timingfor the whole genomeduplication
• Hughes, et al. devised anumber of experiments to testwhether or not the 2Rhypothesis was valid.
The Popular Vote
• The most popular version of the 2Rhypothesis:– one duplication event at the root of the
vertebrate lineage– another around Agnatha and Gnathostomata.
• Some extreme proposals include the firstgenome duplication at the root of Chordatalineage and the last one at the amphibianlineage.
A schematic representation phylogeny of maininvertebrate groups.
Timescales are based on molecular clock (Kumar and Hedges 1998). Some of the proposedvertebrate whole genome duplications are marked by asterisks. The connected asterisksdenote the range of the proposed genome duplication time. The references are as follows:1, Ohno 1970; 2, Ohno 1973; 3, Lundin 1993; 4, Holland et al. 1994; 5, Sidow 1996;6, Kasahara et al. 1996; 7, Spring 1997; 8, Ohno 1998; 9, Meyer and Schartl 1999;10, Amores et al. 1998 and Mayer and Schartl 1999.
Arguments for the 2RHypothesis
• Gene numbers in different animalslineages vary greatly
• Many invertebrates have ~15,000 genes• With the initially accepted human gene
number ~80,000, the fourfold ratiobetween mammalian and invertebrategene numbers was in good agreementwith the 2R hypothesis.
Houston, we have a problem…
• The current mammalian gene numberestimations based on both ESTs and draftsequence of the human genome reveal thatour genome hosts much fewer protein codinggenes than anticipated
• The 35,000 genes in the human genomemeans that, on average, for everyinvertebrate protein gene there are only twomammalian orthologs.
Arguments for the 2R Hypothesis,continued
• One can argue that many redundantgenes have been lost during vertebrateevolution.
• If this were true, we should be able detectgene loss by simply plotting phylogenetictrees for different vertebrate gene families.
A hypothetical phylogenetictree of vertebrate gene
family according to the 2Rhypothesis.
First duplication
Second duplication
Second duplication
•Two rounds of the wholegenome duplication wouldresult in four vertebrate geneclusters of (AB) (CD) treetopology. This topology shouldbe easily detectable even inincomplete data or after a geneloss in specific lineages.
•This figure represents ahypothetical phylogenetic treefor a vertebrate gene family,assuming some gene loss orincomplete data and additionalgene duplication in specificlineages.
2R, or not 2R• To support the 2R hypothesis, clustered paralogous genes not only
have to exhibit (AB) (CD) tree topology, but additionally they oughtto diverge in concordance with the Hox family and by the proposed2R hypothesis genome duplication time.
• The investigators (Hughes, et al) applied four different methods:– construction of phylogenetic trees– estimation of the divergence time of paralogous genes– consistency of the phylogenetic trees– the parsimony test
• The results of their analysis did not support the 2R hypothesis.Thirty-five gene families provided evidence regarding thehypothesis, and in 29 cases the results were inconsistent with thehypothesis.
• For more information on the specifics of their tests, please see thefollowing link: Are we Polyploids? A Brief history of one hypothesis
Maybe not for now…
• Formulated three decades ago, thehypothesis of whole genome duplications inthe early stages of vertebrate evolution hashad as many adherents as opponents.
• It seems that the current data do not supportthe 2R hypothesis, and the existence of moreparalogs in vertebrates than in invertebratescan be explained by waves of tandemduplications of single genes or largerchromosomal fragments.
Evolution is tricky…• Reconstruction of the history of living
organisms is a very difficult task.• We are not able to reconstruct it with
certainty because of its complexity.• Many evolutionary events become obscure
with time– hence, inferences about the early evolution of
vertebrate genomes remain in a scientific "grayzone"
• It is probable that we never will be able to sayhow this happened but only how it could havehappened.
The DDC ModelDDC = duplication, degeneration,
complementation
• The origin of organismal complexity is generally thought to be tightlycoupled to the evolution of new gene functions arising subsequentto gene duplication.
• Under the classical model for the evolution of duplicate genes, onemember of the duplicated pair usually degenerates within a fewmillion years by accumulating deleterious mutations, while the otherduplicate retains the original function.
• This model further predicts that on rare occasions, one duplicatemay acquire a new adaptive function, resulting in the preservation ofboth members of the pair, one with the new function and the otherretaining the old.
• However, empirical data suggest that a much greater proportion ofgene duplicates are preserved than predicted by the classicalmodel.
DDC Model
• The duplication-degeneration-complementation (DDC) model predicts:– degenerative mutations in regulatory elements
can increase rather than reduce theprobability of duplicate gene preservation
– the usual mechanism of duplicate genepreservation is the partitioning of ancestralfunctions rather than the evolution of newfunctions
What this has to do withAmphioxus…
• In the vertebrate lineage, theevolution from the lastancestor most probablyinvolved two rounds ofgenome duplication
• This increase in the number ofgenes was likely instrumental,by means of sub-functionalization of geneduplicates
• Specifically, the acquisition ofvertebrate novel features, suchas the migratory neural crestand derivatives, skeletalsystem, cartilaginous tissueand a complex brain
What these scientists are using theDDC for in this paper:
• Duplicated copies of a single gene sufferfrom differential loss of cis-regulatoryregions.
• Now, a complex or pleiotropic function thatwas performed by a single gene prior toduplication, is now subdivided into discretecomponents.
• These copies are now all very necessaryand essential, as they keep individual andunique cis-regulatory regions.
The Amphioxus Genome
• Has also been evolving since theseparation from its last common ancestorwith vertebrates
• We expect to find cephalocordate-specificduplicates for some gene families
• This has been reported, and the extremecase is to be found in the Hairy family ofhelix-loop-helix transcription factors
What is a Hairy gene? Hairy genes play an
important role in thedevelopmentalprocesses of manyorganisms
– Formation of somites
– In the developing vertebrateembryo, somites are massesof mesoderm distributedalong the two sides of theneural tube and that willeventually become dermis,skeletal muscle, andvertebrae.
Somite examples
Chick embryo somites Human embryo somites
AKA: somites
Helix-Loop-Helix TranscriptionFactors
• Amphioxus – Eight members (HairyA–Hairy H)
• Mouse – One member (Hes1)• Chicken – Two members• Xenopus (frog) – Two members• Zebrafish – Two members
Amphioxus Hairy Genes
• The 8 amphioxus Hairy genes conserve theirgene structure without excessive accumulationof mutations, expansions or losses, and have nostop codons within the coding sequence
• This indicates that they have not degenerated topseudogenes, and they should be expressed atsome point throughout the amphioxus life cycle
• Four of them have known expression duringdevelopment (HairyA-D)
HairyA – HairyD
• Expressions are mostly non-overlapping, oroverlapping in a subtle manner duringparticular developmental stages
• From the expression data, it has beenproposed that amphioxus Hairy genes, aftergene duplication from a single ancestor,underwent a process of divergence in the cis-regulatory regions that matches the DDCmodel.
HairyA – HairyD
• Although HairyA-HairyD have specificexpression and minor overlapping, the overallregions or tissues where they are expressed arethe same.
• Expression is found in the neural tube,presomitic mesoderm, somites, endoderm ornotochord
• Therefore, the researchers expected to findconserved elements in the regulatory sequencesof these four genes
How these particular Hairy genesmight function in amphioxus:
These conserved elements couldrepresent enhancers with spatialinformation for these somiticregions, refining gene expression.
Notch Signaling Pathway
• In zebrafish, chicken and mice, various factorsinvolved in the formation of somites are underthe control of the Notch signaling pathway.
• The RBP-Jk binding sites are the primarytranscriptional mediators of Notch signaling
• One of these targets in mice is Hes1, a Hairygene which contains RBP-Jk sites in its 5’regulatory region.
Details of the Notch Pathway
1. Maturation of the Notch receptor involves cleavage at theprospective extracellular side during intracellulartrafficking in the Golgi complex.
2. This results in a bipartite protein, composed of a largeextracellular domain linked to the smaller transmembraneand intracellular domain.
3. Binding of the ligand promotes two proteolytic processingevents.
4. As a result of proteolysis, the intracellular domain isliberated and can enter the nucleus to engage other DNA-binding proteins and regulate gene expression.
Notch Signaling Cascade1. Once the Notch extracellular domain
interacts with a ligand, the enzyme TACEcleaves the Notch protein just outside themembrane.
2. This releases the extracellular portion ofNotch, which continues to interact with theligand.
3. The ligand plus the Notch extracellulardomain is then endocytosed by the ligand-expressing cell.
4. There may be signaling effects in theligand-expressing cell after endocytosis;this part of Notch signaling is a topic ofactive research.
5. After this first cleavage, an enzyme calledγ-secretase cleaves the remaining part ofthe Notch protein just inside the innerleaflet of the cell membrane of the Notch-expressing cell.
6. This releases the intracellular domain ofthe Notch protein, which then moves tothe nucleus where it can regulate geneexpression by activating the transcriptionfactor CSL.
7. Other proteins also participate in theintracellular portion of the Notch signalingcascade.
Notch-1 MechanismThe Notch protein sitslike a trigger spanningthe cell membrane, withpart of it inside and partoutside. Ligand proteinsbinding to theextracellular domaininduce proteolyticcleavage and release ofthe intracellular domain,which enters the cellnucleus to alter geneexpression.
Goals of the Paper
• Gain insights into the process of amplification ofthe Hairy family in Amphioxus
• Identify cis-regulatory modules responsible forthe maintenance of the duplicates
• Decipher whether amphioxus Hairy genes maybe under the control of the Notch pathway
How they did it…
• First, they cloned the full coding region ofHairyA-F plus 3kb of 5’ regulatory regionsinto lambda phage
• Then, an in silico analysis of potential cis-regulatory control elements was performed
Engineering alambda phageusing a humanDNA insert
Genomic Organization
• Compared genomic sequence (fromlambda phage clone libraries) to predictedpeptides to determine exon/intronstructure
• Compared these results to Hairy genes inmice (Hes1), chicken (Hairy1), and theurochordate Ciona intestinalis
• Chick Hairy1 has all three introns• Ciona intestinalis (urochordate)
– hairya and hairyb have all three introns– hairyc lacks intron three
• Bf = Amphioxus; Mm = Mus musculus
Genomic Organization
• Given that Amphioxus HairyA and E,mouse Hes1, chick Hairy1, and Cionaintestinalis hairya and b all contain threeintrons, it is most likely that thisexon/intron structure represents that of thecommon ancestor to chordates
• In the Amphioxus line:– Intron III was lost in B,C,D,F,G, and H– Intron I was lost in B
Intron Loss after Hairy GeneDuplication Events Is intron loss common?
• It is not as infrequent as one mightthink—researchers* studying intron loss inCaenorhabditis found that it had a veryhigh rate of selective intron loss (nearly400-fold higher than loss rates inmammals)
*Cho S, Jin SW, Cohen A, Ellis RE. A phylogeny ofcaenorhabditis reveals frequent loss of introns during nematodeevolution. Genome Res 2004; 14: 1207-1220.
Mechanisms of Intron Loss
• mRNA mediated recombination• non-homologous recombination between
short repeats at the 5’ and 3’ ends ofintrons
Cho et al. Genome Res 2004; 14: 1207-1220.
Mechanisms of Intron Loss
• mRNA mediated recombination• non-homologous recombination between
short repeats at the 5’ and 3’ ends ofintrons
Cho et al. Genome Res 2004; 14: 1207-1220.
Pipmaker
http://pipmaker.bx.psu.edu/pipmaker/
Comparative GenomicMethods
• Jiménez-Delgado et al. (2006) used thefollowing tools for identifying conservedregions in/around the Hairy gene:
Comparative GenomicMethods
• Jiménez-Delgado et al. (2006) used thefollowing tools for identifying conservedregions in/around the Hairy gene:
http://mulan.dcode.org/
Mulan
Vista
http://genome.lbl.gov/vista/index.shtml
Comparative GenomicMethods
• Jiménez-Delgado et al. (2006) used thefollowing tools for identifying conservedregions in/around the Hairy gene:
We have seen Vista Plots Before…
They say…
• VISTA is a comprehensive suite ofprograms and databases forcomparative analysis of genomicsequences.
• mVISTA is a set of programs forcomparing DNA sequences from twoor more species up to megabaseslong and visualize these alignmentswith annotation information.
A Practical Example
DQ402480
DQ402481
DQ402482
DQ402483
Accession #s
GenBank sequence includes the Hairy gene plus 3kb of 5’ upstream sequence
BEGIN HERE
Results
HairyC
Results
HairyD
Results Results
Can Support Hypotheses withMulan (and Pipmaker)
•Mulan does essentially the samething as Vista:
local sequence alignment
Can Support Hypotheses withMulan (and Pipmaker)
•Mulan does essentially the samething as Vista:
local sequence alignment
Mulan Results
HairyCHairyD
Both Vistaand Mulancame upwith thefollowingsequence
for thisregion of
highconservation
(Output from Mulan)
This is the same conserved“box” that the researchers found
Can useMatinspector
to findtranscription
factorbinding sites
(Output from Mulan)Blue boxes highlight RBP-Jk
binding sites
Remember:
• After comparing their sequences in Vista,Pipmaker, and Mulan, the researchers rana more complicated sequence alignmentalgorithm using Clustal X
• This was done before running sequences through Matinspector
Location of Transcription FactorBinding Sites Using Matinspector
• Find upstream regulators of Hairy genes– Evolutionarily conserved between Hairy A-D
• Like RBP-JKappa
• Matinspector– A “software tool that utilizes a large library of
matrix descriptions for transcription bindingsites to locate matches in DNA sequences.”
– Matinspector
MatInspector
Opening page→Click MatInspector
Many other tools for similarsearches
MatInspector
Insert accession numbers-also can insert DNAsequences
Looking for transcription factor binding sites inentire gene
MatInspector
View in many groups
MatInspector
RBP-JKappa binding sites
1 page of a very long list of transcription factor binding sites
(HairyC)
MatInspector
****
-The RBP-JKappa Binding sites are found in the gene
-Reflect direct correlation of RBP-JKappa found in Box 2in HairyB-D genes
(HairyC)
RBP-JKappa Binding Sites
• RBP-JKappa binding sites are primarytranscriptional mediators of Notchsignaling pathway– Presence of RBP-JKappa BS in non-coding 5’
region of Hairy genes• Illustrates functional role of regulation of Hairy
genes during development• Shows Hairy genes are downstream targets of
Notch pathway
5’ Non-coding Region ofAmphioxus Hairy Genes
RBP-JK BSpresent inconservedregions
Most likelyHairy B,C,&D haverelatedfunctionduringdevelopment
Box1: unknown for now
Box2: somites & presomiticmesoderm
Box3: notochord
Box4: gut endoderm
Box5 (ie: core of Box4): neural tube
Summary• Proving DDC model in amphioxus
– Difference in Boxes → different expression of genesin different tissues
• Example: Expression of HairyC & D in notochord
• Observed differential gene expression duringdevelopment– In particular Hairy genes
• Set out to connect genomesof amphioxus to vertebrates
Due todivergencepoints genesare not likecommonancestor