Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus...
-
Upload
flora-oneal -
Category
Documents
-
view
219 -
download
3
Transcript of Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus...
1
Anatomy of a Genome Project
A. Sequencing1. De novo vs. ‘resequencing’2. Sanger WGS versus ‘next generation’ sequencing3. High versus low sequence coverage
B. Assembly1. Draft assembly2. Gap closure
C. Annotation1. Gene, intron, RNA prediction2. De novo vs. homology-based prediction3. Assessing confidence
D. Comparison1. Comparing gene content, lineage specific gene loss, gain, emergence2. Comparing genome structure (chromosomes, breakpoints, etc)3. Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution)
2
Anatomy of a Genome Project: non-Model challenges
A. Sequencing1. De novo vs. ‘resequencing’ … resequencing not possible without a close, syntenic relative2. Sanger WGS versus ‘next generation’ sequencing3. High versus low sequence coverage … need high coverage and long reads (or mate-pair
reads to assemble)
B. Assembly1. Draft assembly2. Gap closure … time consuming no matter what
C. Annotation1. Gene, intron, RNA prediction2. De novo vs. homology-based prediction3. Assessing confidenceDe novo predictions challenging if gene models are different in your species …can rely less on homology for identifications and assessing confidence
D. Comparison1. Comparing gene content, lineage specific gene loss, gain, emergence2. Comparing genome structure (chromosomes, breakpoints, etc)3. Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution)
3
The power of comparison
For many non-model organisms, most of the predicted genes will be uncharacterized &may not have homology to known genes.
But Comparison within and between species can still reveal interesting features
1. Comparing gene content, lineage specific gene loss, gain, emergence
2. Comparing genome structure (chromosomes, breakpoints, etc)
3. Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution)
4. Comparing population data (SNPs, expression response, phenotypic variation … mapping studies)
4
Science April 25, 2014
Tsetse fly: blood feeding insect that gives birth to live larvae & ‘lactates’
- 366 Mb genome = double the size of Drosophila melanogaster
- Identified orthologs across 5 insects … comparison of ortholog presence/absencesuggests unique evolutionary trajectories
- blood feeding evolved independently 12 times in Diptera … identified sharedproteins unique to several blood-suckers
- Some gene families have been expanded, others contracted in numbers … functionalannotations (“GO” = gene ontology predictions) suggestion selection
5
- sequenced 4 bat genomes & compared orthologs across 22 mammals- used phylogenetic analysis and protein trees to identify cases of lineage-spec. evolution
6
To detect convergent evolution, look for proteins with unusual sequence relationships
Found ~2,300 genes with signatures of convergent evolution.* enriched for genes linked to hearing, ear development, and … vison
7
The power of comparison
For many non-model organisms, most of the predicted genes will be uncharacterized &may not have homology to known genes.
But Comparison within and between species can still reveal interesting features
1. Comparing gene content, lineage specific gene loss, gain, emergence
2. Comparing genome structure (chromosomes, breakpoints, etc)
3. Comparing evolutionary rates of change (rates of amino-acid, nucleotide substitution)
4. Comparing population data (SNPs, expression response, phenotypic variation … mapping studies)
8
9
Evolutionary Genetics Recap
10
* Duplication facilitates change
- Duplications can be tandem, segmental, or whole genome - Most duplications lost quickly through neutral (or selective) processes- Facilitates subfunctionalization and neofunctionalization- Baker et al. 2013 paper: paralog interference could drive evolution
- Benefits of duplication operate at all levels
- Gene duplication novel functions- Gene duplication for novel regulation- Gene duplication for novel network rewiring- Regulatory element duplication for novel gene regulation- Regulatory protein duplication for novel module regulation- Regulatory system duplication for novel network rewiring
Evolutionary Genetics: Recurring Themes
11
Evolutionary Genetics: Recurring Themes
* Biological systems are more plastic than we might think
- Much of the genome is under constraint from evolution purifying selection removes variation
- Many features of cellular systems appear to evolve, even if the cellular function or output is conserved
stabilizing selection can explain poor conservation of important features, if the cell finds a ‘quick fix’ to maintain the phenotype
Examples: pervasive evidence of positive selection in fly and rodentcoding genes … transcription factor binding-site turnover… phospho-site turnover … genetic/protein rewiring??
strongest constraints may promote whole-sale rewiring as stabilizing evolution (e.g. rewiring of ribosomal protein regulon)
De novo genes also appear to emerge frequently from the genomic ether
12
Evolutionary Genetics: Recurring Themes
* Evolutionary pressures vary over time and space
Neutral variation can suddenly become advantageous …therefore accumulation of neutral variation can be a future conduit
Deleterious polymorphisms can be stabilized in the presence of otherpolymorphisms
splitting up alleles by recombination can unmask deleterious alleles
13
* Use a model for null/neutral expectation for your tests
- Likelihood ratio: comparing how likely one model is versus anotherQTL analysismotif model vs background modelselection model vs neutral model etc, etc, etc
- Random sampling or simulations to assess what you expect by chance
- More complicated simulations (eg. coalescence)
This is especially true for whole-genome scans … many things look striking until you do the statistics
Evolutionary Genetics: Recurring Themes
14
* Value of a phylogenetic perspective
- use the tree if you have one* may not be the same tree across the entire genome
- inferring the state of the common ancestor can aid in analysis
Can be very useful for inferring evolutionary trajectory,timing, order of events
Evolutionary Genetics: Recurring Themes
15
* Control for co-variates
Example: controlling for expression levels re. rate of protein evolutionOften hard to know what to even look/control for
* Best evidence if >1 test is significant
* Know your datasetKnow how the data were collected, what types of noise are associated
e.g. genome sequences by short-read deep sequencing protein-protein interaction data
Evolutionary Genetics: Recurring Themes
16
Evolutionary Genetics: Remaining Questions & Challenges
What is the relative contribution of adaptive vs. neutral evolution?
Epistasis & Environmental interactions- how much does epistasis contribute in nature?- challenges associated with gene-gene/gene-environment signals
Detecting signatures of selection, esp. recent/transient- human evolution- how will tests, statistics, caveats change with 10,000 genomes?
What is the relative contribution of regulatory vs. coding evolution?
What features contribute to the evolution of new forms and functions?