Multiple Sequence Alignment
description
Transcript of Multiple Sequence Alignment
![Page 1: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/1.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 1
CS273A
Lecture 17: Cross Species Comparisons
![Page 2: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/2.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 2
Announcements• Your project should be coming along nicely!
![Page 3: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/3.jpg)
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
3http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 4: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/4.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 4
TerminologyOrthologs : Genes related via speciation (e.g. C,M,H3)Paralogs: Genes related through duplication (e.g. H1,H2,H3)Homologs: Genes that share a common origin
(e.g. C,M,H1,H2,H3)
Species tree
Gene tree
SpeciationDuplicationLoss
singleancestralgene
![Page 5: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/5.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 5
Chains join together related local alignments
Protease Regulatory Subunit 3
likely ortholog
likely paralogsshared domain?
![Page 6: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/6.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 6
Before and After Chaining
![Page 7: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/7.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 7
Netting AlignmentsCommonly multiple mouse alignments can be found for a particular human region, eg including for most coding regions.
Net finds best match mouse match for each human region.Highest scoring chains are used first.Lower scoring chains fill in gaps within chains inducing a natural hierarchy.
![Page 8: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/8.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 8
Net highlights rearrangements
A large gap in the top level of the net is filled by an inversion containing two genes. Numerous smaller gaps are filled in by local duplications and processed pseudo-genes.
![Page 9: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/9.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 9
Nets attempt to computationally capture orthologs
(they also hide everything else)
![Page 10: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/10.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 10
Nets/chains can reveal retrogenes (and when they jumped in!)
![Page 11: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/11.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 11
Nets
• a net is a hierarchical collection of chains, with the highest-scoring non-overlapping chains on top, and their gaps filled in where possible by lower-scoring chains, for several levels.
• a net is single-coverage for target but not for query.• because it's single-coverage in the target, it's no longer symmetrical.• the netter has two outputs, one of which we usually ignore: the target-
centric net in query coordinates. The reciprocal best process uses that output: the query-referenced (but target-centric / target single-cov) net is turned back into component chains, and then those are netted to get single coverage in the query too; the two outputs of that netting are reciprocal-best in query and target coords. Reciprocal-best nets are symmetrical again.
• nets do a good job of filtering out massive pileups by collapsing them down to (usually) a single level.
• GB: for human inspection always prefer looking at the chains!
[Angie Hinrichs, UCSC wiki]
![Page 12: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/12.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 12
Before and After Netting
![Page 13: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/13.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 13
Convert / LiftOver"LiftOver chains" are actually chains extracted from nets, or chains filtered by the netting process.
LiftOver – batch utility
![Page 14: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/14.jpg)
Drawbacks
14
• Inversions not handled optimally
> > > > chr1 > > >
> > > > chr1 > > >
< < < < chr1 < < < <
< < < < chr5 < < < <
Chains
Nets > > > > chr1 > > >
> > > > chr1 > > >
< < < < chr5 < < < <
http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 15: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/15.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 15
What nets can’t show, but chains will
![Page 16: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/16.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 16
Same Region…
same in allthe other fish
![Page 17: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/17.jpg)
Drawbacks
• High copy number genes can break orthology
17
![Page 18: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/18.jpg)
Gene Families
18
![Page 19: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/19.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 19
Self Chain reveals (some) paralogs
(self net ismeaningless)
![Page 20: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/20.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 20
The Biggest Challenge in Genomics…… is computational:
How does this encode this
Program Output
![Page 21: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/21.jpg)
21
Xkcd Take – It’s Actually Not That Bad
http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 22: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/22.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 22
Why compare to Chimp?
![Page 23: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/23.jpg)
2323
Humans and Chimpanzees PossessMany Vastly Different Phenotypes
A: Chimp B: Human
A B
[Varki, A. and Altheide, T., Genome Res., 2005]
A B
![Page 24: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/24.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 24
Disease Susceptibility Differences
![Page 25: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/25.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 25
What human-chimp changes do we find?
Small
Large
Medium
![Page 26: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/26.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 26
Large differences
Fusion (HSA 2) 18 pericentromeric inversions
![Page 27: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/27.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 27
Medium Sized Differences
Gene families expandand contract
Mobile element insertionand mediated deletion
![Page 28: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/28.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 28
Small Differences
1% difference at the base level
![Page 29: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/29.jpg)
PhenotypeGenotype
Genetic basis of human phenotypes?N
umbe
r of r
earr
ange
men
ts
29http://cs273a.stanford.edu [Bejerano Fall16/17]
Most mutationsare near/neutral.How do we know?4D sites, ARs.
![Page 30: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/30.jpg)
The Genotype - Phenotype divide
http://cs273a.stanford.edu [Bejerano Fall16/17] 30
Can we find evolutionary patterns that are distinct enough to be phenotypically revealing?
Species A
Species B
Problem #1:
Too many nucleotide changes between any pair of related species (or individuals).
The vast majority of these are near/neutral.
![Page 31: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/31.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 31
Is it in our protein coding genes?
70-80% of all human-chimp orthologous proteins differ.On average they differ by 1-2 amino acids.• Which amino acid changes matter?• One can also compare non-synonymous amino acid
substitutions with synonymous changes, and look for proteins unusually enriched from the former.Those may be evolving under positive selection.
![Page 32: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/32.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 32
Positive and negative gene selection in the human genome
![Page 33: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/33.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 33
Candidate genes for human specific evolution
...
![Page 34: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/34.jpg)
34
What if we did an unbiased search?Human-specific substitutions in conserved sequences
34
[Pollard, K. et al., Nature, 2006] [Beniaminov, A. et al., RNA, 2008]
Human
Chimp
Humanrapid change
HAR1:• Novel ncRNA• 18 unique human substitutions
conserved
Chimp
![Page 35: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/35.jpg)
Different Unbiased Search: Loss vs Gain
Chimp
Humanrapid change • 4-18 unique human substitutions
• Pollard, K. et al., Nature, 2006• Prabhakar, S. et al., Science, 2008
conserved
Human Accelerated Regions
deleted!
Chimp
Human
conserved
Human Conserved Sequence Deletions
(hCONDELs)• Complete human loss of sequence• Likely to confer human-specific
phenotypes
http://cs273a.stanford.edu [Bejerano Fall16/17]
[McLean, Reno, Pollen et al., Nature, 2011]
35
![Page 36: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/36.jpg)
Identifying hCONDELs
http://cs273a.stanford.edu [Bejerano Fall16/17] 36
deleted!
Chimp
Human
conserved
![Page 37: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/37.jpg)
hCONDEL genomic distribution
• Median size: 2.8kb• Not enriched in highly variable genomic regions• Most do not disrupt proteins: only 1 validated exonic deletion
37http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 38: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/38.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17]
Deletions of functional non-coding DNAGene Gene Gene
GeneGeneGene
Gene Gene
GeneGene
( ) ( ) ( )
( )
( ) ( ) ( ) ( )
( )( )
Gene Gene
Gene with functione.g. “neuronal gene” Gene without function
( )hCONDEL Conserved element
[McLean et al., Nat. Biotechnol., 2010]
http://great.stanford.edu
38
![Page 39: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/39.jpg)
Functional enrichments of hCONDELs
Ontology Term p-valueGene Ontology Steroid hormone receptor activity 3.73 x 10-4
InterPro Fibronectin, type III 1.01 x 10-4
Zinc finger, nuclear hormone receptor type 1.80 x 10-4
CD80-like, immunoglobulin C2 set 1.37 x 10-3
Entrez Gene Neuronal genes 1.11 x 10-4
Monoallelically-Expressed Genes Monoallelic expression 8.62 x 10-3
These enrichmentsare unique to hCONDELs
http://great.stanford.eduhttp://cs273a.stanford.edu [Bejerano Fall16/17] 39
![Page 40: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/40.jpg)
hCONDEL near Androgen Receptor
The deletion appears fixed in humansand appears deleted in Neandertal.
http://cs273a.stanford.edu [Bejerano Fall16/17] 40
![Page 41: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/41.jpg)
Androgen Receptor chimpanzee enhancer assay
[Phil Reno, David Kingsley]
Androgen Receptor
Human
Chimp
Genomic fragment Hsp68 promoter LacZ reporter gene
http://cs273a.stanford.edu [Bejerano Fall16/17] 41
![Page 42: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/42.jpg)
The human deletion near AR acts as an enhancer within known AR expression domains
E16.5
Sensory whiskers
E16.5
Genital tubercle
E16.5
E16.5
Penile spines
8 weeksE16.5
Chi
mp
enha
ncer
Mou
se e
nhan
cer
http://cs273a.stanford.edu [Bejerano Fall16/17] [Phil Reno, David Kingsley] 42
![Page 43: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/43.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 43
Androgen Receptor
Cell
AndrogenReceptor
Nucleus
Testosterone
AR+Tdimer
Androgen Receptor
Human
Chimp
![Page 44: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/44.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17]
Androgen responsiveness in domains of expressionSensory whiskers Penile spines
Galago
Sen
sory
whi
sker
leng
th (m
m)
[Dixson, 1976]
Mice with Ar coding region mutations lack penile spines
[Murakami, 1987]
Sensory Penilewhiskers spines
44
[Ibrahim & Wright 1983]
![Page 45: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/45.jpg)
Could sequence loss lead to tissue gain?
• hCONDELs enriched for suppressors of cell proliferation or cell migration expressed in cortex (P=1.3 x 10-3)
Non-human mammals Humans
( )
Suppressproliferation
Do notsuppressproliferation
45http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 46: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/46.jpg)
The Genotype - Phenotype divide
http://cs273a.stanford.edu [Bejerano Fall16/17] 46
Can we find evolutionary patterns that are distinct enough to be phenotypically revealing?
Species A
Species B
Problem #1:
Too many nucleotide changes between any pair of related species (or individuals).
The vast majority of these are near/neutral.
![Page 47: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/47.jpg)
Genotype -> Phenotype screens
http://cs273a.stanford.edu [Bejerano Fall16/17] 47
deleted!
Chimp
Human
conserved
Define a “dramatic” (non-neutral) genomic scenario:
hCONDEL
[McLean, Pollen, Reno et al, 2011]
Problem #2:
What is the phenotype?
![Page 48: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/48.jpg)
Testing is Exciting… and Humbling
http://cs273a.stanford.edu [Bejerano Fall16/17] 48
These are “wild rides”: Often not what we expected, Often not what we can understand.Are we looking at the right place?Did we test at the right time?
[McLean, Pollen, Reno et al, 2011]
We are creating the humanized mice KOs
![Page 49: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/49.jpg)
What about a tree of related species?
http://cs273a.stanford.edu [Bejerano Fall16/17] 49
What if we could find evolutionary patterns that were distinct enough to be phenotypically revealing?
ancestor
Species A
Species H
Genomes:Inherited and Modified.
Traits:Come and Go.
Species B...
![Page 50: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/50.jpg)
ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
What happens when an ancestral trait “goes”?
Phenotype Genome
50http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 51: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/51.jpg)
ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype Genome
A lot of DNA and many traitsvary between any two species.
51http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 52: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/52.jpg)
ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype Genome
52http://cs273a.stanford.edu [Bejerano Fall16/17]
A lot of DNA and many traitsvary between any two species.
What about independent trait loss?
vitamin C synthesis, tail, body hair,dentition features, etc. etc.
![Page 53: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/53.jpg)
ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype Genome
53http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 54: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/54.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17]
matches trait presence/absence pattern
The PG screen
[Hiller et al., 2012a] 54
![Page 55: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/55.jpg)
The PG screen
http://cs273a.stanford.edu [Bejerano Fall16/17] 55
Capture the independent genomic switch from purifying selection neutral evolution
in all and only the trait loss species.
Robust to: Different trait disabling times.Different trait disabling mutations.
![Page 56: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/56.jpg)
Forward Genetics:Search for mutations that segregate with a trait of interest
Forward Genomics:Search for regions that are lost only in species lacking the trait
phenotype genotype
56http://cs273a.stanford.edu [Bejerano Fall16/17]
Branding ;-)
But does it work?
![Page 57: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/57.jpg)
Vitamin C Synthesis
synthesize vitamin C cannot synthesize vitamin C
rats & mice human
57http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 58: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/58.jpg)
vitamin C synthesis was lost3-4 times independently in mammalian evolution
58http://cs273a.stanford.edu [Bejerano Fall16/17]
The Vitamin C synthesis “phenotree”
Fwd Genomics asks:Do one or moregenomic locilook like THAT?
![Page 59: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/59.jpg)
We quantify divergence by comparing sequences to the reconstructed ancestral sequence
reconstruct ancestral sequence
ancestor
59
species 1
outgroup
species 2
ACCCTATCGATT-CA
ACCCTATCGATTGCA
TCCGTATCG-TT-CA
species 1
species 2
14 identical bases
11 identical bases
Mutation in species 1 or 2?
species 1species 2
93%79%
percent of identical bases: more diverged
Insertion in species 1 or deletion in species 2 ?
ACCCTATCGATTGCA
TCCGTATCG-TT-CA
ACTCT-TCGATT-AA
![Page 60: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/60.jpg)
Sequencing errors mimic divergence
60
high sequencing error rate
treat species 2 as missing data
sequence quality scores
ancestor ACCCTATCGATT-CAATGG
ACCCTATCGATTGCAAGGGspecies 1
species 2
89% identical bases
61% identical basesTCCGTAACG--T-CTATCG
![Page 61: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/61.jpg)
Assembly gaps mimic divergence
61
?????????species 1
Sanger reads
assembly gap
conserved region
treat species 1 as missing data
species 2species 3species 4species 5
![Page 62: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/62.jpg)
...
Reconstruct the evolutionary history of all conserved regions, coding and non-coding
85%
70%
93%
matrix: 33 species x 544,549 regions
544,549 conserved regions
• Reconstruct ancestral sequence• Measure extant species divergence• Avoid
• Low quality sequence• Assembly gaps
• Seek perfect phenotree match
62http://cs273a.stanford.edu [Bejerano Fall16/17]
reconstructancestrallocus
![Page 63: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/63.jpg)
We quantify the match to the vitamin C pattern by counting the number of species that violate the pattern
Percent identity0 100
Percent identity0 100
1 violation
2 violations63http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 64: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/64.jpg)
8
Regions matching the vitamin C trait are clustered
these conserved regions are all exons of a single gene
544,549 conserved regions
no. o
f vio
latin
g sp
ecie
s
012345
7
910
6
no match
perfect match
64http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 65: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/65.jpg)
This gene is more diverged in all non-vitamin C synthesizing species
http://cs273a.stanford.edu [Bejerano Fall16/17] 65
![Page 66: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/66.jpg)
What is the function of this gene ?
http://cs273a.stanford.edu [Bejerano Fall16/17] 66
encodes the enzyme responsible for vitamin C biosynthesis
Vitamin C pattern
Gulo - gulonolactone (L-) oxidase
33 genomes X 544,549 regions
Note: 1. No likely shared
disabling mutation.2. We learned about
both evolution and function.
![Page 67: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/67.jpg)
The Power of Forward Genomics
http://cs273a.stanford.edu [Bejerano Fall16/17] 67
Vitamin C pattern
Gulo - gulonolactone (L-) oxidase
33 genomes X 544,549 regions
Forward genomics works.Can it work for continuous traits?With only two independent losses?And many unknown values?
![Page 68: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/68.jpg)
BileBile is a fluid produced by the liver that aids the digestion of lipids in the small intestine.
http://cs273a.stanford.edu [Bejerano Fall16/17] 68
![Page 69: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/69.jpg)
Bile Phospholipids
http://cs273a.stanford.edu [Bejerano Fall16/17] 69
Different mammals have remarkably different levels of biliary phospholipids:
![Page 70: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/70.jpg)
ABCB4 is a phospholipid transporter
http://cs273a.stanford.edu [Bejerano Fall16/17] 70
![Page 71: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/71.jpg)
Find “Cure” Models for Human Disease
http://cs273a.stanford.edu [Bejerano Fall16/17] 71
Human ABCB4 mutations lower patient biliary phospholipid levels to guinea pig levels but are detrimental. Our discovery: Guinea pig and horse have inactivated the Abcb4 gene in their natural state. How can they do it?
create KO gene
try to fix/treat
Natural KO
find nature’s cure!
![Page 72: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/72.jpg)
We have now collected • Million genomic loci by Fifty mammals• Thousands of scored mammalian traits
And we are playing MATCH and TEST.
Reverse Genetics:Pick interesting loci, mutate and try to figure out phenotype/s
Reverse Genomics:Compute independent loss for ALL genomic loci, match to traits
phenotype genotype
72http://cs273a.stanford.edu [Bejerano Fall16/17]
Reverse Genomics
![Page 73: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/73.jpg)
Reverse Genomics of Enhancers
http://cs273a.stanford.edu [Bejerano Fall16/17] 73
![Page 74: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/74.jpg)
Back of an Envelope Wish
http://cs273a.stanford.edu [Bejerano Fall16/17] 74
![Page 75: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/75.jpg)
Poster Child Example
http://cs273a.stanford.edu [Bejerano Fall16/17] 75
![Page 76: Multiple Sequence Alignment](https://reader036.fdocuments.us/reader036/viewer/2022062501/568168f4550346895ddffe1a/html5/thumbnails/76.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 76