MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu
description
Transcript of MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu
![Page 1: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/1.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 1
MW 11:00-12:15 in Beckman B302Prof: Gill BejeranoTAs: Jim Notwell & Harendra Guturu
CS173
Lecture 14: Personal Genomics, GSEA/GREAT
![Page 2: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/2.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 2
Announcements• Coming Monday 3/4 lecture is again in LK101(see class website for room reminders)
• I’ll be working on grad student admissions – Harendra will lecture about his work.(we’ll prepare the ground today)
![Page 3: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/3.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 3
Quick recap
![Page 4: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/4.jpg)
SequencingPublic project:
Celera project:
![Page 5: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/5.jpg)
Human Structural Variation
http://cs173.stanford.edu [BejeranoWinter12/13] 5
![Page 6: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/6.jpg)
Human Disease• Cancer• Congenital defects• Disease Association studies• Genic and cis-regulatory contributions
http://cs173.stanford.edu [BejeranoWinter12/13] 6
![Page 7: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/7.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 7
Personal genomics
![Page 8: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/8.jpg)
Gameplan1. As your budget allows, characterize all the variants in an individual’s genome:
A) Against the reference genome.B) Against variants known in the population.C) If possible, against unaffected relatives.
2 Compare the structural variants you observe to the body of knowledge about genome content & function. Seek culprit mutations.
3. Having detected a smoking gun mutation, attempt to recreate it in a cell population or organism to obtain a “disease model”.
http://cs173.stanford.edu [BejeranoWinter12/13] 8
Variant Types
Single Nucleotide Variants(SNVs)
Small Insertion / Deletion (indels)
Copy Number Variants (CNVs)
Structural Variants (SVs)
Novel Sequence
![Page 9: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/9.jpg)
Targeted Sequencing, orlooking under the lamp is 50x cheaper
Capture Methods vs. Shotgun• Targeted sequencing allows for much
higher coverage at less cost• Will only capture known sites• These methods also introduce significant
captures bias, including failure to capture sites that differ significantly from the reference genome. (analogous to microarrays)
Modified from Meyerson et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696
ExomeLibrary
ShotgunLibrary
Genomic DNAExon 1 Exon 2
![Page 10: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/10.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 10
Consumer genomics
![Page 11: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/11.jpg)
Gameplan1 Collect scientific literature
about all structural variant correlations with human disease & traits.
2 Genotype customers for as many informative loci as is commercially viable.
3 Offer counseling for your findings, and their meaning.
4 Ask customers to phenotype themselves.
5 Discover new associations!http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 12: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/12.jpg)
Pay, send biosample, get genotyped
![Page 13: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/13.jpg)
Trait associations
![Page 15: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/15.jpg)
Side Effects: Serious Ethical Issues
http://cs173.stanford.edu [BejeranoWinter12/13] 15
![Page 16: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/16.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 16
Gene set enrichment analysis:The genic version
![Page 17: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/17.jpg)
Imagine you did a microarray experiment
http://cs173.stanford.edu [BejeranoWinter12/13] 17
![Page 18: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/18.jpg)
Cluster all genes for differential expression
http://cs173.stanford.edu [BejeranoWinter12/13] 18
Most significantly up-regulated genes
Unchanged genes
Most significantly down-regulated genes
Experiment Control(replicates) (replicates)
gene
s
![Page 19: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/19.jpg)
Determine cut-offs, examine individual genes
http://cs173.stanford.edu [BejeranoWinter12/13] 19
Most significantly up-regulated genes
Unchanged genes
Most significantly down-regulated genes
Experiment Control(replicates) (replicates)
gene
s
![Page 20: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/20.jpg)
Genes usually work in groupsBiochemical pathways, signaling pathways, etc.Asking about the expression perturbation of groups of genes is both more appealing biologically, and more powerful statistically (you sum perturbations).
http://cs173.stanford.edu [BejeranoWinter12/13] 20
![Page 21: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/21.jpg)
ES
/NE
S statistic
+
-
Exper. ControlGene Set 1
Gene Set 2
Gene Set 3
Gene set 3up regulated
Gene set 2down regulated
Ask about whole gene sets
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 22: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/22.jpg)
One approach: GSEA
http://cs173.stanford.edu [BejeranoWinter12/13] 22
Dataset distribution Number of genes
Gene Expression Level
Gene set 3 distribution
Gene set 1 distribution
![Page 23: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/23.jpg)
Another popular approach: DAVID
http://cs173.stanford.edu [BejeranoWinter12/13] 23
Input: list of genes of interest (without expression values).
![Page 24: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/24.jpg)
Multiple Testing Correction
http://cs173.stanford.edu [BejeranoWinter12/13] 24
Note that statistically you cannot just run individual tests on 1,000 different gene sets. You have to apply further statistical corrections, to account for the fact that even in 1,000 random experiments a handful may come out good by chance.(eg experiment = Throw a coin 10 times. Ask if it is biased. If you repeat it 1,000 times, you will eventually get an all heads series, from a fair coin. Mustn’t deduce that the coin is biased)
run tool
![Page 25: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/25.jpg)
What will you test?
http://cs173.stanford.edu [BejeranoWinter12/13] 25
Also note that this is a very general approach to test gene lists.Instead of a microarray experiment you can do RNA-seq.Instead of up/down-regulated genes you can test all the genes in a personal genome where you see surprising mutations.Any gene list can be tested.
run tool
![Page 26: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/26.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 26
Gene Sets:Cataloging biological knowledge
![Page 27: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/27.jpg)
27
Keyword lists are not enough
Sheer number of terms too much to remember and sort• Need standardized, stable, carefully defined terms• Need to describe different levels of detail• So…defined terms need to be related in a
hierarchy
With structured vocabularies/hierarchies• Parent/child relationships exist between terms• Increased depth -> Increased resolution• Can annotate data at appropriate level• May query at appropriate level
organ system
embryo
cardiovascular
heart
… …
… …
… …… …
Anatomy Hierarchy
Organ systemCardiovascular systemHeart
Anatomy keywords
![Page 28: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/28.jpg)
TJL-2004 28
Annotate genes to most specific terms
![Page 29: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/29.jpg)
1. Annotate at appropriate level, query at appropriate level
2. Queries for higher level terms include annotations to lower level terms
29
General Implementations for Vocabularies
organ system
embryo
cardiovascular
heart
… …
… …
… …… …
Hierarchy DAG
chaperone regulator
molecular function
chaperone activator
… enzyme regulator
enzyme activator… …
Query for this term
Returns things annotated to descendents
![Page 30: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/30.jpg)
Gene Sets• Gene Ontology (“GO”)
– Biological Process– Molecular Function– Cellular Location
• Pathway Databases– KEGG– BioCarta– Broad Institute
![Page 31: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/31.jpg)
Other Gene Sets• Transcription factor targets
– All the genes regulated by particular TF’s• Protein complex components
– Sets of genes whose protein products function together• Ion channel receptors• RNA / DNA Polymerase
• Paralogs– Families of genes descended (in eukaryotic
times) from a common ancestor
![Page 32: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/32.jpg)
Natural Language Processing (NLP) Opportunities
http://cs173.stanford.edu [BejeranoWinter12/13] 32
Literature
Genes
OntologyMap genesto ontology
using literature
![Page 33: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/33.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 33
Gene set enrichment analysis:The gene regulatory version
![Page 34: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/34.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 34
Combinatorial Regulatory Code
Gene
2,000 different proteins can bind specific DNA sequences.
A regulatory region encodes 3-10 such protein binding sites.When all are bound by proteins the regulatory region turns “on”,
and the nearby gene is activated to produce protein.
Proteins
DNA
DNA
Protein binding site
![Page 35: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/35.jpg)
ChIP-Seq: first glimpses of the regulatory genome in action
Cis-regulatory peak
3535http://cs173.stanford.edu [BejeranoWinter12/13]
Peak Calling
![Page 36: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/36.jpg)
Gene transcription start site
What is the transcription factor I just assayed doing?
Cis-regulatory peak
3636http://cs173.stanford.edu [BejeranoWinter12/13]
• Collect known literature of the form• Function A: Gene1, Gene2, Gene3, ...• Function B: Gene1, Gene2, Gene3, ...• Function C: ...
• Ask whether the binding sites you discovered are preferentially binding (regulating) any one or more of the functions listed above.
• Form hypothesis and perform further experiments.
![Page 37: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/37.jpg)
Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile
37
Gene transcription start site
SRF binding ChIP-seq peak
• ChIP-seq identified 2,429 SRF binding peaks in human Jurkat cells1
• SRF is known as a “master regulator of the actin cytoskeleton”
• In the ChIP-Seq peaks, we expect to find binding sites regulating (genes involved in) actin cytoskeleton formation.http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 38: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/38.jpg)
Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile
38
Existing, gene-based method to analyze enrichment:
• Ignore distal binding events.
• Count affected genes.
• Rank by enrichment hypergeometric p-value.
π π π
Gene transcription start site
SRF binding ChIP-seq peakOntology term (e.g. ‘actin cytoskeleton’)π
N = 8 genes in genomeK = 3 genes annotated withn = 2 genes selected by proximal peaksk = 1 selected gene annotated with
π
π
P = Pr(k ≥1 | n=2, K =3, N=8)
π
π
π π
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 39: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/39.jpg)
We have (reduced ChIP-Seq into) a gene list!What is the gene list enriched for?
39
Microarray tool
Microarray data
Microarray data
Generegulation
data
http://cs173.stanford.edu [BejeranoWinter12/13]
Pro: A lot of tools out there for the analysis of gene lists.Cons: These tools are built for microarray analysis.Does it matter ??
![Page 40: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/40.jpg)
SRF Gene-based enrichment results
40
• Original authors can only state: “basic cellular processes, particularly those related to gene expression” are enriched1
[1] Valouev A. et al., Nat. Methods, 2008
SRF
SRF
Z
~~
SRF acts on genes both in nucleus and cytoplasm, that are involved in transcription and various types of binding
40http://cs173.stanford.edu [BejeranoWinter12/13]
Where’s the signal?Top “actin” term is ranked #28 in the list.
![Page 41: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/41.jpg)
Associating only proximal peaks loses a lot of information
41
0-2 2-5 5-50 50-500 > 5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
SRF (H: Jurkat) NRSF (H: Jurkat) GABP (H: Jurkat)Stat3 (M: ESC) p300 (M: ESC) p300 (M: limb)p300 (M: forebrain) p300 (M: midbrain)
Distance to nearest transcription start site (kb)
Frac
tion
of a
ll el
emen
tsRelationship of binding peaks to nearest genes for eight human (H) and mouse (M) ChIP-seq datasets
Restricting to proximal peaks often leads to complete loss of key enrichments
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 42: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/42.jpg)
Bad Solution: Associating distal peaks brings in many false enrichments
42
Why bad? 14% of human genes tagged ‘multicellular organismal development’. But 33% of base pairs have such a gene nearest upstream/downstream.
π π π
http://cs173.stanford.edu [BejeranoWinter12/13]
Term Bonferroni corrected p-valuenervous system development 5x10-9
system development 8x10-9
anatomical structure development 7x10-8
multicellular organismal development 1x10-7
developmental process 2x10-6
SRF ChIP-seq set has >2,000 binding events.Throw a random set of 2,000 regions at the genome.
What do you get from a gene list analysis?Large “gene deserts” are oftennext to key developmental genes
![Page 43: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/43.jpg)
Real Solution: Do not convert to gene list.Analyze the set of genomic regions
43
Gene transcription start siteOntology term ( ‘actin cytoskeleton’)
P = Prbinom(k ≥5 | n=6, p =0.33)
p = 0.33 of genome annotated withn = 6 genomic regionsk = 5 genomic regions hit annotation
π π π
π
ππ
π π π
http://cs173.stanford.edu [BejeranoWinter12/13]
Gene regulatory domainGenomic region (ChIP-seq peak)
Since 33% of base pairs are near a ‘multicellular organismal development’ gene, we now expect 33% of genomic regions to hit this term by chance. => Toss 2,000 random regions at genome, get NO (false) enrichments.
GREAT = Genomic RegionsEnrichment of Annotations Tool
![Page 44: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/44.jpg)
How does GREAT know how to assign distal binding peaks to genes?
44
Future: High-throughput assays based on chromosome conformation capture (3C) methods will elucidate complex regulation mechanisms
Currently: Flexible computational definitions allow assignment of peaks to nearest gene, nearest two genes, etc.
• Default: each gene has a “basal regulatory domain” of 5 kb up- and 1kb downstream of transcription start site, extends to basal domain of nearest genes within 1 Mb
• Though some associations may be missed or incorrect, in general signal richness and robustness is greatly improved by associating distal peaks
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 45: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/45.jpg)
GREAT infers many specific functions of SRF from its binding profile
45
Ontology Term # Genes Binomial Experimental P-value support*
Gene Ontology actin cytoskeletonactin binding
7x10-9
5x10-5
Miano et al. 2007
Miano et al. 2007
* Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT.
3031
Pathway Commons
TRAIL signalingClass I PI3K signaling
5x10-7
2x10-6
Bertolotto et al. 2000
Poser et al. 20003226
TreeFam 1x10-85 Chai & Tarnawski 2002
TF Targets Targets of SRFTargets of GABPTargets of YY1Targets of EGR1
5x10-76
4x10-9
1x10-6
2x10-4
Positive control
ChIp-Seq support
Natesan & Gilman 1995
84284423
Top gene-basedenrichments of SRF
Top GREAT enrichments of SRF
(top actin-related term 28th in list)
FOS gene family
http://cs173.stanford.edu [BejeranoWinter12/13]
Similar results for GABP, NRSF, Stat3, p300 ChIP-Seq[McLean et al., Nat Biotechnol., 2010]
![Page 46: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/46.jpg)
Limb P300: I was blind and I can see
46http://cs173.stanford.edu [BejeranoWinter12/13]
Gene List
![Page 47: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/47.jpg)
GREAT works with ANY cis-regulatory rich setExample: GWAS Compendium set
47http://cs173.stanford.edu [BejeranoWinter12/13]
Height-associated unlinked SNPs
![Page 48: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/48.jpg)
GREAT analysis of histone mark combinations
http://cs173.stanford.edu [BejeranoWinter12/13] 48
![Page 49: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/49.jpg)
GREAT includes multiple ontologies
49Michael Hiller
• Twenty ontologies spanning broad categories of biology• 44,832 total ontology terms tested in each GREAT run
(2,800 terms)(5,215)(834)
(5,781)(427)(456)
(150)(1,253)(288)(706)
(6,700)(3,079)(911)
(615)(19)(222)(9)
(6,857)(8,272)(238)
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 50: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/50.jpg)
Advantages of the GREAT approachTailored to the biology of gene regulation:• Distal sites are incorporated, not ignored• Variable length gene regulatory domains• Multiple bindings next to same target gene rewarded• Extensive ontologies, some home-made
http://cs173.stanford.edu [BejeranoWinter12/13] 50
![Page 51: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/51.jpg)
http://cs173.stanford.edu [BejeranoWinter12/13] 51
Algorithmic Optimization: A it works; B make it efficient
![Page 52: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/52.jpg)
52
enter GREAT.stanford.edu
Choose genome
Input peak list
http://cs173.stanford.edu [BejeranoWinter12/13]
Hit submit!
![Page 53: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/53.jpg)
53
GREAT web app:(Optional): alter association rules
http://great.stanford.edu
Three association rule choices
Literature-curated domains for a small subset of genes
Lnp Evx2 HoxD cluster
[adapted from Spitz, Gonzalez, & Duboule, Cell, 2003]http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 54: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/54.jpg)
54
Additional ontologies,term statistics,multiple hypothesis corrections, etc.
GREAT web app: output summary
Ontology-specific enrichments
http://cs173.stanford.edu [BejeranoWinter12/13]
Cool visualization opportunities!
![Page 55: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/55.jpg)
55
GREAT web app: term details page
Genes annotated as “actin binding” with associated genomic regions
Genomic regions annotated with “actin binding”
Drill down to explore how a particular peak regulates Plectin and its role in actin binding
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 56: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/56.jpg)
You can also submit any trackstraight from the UCSC Table Browser
56http://cs173.stanford.edu [BejeranoWinter12/13]
A simple, well documentedprogrammatic interface allowsany tool to submit directly to GREAT.(See our Help / Inquiries welcome!)
![Page 57: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/57.jpg)
GREAT web app: export data
57
HTML output displays all user selected rows and columns
Tab-separated values also available for additional postprocessing
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 58: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/58.jpg)
GREAT Web Stats
http://cs173.stanford.edu [BejeranoWinter12/13] 58
200-400 job submissions per day, from 7,000 IP addrs
![Page 59: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/59.jpg)
59
Adding a new species to GREAT
We need:1. A good assembly2. A high quality gene set3. Good gene annotations*
*Most valuable for species with independent annotations!
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 60: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/60.jpg)
60
Adapting GREAT for zebrafish
We need:1. A good assembly2. A high quality gene set3. Good gene annotations
# Scaffolds Avg. ScaffoldLength
# Assembly
GapsZv8 11,724 129Kb ~55,000Zv9 1,133 1,250 Kb ~27,000
Zv9 = UCSC danRer7older assemblies? liftover to Zv9/danRer7
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 61: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/61.jpg)
61
Adapting GREAT for zebrafish
We need:1. A good assembly2. A high quality gene set3. Good gene annotations
• Carefully combine (95% identity, 80% coverage)RefSeq transcripts Ensembl coding genes RefSeq proteins Uniprot proteins
Obtain 14,567 genes, all with ZFIN gene identifiers
• Using only RefSeq would miss 1,912 annotated genes• Using only Ensembl would miss 1,218 annotated genes
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 62: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/62.jpg)
62
Adapting GREAT for zebrafish
We need:1. A good assembly2. A high quality gene set3. Good gene annotations
Curate zebrafish:•Gene Ontology (GO) - Function, Process, Cellular Component•ZFIN Phenotype•Wiki Pathways•ZFIN Wildtype Expression•InterPro - protein domains, families and functional sites•TreeFam - gene families of paralogs
http://cs173.stanford.edu [BejeranoWinter12/13]
![Page 63: MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu](https://reader036.fdocuments.us/reader036/viewer/2022062521/568168d1550346895ddfc191/html5/thumbnails/63.jpg)
63
96% of our gene set is annotated
Ontology Genes Terms* AnnotationsGene Ontology Molecular Function 10,520 1,697 80,788
Biological Process 8,174 3,597 152,595Cellular Component 7,138 592 68,922
Phenotype Data ZFIN Phenotype 671 11,835 57,976Pathway Data Wiki Pathways 1,754 105 3,622Gene Expression ZFIN Wildtype Expression 8,421 9,812 888,189Gene Families Interpro 12,746 6,667 43,994
Treefam 11,324 6,010 11,338Total 14,038 40,315 1,307,424
Unfolded
• At least one gene is annotated with the term
http://cs173.stanford.edu [BejeranoWinter12/13]