1
3:50-4:20 PM GC at CGC 9-Jun-2009
Thanks to:
The $0 Genome & PersonalGenomes.org
Azco
RBH
2
What does $0 to the consumer mean?
1991 Linux1993 WWW2001 Wikipedia1998 Google Search, Maps, Translate, Health..
3
Specifications other than cost1. Speed (really real-time)2. No reagents or stable in harsh conditions 3. Portability (Instrument size)4. Read length (Mbp)5. Keep DNA parts together in mixtures6. Subsequence targeting (e.g. drug resistance)
emulsionoilH2O
Microbe chromosomes
barcode
4
DNA Explorer, $80 (Ages 10 and up) www.discovery.com
Genographic Project $99
DIY Bio
23andme $399Time Magazine Nov 2008 invention of the year
5
DTC SNP chips : Breast Cancer
deCODEme: “does not include the high-risk but rare BRCA1 and BRCA2 breast cancer risk variants”. Navigenics: “Mutations in BRCA1 or BRCA2 are less common in the population and are only present in approximately 5 – 10% of families with breast and ovarian cancer.”23andme: “Hundreds of cancer-associated BRCA1 and BRCA2 mutations have been documented, but three specific BRCA mutations are worthy of note because they are responsible for a substantial fraction of hereditary breast cancers and ovarian cancers among women with Ashkenazi Jewish ancestry”.
1M vs 3G
6
“Genes Show Limited Value in Predicting Diseases”
Nicholas Wade April 15, 2009
David B. Goldstein, Ph.D.“We must therefore turn more sharply toward the
study of rare variants.”
(Common SNP backlash)
7
Valuable Personal Genome Sequences
1464 genes are highly predictive & medically actionable(inherited & cancer) at ~$2K per gene.
**Very few of these are on SNP chips.** Why?PKU, Tay Sachs, Cystic Fibrosis, BRCA1/2, etc.
Pharmacogenomic drug/allele combinations:Herceptin, Iressa, ..
Also: Ancestry, Forensics, Social Networking, Education, Research
8
Multigenic rare causative alleles can yield strong or weak GWA with a common allele
CasesStrong GWA
Controls
Casesweak GWA
Controls
Red=haplotype block
9
Seq bp/$
0.01
0.1
1
10
100
1000
10000
100000
1000000
10000000
1980 1985 1990 1995 2000 2005 2010
(Moore’s law) 1.5x/yr for electronics
vs10x/yr for
DNA Sequencing
4 logs in 4 years
2009:Lig:$5K
2005:capil:$50M
1995:gel: $3G
Pol:$50K
10
Ultra-low-cost sequencing1. Polonator SbL/P Open-source $170K device, haplotypes2. Roche-454 SbP Long reads (>0.4 kb)3. Illumina-GA SbP Fluorescent read-length 2*110 bp4. AB-SOLiD SbL Longest ligation reads5. Helicos SbP High parallelism & quantitation6. CGI SbL Rolony grid & 100Kb haplotypes $5K genome
7. Ion Torrent SbP Potentially small device8. Genizon BioSci SbH In situ sequencing9. LightSpeed SbL 16X higher density, >10X speed10. Intelligent Bio SbP Hexagonal grid11. Pacific Bio SbP Long reads (>2.0 kb)12. Bionanomatrix SbP Fluorescent mapping (>300kb) 13. Visigen SbP14. OxfordNanopore Pore Potentially small device15. Nabsys Pore Potentially small device16. Halcyon EM Long reads (>300kb) 17. ZS Genetics EM Long reads (>300kb)
Polonator Polonator
11
SequencingmC
G T A C
Clarke, Bayley, et al. Nature Nanotech 2009
12
Electron Microscopy
YOYO labeled stretched ss-M13-DNA on PDMS15 μm = 30 kb
Pt-G-ssDNA 0.5nm = 1 base
William Andregg, et al. unpublished, 2009 .gg...gg....g..g.....gg.n....g...g...g..ggg.....gg.gg....n..
13
Electron Microscopy
Osmium T
William Andregg, et al. unpublished, 2009
10 vs10K fps
14
Why open-architecture hardware, software, wetware?
Polonator
1999-2009$170K
2 billion reads per run
Precedents:1981 IBM PC1991 Linux1993 WWW2001 Wikipedia
Rich TerryFigure 4.6.1 Polonator instrument
A shared resource: Pol & Ligase chemistries
15
Anonymity vs Open-access? Are we in denial?
Trends in laws to make data public (not just at elite institutions): e.g. H.R. 2764, SEC. 218. 26-Dec-07 open-access publishing for all NIH-funded research.
(12) Identify individual case/control status from pooled SNP data Homer et al PLoS Genetics 2008 as this became known, NCBI pulled dbGAP data
(11) Re-identification after “de-identification” using public data. Group Insurance list of birth date, gender, zip code sufficient to re-identify medical records of Governor Weld & family via voter-registration records (1998)
Self identification trend (10) Unapproved self-identification. e.g. Celera IRB. (Kennedy Science. 2002)(9) Obtaining data about oneself via FOIA or sympathetic researchers. (8) DNA data CODIS data in the public domain.
even if acquitted
16
Anonymity vs Open Access? Are we in denial?Accessing “Secure data”(7) Laptop loss. 26 million Veterans' medical records,
SSN & disabilities stolen Jun 2006. (6) Hacking. A hacker gained access to confidential medical info at the U.
Washington Medical Center -- 4000 files (names, conditions, etc, 2000)(5) Combination of surnames from genotype with geographical info An
anonymous sperm donor traced on the internet 2005 by his 15 year old son who used his own Y chromosome data.
(4) Identification by phenotype. If CT or MR imaging data is part of a study, one could reconstruct a person’s appearance . Even blood chemistry can be identifying in some cases.
(3) Inferring phenotype from genotype Markers for eye, skin, and hair color, height, weight, geographical features, dysmorphologies, etc. are known & the list is growing.
(2) “Abandoned DNA bearing samples (e.g. hair, dandruff, hand-prints, etc.) (1) Government subpoena. False positive IDs and/or family coercion
index
17
Who can contribute to cures?
Huntington's NancyWexler (psychologist)
Adrenoleukodystrophy
Odone (World Bank)
Parkinson’sBrin family Hugh Rienhoff, (MD)
MyDaughtersDNA.org
ALS Jamie Heywood (engineer)PatientsLikeMe.com
Motivating, donating data ... access to data?
LRRK2 G2019S
HFE Aull(engineer)
18
Genesenvironmentstraits, cells1) First/only open access data 2) Avoid over-promising on de-identification 3) 100% on Exam to assure informed consent(*Educate pre-consent rather than post-discovery*)4) Low cost coding sequence + regulatory data 5) Multi-traits: images, iPS-etc.RNA, microbe/VDJ 6) Cells available for personal functional genomics7) IRB approval for 100,000 diverse volunteers
501(c)(3)
0431
1070
1660
1677
1687
1833
1846
1731
1730
1781
1919
Traitomatic: 7 diploid +10 PGP sequences: hypertrophic cardiomyopathy allele
20
Diagnostics Systems Biology Challenge
TRAITS(Phenome)
Genome6 Gbp
3M Alleles
NOT going from ONLY Genome Sequence to Prediction
21
PersonalGenomes.orgInherited, Somatic, Environmental Genomics
VDJ-ome
TRAITS(Phenome)
Personal stem-cellsepigenome(RNA,mC)
PERSONAL GENOME
6 Gbp3M alleles
One in a life-time genome + yearly ( to daily) tests
Public Health Bio-weathermap.org : Allergens, Microbes, Viruses
Microbiome~5 new non-synonymousAlleles per generation
22
Microbiome vs VDJ-ome
Microbe tests: Detect Drug resistance spectrumEarlier warning (e.g. meningitis)
Immune tests: Focus on response to exposureLonger times to detect exposure (e.g. HIV, TB)
23
Multiple Phyla Subsisting on 18 Antibiotics
DantasSommerChurchScience
2008
(& lignin)
24
PersonalGenomes.orgInherited, Somatic, Environmental Genomics
VDJ-ome
TRAITS(Phenome)
Personal stem-cellsepigenome(RNA,mC)
PERSONAL GENOME3M alleles
One in a life-time genome + yearly ( to daily) testsPublic Health Bio-weather map : Allergens, Microbes, Viruses
Microbiome
25
Epignome: DNA - RNA - Protein
Regulatory RNA & Proteins
26
Selective genome sequencing
Shendure, et al. Science 309:1728 Porreca et al 2007 Nat Methods 4:931Nilsson et al. (2006) Trends Biotechnol 24:83.
Red=Synthetic; Yellow=genome/cDNA
Optimize 258K oligos: 148,949 exons, 20,065 CCDS genes.
3 ways to capture alleles from genomic or c-DNA
In vitro Paired-end-tags (PET)
Science 2005Science 2005
Hybridiz.selection
Zhang, Chou, Shendure, Li, Leproust, Dahl, Davis, Nilsson, Church
For rearrangements
2. 3.1.
GapFill
Nat Methods 2007
3.
2727
Array Synthesis of Padlock Probes
barcodes
28
PO4
PO4
App
Barcoding RNAs
Efficient microRNA capture and barcoding via enzymatic oligonucleotide adenylation.Vigneault et al. Nature Methods 2009
3’‐OH5’
5’
+
X3’
+T4 RNA ligase
ATPX3’5’
X3’
29
RNA editing: A to I(G)# of known cases increased from from 10 to 569
Erez Levanon
Genomic DNA
RNA - intestine
RNA - kidney
RNA - diencephalon
RNA - frontal lobe
RNA - corpus callosum
RNA - cerebellum
Li, Levanon,Yoon,Aach, Xie, LeProust, Zhang, Gao, Church (Science 2009)
e.g. VEZF1
30
Regulation & MethylationHigh Expression = High Gene-Body to Promoter Ratio
Ball, Li, Gao, Lee, LeProust, Park, Xie, Daley, Church. (Nature Biotech 2009)
Genome wide bisulfite & enzyme assays unrestricted by CpG Island bias
31
G
A
TC
Allele-specific expression (ASE)
N=1: Combine all cis element variants
GA
AAAAAAAAAAAAAAAAAAAA
TC
TT
& eliminate environmental & trans-acting variation among individuals.Cis: Copy number, enhancer, promoter, splicing, polyA, termination, transport, decay.
G
A
GG
Allele‐specific transcription factor
binding
TF
Causality: Synthetic homologous allele‐replacement
Zhang, Li, Church unpublishedForton et al. Genome Res. 2007
3232
PersonalGenomes.org: skin to stem cells to many types
Park& Lee
Hair or skin sample
33
Clustering stat-significant
allele-specific expression in
reprogrammed cells, ~50% of ASE invariant
among cell types
LeeZhangParkDaleyChurch
34
PersonalGenomes.orgInherited, Somatic, Environmental Genomics
VDJ-ome
TRAITS(Phenome)
Personal stem-cellsepigenome(RNA,mC)
PERSONAL GENOME
6 Gbp3M alleles
One in a life-time genome + yearly ( to daily) tests
Public Health Bio-weather map : Allergens, Microbes, Viruses
Microbiome
Top Related