How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech...
-
Upload
counsyl -
Category
Technology
-
view
3.509 -
download
0
description
Transcript of How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech...
![Page 1: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/1.jpg)
Counsyl
www.counsyl.com
How I Learned to Stop Worryingabout Big Data
...and love the data that actually counts
Imran S. HaqueCounsyl
18 Jul 2013
Friday, July 26, 13
![Page 2: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/2.jpg)
About the Speaker
•Imran S. Haque ([email protected])
•Director of Research at Counsyl
•BS EECS, UC Berkeley; PhD CS, Stanford
Friday, July 26, 13
![Page 3: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/3.jpg)
About CounsylWe have developed a single genomic test that replaces 100+ expensive assays
It has reduced the cost of carrier testing by literally one hundred fold
Bloom Syndrome $167Canavan Disease $473
Cystic Fibrosis $506Familial Dysautonomia $334
Fanconi Anemia $167Gaucher Disease $467
Glycogen Storage Disease Type Ia $283Maple Syrup Urine Disease Type 1B $557
Mucolipidosis IV $279Niemann-Pick Disease Type A $337
Spinal Muscular Atrophy $700Tay-Sachs Disease $473
Total $4743
Friday, July 26, 13
![Page 4: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/4.jpg)
Engineering at Counsyl
WetlabBiology
Ordering
Reporting
Billing
Fulfillment
Automation Assay Calling
Friday, July 26, 13
![Page 5: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/5.jpg)
Engineering at Counsyl
How big is the data in genomics?
WetlabBiology
Ordering
Reporting
Billing
Fulfillment
Automation Assay Calling
Assay Calling
Friday, July 26, 13
![Page 6: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/6.jpg)
Big Data Will Save the World
Friday, July 26, 13
![Page 7: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/7.jpg)
Big Data Will Save the World
But what is it, anyway?
Friday, July 26, 13
![Page 8: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/8.jpg)
Background
Friday, July 26, 13
![Page 9: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/9.jpg)
Background
Wikipedia “Big Data”:A collection of data sets so large and complex that it becomes difficult to
process using on-hand database management tools or traditional data
processing applications
Friday, July 26, 13
![Page 10: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/10.jpg)
What Defines Big Data
• Computation: data so large that algorithms must be o(N1+ε): “almost linear.”
• Handling: data so large that with tractable algorithms communication becomes more significant than computation.
Friday, July 26, 13
![Page 11: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/11.jpg)
Why Do People Care?
Big Data is fundamental to fields in which each individual piece of data is relatively information-light, so it is necessary to
aggregate a lot of it.
Friday, July 26, 13
![Page 12: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/12.jpg)
Genomics:Big Data
Friday, July 26, 13
![Page 13: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/13.jpg)
Genomics:Big Data
But not as we know it.
Friday, July 26, 13
![Page 14: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/14.jpg)
Short-Read Sequencing in Short
I don’t know what they want from meIt’s like the more money we come across
The more problems we see
Friday, July 26, 13
![Page 15: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/15.jpg)
Short-Read Sequencing in Short
I don’t know what they want from meIt’s like the more money we come across
The more problems we see
It’s like the morew what they wan
acro5s The more problre problems we see
...
Friday, July 26, 13
![Page 16: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/16.jpg)
Short-Read Sequencing in Short
I don’t know what they want from meIt’s like the more money we come across
The more problems we see
It’s like the morew what they wan
acro5s The more problre problems we see
...
Current sequencers can produce ~100Gb of short (100bp) reads/day
Friday, July 26, 13
![Page 17: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/17.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 18: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/18.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acr
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 19: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/19.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acrIt’s!like!the!more
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 20: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/20.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acrIt’s!like!the!more!!!!!!!!!!!!!!!re!data!!we!c
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 21: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/21.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acrIt’s!like!the!more!!!!!!!!!!!!!!!re!data!!we!c!!!!like!the!more!d
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 22: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/22.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acrIt’s!like!the!more!!!!!!!!!!!!!!!re!data!!we!c!!!!like!the!more!d!!!!!!!!!!!!!!!!!!!!ata!!we!come!across
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 23: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/23.jpg)
Short-Read Alignment
It’s%like%the%more%money%we%come%across!!!!!!!!!!!!!!!!!!!!!!!!!!e!come!acrIt’s!like!the!more!!!!!!!!!!!!!!!re!data!!we!c!!!!like!the!more!d!!!!!!!!!!!!!!!!!!!!ata!!we!come!across
It’s!like!the!more!data!we!come!across
Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 24: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/24.jpg)
Alignment Algorithms
Ning, Cox, Mullikin. Genome Res 2001Li, Ruan, Durbin Genome Res 2008Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 25: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/25.jpg)
Alignment Algorithms
• Smith-Waterman: O(MN), large constant factor
• Hash-based Alignment: much smaller constants than SW• MAQ, SSAHA
• Burrows-Wheeler Alignment: sublinear in size of genome• Bowtie, BWA
Ning, Cox, Mullikin. Genome Res 2001Li, Ruan, Durbin Genome Res 2008Ferragina and Manzini, JACM 2005Langmead et al, Genome Biol 2009Li and Durbin et al, Bioinformatics 2009
Friday, July 26, 13
![Page 26: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/26.jpg)
Real-World AlignmentsATCCTTTGGGTGTATGGGTCGTAGCGAACTGAGAAGGGCCGAGG............!!...........................................!!!!....C....................................!!!,,,,c,,,,,,,,,,,,,,,,,,,,,,...............!!!...........................................!!..C........................................!!!C...........................................C!!.........................................C.!!........................................C..!!..................,,,,,,,,,,,,,,,,,,,,,,,,,!!!!!!,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c,,,,!!!!!!.......................................!!...........................................!!...................................C.......!!..................................C.........!!...........................................!!..........,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,!!..............................C.............!!.......
Friday, July 26, 13
![Page 27: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/27.jpg)
Real-World AlignmentsATCCTTTGGGTGTATGGGTCGTAGCGAACTGAGAAGGGCCGAGG............!!...........................................!!!!....C....................................!!!,,,,c,,,,,,,,,,,,,,,,,,,,,,...............!!!...........................................!!..C........................................!!!C...........................................C!!.........................................C.!!........................................C..!!..................,,,,,,,,,,,,,,,,,,,,,,,,,!!!!!!,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c,,,,!!!!!!.......................................!!...........................................!!...................................C.......!!..................................C.........!!...........................................!!..........,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,!!..............................C.............!!.......
PAH:Y414C(heterozygote C/T)
phenylketonuria
Friday, July 26, 13
![Page 28: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/28.jpg)
Real-World AlignmentsATCCTTTGGGTGTATGGGTCGTAGCGAACTGAGAAGGGCCGAGG............!!...........................................!!!!....C....................................!!!,,,,c,,,,,,,,,,,,,,,,,,,,,,...............!!!...........................................!!..C........................................!!!C...........................................C!!.........................................C.!!........................................C..!!..................,,,,,,,,,,,,,,,,,,,,,,,,,!!!!!!,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,c,,,,!!!!!!.......................................!!...........................................!!...................................C.......!!..................................C.........!!...........................................!!..........,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,!!..............................C.............!!.......
PAH:Y414C(heterozygote C/T)
phenylketonuria
Need to align 1.5M reads per sample, across
thousands of samples!
Friday, July 26, 13
![Page 29: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/29.jpg)
Genomics: Big Data?Genomics appears to have all the characteristics of Big Data.
• Large quantity: ~100GB/day/sequencer
• Advanced algorithms: BWT alignment in linear/sublinear time
But characteristics of the data itself matter too!
Friday, July 26, 13
![Page 30: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/30.jpg)
Clinical Genomics: Not That Big
Friday, July 26, 13
![Page 31: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/31.jpg)
Clinical Genomics: Not That BigMost of the human genome is currently non-actionable.
Whole Genome Sequencing (~3000 Mb)
Friday, July 26, 13
![Page 32: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/32.jpg)
Clinical Genomics: Not That BigMost of the human genome is currently non-actionable.
Whole Genome Sequencing (~3000 Mb)
Whole Exome Sequencing (~30 Mb)
Friday, July 26, 13
![Page 33: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/33.jpg)
Clinical Genomics: Not That BigMost of the human genome is currently non-actionable.
Whole Genome Sequencing (~3000 Mb)
Whole Exome Sequencing (~30 Mb)
Clinical Carrier Screening (~1 Mb)
Friday, July 26, 13
![Page 34: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/34.jpg)
Clinical Genomics: Not That BigMost of the human genome is currently non-actionable.
Whole Genome Sequencing (~3000 Mb)
Whole Exome Sequencing (~30 Mb)
Clinical Carrier Screening (~1 Mb)
Exome Sequencing (30 Mb)
Friday, July 26, 13
![Page 35: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/35.jpg)
Clinical Genomics: Not That BigMost of the human genome is currently non-actionable.
Whole Genome Sequencing (~3000 Mb)
Whole Exome Sequencing (~30 Mb)
Clinical Carrier Screening (~1 Mb)
Exome Sequencing (30 Mb)Clinical Carrier Screening (~1 Mb)
Friday, July 26, 13
![Page 36: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/36.jpg)
But 100Gb Is Still 100Gb, Right?
Friday, July 26, 13
![Page 37: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/37.jpg)
But 100Gb Is Still 100Gb, Right?
Clinical genomics analysis is per-sample.
• Processing is embarrassingly parallel after demultiplexing.• Handling a single sample is trivial on even a laptop.
Use ZFS and LSF/SGE, not Cassandra and Hadoop.
Friday, July 26, 13
![Page 38: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/38.jpg)
Why is Genomics Still Interesting?
Friday, July 26, 13
![Page 39: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/39.jpg)
Why is Genomics Still Interesting?
It’s OK to be Lil’.
Friday, July 26, 13
![Page 40: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/40.jpg)
Research Genomics
Friday, July 26, 13
![Page 41: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/41.jpg)
Research Genomics
Counsyl runs this many samples every year ; clinical = scale.
Target # Samples # SNPs
Education Level 126,559 2.2M
Breast/Ovarian Cancer 11,705 31,812
Diabetes 10,128 2.2M
Telomere Length 37,684 2.4M
Rietveld et al, Science 2013Couch et al, PLoS Genet 2013Zeggini et al, Nat Genet 2008Codd et al, Nat Genet 2013
Friday, July 26, 13
![Page 42: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/42.jpg)
Clinical Genomics: Big Where It Matters
Whole Genome (3000 Mb)
Clinical Genome (1 Mb)
Friday, July 26, 13
![Page 43: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/43.jpg)
Clinical Genomics: Big Where It Matters
• Focusing on a small region means you can examine thousands of people: study important regions in great depth.
• Embarrassingly parallel is a good thing: people pay the bills!
Friday, July 26, 13
![Page 44: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/44.jpg)
Let’s Science Up This Data
N=83,538 samples, 493 variants
Estimated carrier frequency per population as a binomial.
Bonferroni-corrected binomial equality test comparing each population against the pooled data finds variants that are significantly enriched/
depleted in particular populations.
Haque et al, in preparationFriday, July 26, 13
![Page 45: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/45.jpg)
Smith-Lemli-Opitz Syndrome (DHCR7)
• We see a carrier rate double the predicted literature values(e.g., 1/57 vs 1/124 in Northwestern Europeans)
• We find previously undescribed population associations for DHCR7:IVS8-1G>C
Population Frequency Overall Frequency P-value N
⬆AJ 1 in 46 1 in 96 1.18E-11 4330⬇EA 0 1 in 96 1.56E-07 2739
Haque et al, in preparationFriday, July 26, 13
![Page 46: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/46.jpg)
Genetic Disease in South Asians
Cystic Fibrosis (CFTR)
• 1/57 observed vs 1/118 in literature.
GJB2-related DFNB1 nonsyndromic hearing loss and deafness
• Literature claims 1/133 with 35delG, but we find 1/2191.• 36/2191 carriers, 35 for W24X.
Progressive cone dystrophy/achromatopsia (CNGB3)
• R403Q present in 1/18: 30% of carriers in 4% of tested pop.
Haque et al, in preparationFriday, July 26, 13
![Page 47: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/47.jpg)
Size Doesn’t Matter, It’s How You Use It
• Genomics has a real ground truth.
• Genomics has a real impact.
Clinical genomics is interesting independently of “Big”ness.
Friday, July 26, 13
![Page 48: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/48.jpg)
Future of GenomicsCratering prices drive technological shifts.
Technologies at the research frontier will become commercialized.
• Whole-genome association studies
• RNA-seq and transcriptomics
• Epigenomics
• Pathogen sequencing and metagenomics
Friday, July 26, 13
![Page 49: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/49.jpg)
Where Are We Now?
• Theory has been developed in academia and government.
• Scale-up is just beginning in industry: started with tool vendors, now reaching applications companies.
• New scales of data will feed back into basic R&D.
Friday, July 26, 13
![Page 50: How I Learned to Stop Worrying about Big Data and Love the Data That Actually Counts - Counsyl Tech Talk](https://reader033.fdocuments.us/reader033/viewer/2022052900/55616688d8b42a5f4b8b50cb/html5/thumbnails/50.jpg)
Recap
Big Data =
•“near linear” algorithms• communication is harder than computation
Short-read sequencing produces large amounts
of data.
Useful clinical insights are mostly derived from embarrassingly-parallel
small data.
“Small data” genomics is highly impactful in its
own right.
Genomics may enter a “big data” phase in the
future with new methods.
Friday, July 26, 13