Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · ©...
Transcript of Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · ©...
![Page 1: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/1.jpg)
1 © Cloudera, Inc. All rights reserved.
Big Data for Personalized Health & Genomics
Shawn Dolley| Industry Leader, Health & Life Science
Big Data Healthcare Meetup July 2016
![Page 2: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/2.jpg)
2 © Cloudera, Inc. All rights reserved.
http://www.nih.gov/precisionmedicine/infographic-printable.pdf
![Page 3: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/3.jpg)
3 © Cloudera, Inc. All rights reserved. Public domain slide, by Brian Wells, Penn Medicine, http://tinyurl.com/zsayqld
![Page 4: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/4.jpg)
4 © Cloudera, Inc. All rights reserved.
Precision Medicine…….. Why Now?
![Page 5: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/5.jpg)
5 © Cloudera, Inc. All rights reserved.
Collecting a patient genome is affordable
“In 10 years we’ve come from a $300M genome to one that’s realistically available at around $3000. That’s a 100,000 fold drop!” - James Hadfield, Next Generation Sequencing, 2014
Source: Nature, 2014
![Page 6: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/6.jpg)
6 © Cloudera, Inc. All rights reserved.
Patient data moves from paper to digital
![Page 7: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/7.jpg)
7 © Cloudera, Inc. All rights reserved.
HIPAA enabled us to break down industry silos
![Page 8: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/8.jpg)
8 © Cloudera, Inc. All rights reserved.
A new class of data scientists trained for a decade
![Page 9: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/9.jpg)
9 © Cloudera, Inc. All rights reserved.
2006 2008 2009 2010 2011 2012 2013
Core Hadoop (HDFS,
MapReduce)
HBase ZooKeeper
Solr Pig
Core Hadoop
Hive Mahout HBase
ZooKeeper Solr Pig
Core Hadoop
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
Core Hadoop
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Spark Tez
Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
2007
Solr Pig
Core Hadoop
Knox Flink
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
2014 2015
Kudu
RecordService Ibis
Falcon Knox Flink
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Big Data finally mature and ready
![Page 10: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/10.jpg)
10 © Cloudera, Inc. All rights reserved.
How big is the data?
![Page 11: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/11.jpg)
11 © Cloudera, Inc. All rights reserved.
Title
![Page 12: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/12.jpg)
12 © Cloudera, Inc. All rights reserved. Courtesy Cloudian
![Page 13: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/13.jpg)
13 © Cloudera, Inc. All rights reserved. Courtesy Cloudian
![Page 14: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/14.jpg)
14 © Cloudera, Inc. All rights reserved.
What are we seeing?
![Page 15: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/15.jpg)
15 © Cloudera, Inc. All rights reserved.
Annotation Data
• Each researcher today has to go to multiple genomic search engines to find variants, annotations, other
• Much better to have this data in the same integrated repository as your data—easier & faster
• Cloudera Omics Accelerator integration of public databases is designed to save researchers’ time
![Page 16: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/16.jpg)
16 © Cloudera, Inc. All rights reserved.
The Genomic Analytic Pipeline & Cloudera Omics
Upstream Downstream Whole
Exome/ Genome
Genotyping
Alignment Annotation
Analysis
Velvet, BWA…n
GATK, Samtools…n
VEP, SnpEff…n
Se
qu
en
cin
g
Multiple Public
Databases
Internal Clinical
Integrated precision medicine repository
![Page 17: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/17.jpg)
17 © Cloudera, Inc. All rights reserved.
Baylor moves to Cloudera Enterprise to embark on their precision medicine journey
![Page 18: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/18.jpg)
18 © Cloudera, Inc. All rights reserved.
Broad Institute’s industry standard GATK pipeline’s next version will be Spark-based, over 20,000 global users may migrate to Spark
![Page 19: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/19.jpg)
19 © Cloudera, Inc. All rights reserved.
Cloudera will be driving adoption of big data at precision medicine labs around the US, including custom collaborations
![Page 20: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/20.jpg)
20 © Cloudera, Inc. All rights reserved.
The new ‘omic ‘apps’ use big data stack
![Page 21: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/21.jpg)
21 © Cloudera, Inc. All rights reserved.
![Page 22: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/22.jpg)
22 © Cloudera, Inc. All rights reserved.
Embrace • Genomics is part of our lives • Precision medicine is just medicine • Data sizes of the future are here Open Your Mind, Open Your Mission • Decide to help human health
Your Next Step • Find the practitioners, their data, their tools • Get started with Hadoop, Spark and more • Expand their data universe, be a hero
Your next step?
![Page 23: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022042302/5ecd69f3d78785722568e644/html5/thumbnails/23.jpg)
23 © Cloudera, Inc. All rights reserved.
Thank you