Post on 24-May-2020
1 © Cloudera, Inc. All rights reserved.
Big Data for Personalized Health & Genomics
Shawn Dolley| Industry Leader, Health & Life Science
Big Data Healthcare Meetup July 2016
2 © Cloudera, Inc. All rights reserved.
http://www.nih.gov/precisionmedicine/infographic-printable.pdf
3 © Cloudera, Inc. All rights reserved. Public domain slide, by Brian Wells, Penn Medicine, http://tinyurl.com/zsayqld
4 © Cloudera, Inc. All rights reserved.
Precision Medicine…….. Why Now?
5 © Cloudera, Inc. All rights reserved.
Collecting a patient genome is affordable
“In 10 years we’ve come from a $300M genome to one that’s realistically available at around $3000. That’s a 100,000 fold drop!” - James Hadfield, Next Generation Sequencing, 2014
Source: Nature, 2014
6 © Cloudera, Inc. All rights reserved.
Patient data moves from paper to digital
7 © Cloudera, Inc. All rights reserved.
HIPAA enabled us to break down industry silos
8 © Cloudera, Inc. All rights reserved.
A new class of data scientists trained for a decade
9 © Cloudera, Inc. All rights reserved.
2006 2008 2009 2010 2011 2012 2013
Core Hadoop (HDFS,
MapReduce)
HBase ZooKeeper
Solr Pig
Core Hadoop
Hive Mahout HBase
ZooKeeper Solr Pig
Core Hadoop
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
Core Hadoop
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Spark Tez
Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
2007
Solr Pig
Core Hadoop
Knox Flink
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
2014 2015
Kudu
RecordService Ibis
Falcon Knox Flink
Parquet Sentry Spark
Tez Impala Kafka Drill
Flume Bigtop Oozie
HCatalog Hue
Sqoop Avro Hive
Mahout HBase
ZooKeeper Solr Pig
YARN Core Hadoop
Big Data finally mature and ready
10 © Cloudera, Inc. All rights reserved.
How big is the data?
11 © Cloudera, Inc. All rights reserved.
Title
12 © Cloudera, Inc. All rights reserved. Courtesy Cloudian
13 © Cloudera, Inc. All rights reserved. Courtesy Cloudian
14 © Cloudera, Inc. All rights reserved.
What are we seeing?
15 © Cloudera, Inc. All rights reserved.
Annotation Data
• Each researcher today has to go to multiple genomic search engines to find variants, annotations, other
• Much better to have this data in the same integrated repository as your data—easier & faster
• Cloudera Omics Accelerator integration of public databases is designed to save researchers’ time
16 © Cloudera, Inc. All rights reserved.
The Genomic Analytic Pipeline & Cloudera Omics
Upstream Downstream Whole
Exome/ Genome
Genotyping
Alignment Annotation
Analysis
Velvet, BWA…n
GATK, Samtools…n
VEP, SnpEff…n
Se
qu
en
cin
g
Multiple Public
Databases
Internal Clinical
Integrated precision medicine repository
17 © Cloudera, Inc. All rights reserved.
Baylor moves to Cloudera Enterprise to embark on their precision medicine journey
18 © Cloudera, Inc. All rights reserved.
Broad Institute’s industry standard GATK pipeline’s next version will be Spark-based, over 20,000 global users may migrate to Spark
19 © Cloudera, Inc. All rights reserved.
Cloudera will be driving adoption of big data at precision medicine labs around the US, including custom collaborations
20 © Cloudera, Inc. All rights reserved.
The new ‘omic ‘apps’ use big data stack
21 © Cloudera, Inc. All rights reserved.
22 © Cloudera, Inc. All rights reserved.
Embrace • Genomics is part of our lives • Precision medicine is just medicine • Data sizes of the future are here Open Your Mind, Open Your Mission • Decide to help human health
Your Next Step • Find the practitioners, their data, their tools • Get started with Hadoop, Spark and more • Expand their data universe, be a hero
Your next step?
23 © Cloudera, Inc. All rights reserved.
Thank you