Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · ©...

Post on 24-May-2020

2 views 0 download

Transcript of Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · ©...

1 © Cloudera, Inc. All rights reserved.

Big Data for Personalized Health & Genomics

Shawn Dolley| Industry Leader, Health & Life Science

Big Data Healthcare Meetup July 2016

2 © Cloudera, Inc. All rights reserved.

http://www.nih.gov/precisionmedicine/infographic-printable.pdf

3 © Cloudera, Inc. All rights reserved. Public domain slide, by Brian Wells, Penn Medicine, http://tinyurl.com/zsayqld

4 © Cloudera, Inc. All rights reserved.

Precision Medicine…….. Why Now?

5 © Cloudera, Inc. All rights reserved.

Collecting a patient genome is affordable

“In 10 years we’ve come from a $300M genome to one that’s realistically available at around $3000. That’s a 100,000 fold drop!” - James Hadfield, Next Generation Sequencing, 2014

Source: Nature, 2014

6 © Cloudera, Inc. All rights reserved.

Patient data moves from paper to digital

7 © Cloudera, Inc. All rights reserved.

HIPAA enabled us to break down industry silos

8 © Cloudera, Inc. All rights reserved.

A new class of data scientists trained for a decade

9 © Cloudera, Inc. All rights reserved.

2006 2008 2009 2010 2011 2012 2013

Core Hadoop (HDFS,

MapReduce)

HBase ZooKeeper

Solr Pig

Core Hadoop

Hive Mahout HBase

ZooKeeper Solr Pig

Core Hadoop

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

Core Hadoop

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Spark Tez

Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

2007

Solr Pig

Core Hadoop

Knox Flink

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

2014 2015

Kudu

RecordService Ibis

Falcon Knox Flink

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Big Data finally mature and ready

10 © Cloudera, Inc. All rights reserved.

How big is the data?

11 © Cloudera, Inc. All rights reserved.

Title

12 © Cloudera, Inc. All rights reserved. Courtesy Cloudian

13 © Cloudera, Inc. All rights reserved. Courtesy Cloudian

14 © Cloudera, Inc. All rights reserved.

What are we seeing?

15 © Cloudera, Inc. All rights reserved.

Annotation Data

• Each researcher today has to go to multiple genomic search engines to find variants, annotations, other

• Much better to have this data in the same integrated repository as your data—easier & faster

• Cloudera Omics Accelerator integration of public databases is designed to save researchers’ time

16 © Cloudera, Inc. All rights reserved.

The Genomic Analytic Pipeline & Cloudera Omics

Upstream Downstream Whole

Exome/ Genome

Genotyping

Alignment Annotation

Analysis

Velvet, BWA…n

GATK, Samtools…n

VEP, SnpEff…n

Se

qu

en

cin

g

Multiple Public

Databases

Internal Clinical

Integrated precision medicine repository

17 © Cloudera, Inc. All rights reserved.

Baylor moves to Cloudera Enterprise to embark on their precision medicine journey

18 © Cloudera, Inc. All rights reserved.

Broad Institute’s industry standard GATK pipeline’s next version will be Spark-based, over 20,000 global users may migrate to Spark

19 © Cloudera, Inc. All rights reserved.

Cloudera will be driving adoption of big data at precision medicine labs around the US, including custom collaborations

20 © Cloudera, Inc. All rights reserved.

The new ‘omic ‘apps’ use big data stack

21 © Cloudera, Inc. All rights reserved.

22 © Cloudera, Inc. All rights reserved.

Embrace • Genomics is part of our lives • Precision medicine is just medicine • Data sizes of the future are here Open Your Mind, Open Your Mission • Decide to help human health

Your Next Step • Find the practitioners, their data, their tools • Get started with Hadoop, Spark and more • Expand their data universe, be a hero

Your next step?

23 © Cloudera, Inc. All rights reserved.

Thank you