MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle...

67
Nicholas Collins, Principal ASA Clinical Analytics and Informatics 03 June 2014 Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM]

Transcript of MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle...

Page 1: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Nicholas Collins, Principal ASA Clinical Analytics and Informatics 03 June 2014

Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM]

Page 2: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

1. About MD Anderson

2. The Future of Cancer Treatment and Research

3. Oracle Health Sciences at MD Anderson

4. Genomics and NLP Pipelines

5. FIRE Architecture (HDWF Implementation)

6. Oracle HDWF Upgrade to Exadata x4-2

7. Closing/Questions

Topics

2

Page 3: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

About MD Anderson

1

3

Page 4: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Non-profit Houston-based cancer hospital and research institution, founded in 1941 as part of The University of Texas System

Named after Monroe Dunaway Anderson (a banker and cotton trader, not an MD)

“Making Cancer History” – our mission is to eradicate cancer

Consistently ranked as the #1 hospital for cancer care

About MD Anderson

4

Page 5: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

About MD Anderson

5

Page 6: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

About MD Anderson

6

Almost 20,000 employees, majority in the Houston area

Occupying over 20 buildings in the Texas Medical Center

The Texas Medical Center has over 50 member institutions, together over 100,000 employees

Page 7: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

The Future of Cancer Treatment and Research

2

7

Page 8: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

“The Time is Now. Together we will end cancer.”

Target six forms of cancer

Clear focus on the concept that the answer to curing cancer lies in both clinical and genomic data

MD Anderson Moon Shots Program

8

Breast/Ovarian Leukemia (AML/MDS) Leukemia (CLL) Melanoma Lung Prostate

Page 9: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

MD Anderson Moon Shots Program

9 http://www.cancermoonshots.org

Page 10: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

MD Anderson Moon Shots Program

10

How do we solve the mysteries of curing cancer?

Page 11: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

It’s in the Data!

11

Page 12: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

MD Anderson Moon Shots Platforms

12

Massive Data Analytics – An infrastructure for complex analytics and clinical decision support using integrated patient information, including clinical and research data

Big Data – An Information Technology infrastructure/environment that enables centralization, integration and secured access of patient and research data and analytical results

Page 13: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

It’s in the Genes!

13

Page 14: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

MD Anderson Moon Shots Platforms

14

Clinical Genomics – Clinical gene sequencing infrastructure, including centralized bio-specimen repository and processing

Omics – Bioinformatics – A high-throughput infrastructure for generation and standardization of large-scale “omic” data, including genomics, proteomics and immune profiling

Adaptive Learning in Genomic Medicine – A framework for bringing clinical medicine and genomic research together to enable rapid learning to improve patient management using Clinical Genomics, Omics-Bioinformatics and Massive Data Analytics platforms within the Big Data environment

Page 15: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Genomics in the News

15

Page 16: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Health Sciences at MD Anderson

3

16

Page 17: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Healthcare Data Warehouse Foundation (HDWF)

Oracle Healthcare Analytics Data Integration (OHADI)

Oracle TRC (Translational Research Center) Cohort Explorer

Oracle TRC Omics Data Bank (ODB)

Oracle Health Sciences Products at MD Anderson

17

Page 18: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Database 11gR2

Oracle Exadata (x3 and x4)

Oracle Business Intelligence (OBIEE)

Oracle GoldenGate*

Oracle Technology at MD Anderson

18

*Oracle GoldenGate was used to demonstrate replication capabilities in a significant POC, but has not been purchased or put into production. Informatica is commonly used at MD Anderson for data integration; ODI is not currently in use at the institution.

Page 19: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Healthcare Data Warehouse Foundation (HDWF)

19

HDI HDM

Page 20: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Healthcare Analytics Data Integration (OHADI)

20

HDI HDM OHADI

Integration code that maps from the HDWF interface tables (HDI) to the HDWF warehouse tables (HDM)

Available as either Informatica or ODI mappings

Page 21: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Cohort Explorer

21

CDM

Cohort Explorer

CDM (Cohort Data Model) is the Clinical Data Mart used by Oracle Cohort Explorer

Page 22: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Cohort Explorer

22

Page 23: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Oracle Omics Data Bank (ODB)

23

Page 24: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

24

Page 25: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Review of Oracle Health Sciences Products

25

HDI HDM OHADI CDM

ODB

Cohort Explorer

Page 26: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Genomics and NLP Pipelines

4

26

Page 27: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

CDM

ODB

Oracle Cohort Explorer

HDM

Genomic Sequencing

Data

Combining Genotypic and Phenotypic Data

Clinical (Phenotypic) Data

Genotypic Data

Page 28: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Reference Data

What kind of data is loaded to ODB?

28

Full Genome Sequence Reference (EMBL) Known Variants (dbSNP/Cosmic) Genes (HUGO) Proteins (SwissProt/UniProt) Pathways (Pathwaycommons) Predictive Phenotyping (Polyphen/SIFT)

Simple Variant (SNP/Indel) Gene Expression Copy Number Variation RNA Sequencing Structural Variants

Result Data

How much data? Three billion bases in human genome

2.8 GB at a byte per base 700 MB at two bits per base

Page 29: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

29

Page 30: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

RNA Codon Chart

30

Synonymous Nonsynonymous

Missense Nonsense

Frame Shift (in the case of indels)

Point Mutations

Page 31: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Load reference data into ODB (initialize it for use, no results)

Lab sequences patient specimens to create result files with variant data (vcf & cnv)

Load result files local to Exadata rack in DBFS

Load from DBFS into ODB via java and PL/SQL loaders (stock ODB functionality, though customer loaders can be written)

Genomic data (in ODB) links to clinical data (in CDM) via a specimen ID for use in Oracle Cohort Explorer and other applications

Genomics Pipeline

31

Page 32: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

http://www.1000genomes.org/node/101

VCF (Variant Call Format) File Example

Page 33: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

CNV (Copy Number Variation) File Example

Page 34: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

CDM/ODB Implementation - Exadata x3-2 Equipment Purchased

December 2012

Development/Test Environment

Production Environment

34

Page 35: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Photo shown courtesy of Mr. Robert Jeffries, Project Manager 35

MD Anderson CAI “War Room” Exadata Implementation Team

January 2013

Page 36: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

36

Page 37: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Querying in Cohort Explorer often requires many lengthy and complex conditions in the where clause, pulling based on individual column values (i.e. genes, healthcare terminology codes) without benefit of ranges, smart scans improve performance

HCC compression helps in repetition of common values (just take ‘A’, ‘G’, ‘T’, and ‘C’ for instance), also when using non-EAV tables for EAV-style data, common in clinical data

Overall performance requirements of the most advanced Cohort Explorer functionality would be difficult to back with anything other than Exadata

Gains from CDM and ODB on Exadata

37

Page 38: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Subject: i got to tell you this about exadata!

Page 39: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Query

Page 40: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

MD Anderson partnered with IBM to build an Oncology Expert Advisor (OEA) application based on IBM Watson technology

NLP important to make data from clinical notes available to OEA, so the institution began using IBM’s NLP tools, including IBM Content Analytics (ICA)

CAI had a need for NLP to make more clinical data available for Cohort Explorer use - often phenotypic data (i.e. patient diagnosis, comorbidities, family cancer history) is only in the transcribed note

CAI is now collaborating with the IBM Watson team to ensure institutional standards for NLP efforts, and to leverage each others’ work as much as possible

CAI’s NLP team uses Exadata for pre-warehouse staging/processing

Natural Language Processing (NLP)

40

Page 41: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Unstructured source documents created by provider (i.e. patient notes, pathology reports), scanned if not originated electronically or transcribed

Documents pulled from various source systems into a single repository (on Exadata)

Crawler pulls new documents for processing by ICA Server, annotators process documents to distill specific attributes and evidence

Attributes loaded to database in structured relational model as intra-document “preliminary” assertions (on Exadata)

IBM WODM Rules Engine takes preliminary assertions and creates cross-document final assertions, also loaded into relational model (on Exadata)

Final assertions loaded into Oracle HDWF as structured clinical data

NLP Pipeline

41

Page 42: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary
Page 43: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

FIRE Architecture (HDWF Implementation)

5

43

Page 44: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

FIRE - Federated Institutional Reporting Environment

A program level initiative, with many projects and products involved, to provide a unified BI/Reporting solution for all of MD Anderson

Managed by the Clinical Analytics and Informatics (CAI) Department, part of Oracle SDP (Strategic Development Parnter) Program

Implmented Oracle HDWF warehouse as the core of the FIRE Program, beginning in 2012

HDWF for The MD Anderson FIRE Program

44

Page 45: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Bring all the data processing together on a single Oracle instance for performance benefits of local movement and transformation

Abstract across all commonalities and patterns to the largest extent possible, avoiding needless one-off solutions, use code generation and automation

Initially implemented on existing AIX hardware, but an ideal candidate for a later “forklift” to Exadata

Architectural Concept

45

Page 46: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

The FIRE Architecture

46

SR SI UI UD HDI/HDM

Page 47: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Data Movement

47

Page 48: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

There was a desire from our integration team to use Informatica for ETL because of experience base on the team, not much PL/SQL or ODI knowledge

Architecture proposed use of abstracted code generation via Informatica APIs, jointly used with the push-down optimization option for all non-OHADI internal data movement (i.e. SI to HDI, UI to UD)

Data Movement (Planned)

48

Page 49: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Our integration team initially indicated that code generation with Informatica (or other tools) could not be done on account of complexity, and that the push-down optimization option was too expensive

To demonstrate the feasibility, I programmed a PL/SQL-based version of the code generation as proposed in the FIRE Architecture documentation, we used this code in the first release

Data Movement (Actual)

49

Page 50: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Data Movement Code Generation

50

Page 51: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Procedure iv_tv_ip_gen(name_of_sv_view) for SI layer, generates objects for change detection and movement from SR to HDI

Procedure iv_uv_dv_ip_gen(name_of_sv_view) for UI layer, generates objects for change detection and movement from HDM to UD

All that is needed for generation is the SV view, which conforms to the HDI-based structure, data in certain standard HDI columns determine action

A benefit of the generated views is the ability to see what will happen during the next run, without actually running anything

PL/SQL Procedures for Code Generation

51

Page 52: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Had approximately three months to implement, process was difficult, but in the end everything worked and we went to production with the first FIRE release in November 2012

OHADI was somewhat slower than expected but got the job done, Informatica version used, but might be faster with ODI?

Integration team wanted a second chance to get code generation going for Informatica, and wanted more Informatica and less SQL and PL/SQL

CAI committee voted to try Informatica alternatives for the next release

Results/Next Steps

52

Page 53: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Architectural Changes

53

Page 54: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Informatica Code Generation using Java to generate Informatica objects, so not using the PL/SQL code generation with SI and UI views for this release

Integration team wanted an instantiated SI Layer and UI Layer for Informatica-based code generation instead of views in the SI and UI Layer

As a result, hybrid architecture in place with some generated PL/SQL/views and some generated Informatica objects

Architectural Changes

54

Page 55: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Informatica Code Generation with Java

55

Page 56: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

HDWF Upgrade – AIX to Exadata x4-2

5

56

Page 57: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

HDWF Hardware Upgrade (from AIX) - Exadata x4-2 Equipment Purchased

December 2013

Development/Test Environment

Production Environment

57

Original AIX Hardware: IBM P550, 8 Physical CPUs /1.65 GHz, 64 GB RAM, AIX 5.3/64Bit

Page 58: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Migration Methodology

58

Logical migration using data pump via network (impdp)

Develop a test suite of queries for performance tuning, and automate testing runs

Change configuration using single variable controlled experimentation to hone in on best options, also identify which are the most significant performance boosters (SGA/PGA size, index visibility, tables pinned to cell flash, storage index disabled, compression, Auto DOP, etc.)

Lock configuration and attempt further tuning with test incremental data loads (DML rather than query)

Page 59: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Performance / Times - Indexes

59

Query All Original Indexes Visible All Non-unique Invisible (PKs & UKs visible, FKs invisible) All Indexes Invisble

All Non-unique Invisible except MDACC custom

All Non-unique Invisible except FKs (PKs, UKs, & FKs visible)

Query 1 +00 00:04:15.268206 +00 00:04:18.802606 +00 00:04:17.616623 +00 00:04:15.777681 +00 00:04:15.713001

Query 2 +00 00:00:57.234542 +00 00:00:56.790110 +00 00:00:57.144610 +00 00:00:58.184639 +00 00:00:57.262739

Query 3 +00 00:08:34.878275 +00 00:08:41.121726 never finished +00 00:08:49.401729 +00 00:08:45.867188

Query 4 +00 00:06:41.485783 +00 00:06:34.904931 +00 00:06:29.586784 +00 00:06:42.858168 +00 00:06:42.512104

Query 5 +00 00:24:40.556198 +00 00:24:40.578150 +00 00:26:10.792066 +00 00:24:59.570722 +00 00:24:55.165571

Query 6 +00 00:00:43.029112 +00 00:00:43.410431 +00 00:00:44.056125 +00 00:00:44.123841 +00 00:00:43.635425

Query 7 +00 00:00:01.545078 +00 00:00:01.699091 +00 00:00:01.582558 +00 00:00:01.608624 +00 00:00:01.609783

Query 8 +00 00:00:10.876617 +00 00:00:08.854054 never finished +00 00:00:09.099019 +00 00:00:08.470038

Query 9 +00 00:00:01.710590 +00 00:00:01.946892 +00 00:00:01.823464 +00 00:00:01.734865 +00 00:00:01.886647

Query 10 +00 00:00:03.919420 +00 00:00:04.223222 +00 01:01:53.262450 +00 00:00:04.191241 +00 00:00:04.139644

Query 11 +00 00:00:57.318168 +00 00:00:07.071767 +00 00:00:06.247229 +00 00:00:06.577124 +00 00:00:06.578497

Query 12 +00 00:00:17.095400 +00 00:00:10.208428 +00 00:00:09.547868 +00 00:00:09.634578 +00 00:00:09.882055

Query 13 +00 00:00:21.082866 +00 00:47:44.312674 never finished +00 00:52:08.863991 +00 00:00:27.537543

Query 14 +00 00:51:57.673060 never finished never finished +00 03:26:31.477185 +00 00:49:45.230375

Query 15 +00 00:04:38.656857 +00 00:02:55.667741 never finished +00 00:02:55.759108 +00 00:02:58.397979

Query 16 +00 00:00:05.751766 +00 00:00:24.376534 never finished +00 00:00:24.581059 +00 00:00:24.912151

Query 17 +00 00:05:34.114854 +00 00:03:49.423974 never finished +00 00:03:56.030990 +00 00:03:57.470639

Query 18 +00 00:02:40.790018 +00 00:02:39.300987 +00 00:02:39.422138 +00 00:02:40.992558 +00 00:02:41.056058

Query 19 +00 00:00:51.688008 +00 00:00:51.566929 +00 00:00:51.265070 +00 00:00:52.757858 +00 00:00:51.403172

Query 20 +00 00:03:44.814679 +00 00:03:30.898289 +00 00:03:29.446093 +00 00:03:52.102754 +00 00:03:34.554562

Page 60: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Performance / Times - Compression

60

Query No Compression Compressed Compression Type

Query 1 +00 00:04:18.582828 +00 00:04:50.366265 OLTP

Query 2 +00 00:00:57.554905 +00 00:01:15.014118 OLTP

Query 3 +00 00:06:18.164402 +00 00:06:59.956384 OLTP

Query 4 +00 00:07:11.780093 +00 00:08:14.282119 OLTP

Query 5 +00 00:24:58.411945 +00 00:25:23.206456 OLTP

Query 6 +00 00:00:42.437158 +00 00:00:55.025157 OLTP

Query 7 +00 00:00:01.516546 +00 00:00:01.625790 OLTP

Query 8 +00 00:00:09.569546 +00 00:00:06.663720 OLTP

Query 9 +00 00:00:01.744924 +00 00:00:01.272805 OLTP

Query 10 +00 00:00:04.012592 +00 00:00:05.653153 OLTP

Query 11 +00 00:00:06.040799 +00 00:00:07.085242 OLTP

Query 12 +00 00:00:09.923520 +00 00:00:09.957895 OLTP

Query 13 +00 00:00:30.367873 +00 00:03:41.028555 HCC - QH

Query 14 +00 00:49:22.688745 +00 02:23:08.758924 HCC - QH

Query 15 +00 00:04:29.040059 +00 00:14:09.746344 OLTP

Query 16 +00 00:00:26.161868 +00 00:00:28.511322 OLTP

Query 17 +00 00:04:34.196931 +00 00:06:04.440474 OLTP

Query 18 +00 00:02:40.874603 +00 00:02:57.846432 HCC - QH

Query 19 +00 00:00:51.901180 +00 00:01:01.855314 HCC - QH

Query 20 +00 00:02:54.209488 +00 00:03:44.067049 HCC - QH

Page 61: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Compression Ratios

61

Table Size in GB Size in GB at HCC - QH

HCC - QH % of Orig Size

Ratio (Orig : HCC - QH)

Size in GB at HCC - QL

HCC - QL % of Orig Size

Ratio (Orig : HCC - QL)

HCC-QH % of HCC-QL

Table 1 51.97851563 2.487487793 5% 21 4.06964111 8% 13 61% Table 2 41.59350586 2.446899414 6% 17 4.51171875 11% 9 54% Table 3 29.55371094 2.084411621 7% 14 3.78485107 13% 8 55% Table 4 25.5625 2.509277344 10% 10 4.18261719 16% 6 60% Table 5 23.5078125 2.664794922 11% 9 4.40570068 19% 5 60% Table 6 16.61914063 1.192260742 7% 14 2.08892822 13% 8 57% Table 7 15.38574219 1.850524902 12% 8 3.13989258 20% 5 59% Table 8 9.083007813 0.700500488 8% 13 1.18804932 13% 8 59% Table 9 6.918334961 0.898925781 13% 8 1.43572998 21% 5 63% Table 10 6.8359375 0.70690918 10% 10 1.28656006 19% 5 55% Table 11 6.801879883 0.730773926 11% 9 1.22528076 18% 6 60% Table 12 5.343933105 0.930297852 17% 6 1.50109863 28% 4 62% Table 13 4.3828125 0.579589844 13% 8 0.96081543 22% 5 60% Table 14 2.6171875 0.157653809 6% 17 0.24041748 9% 11 66% Table 15 2.5 0.128295898 5% 19 0.22894287 9% 11 56% Table 16 2.088867188 0.115844727 6% 18 0.20172119 10% 10 57% Table 17 1.5 0.181213379 12% 8 0.30102539 20% 5 60% Table 18 1.4375 0.122924805 9% 12 0.24169922 17% 6 51% Table 19 1.1875 0.111816406 9% 11 0.20196533 17% 6 55% Table 20 0.853515625 0.10949707 13% 8 0.14990234 18% 6 73%

Page 62: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Stats – With and Without Histograms

62

Query Baseline (default gather) After stats gathered, forcing histograms (METHOD_OPT => 'FOR ALL COLUMNS SIZE 254‘)

After stats gathered, no histograms (METHOD_OPT => 'FOR ALL COLUMNS SIZE 1‘)

Query 1 +00 00:04:18.069513 +00 00:04:14.479659 +00 00:04:12.718458 Query 2 +00 00:00:56.012300 +00 00:00:55.937563 +00 00:00:56.399873 Query 3 +00 00:12:03.440581 +00 00:06:32.384087 +00 00:08:15.304644 Query 4 +00 00:06:40.072502 +00 00:07:23.185157 +00 00:06:32.371709 Query 5 +00 00:24:59.584111 +00 00:24:52.754278 +00 00:24:43.723337 Query 6 +00 00:00:43.457988 +00 00:00:43.332688 +00 00:00:43.154288 Query 7 +00 00:00:01.825817 +00 00:00:01.735274 +00 00:00:01.684231 Query 8 +00 00:00:08.743574 +00 00:00:09.016219 +00 00:00:06.635865 Query 9 +00 00:00:01.911499 +00 00:00:01.268445 +00 00:00:01.816548 Query 10 +00 00:00:04.326782 +00 00:00:03.879728 +00 00:00:03.953567 Query 11 +00 00:00:07.088747 +00 00:00:09.095229 +00 00:00:07.355436 Query 12 +00 00:00:10.576206 +00 00:00:09.860471 +00 00:00:09.813917 Query 13 +00 00:00:30.217194 +00 00:00:29.858610 +00 00:00:24.543118 Query 14 +00 00:49:28.997130 +00 01:42:28.740533 +00 00:28:13.714565 Query 15 +00 00:03:26.125896 +00 00:03:30.789577 +00 00:03:07.800214 Query 16 +00 00:00:23.819626 +00 00:00:23.579613 +00 00:00:23.572567 Query 17 +00 00:04:23.784166 +00 00:03:49.886384 +00 00:03:49.866113 Query 18 +00 00:02:43.559441 +00 00:03:44.672787 +00 00:02:39.627688 Query 19 +00 00:00:52.545641 +00 00:00:52.185251 +00 00:00:51.524320 Query 20 +00 00:03:23.480463 +00 00:04:35.762286 +00 00:03:21.034797

Page 63: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Performance / Times – Auto DOP

63

Query Baseline Auto, DL 4, Force Local Auto, DL 8, Force Local Auto, DL 16, Force Local

Query 1 +00 00:04:12.718458 +00 00:04:21.658909 +00 00:04:16.253238 +00 00:04:15.601514

Query 2 +00 00:00:56.399873 +00 00:00:56.412281 +00 00:00:55.648165 +00 00:00:56.170821

Query 3 +00 00:08:15.304644 +00 00:08:17.698308 +00 00:08:12.276071 +00 00:08:17.515982

Query 4 +00 00:06:32.371709 +00 00:06:45.082731 +00 00:06:39.860725 +00 00:06:44.585730

Query 5 +00 00:24:43.723337 +00 00:24:57.991367 +00 00:25:09.900648 +00 00:24:50.583881

Query 6 +00 00:00:43.154288 +00 00:00:43.439962 +00 00:00:42.944854 +00 00:00:43.224442

Query 7 +00 00:00:01.684231 +00 00:00:01.691277 +00 00:00:01.610445 +00 00:00:01.715049

Query 8 +00 00:00:06.635865 +00 00:00:06.469042 +00 00:00:06.241034 +00 00:00:06.570801

Query 9 +00 00:00:01.816548 +00 00:00:01.840649 +00 00:00:01.748562 +00 00:00:01.765074

Query 10 +00 00:00:03.953567 +00 00:00:03.933501 +00 00:00:03.754602 +00 00:00:03.810805

Query 11 +00 00:00:07.355436 +00 00:00:09.900409 +00 00:00:09.615272 +00 00:00:09.421509

Query 12 +00 00:00:09.813917 +00 00:00:09.482519 +00 00:00:09.710079 +00 00:00:09.892875

Query 13 +00 00:00:24.543118 +00 00:00:15.723340 +00 00:00:18.721824 +00 00:00:20.151063

Query 14 +00 00:28:13.714565 +00 00:33:37.034347 +00 00:33:25.797660 +00 00:33:33.464341

Query 15 +00 00:03:07.800214 +00 00:03:28.656200 +00 00:03:43.276557 +00 00:03:27.005350

Query 16 +00 00:00:23.572567 +00 00:00:15.264801 +00 00:00:16.036919 +00 00:00:15.330395

Query 17 +00 00:03:49.866113 +00 00:04:04.479679 +00 00:04:11.732845 +00 00:04:08.134070

Query 18 +00 00:02:39.627688 +00 00:02:45.298123 +00 00:02:45.883673 +00 00:02:45.424077

Query 19 +00 00:00:51.524320 +00 00:00:15.818625 +00 00:00:13.755202 +00 00:00:13.746067

Query 20 +00 00:03:21.034797 +00 00:02:57.888175 +00 00:02:57.619376 +00 00:02:57.591926

Page 64: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Performance / Times – DOP Hint

64

Query Initial Run all parallel 24 all parallel 16 all parallel 8 all parallel 6 all parallel 4

Query 1 +00 00:04:16.672367 +00 00:00:25.938898 +00 00:00:26.449299 +00 00:00:34.633897 +00 00:00:43.261151 +00 00:01:01.939329

Query 2 +00 00:00:55.772976 +00 00:00:14.462675 +00 00:00:14.936694 +00 00:00:14.448895 +00 00:00:13.395207 +00 00:00:15.068520

Query 3 +00 00:08:11.134460 +00 00:04:02.388953 +00 00:03:55.116240 +00 00:03:41.027493 +00 00:01:57.306787 +00 00:03:29.615242

Query 4 +00 00:06:41.349366 +00 00:00:39.683390 +00 00:00:44.865423 +00 00:01:26.220494 +00 00:01:37.733822 +00 00:02:08.823786

Query 5 +00 00:24:28.216851 +00 00:01:30.732530 +00 00:02:04.158495 +00 00:02:28.018871 +00 00:04:15.199074 +00 00:05:43.970349

Query 6 +00 00:00:43.078030 +00 00:00:07.565337 +00 00:00:07.655455 +00 00:00:09.543481 +00 00:00:10.952646 +00 00:00:14.283483

Query 7 +00 00:00:01.686269 +00 00:00:00.747075 +00 00:00:00.558309 +00 00:00:00.679023 +00 00:00:00.272988 +00 00:00:00.818093

Query 8 +00 00:00:06.936982 +00 00:00:02.711387 +00 00:00:02.898431 +00 00:00:03.273516 +00 00:00:02.385028 +00 00:00:04.060303

Query 9 +00 00:00:01.827659 +00 00:00:00.686610 +00 00:00:00.460302 +00 00:00:00.474744 +00 00:00:00.377457 +00 00:00:00.681739

Query 10 +00 00:00:03.920712 +00 00:00:02.807461 +00 00:00:02.467666 +00 00:00:02.347095 +00 00:00:01.599735 +00 00:00:02.430706

Query 11 +00 00:00:07.290813 +00 00:00:09.330830 +00 00:00:10.142888 +00 00:00:09.091451 +00 00:00:04.770838 +00 00:00:06.636206

Query 12 +00 00:00:09.591401 +00 00:00:09.964417 +00 00:00:09.468584 +00 00:00:08.968741 +00 00:00:10.821744 +00 00:00:09.175446

Query 13 +00 00:00:24.930160 +00 00:00:23.852529 +00 00:00:17.268112 +00 00:00:17.450359 +00 00:00:15.524086 +00 00:00:15.191239

Query 14 +00 00:27:57.732468 +00 00:09:06.884239 +00 00:09:37.026665 +00 00:12:55.954573 +00 00:20:28.144051 +00 00:20:36.553182

Query 15 +00 00:02:27.918861 +00 00:01:42.524856 +00 00:01:42.352751 +00 00:01:48.366352 +00 00:01:11.173652 +00 00:02:05.916376

Query 16 +00 00:00:14.481427 +00 00:00:10.335290 +00 00:00:10.449666 +00 00:00:12.522837 +00 00:00:08.010176 +00 00:00:11.036114

Query 17 +00 00:03:39.617132 +00 00:01:09.896561 +00 00:01:12.347061 +00 00:01:15.938717 +00 00:01:03.228098 +00 00:01:34.219806

Query 18 +00 00:02:38.285368 +00 00:02:05.803792 +00 00:02:12.261332 +00 00:02:44.436150 +00 00:03:18.629559 +00 00:05:26.549267

Query 19 +00 00:00:52.881317 +00 00:00:13.131290 +00 00:00:13.126071 +00 00:00:13.192107 +00 00:00:14.278196 +00 00:00:15.489964

Query 20 +00 00:03:23.152004 +00 00:02:30.883228 +00 00:02:35.797302 +00 00:03:28.339995 +00 00:04:08.225930 +00 00:05:56.041736

1 hr 27 min 24 min 26 min 32 min 40 min 50 min

Page 65: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

alter system set parallel_adaptive_multi_user=FALSE scope=SPFILE; alter system set parallel_degree_limit=16 scope=SPFILE; alter system set parallel_degree_policy=LIMITED scope=SPFILE; alter system set parallel_force_local=TRUE scope=SPFILE; alter system set parallel_min_time_threshold=30 scope=SPFILE; alter system set "_OPTIMIZER_USE_FEEDBACK"=FALSE scope=SPFILE;

truncate table resource_io_calibrate$;

insert into resource_io_calibrate$ values (CURRENT_TIMESTAMP,CURRENT_TIMESTAMP, 0,0,300,0,0);

Parameters at Go-live

65

Page 66: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Most stock indexes were necessary, even FKs were needed (contrary to the HS GBU's recommendation), but a lot of our custom indexes (including bitmaps) unecessary and even impairing performance in some cases

Compression got some fantastic disk space savings, but performance lagged uncompressed in many cases, and was only better in one test case

Gathering stats with histograms caused some performance degradation in our test queries, went live without using histograms

Auto DOP had very little impact on queries (not picking up), but forcing parallelism with hints brought significant results, so the power of parallelism is there, perhaps Auto DOP may be better with this in 12c?

Overall daily incremental load was reduced from 5.5+ hours to 3.5 hours, further improvements down to almost 3 hours and still tuning, have seen problems in Informatica that may be holding us up from better performance

Results

66

Page 67: MD Anderson: Using Oracle Exadata in Cancer Research ... · Customer Case Study: Using Oracle Exadata in Cancer Research [E4 2014 Tue 3:30PM] 1. ... IBM WODM Rules Engine takes preliminary

Questions?

67 www.mdanderson.org

[email protected]