Enabling Research through the deployment of Accessible ... · Accessible Data Visualization and...

Post on 13-Oct-2020

2 views 0 download

Transcript of Enabling Research through the deployment of Accessible ... · Accessible Data Visualization and...

1 Confidential

Scientific Advisory Board Meeting

July 2015

Enabling Research

through the deployment of

Accessible Data Visualization and

Analysis Tools

Jeremiah Degenhardt

Jeremiah Degenhardt

Research Bioinformatics

Gilead Sciences

2 Confidential

Gilead Sciences

Gilead Sciences, Inc. is a research-based biopharmaceutical

company that discovers, develops and commercializes

innovative medicines in areas of unmet medical need. We strive

to transform and simplify care for people with life-threatening

illnesses around the world. Gilead's portfolio of products and

pipeline of investigational drugs includes treatments for HIV/AIDS,

liver diseases, cancer, inflammatory and respiratory diseases, and

cardiovascular conditions.

3 Confidential

Gilead Research Bioinformatics

Analysis: To provide bioinformatics data

analysis, experimental design and consultation

support to research project teams and nonclinical

sub-teams

Infrastructure: To enable researchers to get

access to public and commercial bioinformatics

tools, genomic technologies and genomics

datasets and for the analysis of in-house

datasets

4 Confidential

Bioinformatics data analysis support

Evaluate targets/combinations

Identify/evaluate candidate biomarkers

Generate hypothesis for MOA

Understand disease mechanisms

Characterize model systems

5 Confidential

Tools

Expression microarray data

Next-generation sequencing (NGS)

Targeted assays

Proteomics

6 Confidential

WGS and Exome data

Used in the identification of

multiple features

SNPs and SNVs

Somatic mutations

Insertions and deletions

Copy number changes

Structural changes

Provides data on 10s to

millions of mutations at a

time

7 Confidential

RNAseq

Quantifies expression

across the whole

transcriptome

– Identify differential

expression between

indications and

treatments

– To evaluate

mechanism of action

for therapeutics

Provides data on

more than 20,000

genes at a time

8 Confidential

Public data

• Public dataset provide a large amount of data

• Useful for the generation or validation of

hypotheses

TCGA

9 Confidential

Miscellaneous data types

Many other assay types for

evaluating biological data

While many assays provide

less data that whole genome or

transcriptomes, it is still a large

amount of data

10 Confidential

Combining Data

Mechanism of Action Mechanism of Resistance

Data mining for hypothesis generation

11 Confidential

From data to interpretation

• Genomic assays generate a large

amount of data

• This data is delivered specialized

files, text files, or spreadsheets

• Scientists need tools to parse the

data and then transform it into

something useful

How do we go from to

12 Confidential

Tools for data analysis

Tools must be simple to use

Data and results must be robust

Tools must also be flexible

13 Confidential

Spotfire

Early pipelines

attempted to combine

R and spotfire to

provide complete data

access

Results were variable

– Workflows were not

robust

– Difficult to maintain

– Tools were not

simple

14 Confidential

Tools for data analysis

Tools must be simple to use

Data and results must be robust

Tools must also be flexible

We have settled on combination of tools

– Command line pipelines

– R

– Spotfire

– Shiny

– Javascript

15 Confidential

Current setup

Command line tools and

pipelines for heavy lifting

Spotfire for some data

deliveries and 1-off

questions

Shiny for applications

and more complex data

delivery

R statistical analysis and

data distillation

16 Confidential

Building robust pipelines

Identify aspects of analysis

that are consistent across

questions/data set

Build reusable pipelines to

handle these aspects

Use pipeline and version

control systems

Format outputs and data

summaries for easy ingestion

into next steps

Identification of Relapse mutations

17 Confidential

Spotfire

Still using Spotfire to

deliver final data for

many projects

Spotfire is a powerful

tool for summarized

data and mid-sized

datasets

Curve fitting for EC50 values

18 Confidential

Spotfire

Still using Spotfire to

deliver final data for

many projects

Spotfire is a powerful

tool for summarized

data and mid-sized

datasets

Initial learning curve for

researchers but then

provides easy access

to data

Heat maps for clustering expression profiles

19 Confidential

Shiny

Flexible and powerful tool for building data-rich web

pages/apps

Seamlessly integrate R and javascript

Extensive documentation and examples

20 Confidential

Shiny

Flexible and powerful tool for building data-rich web

pages/apps

Seamlessly integrate R and javascript

Extensive documentation and examples

Currently utilized to create in-house pages for data

display

21 Confidential

Combing tools

Biologic datasets are large and complex

Research biology questions are complex and

variable

Data analysis and visualization tools need to be

flexible and also robust for answering these

questions

Combining tools solves several issues

– Robust pipelines for initial data processing

– Flexible analysis environment for processing

intermediate data

– Multiple platforms for data visualizations and data

delivery

22 Confidential

Acknowledgements

Peng Yue

Li Li

Aaron Arvey

Ricardo Ramirez

23 Confidential

Thank you!