1 Confidential
Scientific Advisory Board Meeting
July 2015
Enabling Research
through the deployment of
Accessible Data Visualization and
Analysis Tools
Jeremiah Degenhardt
Jeremiah Degenhardt
Research Bioinformatics
Gilead Sciences
2 Confidential
Gilead Sciences
Gilead Sciences, Inc. is a research-based biopharmaceutical
company that discovers, develops and commercializes
innovative medicines in areas of unmet medical need. We strive
to transform and simplify care for people with life-threatening
illnesses around the world. Gilead's portfolio of products and
pipeline of investigational drugs includes treatments for HIV/AIDS,
liver diseases, cancer, inflammatory and respiratory diseases, and
cardiovascular conditions.
3 Confidential
Gilead Research Bioinformatics
Analysis: To provide bioinformatics data
analysis, experimental design and consultation
support to research project teams and nonclinical
sub-teams
Infrastructure: To enable researchers to get
access to public and commercial bioinformatics
tools, genomic technologies and genomics
datasets and for the analysis of in-house
datasets
4 Confidential
Bioinformatics data analysis support
Evaluate targets/combinations
Identify/evaluate candidate biomarkers
Generate hypothesis for MOA
Understand disease mechanisms
Characterize model systems
5 Confidential
Tools
Expression microarray data
Next-generation sequencing (NGS)
Targeted assays
Proteomics
6 Confidential
WGS and Exome data
Used in the identification of
multiple features
SNPs and SNVs
Somatic mutations
Insertions and deletions
Copy number changes
Structural changes
Provides data on 10s to
millions of mutations at a
time
7 Confidential
RNAseq
Quantifies expression
across the whole
transcriptome
– Identify differential
expression between
indications and
treatments
– To evaluate
mechanism of action
for therapeutics
Provides data on
more than 20,000
genes at a time
8 Confidential
Public data
• Public dataset provide a large amount of data
• Useful for the generation or validation of
hypotheses
TCGA
9 Confidential
Miscellaneous data types
Many other assay types for
evaluating biological data
While many assays provide
less data that whole genome or
transcriptomes, it is still a large
amount of data
10 Confidential
Combining Data
Mechanism of Action Mechanism of Resistance
Data mining for hypothesis generation
11 Confidential
From data to interpretation
• Genomic assays generate a large
amount of data
• This data is delivered specialized
files, text files, or spreadsheets
• Scientists need tools to parse the
data and then transform it into
something useful
How do we go from to
12 Confidential
Tools for data analysis
Tools must be simple to use
Data and results must be robust
Tools must also be flexible
13 Confidential
Spotfire
Early pipelines
attempted to combine
R and spotfire to
provide complete data
access
Results were variable
– Workflows were not
robust
– Difficult to maintain
– Tools were not
simple
14 Confidential
Tools for data analysis
Tools must be simple to use
Data and results must be robust
Tools must also be flexible
We have settled on combination of tools
– Command line pipelines
– R
– Spotfire
– Shiny
– Javascript
15 Confidential
Current setup
Command line tools and
pipelines for heavy lifting
Spotfire for some data
deliveries and 1-off
questions
Shiny for applications
and more complex data
delivery
R statistical analysis and
data distillation
16 Confidential
Building robust pipelines
Identify aspects of analysis
that are consistent across
questions/data set
Build reusable pipelines to
handle these aspects
Use pipeline and version
control systems
Format outputs and data
summaries for easy ingestion
into next steps
Identification of Relapse mutations
17 Confidential
Spotfire
Still using Spotfire to
deliver final data for
many projects
Spotfire is a powerful
tool for summarized
data and mid-sized
datasets
Curve fitting for EC50 values
18 Confidential
Spotfire
Still using Spotfire to
deliver final data for
many projects
Spotfire is a powerful
tool for summarized
data and mid-sized
datasets
Initial learning curve for
researchers but then
provides easy access
to data
Heat maps for clustering expression profiles
19 Confidential
Shiny
Flexible and powerful tool for building data-rich web
pages/apps
Seamlessly integrate R and javascript
Extensive documentation and examples
20 Confidential
Shiny
Flexible and powerful tool for building data-rich web
pages/apps
Seamlessly integrate R and javascript
Extensive documentation and examples
Currently utilized to create in-house pages for data
display
21 Confidential
Combing tools
Biologic datasets are large and complex
Research biology questions are complex and
variable
Data analysis and visualization tools need to be
flexible and also robust for answering these
questions
Combining tools solves several issues
– Robust pipelines for initial data processing
– Flexible analysis environment for processing
intermediate data
– Multiple platforms for data visualizations and data
delivery
22 Confidential
Acknowledgements
Peng Yue
Li Li
Aaron Arvey
Ricardo Ramirez
23 Confidential
Thank you!
Top Related