Agenda - ELIXIR · ELIXIR Innovation & SME event March 19, Wageningen Jakob de Vlieg, VP - Sr....
Transcript of Agenda - ELIXIR · ELIXIR Innovation & SME event March 19, Wageningen Jakob de Vlieg, VP - Sr....
20/03/2015
1
Computational Life Sciences Future of R&D in the big data era
ELIXIR Innovation & SME event
March 19, Wageningen
Jakob de Vlieg, VP - Sr. Expert Lead
Agenda
why do we need it?
A definition of CLS
Introducing Bayer Health Care and CropSciences R&D
Bayer CropSciences (BCS) strategy
Role of CLS to support integrative & data-driven R&D
Some real life CLS examples from CropSciences
Data! Data! Data! I can't make bricks without clay
- Sherlock Holmes
20/03/2015
2
CLS: Computational Life Sciences
A Definition:
• Improving R&D decision making and execution based on data-
driven insights
• Combining and analyzing data from multiple sources
• Participating in the value creation
• CLS is linked
with all data-
driven R&D
activities
• CLS does not
fit in a “box”
R&D Bayer HealthCare Drug Discovery at a Glance
• Bayer HealthCare Overview • March 2014 Page 4
• Hematology
• Biologics Research & Development
• Science Hub US
• Oncology
• Gynecological Therapies
• Common Mechanism Research
• Cardiology / Hematology
• Biologics Research& Development
• Common Mechanism Research
• Animal Health Research
• GDD Innovation Center China
• Bayer Tsinghua Center for Innovative Drug Discovery
Investment in R&D 2013: € 2 billion
Investment in R&D of 2013 total net sales: 10.8%
R&D Employees 2013: 7,800
Berkeley /
San Francisco
Berlin Wuppertal / Cologne /
Monheim
Beijing
Computational life sciences
External collaborations: e.g. Broad Institute and Bayer joined
forces to develop novel treatment options in cancer therapy
20/03/2015
3
Global network of R&D sites is supplemented by more than 130 field testing stations
Singapore
Saskatoon
Davis
Lubbock
RTP / Morrisville
Hyderabad
Sao Paulo
Gent Haelen
Monheim Frankfurt Lyon
Sophia
R&D Bayer CropSciences: R&D network
encompasses ~4400 employees in 13 major hubs
• R&D investment of ca €880 million in 2013; ~4400 R&D employees
• Full year sales: € 8,819m
• Total BCS: 22,400 employees; >120 countries
External collaborations: e.g. Targenomix (spin-off of Max Planck Institute) to discover new MoA and
active ingredients to increase agricultural productivity
Leveraging synergies by developing combined
life science technology platforms for Health
Care and Crop Sciences
Ion channel
platform
Computational Life Sciences to manage, analyze, combine and re-use data
to create scientific breakthroughs and business value
20/03/2015
4
Bayer CropSciences: an integrated approach
across Small Molecules, Seeds and Biologics to
develop integrated crop solutions
Breeding & traits: adjusting the
plant characteristics to improve
quality & yield
Crop Protection: Improving plant
health by protecting the crops
against external stress factors
Breeding
& Traits
Chemistry Biologics
Yield
Quality
Customer Needs
Integrated crop solutions:
Leverage the synergies across
all R&D areas
Page 7 • BCS Info Days • June 2014 Computational life sciences
Page 8 • Scorecard • 2014
A particular CropScience challenge:
Huge diversity of species
Mammals (human safety) Fish / …
(tox models)
Center graphic generated at http://itol.embl.de/, Letunic and Bork, Bioinformatics 23(1) 2006
Photos from http://www.cropscience.bayer.com/en/Crop-Compendium.aspx
Oomycetes (plant pathogenes)
Insects (pests; beneficials)
Ascomycetes (plant pathogenes)
Basidiomycetes (plant pathogenes)
Nematodes (plant pests)
Plants (weeds; crops; seeds, traits)
Bacteria (traits; biologics; diseases)
Enormous diversity of
relevant species
Only for a small fraction,
(proper) genome
information is available
Huge opportunities for
Computational Life
Scientists
20/03/2015
5
Source: Illustration from CropLife America, adapted by Bayer CropScience
… this would be the land
available for agriculture (1.5 billion hectares / ~ 3%)
High need for innovation in CropSciences to
deliver safer & more effective next generation
products
• BCS Info Days • June 2014 Page 9
If this was the Earth… (Surface of 50.9 billion hectares)
Computational life sciences
A global challenge to meet food supply
*Source: United Nations Available farmland per person is expected to decrease dramatically
At the other hand … need to increase food production by 70% to meet
demand by 2050
World population*
Arable land
per person
1950 2000 2050
0.52 ha 0.26 ha 0.19 ha
2,529,346,000 6,115,367,000 9,600,000,000
20/03/2015
6
We need constant innovation to deliver safer
& more effective next generation products
Growing pressure on resources
• Crops needed for food,
feed, fiber and renewable
raw materials
• Water scarcity
The complexity of sustainable agriculture and the interdependencies
of various factors increase the need for innovation
Climate Change
• Understand the crops
under challenging
environment
Understand practical
agriculture
• Across the world
• Across the seasons
Growing world population • Increasing food &
energy demand
• Decreasing farmland
per capita
• Increasing market
volatility
Increasing Regulatory
demand
• Increasing standards
• Lack of harmonization of
regulatory requirements
Page 11 • BCS Info Days • June 2014
The Success Story...
The success of the Roundup®
Ready system and the nearly
exclusive reliance on glyphosate
for weed control…
… this caused massive development
of weed shifts and resistance...
dose rate
But Nature Strikes Back
x 2 x 16 x 4 x 8
1996 2008
Page 12
Today
Limited Number of Modes of Action leads to
an increased risk of resistance worldwide
20/03/2015
7
Page 13 • Scorecard • 2014
CLS will help exploring & connecting massive datasets
to bring unprecedented innovation power
Overview of key CLS technologies
Genotype Phenotype
Environment
Data Sciences
Question-
based
R&D
Enable a data-intensive, multidisciplinary
research process: technology & culture
Cross-type data
integration (structured
& unstructured data)
Data-driven &
multi-models
simulations
High Performance
Computing: connected
computers & fast
networks
Visualization & analytics
(e.g. experimental design,
text mining, pattern
recognition, algorithms)
Key CLS technologies
Computational life sciences
Technology trends in agriculture; digital farming
producing enormeous amounts of data
• Computational Life Sciences at Bayer • Project Kickoff • June 27, 2014 Page 14 Source AT Kearny
20/03/2015
8
Page 15
Chlorophyll stress can be measured to identify
diseases
Source: Mriya Agroholding, Ternopil Oblast, field 109-A, 971,2 ha, Winter Wheat, Odessa 1432 variety sown 30.09.2012
Page 16 16
Yield from combine
Stratego YLD As-
Applied from
Sprayer
NDVI Imagery
(0.5M resolution)
NIR Imagery
(0.5M resolution)
Elevation
from NED
• Source N. Hummel, BCS US
Powerful visualisation and innovative
data sciences solutions needed
to create value
20/03/2015
9
Chemistry
Seed Bank
Environment Data
Seed Bank
Germplasm
Breeding Pops & Materials
Pedigree
Phenomics
Yield & Components
Quality Traits
Plant Health Traits
Omics
Genomics
Transcriptomics
Proteomics
Metabolomics
Epigenomics
Environment
Weather
Soil Type
Crop Modeling
Phenomics Data
Omics Data
Predictive
Analytics
“Big Data” Layer 1. Data integration
Capture heterogeneous datasets
Data integrity, consistency, accessibility
2. Data & tools connection Connect and mine data for
better leads in trait and breeding
targets for small molecules & biologics,
Improved compound (de)selection
3. Cross-fertilization Cross-functional best-in-class algorithms
High performance computing environment
Computational Life Science in BCS – extract relevant information from big data
Chemistry
HT Screening
Chem. Processes
Struct. Biology
CLS example from Bayer CropSciences: bridging (structural) biology and (computational) chemistry
Example: Prediction and design of traits tolerant to
HPPD Inhibitor herbicides (pigment synthesis inhibitors)
“Research Techn.”
HPPD-X-ray structures with
different active HPPD
inhibitors
• Able to select 11 out of 1000+
HPPD sequences as candidates
tolerant to the herbicide
• Significantly reducing variants &
sequences to be tested
• Predicted sequences showed the
desired tolerance to the
herbicides in planta.
Structural bioinformatics
to identify residues
needed for binding and
protein conformation.
Gudrun Lange, Robert Klein, Michael Beck et al. CLS BCS R&D
HPPD = Hydroxyphenyl pyruvate dioxygenase
20/03/2015
10
Page 19 • Scorecard • 2014
CLS to identify broad spectrum fungicide
Biologics much quicker with less resources
Source: Dan Joo, CLS BCS Biologics R&D
• To Identify novel fungicide Biologics leads, ca. 400
Paenibacillus strains to be profiled
• Traditional approach require ~ 26 weeks based on a
full characterization for each strain
• CLS genome sequencing and annotation pipeline
significantly reduced time and workload
Approach
• Hundreds of strains could be narrowed down to 3
meaningful lead strains with desired properties
• Characterized with only a fraction of time and
resources
Outcome & Impact
Optimal balance between wet and
dry life sciences
CLS
Ontology
Platform
Ontologies and Controlled vocabularies
in CLS
Centralized management of the terminologies
used across functions within Bayer CropScience
Comparative
Genomics
Genome
Annotation Functional
Annotation
Germplasm
Contracts
Safety
Risk
Plant Quality
Assurance Traceability
Vectoring
Platform
Breeding
Platform
Acronym Ontology
TO Trait Ontology
GO Gene Ontology
LO Locations Ontology
SO Sequence Ontology
TAX Taxonomy of Species
ECO Evidence Code Ontology
EDAM EMBRACE Data & Methods Ontology
Acronym Controlled Vocabulary
XC Experimental Conditions Vocabulary
MU Measure & Units Vocabulary
TECH Technologies Vocabulary
PROV Providers Vocabulary
Vegetables
Small Molecules
(Molecular) Breeding
Trait Research
Biologics
Support Functions (Legal, IP, Stewardship)
SO GO
SO GO
SO GO
LO TAX
LO TAX
LO
Source: Yann-Francois Bizouerne & Erick Antezana, BCS R&D
20/03/2015
11
Modern graph database for efficient
storage of biological information
Multiple genome-scale approaches require effective ways to describe, connect and
organize our biological data
In-house graph database stores > 338M relevant Triples: “subject - predicate -
object” for mining and answering questions like:
• Which T. aestivum genes are orthologous to the gene
linked to my A. thaliana affymetrix identifier?
• Which genes are specifically expressed in the root?
• ... and are transcription factors also
responsive to drought?
• And so on…
Page 21 • CLS • September 2014
HPC
BioGrid Source “Ontologies and Semantic Web technologies
for knowledge discovery in Big Data” by E.
Antezana, CLS Ghent
Gene expression
profiling
Gene Passport / Wiki
Functional annotation,
protein domains, sub-
cellular localization, etc.
Comparative Genomics
Genetic maps
RNA-seq
Microarrays
Clustering
Co-expression
Genome Annotations
Intuitive & powerful Data Analysis
interfaces to boost decision making
20/03/2015
12
Pod Shattering is a Problem in
Canola
Pre-harvest release of seeds from mature pods towards the end of the season
Page 23 • CLS • September 2014
Canola pods have “pre-set breaking
points”
Natural & essential process, but
one that can drive canola farmers to
despair
When the pod is ripe it splits
open & then releasing the seeds
inside
E.g. by strong winds pods break
open too early….seeds fall to the
ground and no longer usable
Mechanical “Swathing” to prevent yield
loss by seed pods open prematurely
swathed field maturation - drying
harvesting
swathing
Page 24 • CLS • September 2014
20/03/2015
13
Resistance to pod shattering is an important
target trait
R
V S
DZ R
V
Transversal section
DZ : Dehiscence zone
R : Replum; S : Septum; V : Valve
From Dinneny et al. BioEssays 27:42–49
IPage 25 • CLS • September 2014
• Pods are composed of two halves (valves) that are held strongly together by a specialized
tissue
• When the pod is ripe the tissue disintegrates; splits open & releasing the black seeds inside
• Discovered that Transcription factor gene Ind (Indeshiscence) has a key role in tissue
disintegration
• Approach: make ind gene defective to reduce pre-release of the seed (by preventing the
formation of the lignified enb layer; dehiscence zone)
2 IND homeologs
genome
B.napus genome sequence
Bioinformatics and omics revealed 2
ind Homeologs
chromosome
location
A3 C3
Platform
Genome Annotation
Page 26 • CLS • September 2014
20/03/2015
14
Both, A3 and C3 ind gene relevant to modify pod shattering
chromosome
location
A3 C3
2 Functional - Highly Expressed ind
homeologs in Pod Valves
B.napus transcript atlas
RNA sequencing of: - 10 B. napus tissues - 12 developmental stages
Page 27 • CLS • September 2014
KO
ind1 ind2
WT
Genotype Shattering Relative yield
Optimizing the Ind1-Ind2 genotype is critical
to achieve the desired effect
Pod
phenotype
High
Limited
None
100%
100 - 117%
Not harvestable
Hyb
rid s
eed p
roduction
25% functional IND
A hybrid with this ind genotype cannot be produced as it would
require a 4 x KO parent which is not harvestable
Page 28 • CLS • September 2014
20/03/2015
15
A Single Amino Acid Substitution in the basic
Helix-Loop-Helix structure of IND provided the solution
ind1 ind2
Hybrid genotype
b b
b b
b b
IND is a transcription factor binding to DNA with a Helix-Loop-Helix motif
However, IND can only bind to the target DNA as a dimer
Parental genotypes
25% functional IND
Page 29 • CLS • September 2014
A Single Mutant Allele Solves a Big Issue in Canola
Pod Shatter Reduction Launched in Canada in 2014
Page 30 • CLS • September 2014
Opportunity to
harvest canola fields
at just the right time
without having to
swath them first
This gives the plants
enough time
to ripen & form
better-quality seeds
20/03/2015
16
Some key CLS challenges for data-
driven and integrative R&D
Page 31
• Simultaneously manage security and promoting sharing of data across
functions (key words: technology hopping, data-sharing culture, manage business risks)
• Digital scientists to bridge the gap between Data and R&D workflows
• Data stewardship & interoperability; make data available and usable
• New ways to automate data integration e.g. to manage time sensitive
data…to manage structured and unstructured data
• User-friendly interfaces for experts and non-experts
• Connect fast developing data science solutions with stable IT enterprise
architectures: balancing robustness and agility
• the FAIR principe: Findable – Accessible– Interoperable – Reusable
Acknowledgements
Erick Antezana
Bart Lambert
Laurent Viau
Yann Bizouerne
Steven Robbens
Jack van Handenhove
Stephane Bourot
Benjamin Laga
Joan Wong
Aurélie Defferrard
Xi Wang
Marc Cornelissen
Jens Hollunder
Eddy Beck
Rene van Schaik
Dan Joo
Gudrun Lange
Sabien Vulsteke
Henning Redestig
Gitta Erdmann
And many others
20/03/2015
17
Forward-Looking Statements
This presentation may contain forward-looking statements based on current
assumptions and forecasts made by Bayer Group or subgroup management.
Various known and unknown risks, uncertainties and other factors could lead to
material differences between the actual future results, financial situation,
development or performance of the company and the estimates given here.
These factors include those discussed in Bayer’s public reports which are
available on the Bayer website at www.bayer.com.
The company assumes no liability whatsoever to update these forward-looking
statements or to conform them to future events or developments.
Page 33 • CLS • September 2014
Thank you!
Page 34 • CLS • September 2014