Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of...

April 25th 2018

Automatic

optimization of

predictive Bioactivity

models

Chi Chung Lam, Fabian Steinmetz, Paul Czodrowski

Multiple models trained for biological targets

Random Forests

Neural Networks

Gradient Boosted Trees

NNs and GBTs are very sensitive to hyperparameter changes

Automated ways needed to build models with the right hyperparameters

Predictive Models in Production

millions of unique

combinations possible3

NN Architectures & Hyperparameters

NN-Architecture

• Layer-Type

• Number of Layers

• Neurons per Layer

• Activation-Functions

Training-Parameters

• Optimizer

• Learning-Rate

• Weight-Decay

• Batch-Size

• Loss-Function

• …

Hyperparameters

Guido Bolick: Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016

Genetic Algorithm for hyperparameter optimization

Genetic Algorithm Workflow

Comparing Global Models

Model Description

RF Random Forest with fixed hyperparams

Leiden DNN DNN with fixed hyperparams

GA DNN DNN with GA optimized hyperparams

Random DNN DNN with grid search optimized hyperparams

Feature-Wise Baseline Model that takes the fingerprint bit as prediction

XGBoost Gradient Boosted Trees with fixed hyperparams

Assume that each fingerprint bit is a prediction, and select the best bit

Feature-Wise Baseline

Bit 0 Bit 1 Bit 2 Bit 3 Activity

Sample 1 1 0 0 1 0

Sample 2 1 0 0 0 0

Sample 3 1 1 1 1 1

Sample 4 1 1 1 0 1

Sample 5 0 0 1 1 0

Kappa score 0.41 1.00 0.67 -0.17

Global Model Performance

CACO CLINT_H CLINT_M CLINT_R HERG SOL

Kappa S

Target

Global Model Performance

RF Leiden DNN GA DNN Random DNN Feature-Wise XGBoost XGBoost Random

GA vs Random Search Comparison

Mean kappa score increases as GA evolution occurs

However, good solution is found too easily (already found in initial 100 architectures)

A random search of the same search space finds a similar or better solution

Fingerprints hash a molecule’s substructures into a fixed bit

A small fingerprint size will cause “collisions”

A large fingerprint size will cause many redundant bits

Fingerprint Filtering: CLINT_R

FP Size 1024 4096

Avg substructures per bit 79.84 20.64

0.01 variance filter 3 2388

Substr/bit after 0.01 var filter 80.00 21.86

True size after 0.01 var filter 1021 1708

Feature-selection of fingerprints by variance

Control: unfiltered FP of same length as filtered FP

Problem: Arbitrary choice of threshold variance

Fingerprint Filtering: CLINT_R

0.01 var Control 0.0 var Control Unfiltered

Mean K

appa S

CLINT_R 1024 Bits Filtering

DNN RF XGB

0.01 var Control 0.0 var Control Unfiltered

Mean K

appa S

CLINT_R 4096 Bits Filtering

DNN RF XGB

Finding the optimal variance: CLINT_R

0.01 var 0.0 var Optimal Var Unfiltered

Mean K

appa S

CLINT_R Optimal Var Filtering

Finding the optimal variance: HERG

0.01 var 0.0 var Optimal Var Unfiltered

Mean K

appa S

HERG Optimal Var Filtering

Fingerprint Filtering: Problems

Variance of bits highly depends on sample size

Use threshold that is relative to sample size, instead of absolute value

Can we combine this filtering with the “feature-wise baseline” analysis?

Drop fingerprints that correlate poorly with dependent variable?

Nested Cluster Validation

The final models are used in production and served to chemists, etc.

Retraining occurs every 3 months

During these three months, models are “outdated”

Retraining more frequently is time-wise impractical

XGB and DNNs allow “On-line” updating

Fit new data during an additional training step of existing models

Can happen nearly real-time

Retraining only necessary when performance starts declining

On-line Updating of Models

Our in house environments: CREAM and MOCCA

CREAM (Classification REgression At Merck)

- Python environment and modelling tool

- Used for the majority of predictive models

- Holds versatile features, such as

- Multiple machine learning algorithms

- Different validation methods

- Interface to MOCCA

MOCCA is the Merck Online Computational Chemistry Analyzer, our

web-based in-house prediction tool

Global models

• Large Dataset

• Large Applicability Domain (AD)

• Endpoints, such as

• Physico-chemical Properties

• Pharmacokinetics

• Toxicity

• General Selectivity

Global vs. local models

Local models

• Smaller Dataset

• Smaller Applicability Domain

• Endpoints, such as

• Activity

• Selectivity

• Toxicity, Pharmacokinetics

Generally global models are preferrable dueto greater in-house modelling experience andlarger AD, but we are happy to supportprojects with local models if needed.

• Chi Chung Lam

• Wolf-Guido Bolick (Andreas Dominik)

• Fabian Steinmetz

• Kristina Preuer, Günter Klambauer (Sepp Hochreiter)

• Friedrich Rippmann

• Marcel Baltruschat

• Cornelius Kohl

• Samo Turk

• Jan Fiedler

• Christian Röder

Acknowledgement

back-up

SET Train Test Classes

CACO 9637 523 3

CLINT_H 16264 797 3

CLINT_M 18313 981 3

CLINT_R 15910 760 3

HERG 6894 288 2

SOL 19615 667 3

Datasets

millions of unique

combinations possible25

NN Architectures & Hyperparameters

NN-Architecture

• Layer-Type

• Number of Layers

• Neurons per Layer

• Activation-Functions

Training-Parameters

• Optimizer

• Learning-Rate

• Weight-Decay

• Batch-Size

• Loss-Function

• …

Hyperparameters

Optimization of Hyperparameters

Expert Lucky People Everyone

Hyperparameters derived

from literature & experience

Hyperparameter search

within promising parameter

Random-Search (Bergstra et al. 2012)

Grid-Search (Larochelle et al. 2007)

Probability based algorithms (Brochu et al. 2010, Bergstra et al. 2011)

Directed Random-Search

(e.g. genetic algorithms)

What is a Genetic Algorithm?

Validation Strategies

• Use as much data as possible for training

• Being able to get a realistic glimpse of the

performance

• 5-fold cross-validation

• Every compound represented in 4/5 models

• Hyperparameter optimization to increase

performance of validation sets

• Resulting performance trustworthy ?!

• 5-fold nested cross-validation 25 models

• Every compound represented in 16/25 models

• Increased computational requirements

• 5x Hyperparameter optimizations to increase

performances of validation sets

• Final performances evaluated using

corresponding outer loop test sets

Getting a job (hyperparameters) from the jobserver

Repeat for all training/test sets:

Building of a NN based on hyperparameters

Training of the NN using a training set

Balanced-Batch-Generator maintains the same active/inactive-ratio within a batch

Early-Stopping, when mean validation-loss of sliding window (15 epochs) does not

improve for 100 epochs

Evaluation of best state (center of best window)

using validation set, metric Cohen’s Kappa

Training of a NN

Agreement of labels vs. prediction

Agreement of 2 random observers

So many parameters..

Genetic Algorithm

• Population-Size: 100

• Workers: 10

• Fingerprint-Size:

• Smarts-Patterns:

• Evolution-Strat.:

Drop-Worst-50%

Mutation Settings

• Default:

• Mutation-Rate: 5%

• Mutation-Strength: 1

• Crossing-Over-Rate: 30%

• Increased:

• Mutation-Rate: 10%

• Mutation-Strength: 2

• Crossing-Over-Rate: 30%

Training

• Optimizer: sgd, rmsprop, adagrad, adadelta, adam, adamax, nadam

• Loss-Functions: mae, mse, msle

• Learning-Rate: 0.05, 0.1, 0.5, 1.0

• Weight-Decay: 0.0, 1E-7, 5E-7

• Momentum: 0.0, 0.1, …, 0.9

• Nesterov: 0, 1

• Batch-Size: 5%, 6%, …, 20%

Architecture

• Layers: 1-4

• Layer-Types: Dense, Dropout

• Neurons: 32, 64, …, 512

• Dropout-Ratio: 5%, 10%, …, 90%

• Activation-Functions: linear, sigmoid, hard-sigmoid, softmax, relu, tanh

Datasets

Dataset hERG Micronucleus-Test

Compounds 6999 798

Actives 3205 (46%) 263 (33%)

Inactives 3794 (54%) 535 (67%)

Binary Classification: Inactive 0

Active 1

Found NN-Hyperparameters

Improvement of NNs while running the GA

Initial population starts with inner-

kappa values of ~0.6 in all splits

GA is able to improve performance of

best entities even more (red line)

Mutations can lead to bad performing

entities (blue line) until the last

generation

Novelty of Architectures

Proportion of new entities in population

decreases during the runtime of the GA

Higher mutation-rate (red line) increases

the searchable space for the GA

Influence of Hyperparameters

1_activation (344)

First hidden

Activation-function

of this layer

Number of

contributing pairs

Contributing pairs only differ by

the shown parameter

Boxplots are based on the

absolute difference of both inner-

kappa values of all contributing

User-Interface

Implemented an algorithm to create a consensus-model using 5-fold nested cross-validation

Each compound is represented in 16 of 25 NNs

Calculation needs 8-14 hours (e.g. during a night) using a GTX-Cluster

GA improves already high kappa values of NNs even more

Kappa values of final NN-models are mostly larger than 0.5 (“moderate” according to Landis et al. 1977)

Further steps:

Possibility to use chemical descriptors and multiple fingerprints

Option to create multi-class models (more classes than just 0 and 1) and regression models

(Polishing up and writing a paper)

Conclusion

Implementation of the GA

Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of...

Documents

Transcript of Automatic optimization of predictive Bioactivity models · Guido Bolick: Automatic Generation of...

- IMPACTS OF HUNTING IN FRANCE - ECONOMIC ......- IMPACTS OF HUNTING IN FRANCE - ECONOMIC, SOCIAL AND ENVIRONMENTAL STUDY BIPE - 2015 FACE – 27.09.2016 2 To analyse data in order

1972 Tuscaloosa High School Black Bears Buzz Bookccpedagogy.com/tusc high/1972 Tuscaloosa High School Black Bears... · 1972 Tuscaloosa High School Black Bears Buzz Book ... BOLICK

be finnovative Fintech Trends & Technology Strategy Open Day 27.09.2016

with commercial flooring - Bolick Distributorsbolick-distributors.com/sitebuildercontent/site... · Blueprint Commercial Flooring Our research and design team closely tracked the

Air Quick Connect Couplings · 2019. 1. 3. · SFP RECTUS CEJN AMFLO ARO FOSTER HANSEN WEATHERHEAD Automatic Automatic Automatic Manual Automatic Automatic Manual Automatic Series

27.09.2016 Tensu & Stenprojekti/Inspections.pdf · Inspections and Reviews 27.09.2016 Tensu & Sten TUT / PERV COMP TIE-13106 28.9.2016 1

Research & Economic Development Office of Grants and Contracts Administration Data Security Presented by Debbie Bolick September 24, 2015.

BOLICK, Jonathan Syllabus Spring Quarter Tue.2 112 100101

Rapid Clic TM - Bolick Distributorsbolick-distributors.com/sitebuildercontent/sitebuilderfiles/ew1lwb... · or retail setting that need a new look. LinkWerksTM is the perfect ...

Allen Hospital MRI Open House - UnityPoint Health · IP Rehab Social Worker; Jodi Boyd, IP Rehab Recreation Therapist; Kate Bolick RN-BSN, RN Neurology Navigator; Maria Chao-Shepard,

Automatic Generation of Neural Network Architectures Using a … · 18 Automatic Generation of Neural Network Architectures Using a Genetic Algorithm | 27.09.2016 Implemented an algorithm

Electron Microscopy II - ETH Z · 2016-10-07 · 27.09.2016 2 Development of the First Transmission Electron Microscope Knoll, Ruska, Z. Phys. 78 (1932) 318 1927 Hans Busch: Electron

Automatic AUTOMATIC

MOTHER MARIA KAUPAS CENTER - WordPress.com · MOTHER MARIA KAUPAS CENTER CONTACT INFORMATION ... Angelina Duda, Jennifer Bolick and Zachary ... Reifer, Rob Williams, Ginny Duncan,

Performance Qualification of a Novel Subcutaneous Insulin ... · Performance Qualification of a Novel Subcutaneous Insulin Infusion Set using Medical Imaging Natasha G. Bolick, MS1,

SPEAKERS: Patrick H. Flanagan (704) 940-3419 pflanagan@cshlaw.com Norwood P. Blanchard (910) 332-0944 nblanchard@cshlaw.com Ryan D. Bolick (704) 940-3416.

GSP ANTARES needitabil 27.09.2016 · Title: GSP_ANTARES needitabil 27.09.2016.cdr Author: Diana LALU;[UGM] Created Date: 9/27/2016 9:15:50 AM

be finnovative Programme Overview Open Day 27.09.2016

bolick-distributors.combolick-distributors.com/sitebuildercontent/sitebuilder...MOODS by Interceramic MOODS by Interceramic is basically a concept that acts much the same as a mood

users.belgacom.netusers.belgacom.net/Bolick-crew/449 - First Hand Account.pdfZune 7 ere inoperative. of later the e. g. aaid he out of ammunition, Leaving the two valet gunners and