Artificial neural networks and deep learning€¦ · Deep learning 29 Deep learning is the...

Artificial neural networks and

deep learning

Dr. Timur I. Madzhidov

Kazan Federal University, Department of Organic Chemistry

1

AI achievement

2

AI beats human in games Optical character

recognition – solved!

Image recognition better

than human

Speech recognition at

human levelHandwritten text

recognition at human level

Self-driving cars

© Timur Madzhidov

Olomouc, February 5, 2020

Deep learning and human intelligence

3 © Timur Madzhidov


Deep learning in chemistry

4

2012

Team, lead by G. Hinton, having no experience in chemoinformatics

and drug design won by a wide margin other teams in Merck

Molecular Activity Challenge

First time when “deep learning” came to chemistry

2014

Tox21 Data Challenge 2014 is won by other deep learning team

from JKU on almost all subcompetitions!

© Timur Madzhidov


Artificial neuron

5

Dendrites

(receive signal)

Cell body

(make decision)

Axon

(transfer signal)

© Timur Madzhidov


Artificial neuron

6

Dendrites

(receive signal)

Cell body

(make decision)

Axon

(transfer signal)

x1 x2

𝑤𝑖𝑥𝑖

w1 w2

Inputs

Transfer

function

x2

w3

Activation

function

y Output

© Timur Madzhidov


Signal

One neuron is simply MLR

7

x1 x21

𝑤𝑖𝑥𝑖

w1w2

w0(bias)

x3

w3

y

Linear activation

𝑓 𝑎 = 𝑎

Linear Regression

𝑦 = 𝑤𝑖𝑥𝑖

© Timur Madzhidov


Linear Regression

One neuron is Logistic Regression

8

x1 x2

𝑤𝑖𝑥𝑖

w1 w2

xx2

w3

y

w

wTx

Linear activation

𝑓 𝑎 = 𝑎

x1 x2

𝑤𝑖𝑥𝑖

w1 w2

x3

w3

y

Sigmoid activation

𝑓 𝑎 =1

1 + 𝑒−𝑎

Logistic Regression

𝑓 𝑎 =1

1 + 𝑒−𝒘𝑇𝒙

y =1

1+𝑒−𝒘𝑇𝒙

© Timur Madzhidov


Multilayered architecture

10

∑

x1 x2 x3

∑

x4 x5 x6 x7 x8 x9

∑

y

x

W1

ah = W1T x

h = f(ah)

∑Hidden

layer

Input

layer

Output

layer

W2

ao = W2T h

y = f(W2h)

© Timur Madzhidov


Functional layers

11

∑

x1 x2 x3

∑

x4 x5 x6 x7 x8 x9

∑

y

∑

Input

Linear

transformation

layer

Activation function

layer

Linear

transformation

layer

Activation function

layer

© Timur Madzhidov


Set network in popular frameworks

12

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

Input

Linear

transformation

layer

Activation function

layer

Linear

transformation

layer

Activation function

layer

Net = Sequential(

Linear(9, 3),

Sigmoid(),

Linear(3, 1),

Sigmoid()

)

In PyTorch:

© Timur Madzhidov


Important!

13

Non-linear activation function is essential for building

neural net! Otherwise it corresponds to simpler models.

Prove at home or ask me why after the lecture…

© Timur Madzhidov


Why multilayered architecture is better?

14

The deeper network is – the more complex dependencies it

could fit. More data is needed to avoid overfitting.

© Timur Madzhidov


Deep Learning and Big Data was born to make each other happy



How to train network?

16

Descriptors

How to train network?

17

∑

x1 x2 x3

∑

x4 x5 x6 x7 x8 x9

∑

y

∑

Property Prediction error(loss function)

Weights update

Weights update

Error minimization

W1

W2

© Timur Madzhidov


Loss function for regression

18

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

y pred

𝑀𝑆𝐸 =1

2

𝑖

𝑁

𝑦𝑖𝑝𝑟𝑒𝑑

− 𝑦𝑖𝑒𝑥𝑝

2

Linear activation

© Timur Madzhidov


Loss function for classification

19

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

Sigmoid activation

P(“1”|X) = p

Probability that object belongs to class “1”

𝑁𝐿𝐿 = −

𝑖

𝑁

[ 𝑦𝑖 ∙ log 𝑝𝑖 + 1 − 𝑦𝑖 ∙ log (1 − 𝑝𝑖)]

Negative log likelihood:

a.k.a. binary cross-entropy (BCE)

© Timur Madzhidov


How to update weights?

20

Error

Weight value

© Timur Madzhidov


How to update weights?

21

Error,

E

Weight value, w

w1

tg 𝛼 =𝜕𝐸

𝜕𝑤

+-𝜕𝐸

𝜕𝑤𝜕𝐸

𝜕𝑤

𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − 𝛼𝜕𝐸

𝜕𝑤

w2

© Timur Madzhidov


𝜕𝐸

𝜕𝑤= 0

How to calculate gradient?

22

𝑤𝑖𝑡+1 = 𝑤𝑖

𝑡 − 𝛼𝜕𝐸

𝜕𝑤𝑖𝑡

We need to calculate derivative of loss function w.r.t.

weights of NN

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

x

W1

W1T x

f(W1T x)

W2

W2T f(W1

T x)

f(W2T f(W1

T x))

E=(ytrue - f(W2T f(W1

T x)))2ytrue

NN is nothing more

than (quite complex)

analytic function!

© Timur Madzhidov


Computational graph

23

Seppo LINNAINMAA

𝑧 = log(𝑤2 ∙ sin 𝑥 + 1)

z at w=1, x=0?

𝑤 𝑥

𝑎 = 𝑤2 𝑏 = sin(𝑥)

𝑐 = 𝑎 ∙ 𝑏

𝑧 = log(𝑑)

1 0

𝑒 = 𝑐 + 𝑑

1 0

0

1

0

𝑑 = 1 1

𝜕𝑧

𝜕𝑤=𝜕𝑧

𝜕𝑒∙𝜕𝑒

𝜕𝑐∙𝜕𝑐

𝜕𝑎∙𝜕𝑎

𝜕𝑤

𝜕𝑧

𝜕𝑤=?

Chain rule (derivative of complex function)

𝑓 𝑔 𝑥 = 𝑓′(𝑔) ∙ 𝑔′(𝑥)

© Timur Madzhidov


Computational graph

24

Seppo LINNAINMAA

𝑧 = log(𝑤2 ∙ sin 𝑥 + 1)

at w=1, x=0?

𝑤 𝑥

𝑎 = 𝑤2 𝑏 = sin(𝑥)

𝑐 = 𝑎 ∙ 𝑏

𝑧 = log(𝑑)

1 0

𝑒 = 𝑐 + 𝑑

1 0

0

1

0

𝑑 = 1 1

𝜕𝑧

𝜕𝑒=1

𝑒1

𝜕𝑒

𝜕𝑐= 1 1

𝜕𝑐

𝜕𝑎= 𝑏 0

𝜕𝑎

𝜕𝑤= 2𝑤 2

𝜕𝑧

𝜕𝑤=𝜕𝑧

𝜕𝑒∙𝜕𝑒

𝜕𝑐∙𝜕𝑐

𝜕𝑎∙𝜕𝑎

𝜕𝑤= 1*1*0*2 = 0

© Timur Madzhidov


Take-home messages

25

Computational graph is efficient way to calculate value of

functions at given arguments

Computational graph can be used for derivatives calculation

using chain rule

Derivatives calculation starts from the end and propagates to

the input (in direction, opposite to function calculation)

Computational graph and derivative calculation is the essence

of deep learning frameworks (TensorFlow, PyTorch, etc)

© Timur Madzhidov


Backpropagation

26

x

w1

Sigmoid

•

w2

•

Sigmoid

w3

•

Linear

MSE

o1 o2 o3 o4 o5 o6 E

Forward signal propagation

Backward derivative (error) propagation

E

𝜕𝐸

𝜕𝑜6

𝜕𝑜6𝜕𝑜5

𝜕𝑜5𝜕𝑤3

𝜕𝑜5𝜕𝑜4

𝜕𝑜4𝜕𝑜3

𝜕𝑜3𝜕𝑤2

𝜕𝑜3𝜕𝑜2

𝜕𝑜2𝜕𝑜1

𝜕𝑜1𝜕𝑤1

© Timur Madzhidov


Vanishing gradient problem

27

𝜕𝐸

𝜕𝑜6

𝜕𝑜6𝜕𝑜5

𝜕𝑜5𝜕𝑜4

𝜕𝑜4𝜕𝑜3

𝜕𝑜3𝜕𝑜2

𝜕𝐸

𝜕𝑤1

𝜕𝑜2𝜕𝑜1

𝜕(𝑥𝑤1)

𝜕𝑤1= 𝑥

≈0≈0≈0

≈0

Weights are not updated!

x

w1

Sigmoid

•

w2

•

Sigmoid

w3

•

Linear

MSEo1 o2 o3

o4 o5 o6E

© Timur Madzhidov


Activation functions: ReLU

28

𝑅𝑒𝐿𝑈 𝑥 = ቊ𝑥, 𝑖𝑓 𝑥 ≥ 00, 𝑖𝑓 𝑥 < 0

DerivativeReLU

https://mlfromscratch.com/activation-functions-explained/#/

© Timur Madzhidov


https://mlfromscratch.com/activation-functions-explained/

Deep learning

29

Deep learning is the application to learning tasks of artificial neural networks with multiple hidden layers

• New activation functions (ReLU, etc)

• New regularization techniques (dropout, etc)

• New learning techniques (SGD, Adam)

• New output functions (SoftMax, cross-entropy)

• Representation learning using autoencoders

• Generative models using variational autoencoders (VAE)

• Convolutional neural networks (CNN)

• Recurrent neural networks (RNN)

• Generative adversarial networks (GAN)

• Deep reinforcement learning (RL)

© Timur Madzhidov


WHY DEEP LEARNING?

30

Precision is often better

31

https://www.frontiersin.org/articles/10.3389/fphar.2019.01303/full

© Timur Madzhidov


https://www.frontiersin.org/articles/10.3389/fphar.2019.01303/full

Neural Networks are very flexible

32

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

Linear

activation

For regression

∑

x1

x2

x3

∑

x4

x5

x6

x7

x8

x9

∑

y

∑

Sigmoid

activation

For classification

MSE loss BCE loss

© Timur Madzhidov


Neural nets are building clocks

33

Task Y Output layer

activation

Loss function

Regression (-∞; +∞) Linear

MSE,

L1Loss (MAE),

Huber loss

Binary classification {0; 1} Sigmoid BCE

Binary classification

(SVM style with

margin)

{-1; +1}Hyperbolic

tangentHinge loss

Multiclass classification{class1, class2, …,

classK}Softmax

NLLLoss,

Categorical Cross

Entropy

Embedding learning

(minimize difference

between vectors)

vector any Cosine

Binned regressionOne-hot vector of binned

valuesSoftmax Earth mover distance

© Timur Madzhidov


Multi-headed NN for multitask prediction

34

Different properties

(12 assays)

12,707 chemical compounds

Comparison of multi-task (MT) and

single task (ST)

Tox21 challenge

Multitask outperformed

others in 10 out of 12 assays

https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full

© Timur Madzhidov


https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full

Recurrent Neural Networks (RNN)

35

What can RNNs do?

36

• Language modeling and generating text

• Machine translation

• Speech recognition and generation

• Generating image descriptions

• Time series processing

• Movies and video clips processing

• Music classification and generation

• Choreography

• ….

• Bioinformatics (DNA, RNA, proteins, peptides)

• Chemoinformatics (generation of SMILES strings for “useful” structures)

© Timur Madzhidov


Recurrent Neural Networks (RNN)

37

I

0h1

h1 h2

h20

I Ilove Iyou

IIch liebe dich

h3

RNNs can process not only current inputs xt but also previous inputs xt-1, xt-2,…, so

it uses previous experience for taking decisions

© Timur Madzhidov


RNN cell



One-Hot Encoding of SMILES String



SMILES for property prediction

40

Input SMILES Multi-la

yer

NN

y

Too many data needed! No advantage over simpler approaches

© Timur Madzhidov


Autoencoder (AE)

41

3.2

6.4

5.3

3.2

6.4

5.3

Latent vector

• Is an artificial neural network

• Trained to accurately reconstruct its input object

© Timur Madzhidov


Sampling SMILES from probability matrix



RNN + maps

44

Latent space

Sattarov B., et al. JCIM, 2019

https://doi.org/10.1021/acs.jcim.8b00751

© Timur Madzhidov


https://doi.org/10.1021/acs.jcim.8b00751

Discover of novel drug within 2 months

using AI

45

Zhavoronkov A., et al. Nature 2019 https://www.nature.com/articles/s41587-019-0224-x

© Timur Madzhidov


https://www.nature.com/articles/s41587-019-0224-x

CONVOLUTION NETWORKS

46

Image convolution networks



Graph convolution networks

50

Atomic features vectors are assigned each atom:

- element number

- number of implicit hydrogens

- hybridization

- number of valent electrons ...

© Timur Madzhidov


Summary

57

Deep learning imitates thinking process using multilayered artificial neural networks

Deep learning is not magic bullet – performance on small test sets approaches other methods. It is definitely not first choice method – complex to build and adjust

The power of deep learning is flexibility – with small change of architecture even unexperienced machine learner can adapt it to many different tasks

Deep learning can be used to make model directly on chemical structures

The most impressing in deep learning is ability to directly predict chemical structures

© Timur Madzhidov


To be continued…

58

Homework:

Tutorial in deep learning (using PyTorch):

https://bit.ly/2RWAI84

THANK YOU ALL!

59

You did it!

Artificial neural networks and deep learning€¦ · Deep learning 29 Deep learning is the...

Documents

Transcript of Artificial neural networks and deep learning€¦ · Deep learning 29 Deep learning is the...