Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

21
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill June 27, 2022

description

Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity. Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill - PowerPoint PPT Presentation

Transcript of Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Page 1: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Development of Novel Geometrical Chemical Descriptors and Their Application to the

Prediction of Ligand-Protein Binding Affinity

Shuxing Zhang, Alexander Golbraikh and Alex Tropsha

The Laboratory for Molecular ModelingSchool of Pharmacy

University of North Carolina at Chapel Hill

April 19, 2023

Page 2: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Problem

Given a protein-ligand complex, predict ligand binding affinity.

Page 3: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Knowledge-based (Statistical) Potentials

• Two Body potentialsPMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, 791-804

BLEEP Mitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem.

1999, 20,1165-1176 DrugScore Gohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, 337-356

SMoG DeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118,11733-11744 SMoG2001 Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45,

2770-2780 • Four-Body contact potential (By Jun Feng)

Page 4: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)

king
An example of active site tessellation: the ribbon diagram represents the two chains of HIV-1 protease. The ligand acetyl-pepstatin is in spacefill mode and the yellow is the tetrahedral formed by protein and ligand
Page 5: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

RRRLRRLLRLLL

RRRL: Formed by 3 receptor atoms and 1 ligand atomsRRLL: Formed by 2 receptor atoms and 2 ligand atomsRLLL: Formed by 1 receptor atoms and 3 ligand atoms

Three Types of Tetrahedra at Protein-ligand Interface

Page 6: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

LRRR

RRRLRRRL ff

fE ln

LLRR

RRLLRRLL ff

fE ln

LLLR

RLLLRLLL ff

fE ln

Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay

Tessellation

Page 7: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

R2 = 0.4678-100

-80

-60

-40

-20

0

-100 -80 -60 -40 -20 0

DDG, calc

DDG,

exp

RLLLRRLLRRRL EEEE

Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function

Page 8: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Training Set size

Test Set size

Test Set R2

BLEEP 351 90 0.53

PMF 697 77 0.61

SMoG96 120 46 0.42

SMoG2001 725 111 0.436

DT2001 319 67 0.71

DT2002 319 107 0.54

Comparison of Current Scoring Functions

Page 9: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity

• Define the ligand-receptor interface by the means of DT

• Calculate chemical descriptors for nearest neighbor atom quadruplets.

• Use statistical data modeling approach to correlate descriptors and affinity

Page 10: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

µ: Electronegativity (chemical potentials) of atoms

Q: Partial charges on atoms

Η: Hardness kernel

Descriptors derived from atomic electronegativity

King
According to study of Dr. Berkowitz's lab, EN is highly related to the energy of molecules (see formulus). Qualitatively, we also know that it is related to hydrogen bond, polarity and polarization. we hope be able to describe the structure and binding with this parameter by applying it to Delaunay tessellation. There are several ways to apply EN to our geometrical method.
Page 11: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Ligand Atom TypesO EN = 3.4

N EN = 3.0

C EN = 2.5

S EN = 2.4

X P and Halgens, EN = 2.0 ~ 2.4, 4.0

M Metal and all other unexpected atom types, EN = 0.6 ~ 1.6

Receptor Atom TypesO EN = 3.4

N EN = 3.0

C EN = 2.5

S EN = 2.4

There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times).

Atom Type Definition based on En values

king
In order to generate descriptors, the atom types must be defined. Here we use EN as a criterion. The reasons will bed discussed on next slides. Basically we want to our descriptors make more physico-chemical sense and hope to explain the complicated binding process mechanistically. Another is to control of the number of descriptors not too many.
Page 12: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

m: m-th tetrahedral composition typej: Vertex of a tetradedronn: Number of m-th composition type

Thus, there are 100 descriptors for each protein-ligand complex

Descriptor Calculation

S_L

C_R

O_L

N_R

2.5

2.4

3.0

3.4

n

i jijmEN

1

4

EN

king
In order to generate descriptors, the atom types must be defined. Here we use EN as a criterion. The reasons will bed discussed on next slides. Basically we want to our descriptors make more physico-chemical sense and hope to explain the complicated binding process mechanistically. Another is to control of the number of descriptors not too many.
Page 13: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Flowchart of Novel Descriptor GenerationFlowchart of Novel Descriptor Generation

Process files and assign atom type

based on EN value

Define interaction interface with DT and record all interfacial tetrahedra

264 complexes

Classify interfacial tetrahedra into different composition

types and calculate their EN values (Descriptors)

Correlate with

Binding

Page 14: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Data ModelingData Modeling

Structure Binding CG Descriptors

Comp.1 Value1 D1 D2 D3 D4

Comp.2 Value2 " " " "

Comp.3 Value3 " " " "

Comp.N-264 Value264 " " " "

- - - - - - - - - - - - - -

Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes

{Binding affinity} = K{descriptor diversity}^

Page 15: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

0

5

10

15

20

25

30

Complex Families

Num

ber o

f Com

plex

es

Diversity of the dataset: 264 Complexes, 33 families

king
The high diversity of our structures and protein families which is hard for most of the current scoring functions to predict their binding affinity
Page 16: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Only accept models that have a

q2 > 0.6R2 > 0.6, etc.

Multiple Training Sets

Validate Predictive Models with Randomly Selected

External Sets (24)

Data Modeling WorkflowData Modeling Workflow

264 Complexes

Multiple Test Sets

Variable Selection kNN to build modelsSplit 240 into

Training and Test Sets

Binding Prediction

Y-Randomization

Randomly Exclude 24 Complexes as

External Set

Page 17: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds

(in the original 100 descriptor space)

k Nearest Neighbork Nearest Neighbor (k (kNN) with Variable SelectionNN) with Variable Selection

Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore)

Leave out a complex

Find k nearest neighbors in the training set

Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors.

Select acceptable models (with q2 > 0.6)Calculate the predictive ability (q2) of the model

N

times

N

times

SA

Page 18: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Actual PKi

Pre

dic

ted

PK

i

Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes

king
Prediction with multiple models and this is with the best model. R2 is about 0.783 and RMSD is about 0.91 (I will let you know the equavelent binding energy).
Page 19: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

0

2

4

6

8

10

12

0 2 4 6 8 10 12

Actual PKi

Pre

dict

ed P

Ki

Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model

king
United consensus prediction: Combine the training and test sets and do consensus prediction of external 24 complexes. R2 is about 0.70 and RMSD is about 0.89.
Page 20: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

Training Set size

Test Set size

Test Set R2

BLEEP 351 90 0.53

PMF 697 77 0.61

SMoG96 120 46 0.42

SMoG2001 725 111 0.436

DT2001 319 67 0.71

DT2002 319 107 0.54

CG 191 49 0.78

Comparison of Current Scoring Functions

Page 21: Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling

• Novel geometrical chemical descriptors have been developed

• These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures

• The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies

Conclusions