Base stacking classification via automated clustering method Eli Hershkovits 1, Xavier Le Faucheur...

21
Base stacking classification via automated clustering method Eli Hershkovits 1 , Xavier Le Faucheur 1 , Neocles Leontis 2 , Allen Tannenbaum 1 1 Georgia Institute of Technology, 2 BGSU

Transcript of Base stacking classification via automated clustering method Eli Hershkovits 1, Xavier Le Faucheur...

Base stacking classification via automated clustering

method Eli Hershkovits1, Xavier Le Faucheur1, Neocles Leontis2, Allen Tannenbaum1

1Georgia Institute of Technology, 2BGSU

Data Classification

• Coordinate system and parameterization

• Clustering of the data (“by eye” or Automated clustering)

Base stackingRing Coordinate system

• the three orthogonal directions are calculated with Cremer and Pople method.

• The coordinates y1 and y2 can be used to define face of the ring (up or down.)

X1

Y1Z1

X2

Y

2

Z2

r12

Base stackingRelative Coordinate system

• Relative rings coordinates are defined by the spherical coordinates r and

r

r r

Primary Classification

• For each base stacking candidate the two closest rings are chosen to represent the pair. This choice gives a classification to four groups: Pyrimidine-pyrimidine Pyrimidine-imidazole, Imadizole-pyrimidine and Imidazole-imidazole.

• There are four possible combinations of face-face interactions: Up-up, Up down, Down-up, Down,down.

Parameters relevant for clustering

0

20

40

60

80

100

120

140

160

1 22 43 64 85 106 127 148 169 190 211 232 253 274 295 316 337 358 379

0

20

40

60

80

100

120

1 22 43 64 85 106 127 148 169 190 211 232 253 274 295 316 337 358 379

r

Parameters relevant for clustering

r

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6

Parameters relevant for clustering

0

10

20

30

40

50

60

70

0 50 100 150 200 250 300 350 400

Secondary classification

• The polar coordinates “r” , “” and “” are correlated and show distinction to two clusters” “Proper stacking” and improper stacking.

• Those classifications give 4*4*2 = 32 classes

Pyr - Pyr

Relative orientation

proper improper

UU 143C:G142 155C:C154

DD 511A:A509 743G:C699

UD 144A:G135 172U:G164

DU 147G:U146 897A:G765

Im - Pyr

Relative orientation

proper improper

UU 132A:A131 231G:C230

DD 2813A:A2811 2792A:U2791

UD 226A:A215 273G:C271

DU 174A:C173

Pyr-Im

Relative orientation

proper improper

UU 129A:A128 1360C:A1358

DD 129A:A116 2058G:G636

UD 176U:A174 922A:G921

DU 893G:G892 866U:A776

Im-Im

Relative orientation

proper improper

UU 159G:G158 223G:G222

DD 2564G:A2513 1190G:A1189

UD 1626A:A1624

DU 1664A:G1663

ExamplesPyr-Pyr up up

ExamplesPyr-Pyr up down

ExamplesIm-Pyr up up

ExamplesIm-Pyr up down

ExamplesIm-Im up up

ExamplesIm-Im up down

Possible problems

• For stacking of residues that are not neighbors the distribution of is broad.

• Possible overlap between clusters.

Stacking of RNA on protein

• Stacking interactions between nucleic acids and amino acids are not abundant (9 for the large subunit RR0033.)

• Most of the stacking interactions are with Histidine (6.) From the staking cases 5 are with the pyrimidine ring.