Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular...

22
Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural Database

Transcript of Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular...

Page 1: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Stephanie Harris

Crystal Grid Workshop

Southampton, 17th September 2004

Development of Molecular Geometry Knowledge Bases from

the Cambridge Structural Database

Page 2: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Molecular Geometry Knowledge Bases Library of chemically well-defined geometric information Limited user input Rapid retrieval of statistical data

Cambridge Structural Database Stored geometric information for ~300,000 structures Search using Conquest Substructure search, user input required

Page 3: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Molecular Geometry Knowledge Base: Mogul

Bond lengths, valence angles and torsion angles Compiled from the CSD

Published bond length tables: Organic and metal containing structures Published late 1980s Compiled from CSD of ~50,000 structures Cannot be accessed by computer programs

Applications Model building Refinement restraints Structure validation Comparative values

Page 4: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Mogul 1.0

Whole molecule input Graphical (cif, SHELX, mol2 files) or command-line interface Integration with client applications, e.g. Crystals Quick, automatic retrieval of statistical data, histogram distributions, CSD structures

Search Algorithm All non-metal fragments in the CSD coded Set of keys code chemical environments Fragments with identical keys are chemically identical Use hierarchical search tree Generalised searching if insufficient hits

Page 5: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Mogul Search

.S1.C7

N

S

N

O O

O

N

pTol

CN

Search

Page 6: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal – Ligand Bond lengths

To be considered: Ligand type: Carboxylate Metal Oxidation State: Co(II) Metal coordination number: 6 Ligand trans: Oxygen ligand Spin State?

Co-O bond length?

N

N

Co

O

O

OH2

OH2

C

Me

O

C(O)Me

Page 7: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Method

Analysis of M-L bond lengths.

For a range of metal and ligand types identify factors which influence M-L bond lengths and evaluate their importance.

For a defined Metal-Ligand group sub-divide bond length distribution to produce ‘chemically meaningful’

datasets: • Unimodal distributions.• ‘Reasonably small’ sample standard deviations.

From hand-crafted examples develop an algorithm to produce a molecular geometry knowledge base for metal complexes.

Page 8: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Data Tree

Metal-Ligand Group

Bin A1

Sharpened distributionsSmaller sample standard deviations

Bin A2

Bin B2 Bin B3Bin B1 Bin B4

Bin C1 Bin C2

Page 9: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

1. Ligand, L

2. Coordination mode of ligand

3. Effective Metal Coordination Number

4. Metal Oxidation State

5. Metal clusters and cages

6. Spin state

7. Jahn-Teller effect

8. Metal coordination geometry

9. Ligand trans to L

Criteria Influencing M-L Bond Lengths

M = 6 M = 6

Page 10: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Ligand Template Library

Ligand• Non-metal atom or fragment bonded to a metal.• Two ligands are the same if they have same connectivity

(topology) and stereochemistry.

Method• All ligands in CSD to be classified. • Classify according to contact atom coordinated to metal.• Ligands with multiple contact atoms can be present in more

than one ligand group. e.g. SCN-

M A

B

B

B

O O- - O O

Page 11: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Cambridge Structural Database Approximately 22,000 formulaeApproximately 780,000 ligands

No. of occurrences of unique formulae in CSD

Total Number of Ligands

Number of formulae

550,000 (70%) 70

100 – 999 109,263 (14%) 394

10 – 99 76,000 (10%) 3000

1 – 9 45,700 (6%) 18,937

Ligand Template Hierarchy• Exact ligand templates (724)• R-substituted templates (H’s replaced with ‘innocent’ R groups)• Generic templates (ALL ligands classified)

Page 12: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Cobalt Carboxylate Bond Lengths

Co OC

OCsp3

Co-O (Å)

No. ofFrags.

Co-O: 1.929(62) Å619 Fragments

Page 13: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Co OC

OCsp3

1.929(62) Å

Co(III)Co(II)

2.049(58) Å 1.904(20) Å

IICoLL

LLOC(O)C

L

IIICoLL

LLOC(O)C

L2.073(42) Å 1.904(20) Å

IICoLL

LLOC(O)C

OIIICoLL

LLOC(O)C

O

IIICoLL

LLOC(O)C

N2.074(32) Å

1.910(15) Å

1.895(17) Å

Page 14: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Chlorides Fe-Cl

2.242(68) Å Fe

Cl

L LL

III

2.189(24) Å

NFe

2.166(84) ÅHigh Spin

2.225(29) Å

Fe(II)L5py Pyridines e.g. Fe(spin state)

Cu(II)-OH2

2.232(225) Å

Copper complexes (Jahn-Teller effect)Standardisation of Cu connectivity

Tertiary phosphines, Carbon-ligands

Page 15: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal-Ligand Knowledge Base

1. CSD data adjustment: Standardisation of metal connections Assignment of metal as part of a metal cluster Assignment of metal oxidation state

2. Classification of ligands by ligand template library

3. Perform algorithm on all possible M-L fragments to produce knowledge base

Page 16: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal-Ligand Group

From ligand template library:Generic or more specific

e.g. Carboxylates:

C C

O

O

C Et

O

O

Algorithm:

C C

O

O

sp3

Page 17: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal-Ligand Group

Division on Oxidation State

‘Metal Clusters’

Division on Metal effective coordination number

Division on spin and Jahn-Teller effect

• Only for particular metals, oxidation states and coordination numbers.

• Not found for all ligand types.• Not searchable in CSD.Flag users, effects evident by: bimodal histogram, high SSD, outliers.

Page 18: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal-Ligand Group

‘Metal Clusters’

Division on Oxidation State

Division on Metal effective coordination number

Division on spin and Jahn-Teller effect

Division on Metal coordination geometry

E.g. 4-coordinate geometry:Tetrahedral, square planar, disphenoidal

Page 19: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Metal-Ligand Group

‘Metal Clusters’

Division on Oxidation State

Division on Metal effective coordination number

Division on spin and Jahn-Teller effect

Division on Metal coordination geometry

Divide on trans ligand to L

Final Ligand divisionMore specific ligande.g. alkyl carboxylate

Page 20: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Generalised Searching

• No hits or insufficient number of hits.

• Allows the retrieval of data on related fragments.

• Hierarchical search tree structure

• Move up to a higher, less specific level of data tree.

• Order of algorithm important. Should order of criteria be changed? Should order depend on M-L group?

E.g. Should oxidation state always be the first main division?

Page 21: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Conclusions

• Pre-processing of structural data from the CSD to construct molecular geometry knowledge bases.

• Knowledge bases to contain chemically well-defined datasets.

• Limited user input required.

• Quick, automatic retrieval of statistical data, distributions.

• Efficient analysis of large number of chemical fragments.

• Outliers, high SSD? Further Analysis – Computational Chemistry.

• Further development to include extra chemical information e.g. computational data.

Page 22: Stephanie Harris Crystal Grid Workshop Southampton, 17 th September 2004 Development of Molecular Geometry Knowledge Bases from the Cambridge Structural.

Acknowledgements

Bristol University:

Guy Orpen

Natalie Fey

X-Ray Crystallography Group

Cambridge Crystallographic Data Centre:

Robin Taylor

Frank Allen

Ian Bruno

Greg Shields