Representation of Markush structures — from molecules
Transcript of Representation of Markush structures — from molecules
![Page 1: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/1.jpg)
August 2010, ACS National meeting, Boston
Representation of Markush structures — from molecules towards patents
Szabolcs Csepregi
Solutions for Cheminformatics
![Page 2: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/2.jpg)
August 2010, ACS National meeting, Boston
Contents
• ChemAxon
• What are Markush structures?
• How to get them?
• What can be done with them? – Enumeration – Storage, search
• Challenges in chemical representation
• Under development
![Page 3: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/3.jpg)
August 2010, ACS National meeting, Boston
ChemAxon
• Cheminformatics toolkits and applications
• HQ: Budapest, Hungary
• Founded: 1998
• Main customers: pharma, biotech, publishing
• 3rd party applications and web sites. (e.g. Integrity, Reaxis, PDB ligand search, ELN-s, registration systems, etc)
![Page 4: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/4.jpg)
August 2010, ACS National meeting, Boston
ChemAxon
Main products: – Structure drawing & visualization (Marvin family) – Chemical DB tools (JChem family) – Property predictions (Calculator plugins) – Drug discovery tools (Reactor, JKlustor, etc.)
Development strategy: customer-driven
![Page 5: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/5.jpg)
August 2010, ACS National meeting, Boston
What are Markush structures
and how to get them?
![Page 6: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/6.jpg)
August 2010, ACS National meeting, Boston
Markush structures Generic notation for describing many molecules
(= Markush library) in a compact form.
Main usage: – Combinatorial chemistry – Chemistry-related patents
![Page 7: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/7.jpg)
August 2010, ACS National meeting, Boston
Markush structures
• Current features handled: – R-groups – Atom lists, bond lists – Position variation bond – Link nodes – Repeating units – Homology groups
(aryl, alkyl, etc.)
![Page 8: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/8.jpg)
August 2010, ACS National meeting, Boston
ChemAxon Markush project Goals:
– Extend structural search capabilities to combinatorial Markush structures
– Markush enumeration
Complications: – Practical examples may be very complex, methods using
explicit enumeration may be impossible – Extension of current molecular formats (generic features)
Timeline – Pilot study started in 2005 Q4, – First prototype shown at UGM, 2006 June – Released in JChem 5.0, 2008 – Markush DARC format support 5.3.0 2010
![Page 9: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/9.jpg)
August 2010, ACS National meeting, Boston
How to get Markush structures?
• Drawing – Marvin Sketch
![Page 10: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/10.jpg)
August 2010, ACS National meeting, Boston
How to get Markush structures?
• Patent literature – Markush DARC format (*.vmn)
• Compatible with Thomson Reuters MMS patent Markush database (Test set available.)
![Page 11: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/11.jpg)
August 2010, ACS National meeting, Boston
How to get Markush structures?
Combinatorial chemistry – Reagent clipping 1. Replace reacting group with attachment point
(Reactor tool) 2. Turn fragments to
R-group definitions (Molconvert tool)
3. Add a scaffold (Molconvert tool)
![Page 12: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/12.jpg)
August 2010, ACS National meeting, Boston
How to get Markush structures?
Combinatorial chemistry – R-group decomposition 1. Filter and identify ligands in chemical library 2. Create Markush structure from R-table (R-group decomposition tool)
![Page 13: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/13.jpg)
August 2010, ACS National meeting, Boston
What to do with them?
![Page 14: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/14.jpg)
August 2010, ACS National meeting, Boston
Markush Enumeration
• Markush enumeration plugin – Full enumeration – Selected parts only – Random enumeration – Calculate library size – Scaffold alignment
and coloring – Markush code – Optional example
homology group enumeration
![Page 15: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/15.jpg)
August 2010, ACS National meeting, Boston
Markush storage & search • JChem Base and
Instant JChem
• No enumeration involved
• Can handle complex Markush structures (1040 or more)
• Substructure and Full structure search
• Broad translation of homology groups is supported. (Homology in DB, specific in query.)
![Page 16: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/16.jpg)
August 2010, ACS National meeting, Boston
Markush storage & search
Substructure hit visualization
Query
Result in original Markush
![Page 17: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/17.jpg)
August 2010, ACS National meeting, Boston
Markush storage & search
Substructure hit visualization: „Markush structure reduction”
Query
Result in original Markush
Reduced result
![Page 18: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/18.jpg)
August 2010, ACS National meeting, Boston
Main use cases
• Patent search hits refining / visualization,
• White space analysis,
• Patent busting,
• Markush structure curation,
• In-house storage of small Markush DB,
• etc...
![Page 19: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/19.jpg)
August 2010, ACS National meeting, Boston
MMS evaluation Instant JChem project
![Page 20: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/20.jpg)
August 2010, ACS National meeting, Boston
Challenges in chemical representation (solved)
![Page 21: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/21.jpg)
August 2010, ACS National meeting, Boston
Representation - What we already had
Generic notation in queries:
• Atom lists, bond lists
• R-group queries (Problem: RGFile R-logic and patent R-logic are different! - Solution: Just ignore R-logic.)
• Link nodes
• Some generic atoms (X) – represented as pseudo atoms.
Single or double
![Page 22: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/22.jpg)
August 2010, ACS National meeting, Boston
Challenge 1: Attachment point
• Multiple – ligand order and attachment order Heavily used in Markush DARC (up to 8 attachments!)
• Represented as atom property
Parent group (root)
R-group definitions
Order of ligands for G15 (R15)
Attachment points for definitions
![Page 23: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/23.jpg)
August 2010, ACS National meeting, Boston
Challenge 1: Attachment point
• Embedded R-groups: Grandparent relations may be needed between attachment points:
G3’s attachment point „1” is mapped to
G4’s attachment point „1”
![Page 24: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/24.jpg)
August 2010, ACS National meeting, Boston
Challenge 1: Attachment point
• Temporary representation: attached data – ligand order – attachment point in R-group definition – still an atom property – ligand order sometimes in parent group
(grandparent relation)
Order of ligands for R2
Attachment points for definitions
![Page 25: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/25.jpg)
August 2010, ACS National meeting, Boston
Challenge 1: Attachment point
• Real attachment object with bond (under development)
– eliminates need for grandparent relations table:
Order of ligands for R4
Attachment point for R3
Order of ligands for R2
Attachment points for definitions
![Page 26: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/26.jpg)
August 2010, ACS National meeting, Boston
Challenge 2: Abbreviations
• Superatom S-groups were originally in Marvin (~700 built-in shortcuts) – Expand / Contract – Search code already handled them
in specific structures.
• M. DARC had 21 shortcuts + 31 peptides.
• Attachment point next to abbreviations – Needed to be visible „outside” and handled
correctly „inside”. – New attachment point solves this also:
![Page 27: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/27.jpg)
August 2010, ACS National meeting, Boston
Challenge 3: Homology groups (generics)
• Pseudoatom representation
• Naming (Still looking for the most descriptive „long” names.)
• Extra conditions: general atom property framework (under development)
Markush DARC name „Long name” CHK alkyl CYC carboAlicyclyl ARY carboAryl HEA heteroMonoAryl
![Page 28: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/28.jpg)
August 2010, ACS National meeting, Boston
Challenge 4: Frequency variation
• Link nodes
• Repeating units: modified SRU
• Multipliers: – special SRU, 1 outer bonds. – (Currently visualization only.)
• Moieties: – special SRU, 0 outer bonds – to describe (variable) stoichiometry – (Currently visualization only.)
![Page 29: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/29.jpg)
August 2010, ACS National meeting, Boston
Challenge 5: Position variation bond
• New special S-group type
• Relocatable multicenter atom represents group for bonds
• Also useful to represent multicenter charge and coordination compounds:
![Page 30: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/30.jpg)
August 2010, ACS National meeting, Boston
What (else) keep us busy
![Page 31: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/31.jpg)
August 2010, ACS National meeting, Boston
Under development
• Further improvements in Markush DARC support: – Ring segment groups (XX form a ring) – New, more robust representation for attachment points – Homology properties (low alkyl, fused aryl, C1-3, N2-5, etc)
• Ranking of results • New ways to navigate/zoom Markush structures
• Maximum common substructure search
• Biased enumeration and covering Markush – based on examples in patent.
• Improve search speed to handle larger Markush sets.
• Other Markush formats – Markush InChI standard committee • Overlap analysis of Markush structures
• Conditions for Markush variables
![Page 32: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/32.jpg)
August 2010, ACS National meeting, Boston
Summary
• Markush structure storage, search and enumeration at ChemAxon now patent coverage
• Compatible patent data is available from Thomson Reuters
• Well thought out chemical representation
• Continuous development, improvements in the pipeline
![Page 33: Representation of Markush structures — from molecules](https://reader031.fdocuments.us/reader031/viewer/2022020706/61fc81978d33c02b785df7be/html5/thumbnails/33.jpg)
August 2010, ACS National meeting, Boston
Acknowledgements
• Development team: Nóra Máté, Róbert Wágner, Szilárd Dóránt, Tamás Csizmazia, Tim Dudgeon, Erika Bíró, Ali Baharev, Ferenc Csizmadia, et al.
• Tim Miller, Steve Hajkowski, Gez Cross and Linda Clark at Thomson Reuters for useful discussions, help and example Markush DARC files
• Many early adopters and colleagues within the field for suggestions and feedback