Oops and Downs of Resolving InChIs For the Chemistry Community

40
Oops and downs of resolving InChIs for the chemistry community

description

The InChI resolver was rolled out to the community in March 2009 with the purpose of providing a centralized resource for chemists to resolve InChIs (International Chemical Identifiers). This presentation will provide an overview of the development of the underlying technologies associated with the InChI resolver, and how the resolver is being used, integrated and enhanced to provide additional value to the chemistry community. We will discuss present limitations to application of the resolver for providing access to databases and chemistry information distributed across the internet and define our vision for enhancing interconnectivity across Open databases using the InChI resolver as the glue.

Transcript of Oops and Downs of Resolving InChIs For the Chemistry Community

Page 1: Oops and Downs of Resolving InChIs For the Chemistry Community

Oops and downs of resolving InChIs for the chemistry community

Page 2: Oops and Downs of Resolving InChIs For the Chemistry Community

The InChI Has Arrived

My opinions:

The InChI is a crucial part of the future of structure-based relationships on the web

The semantic web of chemistry will sit on the shoulders of InChI until there is something better

InChIs and publishers are already in relationship – publishers who have not adopted will follow

Page 3: Oops and Downs of Resolving InChIs For the Chemistry Community

PPP – Perfection vs Productive vs Prolific

The InChI is not perfect

There are limitations but they are acknowledged and in discussion

The InChI is very “productive”

InChIs are showing up in databases, manuscripts, spreadsheets, on publications, in software

Page 4: Oops and Downs of Resolving InChIs For the Chemistry Community

A Lot of Variability in InChIs

Source: Unofficial InChI FAQ page

Page 5: Oops and Downs of Resolving InChIs For the Chemistry Community

InChIStrings Hash to InChIKeys

Page 6: Oops and Downs of Resolving InChIs For the Chemistry Community

HVYWMOMLDIMFJA-DPAQBDIFSA-N

Page 7: Oops and Downs of Resolving InChIs For the Chemistry Community

The InChI Resolver

Page 8: Oops and Downs of Resolving InChIs For the Chemistry Community

Inchis.chemspider.com

Page 9: Oops and Downs of Resolving InChIs For the Chemistry Community

Resolve an InChI or InChIKey

Page 10: Oops and Downs of Resolving InChIs For the Chemistry Community

Resolved

Page 11: Oops and Downs of Resolving InChIs For the Chemistry Community

Connection Only Resolving

Page 12: Oops and Downs of Resolving InChIs For the Chemistry Community

InChIs and Big Databases

There appears to be a bigger is better mentality with online databases

InChI has shown a lot of “overlap” in the ChemSpider database

Distinction : a unique chemical entity versus what it’s meant to be

Some simple examples …

Page 13: Oops and Downs of Resolving InChIs For the Chemistry Community

Spot The Difference

Page 14: Oops and Downs of Resolving InChIs For the Chemistry Community

Standard InChIKeys

Page 15: Oops and Downs of Resolving InChIs For the Chemistry Community

Spot The Difference

Page 16: Oops and Downs of Resolving InChIs For the Chemistry Community

55 Hits in 0.08 Seconds

Page 17: Oops and Downs of Resolving InChIs For the Chemistry Community

Large Databases Contain Junk

InChI Resolvers will get us back to results but it’s a look up..

There is an enormous need for curation and linking resolved structures to “correct” structures – a manual task

Page 18: Oops and Downs of Resolving InChIs For the Chemistry Community

Generate-It

Page 19: Oops and Downs of Resolving InChIs For the Chemistry Community

Draw and generate

Page 20: Oops and Downs of Resolving InChIs For the Chemistry Community

Generate

Page 21: Oops and Downs of Resolving InChIs For the Chemistry Community

All Flavors

Page 22: Oops and Downs of Resolving InChIs For the Chemistry Community

Historical and Future InChIs

The Standard InChI removed variability

There will be new variants in the future

There are already millions of historical InChIs “out there”

Resolvers should accommodate historical and future InChIs

Page 23: Oops and Downs of Resolving InChIs For the Chemistry Community

In Our Resolver…

Page 24: Oops and Downs of Resolving InChIs For the Chemistry Community

On to ChemSpider…

Page 25: Oops and Downs of Resolving InChIs For the Chemistry Community

NEW Patents and Pubmed on ChemSpider

Page 26: Oops and Downs of Resolving InChIs For the Chemistry Community

InChIs to Patents and Pubmed Articles

Page 27: Oops and Downs of Resolving InChIs For the Chemistry Community

But there will be multiple resolvers…

Each publisher, database, scientist can choose not to publish their structures into a centralized database

There are many large online databases. There is no need to merge/mirror them – each can be a resolver

They need to be federated

Page 28: Oops and Downs of Resolving InChIs For the Chemistry Community

Many ways to address resolving

Our approach is simple – lookup. We look up the structure. SIMPLE.

Page 29: Oops and Downs of Resolving InChIs For the Chemistry Community

NCI/CADD resolver: 69 million structures

Page 30: Oops and Downs of Resolving InChIs For the Chemistry Community

Differences

The NCI and ChemSpider Resolvers are “different”

Different databases behind the resolver – Feedback from NCI: “Preliminary results indicate that inchis.chemspider.com can resolve approx. 28% of our structures.”

Our approaches for resolving differ

Some features are different

Page 31: Oops and Downs of Resolving InChIs For the Chemistry Community

The InChI Resolver Protocol

There will not be only one InChI Resolver – there will be many Publishers Commercial Databases Free services and resources : PubChem,

ChemSpider, NCI Database, ChEBI

Resolvers will not be mirrors of each other There is no need to mirror when a protocol is in

place

Page 32: Oops and Downs of Resolving InChIs For the Chemistry Community

InChI Resolver Protocol

InChI resolving needs to be federated

A common protocol can connect resolvers so that a user gets a complete results set

Individual resolvers can have different capabilities but an agreed common protocol for resolving InChIs

Page 33: Oops and Downs of Resolving InChIs For the Chemistry Community

Discuss with us on Google Groups

Draft protocol for ACS Spring 2010 from RSC ChemSpider NCI/CADD PubChem Symyx

Proof of concept hopefully by end of this year for initial feedback (NCI and ChemSpider

Join us at http://tinyurl.com/r7q9zc http://groups.google.com/group/inchiresolverprotocol

Page 34: Oops and Downs of Resolving InChIs For the Chemistry Community

InChI trust

The founder members of the Trust: Elsevier, Thompson Reuters, Wiley, Nature Publishing Group, Royal Society of Chemistry, Symyx, FIZ-Chemie, Taylor & Francis and OpenEye

Page 35: Oops and Downs of Resolving InChIs For the Chemistry Community

In InChIs We Trust

It was said…. “There is a finite, but very small probability of

finding two structures with the same InChIKey.”

The first collision was announced on Sunday by Jonathan Goodman

Page 36: Oops and Downs of Resolving InChIs For the Chemistry Community

Spongistatin

Page 37: Oops and Downs of Resolving InChIs For the Chemistry Community

Probabilities are what they are…

“The molecule for which a collision has been reported … gives rise to 226 = 67,108,864 possible stereoisomers”

The probability of a clash is low but finite…and it happened.

OR…there may be a bug…work underway

Page 38: Oops and Downs of Resolving InChIs For the Chemistry Community

The Future

InChI is here

InChIKeys are proliferating

The need for lookup is inevitable – the need for federated resolvers is obvious

Intention to provide draft resolver protocol by end of year

ACS Spring – unveil proof of concept

Page 39: Oops and Downs of Resolving InChIs For the Chemistry Community

Acknowledgments

The InChI “Team” – leadership team, developers, advisors, funders and the community providing feedback

Royal Society of Chemistry

Page 40: Oops and Downs of Resolving InChIs For the Chemistry Community

Thank you

[email protected]: ChemSpidermanwww.chemspider.com/blog