Navigating the Complex Web of Chemistry Using ChemSpider

71
Navigating the Complex Web of Chemistry Using ChemSpider

description

There is an increasing availability of free and open access resources for scientists to use on the internet. Coupled with the increasing availability of Open Source software tools we are in the middle of a revolution in data availability and tools to manipulate these data. ChemSpider is a free access website for chemists built with the intention of providing a structure centric community for chemists. As an aggregator of chemistry related information from many sources, at present over 21.5 million unique chemical entities from over 200 separate data sources, ChemSpider has taken on the task of both robotically and manually curating publicly available data sources. This presentation will provide an overview of the ChemSpider platform and how it is fast becoming the centralized hub for resourcing information about chemical entities.

Transcript of Navigating the Complex Web of Chemistry Using ChemSpider

Page 1: Navigating the Complex Web of Chemistry Using ChemSpider

Navigating the Complex Web of Chemistry Using ChemSpider

Page 2: Navigating the Complex Web of Chemistry Using ChemSpider

Antony Williams vs Identifiers

Old Passport ID

Dad, Tony, others

SSN

Green Card

License5 email addressesChemSpiderman (blog, Twitter account, Facebook, Friendfeed)OpenID….

Page 3: Navigating the Complex Web of Chemistry Using ChemSpider

Aspirin vs Chemical Identifiers

Page 4: Navigating the Complex Web of Chemistry Using ChemSpider

Aspirin names and synonyms

• Text searches depend on correct association

• 335 suggested identifiers for Aspirin just on PubChem!

• Disambiguation dictionaries are necessary

Page 5: Navigating the Complex Web of Chemistry Using ChemSpider

Linked Data Cloud

Page 6: Navigating the Complex Web of Chemistry Using ChemSpider

…the premium database producers are using some

automatic tools to prepare a ‘first draft’ of a database record, to be refined by eye.

Coupled with the public internet as a distribution method of choice, it is becoming possible for the first time to create and distribute new structure based databases at much lower costs, or even free of charge.

Page 7: Navigating the Complex Web of Chemistry Using ChemSpider
Page 8: Navigating the Complex Web of Chemistry Using ChemSpider
Page 9: Navigating the Complex Web of Chemistry Using ChemSpider

The Final Search Strategy

Page 10: Navigating the Complex Web of Chemistry Using ChemSpider

All Those Names, One Structure

Page 11: Navigating the Complex Web of Chemistry Using ChemSpider

Content is King and Quality Costs Chemistry “content” is big business. Not everyone

can afford it. Patent searching Structures and properties Drug databases Literature databases

Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information 101 years of content $260 million revenue (2006) >50 million substances Proprietary platform

Page 12: Navigating the Complex Web of Chemistry Using ChemSpider

Searching Chemistry on the Internet

How complete a result set will we get if we search for “chemicals” by name?

Is there a better way to link chemistry databases? Linking by “names” is dangerous

Chemists want structure and SUBstructure searching

Page 13: Navigating the Complex Web of Chemistry Using ChemSpider

The InChI Identifier

Page 14: Navigating the Complex Web of Chemistry Using ChemSpider

Multiple Layers

Page 15: Navigating the Complex Web of Chemistry Using ChemSpider

InChIStrings Hash to InChIKeys

Page 16: Navigating the Complex Web of Chemistry Using ChemSpider

Oleoylethanolamine

InChI=1S/C20H39NO2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-20(23)21-18-19-22/h9-10,22H,2-8,11-19H2,1H3,(H,21,23)/b10-9-

BOWVQLFMWHZBEF-KTKRTIGZSA-N

Page 17: Navigating the Complex Web of Chemistry Using ChemSpider

InChIKey Searches Work

Page 18: Navigating the Complex Web of Chemistry Using ChemSpider

Search Engine Dependencies

Page 19: Navigating the Complex Web of Chemistry Using ChemSpider

Search Engine Dependencies

Page 20: Navigating the Complex Web of Chemistry Using ChemSpider

InChIs have traction…

Page 21: Navigating the Complex Web of Chemistry Using ChemSpider

RDF Linking of Structures

Page 22: Navigating the Complex Web of Chemistry Using ChemSpider

PubChem

Page 23: Navigating the Complex Web of Chemistry Using ChemSpider

The Simplest Organic Molecule

Page 24: Navigating the Complex Web of Chemistry Using ChemSpider

Question Everything online: www.dhmo.org

Page 25: Navigating the Complex Web of Chemistry Using ChemSpider

The Structure-Based Data Cloud

Page 26: Navigating the Complex Web of Chemistry Using ChemSpider

Vancomycin

Page 27: Navigating the Complex Web of Chemistry Using ChemSpider
Page 28: Navigating the Complex Web of Chemistry Using ChemSpider

Vancomycin

Who will curate?

How would you clean such a large dataset?

Page 29: Navigating the Complex Web of Chemistry Using ChemSpider

Vancomycin on ChemSpider

Page 30: Navigating the Complex Web of Chemistry Using ChemSpider

Vancomycin

Page 31: Navigating the Complex Web of Chemistry Using ChemSpider

Vancomycin

Search Molecular SKELETON

Search Full Molecule

Page 32: Navigating the Complex Web of Chemistry Using ChemSpider

Full Skeleton Search: 104 Hits

Page 33: Navigating the Complex Web of Chemistry Using ChemSpider

Full Molecule Search: 4 Hits

Page 34: Navigating the Complex Web of Chemistry Using ChemSpider

What is ChemSpider? ChemSpider is:

Building a Structure Centric Community for Chemists 22.2 million compounds, >200 data sources

A deposition and curation platform

A publishing platform for the community

Grows daily – more depositions, more links, more data sources

Page 35: Navigating the Complex Web of Chemistry Using ChemSpider

For Chemical Compounds

Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others

Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,…

Biological Databases – Protein Database, Stitch, KEGG, ChEBI,…

Analytical databases –NMRShiftDB,…

Page 36: Navigating the Complex Web of Chemistry Using ChemSpider

How Was ChemSpider Built? ChemSpider was a “hobby project”

Housed in a basement and running off three servers – one bought, two built

May 2009

Page 37: Navigating the Complex Web of Chemistry Using ChemSpider

3 servers – 2 homebuilt .NET architecture SQL server Homebuilt structure/substructure Commercial components Open Source Components

OpenBabel, Jmol, JSpecView, NCBI Toolkit, InChI Libraries

Page 38: Navigating the Complex Web of Chemistry Using ChemSpider

Search Cholesterol

Page 39: Navigating the Complex Web of Chemistry Using ChemSpider

Search Cholesterol

Page 40: Navigating the Complex Web of Chemistry Using ChemSpider

Search Cholesterol

Page 41: Navigating the Complex Web of Chemistry Using ChemSpider

Search Cholesterol

Page 42: Navigating the Complex Web of Chemistry Using ChemSpider

Linked across the internet

Page 43: Navigating the Complex Web of Chemistry Using ChemSpider

Kyoto Encyclopedia of Genes and Genomes

Page 44: Navigating the Complex Web of Chemistry Using ChemSpider

Links to Patents based on structure

Page 45: Navigating the Complex Web of Chemistry Using ChemSpider
Page 46: Navigating the Complex Web of Chemistry Using ChemSpider

Answering Questions for Chemists Questions a chemist might ask…

What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?

Page 47: Navigating the Complex Web of Chemistry Using ChemSpider

Complex Data and Information

Page 48: Navigating the Complex Web of Chemistry Using ChemSpider

Remember – QUALITY ISSUES

Page 49: Navigating the Complex Web of Chemistry Using ChemSpider

The FDA’s DailyMed

Page 50: Navigating the Complex Web of Chemistry Using ChemSpider

Incorrect Structures

Page 51: Navigating the Complex Web of Chemistry Using ChemSpider

Does one stereocenter matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon

Page 52: Navigating the Complex Web of Chemistry Using ChemSpider

Crowd-sourcing Chemistry Curation

Page 53: Navigating the Complex Web of Chemistry Using ChemSpider

We Need Recognition and Rewards

Page 54: Navigating the Complex Web of Chemistry Using ChemSpider

Master Curators, Curators, Depositors

Page 55: Navigating the Complex Web of Chemistry Using ChemSpider

Collaborating with Wikipedia

Long term project to curate chemical compounds

Robotically linking ChemSpider to Wikipedia at present

Will layer on InChI Strings and InChIKeys shortly and make Wikipedia structure searchable

Page 56: Navigating the Complex Web of Chemistry Using ChemSpider

Blogs need InChIs too!

Page 57: Navigating the Complex Web of Chemistry Using ChemSpider

Blogs need InChIs too!

Page 58: Navigating the Complex Web of Chemistry Using ChemSpider

Use Intelligent Structures : ChemSpider Embed Web Service

Page 59: Navigating the Complex Web of Chemistry Using ChemSpider

ChemSpider Web Services

Page 60: Navigating the Complex Web of Chemistry Using ChemSpider

Semantic Mark-up for Chemistry

Semantic mark-up for chemistry is here

RSC project prospect

Nature publishing group compound linking

ChemMantis

Page 61: Navigating the Complex Web of Chemistry Using ChemSpider

Nature Chemistry Compound Pages

Page 62: Navigating the Complex Web of Chemistry Using ChemSpider

Project Prospect

Page 63: Navigating the Complex Web of Chemistry Using ChemSpider

ChemMantis

Page 64: Navigating the Complex Web of Chemistry Using ChemSpider

Deposit Structures

Page 65: Navigating the Complex Web of Chemistry Using ChemSpider

Species – linked to Wikipedia

Page 66: Navigating the Complex Web of Chemistry Using ChemSpider

Semantic Linking of Structures

What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”

Page 67: Navigating the Complex Web of Chemistry Using ChemSpider

The InChI “Resolver”

Page 68: Navigating the Complex Web of Chemistry Using ChemSpider

InChI Resolver to DOIsStructure Search the Web

Page 69: Navigating the Complex Web of Chemistry Using ChemSpider
Page 70: Navigating the Complex Web of Chemistry Using ChemSpider

Conclusions Internet resources provide a collaborative

community for chemistry

Crowdsourcing to expand, curate and integrate to the benefit of chemists

Searching the web for chemistry is arriving

InChIs are enabling chemistry on the internet

Question Quality!

Page 71: Navigating the Complex Web of Chemistry Using ChemSpider

[email protected]: ChemSpidermanwww.chemspider.com/blog