Navigating the Complex Web of Chemistry Using ChemSpider
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
1.647 -
download
0
description
Transcript of Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
Antony Williams vs Identifiers
Old Passport ID
Dad, Tony, others
SSN
Green Card
License5 email addressesChemSpiderman (blog, Twitter account, Facebook, Friendfeed)OpenID….
Aspirin vs Chemical Identifiers
Aspirin names and synonyms
• Text searches depend on correct association
• 335 suggested identifiers for Aspirin just on PubChem!
• Disambiguation dictionaries are necessary
Linked Data Cloud
…the premium database producers are using some
automatic tools to prepare a ‘first draft’ of a database record, to be refined by eye.
Coupled with the public internet as a distribution method of choice, it is becoming possible for the first time to create and distribute new structure based databases at much lower costs, or even free of charge.
The Final Search Strategy
All Those Names, One Structure
Content is King and Quality Costs Chemistry “content” is big business. Not everyone
can afford it. Patent searching Structures and properties Drug databases Literature databases
Chemical Abstracts Service (CAS), the “Gold Standard” in Chemistry related information 101 years of content $260 million revenue (2006) >50 million substances Proprietary platform
Searching Chemistry on the Internet
How complete a result set will we get if we search for “chemicals” by name?
Is there a better way to link chemistry databases? Linking by “names” is dangerous
Chemists want structure and SUBstructure searching
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Oleoylethanolamine
InChI=1S/C20H39NO2/c1-2-3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-20(23)21-18-19-22/h9-10,22H,2-8,11-19H2,1H3,(H,21,23)/b10-9-
BOWVQLFMWHZBEF-KTKRTIGZSA-N
InChIKey Searches Work
Search Engine Dependencies
Search Engine Dependencies
InChIs have traction…
RDF Linking of Structures
PubChem
The Simplest Organic Molecule
Question Everything online: www.dhmo.org
The Structure-Based Data Cloud
Vancomycin
Vancomycin
Who will curate?
How would you clean such a large dataset?
Vancomycin on ChemSpider
Vancomycin
Vancomycin
Search Molecular SKELETON
Search Full Molecule
Full Skeleton Search: 104 Hits
Full Molecule Search: 4 Hits
What is ChemSpider? ChemSpider is:
Building a Structure Centric Community for Chemists 22.2 million compounds, >200 data sources
A deposition and curation platform
A publishing platform for the community
Grows daily – more depositions, more links, more data sources
For Chemical Compounds
Vendor sites – Aldrich, Alfa Aesar, TCI and 100s of others
Government databases – PubChem, DSSTox, FDA databases, ChemIDPlus,…
Biological Databases – Protein Database, Stitch, KEGG, ChEBI,…
Analytical databases –NMRShiftDB,…
How Was ChemSpider Built? ChemSpider was a “hobby project”
Housed in a basement and running off three servers – one bought, two built
May 2009
3 servers – 2 homebuilt .NET architecture SQL server Homebuilt structure/substructure Commercial components Open Source Components
OpenBabel, Jmol, JSpecView, NCBI Toolkit, InChI Libraries
Search Cholesterol
Search Cholesterol
Search Cholesterol
Search Cholesterol
Linked across the internet
Kyoto Encyclopedia of Genes and Genomes
Links to Patents based on structure
Answering Questions for Chemists Questions a chemist might ask…
What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
Complex Data and Information
Remember – QUALITY ISSUES
The FDA’s DailyMed
Incorrect Structures
Does one stereocenter matter?
Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
Crowd-sourcing Chemistry Curation
We Need Recognition and Rewards
Master Curators, Curators, Depositors
Collaborating with Wikipedia
Long term project to curate chemical compounds
Robotically linking ChemSpider to Wikipedia at present
Will layer on InChI Strings and InChIKeys shortly and make Wikipedia structure searchable
Blogs need InChIs too!
Blogs need InChIs too!
Use Intelligent Structures : ChemSpider Embed Web Service
ChemSpider Web Services
Semantic Mark-up for Chemistry
Semantic mark-up for chemistry is here
RSC project prospect
Nature publishing group compound linking
ChemMantis
Nature Chemistry Compound Pages
Project Prospect
ChemMantis
Deposit Structures
Species – linked to Wikipedia
Semantic Linking of Structures
What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
The InChI “Resolver”
InChI Resolver to DOIsStructure Search the Web
Conclusions Internet resources provide a collaborative
community for chemistry
Crowdsourcing to expand, curate and integrate to the benefit of chemists
Searching the web for chemistry is arriving
InChIs are enabling chemistry on the internet
Question Quality!
[email protected]: ChemSpidermanwww.chemspider.com/blog