Crawling and web indexes. Today’s lecture Crawling Connectivity servers.
Crawling Across the Web of Chemistry Using ChemSpider
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
1.389 -
download
1
description
Transcript of Crawling Across the Web of Chemistry Using ChemSpider
![Page 1: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/1.jpg)
Crawling Across the Web of Chemistry Using ChemSpider
![Page 2: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/2.jpg)
Citizen Scientists Enable the Web
Who is writing about chemical compounds on Wikipedia?
Who is writing critical reviews of Chemistry online?
Who is blogging about chemistry on the web?
![Page 3: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/3.jpg)
For Synthesis…TotallySynthetic.com
![Page 4: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/4.jpg)
Org Prep Daily (Blog)
![Page 5: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/5.jpg)
Molbank (Open Access Journal)
![Page 6: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/6.jpg)
Synthetic Pages (Website)
![Page 7: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/7.jpg)
Encyclopedic Articles (Wikipedia)
![Page 8: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/8.jpg)
![Page 9: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/9.jpg)
Chemistry online – An Overview Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Chemical Synthesis procedures Scientific publications Chemical vendors Blogs Wikis Open Notebook Science
![Page 10: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/10.jpg)
What and who do you trust?
![Page 11: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/11.jpg)
Compounds and Identifiers
![Page 12: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/12.jpg)
What is ChemSpider? ChemSpider is:
Building a Structure Centric Community for Chemists >23 million compounds, ca. 250 data sources
A deposition and curation platform
A publishing platform for the community
Grows daily – more depositions, more links, more data sources
![Page 13: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/13.jpg)
Search Cholesterol
![Page 14: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/14.jpg)
Search Cholesterol
![Page 15: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/15.jpg)
Search Cholesterol
![Page 16: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/16.jpg)
Search Cholesterol
![Page 17: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/17.jpg)
Search Cholesterol
![Page 18: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/18.jpg)
Linked across the internet
![Page 19: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/19.jpg)
Link off a structure in ChemSpider
Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
![Page 20: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/20.jpg)
Linked to Millions of Articles
![Page 21: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/21.jpg)
Answering Questions for Chemists
Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
![Page 22: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/22.jpg)
What is the structure of Flibanserin?
![Page 23: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/23.jpg)
What is the structure of Flibanserin?
![Page 24: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/24.jpg)
Complex Data and Information
![Page 25: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/25.jpg)
Various Searches
Structure searching
Substructure searching
Subset searching – choose from 200 data sources
Property searching
Searches are used in various ways by different types of chemists…
![Page 26: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/26.jpg)
ChemSpider Searches
![Page 27: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/27.jpg)
ChemSpider Searches
![Page 28: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/28.jpg)
Caution! Question Everything!
![Page 29: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/29.jpg)
Vancomycin
Who will curate?
PubChem is not resourced to clean these errors
How would you clean such a large dataset?
![Page 30: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/30.jpg)
Vancomycin on ChemSpider 1 compound – discussions over 3 days
![Page 31: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/31.jpg)
The EXPERTS must get it right?!
![Page 32: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/32.jpg)
Wikipedia, C&E News, PubChem C&E News (from ACS)
![Page 33: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/33.jpg)
“Lathosterol”
![Page 34: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/34.jpg)
“Lathosterol”
![Page 35: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/35.jpg)
“Lathosterol”
![Page 36: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/36.jpg)
“Lathosterol” Removed
![Page 37: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/37.jpg)
![Page 38: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/38.jpg)
“Lathosterol” on PubChem
![Page 39: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/39.jpg)
Crowd-sourcing Chemistry Curation
Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
![Page 40: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/40.jpg)
Citizen Scientists
![Page 41: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/41.jpg)
Become a Data Source
![Page 42: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/42.jpg)
![Page 43: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/43.jpg)
Synthesis Procedures
![Page 44: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/44.jpg)
Links to Data or Deposit Data
![Page 45: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/45.jpg)
Your Blog Posted Online?
![Page 46: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/46.jpg)
Upload Spectral Data, OPEN Data?
![Page 47: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/47.jpg)
Data as DOIs
Primary Data for Chemistry Available for the First Time
…Thieme is the first publisher to make primary chemistry data accessible worldwide
Analytical data, from various experiments, is the foundation of research work and scientific papers
From now on, primary data will be registered and made available online using digital object recognition in the form of Digital Object Identifiers (DOI)
![Page 48: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/48.jpg)
Linking Data By DOI
![Page 49: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/49.jpg)
Semantic Mark-up for Chemistry
Semantic mark-up for chemistry is here
RSC project prospect (structure linking, IUPAC Gold Book ontology and other ontologies). Based on the OSCAR system
ChemSpider Journal of Chemistry
Nature publishing group compound linking
![Page 50: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/50.jpg)
ChemSpider and Publishing
Curation led to a set of validated dictionaries
Integrated entity extraction with validated name dictionaries
Additional dictionaries gave reactions, groups, families, hardware and software vendors etc
![Page 51: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/51.jpg)
ChemMantis and CJOC
![Page 52: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/52.jpg)
Name-Structure Pairs
![Page 53: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/53.jpg)
Deposit Structures
![Page 54: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/54.jpg)
Species – linked to Wikipedia
![Page 55: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/55.jpg)
Semantic Linking of Structures
What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
![Page 56: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/56.jpg)
RSC’s Project Prospect
![Page 57: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/57.jpg)
In Development ChemSpider Synthesis
ChemSpider Synthesis will be a home for all things “synthetic”
An online resource for synthetic procedures from blogs, other online resources, RSC supplementary info, other publishers etc.
Public peer-review and feedback for synthetic procedures
![Page 58: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/58.jpg)
RSC Supplementary Info
![Page 59: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/59.jpg)
Online Journals and Live Data
![Page 60: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/60.jpg)
ChemSpider Everywhere : Embed
![Page 61: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/61.jpg)
ChemSpider Everywhere: Spectral Game
![Page 62: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/62.jpg)
ChemSpider EverywhereCrowdsourced Curation of Spectra
![Page 63: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/63.jpg)
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemMobi
![Page 64: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/64.jpg)
ChemSpider Web Services
![Page 65: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/65.jpg)
ChemSpider Everywhere Linked from Wikipedia
Linked from Open Notebook Science sites
Linked from Blogs using Structure/Spectra
Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets
![Page 66: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/66.jpg)
![Page 67: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/67.jpg)
Where is ChemSpider Lacking?
ChemSpider is limited to “defined chemicals”. No support for: Polymers Minerals Markush structures
ChemSpider is very dependent on InChIs Stereochemistry around non-carbon centers Organometallics are not correctly represented
There are millions of errors on ChemSpider
![Page 68: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/68.jpg)
What’s next? Keep cleaning and depositing data
Enable discovery via the semantic web (RDF)
Integrate software: Symyx Jdraw, NMRShiftDB
Integrate RSC content – a massive archive!
Integrate RSC publishing workflows and databases
![Page 69: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/69.jpg)
Continue Building Community for Chemistry
Building a Public ADME/Tox database
Delivering ChemSpider Synthetic Pages
Delivering ChemSpider Analytical Data
Delivering ChemSpider Education
Project Focus
![Page 70: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/70.jpg)
People Make Change HappenYou are invited.. Curate ChemSpider data and link to us
Deposit your data with us Structures Spectra Synthesis procedures
ChemSpider Synthesis is under development
![Page 71: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/71.jpg)
People Make Change Happen ChemSpider was a “hobby project”
Housed in a basement and running off three servers – one bought, two built
Sensitive to weather and power stability
Went live at ACS Spring 2007 in Chicago
ca. 6000 visitors a day, >50,000 transactions daily
![Page 72: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/72.jpg)
Organizations Scale Innovation
![Page 73: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/73.jpg)
There is a Downside…
![Page 74: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/74.jpg)
There is a Downside…
![Page 75: Crawling Across the Web of Chemistry Using ChemSpider](https://reader034.fdocuments.us/reader034/viewer/2022051111/554e7da1b4c9054a698b52a1/html5/thumbnails/75.jpg)
Thank you
[email protected]: ChemSpidermanwww.chemspider.com/blogSLIDES: www.slideshare.net/AntonyWilliams