eScience at the Royal Society of Chemistry and our current initiatives
-
Upload
orcid-0000-0002-2668-4821 -
Category
Technology
-
view
540 -
download
0
description
Transcript of eScience at the Royal Society of Chemistry and our current initiatives
![Page 1: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/1.jpg)
eScience at the Royal Society of Chemistry: Current Initiatives
Antony WilliamsCornell University, May 14th 2013
![Page 2: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/2.jpg)
We Have …Too Much Data!!!
![Page 3: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/3.jpg)
The World of Online Chemistry• Property databases• Compound aggregators• Screening assay results• Scientific publications • Encyclopedic articles (Wikipedia)• Metabolic pathway databases• ADME/Tox data – eTOX for example• Blogs/Wikis and Open Notebook Science
![Page 4: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/4.jpg)
e-Science and Primary Data• How much data generated in a lab, that COULD go public, is
lost forever?
![Page 5: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/5.jpg)
e-Science and Primary Data• How much data generated in a lab, that COULD go public, is
lost forever?• Public Domain reference databases of value?
– Syntheses– Properties– Spectra– CIFs– Images
![Page 6: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/6.jpg)
e-Science and Primary Data• How much data generated in a lab, that COULD go public, is
lost forever?• Public Domain reference databases of value?
– Syntheses– Properties– Spectra– CIFs– Images
• Much of chemistry is chemical structure-based – where and how could we host these data?
![Page 7: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/7.jpg)
RSC’s ChemSpider
![Page 8: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/8.jpg)
ChemSpider
• >28.5 million unique chemicals from >400 data sources
• Focus on improving data quality, enhancing functionality, integrating and enabling
![Page 9: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/9.jpg)
Crowdsourced “Annotations”• Users can add
– Descriptions/Syntheses/Commentaries– Links to PubMed articles– Links to articles via DOIs – Add spectral data– Add Crystallographic Information Files– Add photos– Add MP3 files– Add Videos
![Page 10: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/10.jpg)
![Page 11: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/11.jpg)
Spectra
![Page 12: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/12.jpg)
Chemistry Data online are messy• We have inherited errors• All public compound databases have errors• “Incorrect” structures – assertions, timelines etc• “Incorrect” names associated with structures• Properties• Links• Publications• ENORMOUS CHALLENGE
![Page 13: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/13.jpg)
Crowdsourced Curation
• Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
![Page 14: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/14.jpg)
Search “Vitamin H”
![Page 15: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/15.jpg)
“Curate” Identifiers
![Page 16: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/16.jpg)
“Curate” Identifiers
![Page 17: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/17.jpg)
“Curate” Identifiers
![Page 18: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/18.jpg)
Validated Name-Structure Dictionaries• Chemical name dictionaries are used for:
• Text-mining (publications, patents)– Used to index PubMed and link to Google Patents
• Linking to other databases – think Biology!– When structures are not available drug names link
• Searching the web– Names link to structures link to InChIs
![Page 19: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/19.jpg)
I want to know about “Vincristine”
![Page 20: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/20.jpg)
Vincristine: Identifiers and Properties
![Page 21: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/21.jpg)
Vincristine: Vendors and SourcesLinked by Structure
![Page 22: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/22.jpg)
Vincristine: PatentsLinked by Name
![Page 23: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/23.jpg)
Vincristine: ArticlesLinked by Name
![Page 24: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/24.jpg)
Semantic Mark-up of Articles
![Page 25: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/25.jpg)
Linking Names to Structures
![Page 26: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/26.jpg)
The InChI Identifier
![Page 27: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/27.jpg)
InChIStrings Hash to InChIKeys
![Page 28: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/28.jpg)
Vancomycin – Search the Internet
![Page 29: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/29.jpg)
Vancomycin
Search Molecular SKELETON
Search Full Molecule
![Page 30: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/30.jpg)
Full Skeleton Search: 104 Hits
![Page 31: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/31.jpg)
Full Molecule Search: 4 Hits
![Page 32: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/32.jpg)
ChemSpider Resources for Chemistry
![Page 33: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/33.jpg)
Some usage statistics• ca. 200 visitors at any one time, ~30,000 visits per day• Mar 4-Apr 3, 2013
– Visits = 731,656– Unique Visitors = 527,008
• Independent servers to support other projects
![Page 34: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/34.jpg)
Access ChemSpider
• APIs– Programmatic access used by Mobile Apps, Funded
Consortia projects, many Academic groups
• Widgets– UI components for embedding in other websites
• Data– Data access, downloads, reuse, licensing
![Page 36: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/36.jpg)
Flexible ChemSpider API
![Page 37: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/37.jpg)
Publications - a summary of work
• Scientific publications are a summary of work– Is all work reported?– How much science is lost to pruning?– What of value sits in notebooks and is lost?
• How much data is lost?– How many compounds never reported?– How many syntheses fail or succeed?– How many characterization measurements?
![Page 38: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/38.jpg)
Micropublishing Syntheses
![Page 39: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/39.jpg)
ChemSpider SyntheticPages
![Page 40: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/40.jpg)
Olympicene
![Page 41: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/41.jpg)
So you Want a Profile???
![Page 42: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/42.jpg)
![Page 43: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/43.jpg)
![Page 44: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/44.jpg)
Interactive Data
![Page 45: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/45.jpg)
Integrate to instruments and software
• Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters
• Also, Cheminformatics vendors link to ChemSpider– Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
![Page 46: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/46.jpg)
![Page 47: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/47.jpg)
PharmaSea
• Dereplication via ChemSpider• Segregation of natural products datasets• Analytical data algorithms & integration
– Mass spec searching – predicted fragmentation
– NMR feature searching – NMR prediction– Computer-assisted structure elucidation
![Page 48: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/48.jpg)
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections to
disease?Connections to
disease?
Expressed in right cell type?
Expressed in right cell type?
Competitors?Competitors?
IP?IP?
![Page 49: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/49.jpg)
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
![Page 50: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/50.jpg)
ChemSpider Contributions
• The host of the chemistry services– Supplier of “standardized” chemical data files– Chemistry searching (structure, substructure etc)– Curator and data quality checking
• Now building the Open PHACTS chemical registration system
![Page 51: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/51.jpg)
Natural Products Updates
• Names hard, Structures “Obvious”
• New content based on monthly updates of the database
• Click through to the Natural Products Updates entry
![Page 52: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/52.jpg)
National Chemical Database Service
![Page 53: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/53.jpg)
Chemical Database Service• National Chemical Database
Service for UK Academics
• Integrating Commercial Databases and Services
• Chemicals, analytical data, prediction algorithms
• Development of data repository
![Page 54: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/54.jpg)
Community Repository for Data• Funding agencies encourage sharing of data• Increasing availability of “Open Data”• Institutional repositories no specific domain
support • Develop a community repository for chemistry
data – private, public, embargoed• Provides data to develop models/algorithms
![Page 55: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/55.jpg)
Community Repository for Data• Automated depositions of data• DOI’ed data objects for citation purposes• A database of reference data, but validated by
the community • National services feeding the repository –
crystallography, mass spectrometry• Integrate to blogging tools for chemistry• Integrate to Electronic Lab Notebooks as feeds
![Page 56: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/56.jpg)
Model Building with Community Data
• Community data as a basis of model building– Consume data from available databases, community
data, new publications and build predictive algorithms for the community
– How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
![Page 57: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/57.jpg)
Support for Chemical Reactions
• Integrating mined reaction data from patents• Will also incorporate and integrate RSC
Databases: Methods of Organic Synthesis, Catalysts and Catalyzed Reactions and…
![Page 58: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/58.jpg)
Inside our Publication Archive
• How much data is in the archive, in the publications and in the supplementary info?– How many compounds for ChemSpider?– How many syntheses for ChemSpider reactions?– How many characterization measurements?
• Property Data• Spectral Data• Graphs and charts to be used for modeling?
![Page 59: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/59.jpg)
What if we could capture it all?Digitally Enhancing the RSC Archive
![Page 60: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/60.jpg)
Start with data in publications
![Page 61: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/61.jpg)
Data Validation and Curation Required
Encouraging Participation with Rewards and RECOGNITION
![Page 62: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/62.jpg)
Manual Curation
• Integrated commenting, curating and validation platform across ALL eScience and publishing platforms
• All integrated to a central RSC profile and feeding the AltMetrics tools
![Page 63: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/63.jpg)
Structure Review
![Page 64: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/64.jpg)
Future Recognition in AltMetrics?
ChemSpider
![Page 65: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/65.jpg)
Internet Data
The Future
Commercial SoftwarePre-competitive Data
Open ScienceOpen DataPublishersEducators
Open DatabasesChemical Vendors
Small organic moleculesUndefined materialsOrganometallicsNanomaterialsPolymersMineralsParticle boundLinks to Biologicals
![Page 66: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/66.jpg)
The Future of Chemistry on the Web?• Public compound databases federate & build a
linked environment of validated data!• Data validation needs are not ignored• Publishers layer on information to make
publications discoverable• Open Data proliferate• The “Semantic Web” will continue to develop…
![Page 67: eScience at the Royal Society of Chemistry and our current initiatives](https://reader036.fdocuments.us/reader036/viewer/2022062703/554ea596b4c905977e8b48d3/html5/thumbnails/67.jpg)
Thank you
Email: [email protected] Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams