Mining Emotions in Short Films: User Comments or Crowdsourcing?
Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
-
Upload
antony-williams-chemconnector -
Category
Technology
-
view
2.791 -
download
2
description
Transcript of Crowdsourcing, Collaborations and Text-Mining in a World of Open Chemistry
Crowdsourcing, Collaborations Crowdsourcing, Collaborations and Text-Mining in a World of and Text-Mining in a World of
Open Chemistry Open Chemistry
Antony WilliamsAntony WilliamsBio-IT World 2009Bio-IT World 2009
Building a Structure Centric Community for Chemists
Linked Data CloudLinked Data Cloud
Building a Structure Centric Community for Chemists
Chemistry on the InternetChemistry on the Internet
Much of the information online is Much of the information online is User Beware! User Beware!
The Quality of information is “diverse”The Quality of information is “diverse”
Technologies can “link and connect” information Technologies can “link and connect” information but validation and curation is key to providing but validation and curation is key to providing qualityquality
The LinkedData web is of less value when the The LinkedData web is of less value when the data linked are “wrong”data linked are “wrong”
Building a Structure Centric Community for Chemists
Quality Costs Quality Costs
Chemical Abstracts ServiceChemical Abstracts Service (CAS), a (CAS), a division of the ACS is “Gold Standard” in division of the ACS is “Gold Standard” in Chemistry related informationChemistry related information 101 years of content, $260 million revenue 101 years of content, $260 million revenue
(2006), >40 million substances and 60 million (2006), >40 million substances and 60 million sequencessequences
But online…But online…
Building a Structure Centric Community for Chemists
What is “wrong”?What is “wrong”?
Building a Structure Centric Community for Chemists
A platform for:A platform for: Data deposition, Data deposition, curation and annotationcuration and annotation Supporting Open Notebook Science effortsSupporting Open Notebook Science efforts Chemistry document mark-up with ChemMantisChemistry document mark-up with ChemMantis The Open Access ChemSpider Journal of The Open Access ChemSpider Journal of
ChemistryChemistry
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Search CholesterolSearch Cholesterol
Building a Structure Centric Community for Chemists
Complex Data and InformationComplex Data and Information
Building a Structure Centric Community for Chemists
Online DataOnline Data
Many websites host structure-based Many websites host structure-based informationinformation
Question quality!!!Question quality!!!
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Wikipedia, C&E News, Wikipedia, C&E News, PubChemPubChem
C&E News C&E News (from ACS)(from ACS)
Building a Structure Centric Community for Chemists
Does one stereocenter matter?Does one stereocenter matter?
Building a Structure Centric Community for Chemists
VancomycinVancomycin
Who will Who will curate?curate?
PubChem is PubChem is not resourced not resourced to clean these to clean these errors errors
How would How would you clean such you clean such a large a large dataset?dataset?
Building a Structure Centric Community for Chemists
Vancomycin Vancomycin ChemSpider: 1 compound – 3 days ChemSpider: 1 compound – 3 days
Building a Structure Centric Community for Chemists
Question EverythingQuestion Everythingwww.dhmo.orgwww.dhmo.org
Building a Structure Centric Community for Chemists
DailyMedDailyMed
“ “DailyMed provides DailyMed provides high qualityhigh quality information about marketed drugs. information about marketed drugs.
This information includes FDA approved This information includes FDA approved labels (package inserts).”labels (package inserts).”
Building a Structure Centric Community for Chemists
The FDA’s DailyMedThe FDA’s DailyMed
Building a Structure Centric Community for Chemists
Structures on DailyMedStructures on DailyMedPoor RepresentationsPoor Representations
Building a Structure Centric Community for Chemists
Structures on DailyMedStructures on DailyMedLack of StereochemistyLack of Stereochemisty
Building a Structure Centric Community for Chemists
Incorrect StructuresIncorrect StructuresScanning (?) IssuesScanning (?) Issues
Building a Structure Centric Community for Chemists
Incorrect StructuresIncorrect Structures
Building a Structure Centric Community for Chemists
Does it Matter?Does it Matter?
Does it matter to the consumer that the Does it matter to the consumer that the structures are wrong? No…what matters structures are wrong? No…what matters is what is in the bottle is the right is what is in the bottle is the right medication!medication!
To make DailyMed structure searchable it To make DailyMed structure searchable it DOES matterDOES matter
To data mine DailyMed it mattersTo data mine DailyMed it matters To mark up DailyMed it mattersTo mark up DailyMed it matters
Building a Structure Centric Community for Chemists
CollaborativeCollaborative Knowledge Knowledge Management Management for Chemistsfor Chemists
Building a Structure Centric Community for Chemists
Wikipedia Links to DrugbankWikipedia Links to Drugbank
Building a Structure Centric Community for Chemists
Taxol on PubChemTaxol on PubChem
Building a Structure Centric Community for Chemists
Taxol on Daily MedTaxol on Daily Med
Building a Structure Centric Community for Chemists
The InChI IdentifierThe InChI Identifier
Building a Structure Centric Community for Chemists
Multiple LayersMultiple Layers
Source: Unofficial InChI FAQ pageSource: Unofficial InChI FAQ page
Building a Structure Centric Community for Chemists
InChIStrings Hash to InChIStrings Hash to InChIKeysInChIKeys
Building a Structure Centric Community for Chemists
InChIs for TaxolInChIs for Taxol
Building a Structure Centric Community for Chemists
Back to TaxolBack to Taxol
DrugBank: RCINICONZNJXQF-CLDWUXIMDDDrugBank: RCINICONZNJXQF-CLDWUXIMDD
ChEBI: ChEBI: RCINICONZNJXQF-GXKQXQCDDN RCINICONZNJXQF-GXKQXQCDDN
Wikipedia: Wikipedia: RCINICONZNJXQF-MZXODVADBJ
Which one is correct???
Building a Structure Centric Community for Chemists
InChIKeys for TaxolInChIKeys for Taxol
DrugBank: RCINICONZNJXQF-DrugBank: RCINICONZNJXQF-CLDWUXIMDDCLDWUXIMDD
ChEBI: ChEBI: RCINICONZNJXQF-GXKQXQCDDN RCINICONZNJXQF-GXKQXQCDDN
Wikipedia: Wikipedia: RCINICONZNJXQF-MZXODVADBJ
ChEBI and Wikipedia are the SAME structure Drugbank is a DIFFERENT structure – ONE
stereocenter
Building a Structure Centric Community for Chemists
The InChI ResolverThe InChI Resolver
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
Coming Soon…Linked ArticlesComing Soon…Linked Articles
Building a Structure Centric Community for Chemists
How bad can it get???How bad can it get???And who is right????And who is right????
Building a Structure Centric Community for Chemists
ChemMantisChemMantis
ChemChemical ical MMarkup arkup AAnd nd NNomenclature omenclature TTransformation ransformation IIntegrated ntegrated SSystem – ystem – ChemMantisChemMantis
A platform for entity extraction for chemistry A platform for entity extraction for chemistry documents, markup and integration to online documents, markup and integration to online information sources – Wikipedia, ChemSpider, information sources – Wikipedia, ChemSpider, Entrez…Entrez…
Web-based submission, markup and publishing Web-based submission, markup and publishing platform now hosting the platform now hosting the ChemSpider Journal of ChemSpider Journal of ChemistryChemistry
Building a Structure Centric Community for Chemists
ChemMantis MarkupChemMantis Markup
Building a Structure Centric Community for Chemists
Enable Electronic Articles…Enable Electronic Articles…
Structures are the Structures are the language of language of chemistrychemistry
Show structures to Show structures to chemists and chemists and search/link from search/link from there…there…
Building a Structure Centric Community for Chemists
Species MarkupSpecies Markup
Building a Structure Centric Community for Chemists
Dictionaries are Easily Dictionaries are Easily EnhancedEnhanced
Copy-Paste into appropriate Entity Copy-Paste into appropriate Entity DictionaryDictionary
Impacts all future markupsImpacts all future markups
Expanding knowledgebases of informationExpanding knowledgebases of information
Linked out to rich sources of informationLinked out to rich sources of information
Building a Structure Centric Community for Chemists
Build Dictionaries Build Dictionaries Ontologies Next Ontologies Next
Building a Structure Centric Community for Chemists
Outlinks…Outlinks…
Building a Structure Centric Community for Chemists
Publishers and Document Publishers and Document Mark-UpMark-Up
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider Everywhere
Linked from WikipediaLinked from Wikipedia
Linked from Open Notebook Science sites using Linked from Open Notebook Science sites using EMBEDEMBED
Linked from Blogs using Structure/Spectra EMBEDLinked from Blogs using Structure/Spectra EMBED
Integrated into structure drawing packages such as Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source appletsACD/ChemSketch, Symyx Draw, Open Source applets
Integrated to software offerings from Thermo, Integrated to software offerings from Thermo, Waters, Agilent, BrukerWaters, Agilent, Bruker
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereEmbed Functionality (like Embed Functionality (like
YouTube)YouTube)
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider Everywherewww.spectralgame.comwww.spectralgame.com
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereCrowdsourced Curation of SpectraCrowdsourced Curation of Spectra
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereRSC CompoundsRSC Compounds
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereNature ChemistryNature Chemistry
Nature ChemistryNature Chemistry articles articles are annotated to identify all are annotated to identify all of the chemical compounds of the chemical compounds mentioned throughout the mentioned throughout the text. text.
Those compounds are linked Those compounds are linked out to other information out to other information resources including resources including PubChem and PubChem and ChemSpiderChemSpider. .
Building a Structure Centric Community for Chemists
ChemSpider EverywhereChemSpider EverywhereChemMobiChemMobi
Building a Structure Centric Community for Chemists
Structure RSS Feeds with Structure RSS Feeds with InChIsInChIs
Building a Structure Centric Community for Chemists
Building a Structure Centric Community for Chemists
AcknowledgmentsAcknowledgments
Richard Kidd, Royal Society of ChemistryRichard Kidd, Royal Society of Chemistry Jason Wilde, Nature Publishing GroupJason Wilde, Nature Publishing Group Martin Walker and the Wikipedia Chemistry Martin Walker and the Wikipedia Chemistry
teamteam Microsoft – Rudy PotenzoneMicrosoft – Rudy Potenzone Symyx – Keith Taylor and James JackSymyx – Keith Taylor and James Jack SureChem – Nicko Goncharoff SureChem – Nicko Goncharoff Spectral game - Andrew Lang and Jean-Spectral game - Andrew Lang and Jean-
Claude BradleyClaude Bradley ““The InChI team and Advisory Group”The InChI team and Advisory Group”
Building a Structure Centric Community for Chemists
ConclusionsConclusions
www.chemspider.comwww.chemspider.com
www.chemspider.com/journalwww.chemspider.com/journal
InChIs and Internet ChemistryInChIs and Internet Chemistry
http://inchis.chemspider.comhttp://inchis.chemspider.com