Evolution of open chemical information
-
Upload
valery-tkachenko -
Category
Science
-
view
47 -
download
0
Transcript of Evolution of open chemical information
Evolution of open chemical information
Valery TkachenkoRoyal Society of Chemistry
ACS Fall 2016Philadelphia, PA
The Short History of Time
Image credit: Rhys Taylor, Cardiff University
~1992
Chemical database
PubChem
• 57 million chemicals and growing• Data sourced from >500 different sources• Crowdsourced curation and annotation• Ongoing deposition of data from our
journals and our collaborators• A structure centric hub for web-searching
ChemSpider
ChemSpider
ChemSpider real-time curation
Article X-rayCompoundsReactionAnalytical DataText and References
Reaction 1: NextMove reaction text-mined from RSC archive – original article
Reaction 1: NextMove reaction text-mined from RSC archive – cml output
<?xml version="1.0" encoding="UTF-8"?><reactionList xmlns="http://www.xml-cml.org/schema" xmlns:cmlDict="http://www.xml-cml.org/dictionary/cml/" xmlns:nameDict="http://www.xml-cml.org/dictionary/cml/name/" xmlns:unit="http://www.xml-cml.org/unit/" xmlns:cml="http://www.xml-cml.org/schema" xmlns:dl="http://bitbucket.org/dan2097"><reaction> <dl:source> <dl:documentId>c3ra45871g</dl:documentId> <dl:paragraphText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid. [α]D20 −24.2 (c 1.1, CHCl3); 1H NMR (CDCl3, 300 MHz) δ 0.04 (s, 3H), 0.07 (s, 3H), 0.85 (s, 9H), 1.34 (s, 3H), 1.44 (s, 3H), 2.16 (br, 1H), 3.68–3.81 (m, 3H), 4.16 (t, J = 13.8 Hz, J = 13.8 Hz, 1H), 4.59 (t, J = 6.6 Hz, J = 6.6 Hz, 1H), 5.22 (d, J = 10.7 Hz, 1H), 5.34 (d, J = 17.1 Hz, 1H), 5.90 (ddd, J = 7.2 Hz, J = 10.2 Hz, J = 17.2 Hz, 1H); 13C NMR (CDCl3, 75 MHz) δ 134.1, 118.4, 108.5, 79.5, 78.8, 70.8, 65.0, 27.8, 25.9, 25.4, 18.1, −3.7, −4.4. HRMS (ESI) calcd for [M + Na]+ (C15H30O4SiNa) 325.1811, found 325.1807.</dl:paragraphText> </dl:source> <dl:reactionSmiles>[H-].C([Al+]CC(C)C)C(C)C.C([O:17][CH2:18][C@@H:19]([O:29][Si:30]([C:33]([CH3:36])([CH3:35])[CH3:34])([CH3:32])[CH3:31])[C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21]1)(=O)C(C)(C)C>ClCCl>[C:33]([Si:30]([CH3:32])([CH3:31])[O:29][C@@H:19]([C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21]1)[CH2:18][OH:17])([CH3:36])([CH3:35])[CH3:34] |f:0.1|</dl:reactionSmiles> <productList> <product role="product"> <molecule id="m0"> <name dictRef="nameDict:unknown">10</name> <dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethanol</dl:nameResolved> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00102">1.02 mmol</amount> <amount dl:propertyType="MASS" dl:normalizedValue="0.308">308 mg</amount> <amount dl:propertyType="PERCENTYIELD" dl:normalizedValue="79">79%</amount> <amount dl:propertyType="CALCULATEDPERCENTYIELD" dl:normalizedValue="79.1" units="unit:percentYield">79.1</amount> <identifier dictRef="cml:smiles" value="C(C)(C)(C)[Si](O[C@H](CO)[C@H]1OC(O[C@H]1C=C)(C)C)(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/C15H30O4Si/c1-9-11-13(18-15(5,6)17-11)12(10-16)19-20(7,8)14(2,3)4/h9,11-13,16H,1,10H2,2-8H3/t11-,12+,13-/m0/s1"/> <dl:entityType>definiteReference</dl:entityType> <dl:appearance>colourless</dl:appearance> <dl:state>liquid</dl:state> </product> </productList> <reactantList> <reactant role="reactant"> <molecule id="m1"> <name dictRef="nameDict:unknown">Diisobutylaluminium hydride</name> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00323">3.23 mmol</amount> <amount dl:propertyType="MOLARITY" dl:normalizedValue="1.1">1.1 M</amount> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.00293">2.93 mL</amount> <identifier dictRef="cml:smiles" value="[H-].C(C(C)C)[Al+]CC(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/2C4H9.Al.H/c2*1-4(2)3;;/h2*4H,1H2,2-3H3;;/q;;+1;-1"/> <dl:entityType>exact</dl:entityType> </reactant> <reactant role="reactant" count="1"> <molecule id="m2"> <name dictRef="nameDict:unknown">9</name> <dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethyl pivalate</dl:nameResolved> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00129">1.29 mmol</amount> <amount dl:propertyType="MASS" dl:normalizedValue="0.500">500 mg</amount> <identifier dictRef="cml:smiles" value="C(C(C)(C)C)(=O)OC[C@H]([C@H]1OC(O[C@H]1C=C)(C)C)O[Si](C)(C)C(C)(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/C20H38O5Si/c1-12-14-16(24-20(8,9)23-14)15(13-22-17(21)18(2,3)4)25-26(10,11)19(5,6)7/h12,14-16H,1,13H2,2-11H3/t14-,15+,16-/m0/s1"/> <dl:entityType>definiteReference</dl:entityType> </reactant> </reactantList> <spectatorList> <spectator role="solvent"> <molecule id="m3"> <name dictRef="nameDict:unknown">dichloromethane</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.020">20 mL</amount> <identifier dictRef="cml:smiles" value="ClCCl"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH2Cl2/c2-1-3/h1H2"/> <dl:entityType>exact</dl:entityType> </spectator> </spectatorList> <dl:reactionActionList> <dl:reactionAction action="Add"> <dl:phraseText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C</dl:phraseText> <dl:chemical ref="m1"/> <dl:chemical ref="m2"/> <dl:chemical ref="m3"/> <dl:parameter propertyType="Temperature" normalizedValue="-78">-78 °C.</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Stir"> <dl:phraseText>The reaction mixture was stirred at −78 °C for another 2 h</dl:phraseText> <dl:parameter propertyType="Time" normalizedValue="7200">2 h</dl:parameter> <dl:parameter propertyType="Temperature" normalizedValue="-78">-78 °C</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Heat"> <dl:phraseText>warmed up to rt</dl:phraseText> <dl:parameter propertyType="Temperature" normalizedValue="room temperature">rt</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Quench"> <dl:phraseText>quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL)</dl:phraseText> <chemical> <molecule id="m4"> <name dictRef="nameDict:unknown">methanol</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.003">3 mL</amount> <identifier dictRef="cml:smiles" value="CO"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH4O/c1-2/h2H,1H3"/> <dl:entityType>exact</dl:entityType> </chemical> <chemical> <molecule id="m5"> <name dictRef="nameDict:unknown">citric acid</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.005">5 mL</amount> <identifier dictRef="cml:smiles" value="C(CC(O)(C(=O)O)CC(=O)O)(=O)O"/> <identifier dictRef="cml:inchi" value="InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10/h13H,1-2H2,(H,7,8)(H,9,10)(H,11,12)"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Concentrate"> <dl:phraseText>concentrated</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Add"> <dl:phraseText>The residue was added with water (10 mL)</dl:phraseText> <chemical> <molecule id="m6"> <name dictRef="nameDict:unknown">water</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.010">10 mL</amount> <identifier dictRef="cml:smiles" value="O"/> <identifier dictRef="cml:inchi" value="InChI=1S/H2O/h1H2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Extract"> <dl:phraseText>extracted with dichloromethane (12 mL × 3)</dl:phraseText> <chemical> <molecule id="m7"> <name dictRef="nameDict:unknown">dichloromethane</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.012">12 mL</amount> <identifier dictRef="cml:smiles" value="ClCCl"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH2Cl2/c2-1-3/h1H2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Dry"> <dl:phraseText>dried over Na2SO4</dl:phraseText> <chemical> <molecule id="m8"> <name dictRef="nameDict:unknown">Na2SO4</name> </molecule> <identifier dictRef="cml:smiles" value="[Na+].[Na+].[O-]S(=O)(=O)[O-]"/> <identifier dictRef="cml:inchi" value="InChI=1S/2Na.H2O4S/c;;1-5(2,3)4/h;;(H2,1,2,3,4)/q2*+1;/p-2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Filter"> <dl:phraseText>filtered</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Concentrate"> <dl:phraseText>concentrated</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Purify"> <dl:phraseText>The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33)</dl:phraseText> <chemical> <molecule id="m9"> <name dictRef="nameDict:unknown">crude product</name> </molecule> <dl:entityType>definiteReference</dl:entityType> </chemical> <chemical> <molecule id="m10"> <name dictRef="nameDict:unknown">SiO2</name> </molecule> <dl:entityType>falsePositive</dl:entityType> </chemical> <chemical> <molecule id="m11"> <name dictRef="nameDict:unknown">EtOAc-hexanes</name> </molecule> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Yield"> <dl:phraseText>to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid</dl:phraseText> <dl:chemical ref="m0"/> </dl:reactionAction> </dl:reactionActionList> </reaction></reactionList>
Reaction 1: procedure stepsDiisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid (aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid.
Text mining breaks down procedure summary into steps: <dl:reactionActionList/dl:reactionActions> dl:phraseTexts• action="Add“: Diisobutylaluminium hydride (1.1 M in
cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C
• action=" Stir“: The reaction mixture was stirred at −78 °C for another 2 h
• action="Heat“: warmed up to rt• action="Quench“: quenched with methanol (3 mL) and
citric acid(aq) (w/w, 10%, 5 mL)• action="Concentrate“: concentrated• action="Add“: The residue was added with water (10 mL)• action="Extract“: extracted with dichloromethane (12 mL ×
3)• action="Dry“: dried over Na2SO4• action="Filter“: filtered• action="Concentrate“: concentrated• action="Purify“: The crude product was further purified by
column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33)
• action="Yield“: to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid
http://www.wired.com/2014/04/google-project-ara/
http://www.wsj.com/articles/googles-modular-phones-to-go-on-sale-next-year-1463783371
The World we are heading into
http://www.gartner.com/newsroom/id/3143521
Our World is hyperconnected
Standards?
Data quality issues
Robochemistry
Proliferation of errors in public and private databases
Automated quality control system
CVSP
CVSP – submission details
CVSP – issues review
J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10
DB06287
CVSP - mapping
CVSP – rules
Dimensions and complexity of science
D2I2K2W
[email protected] @Open_PHACTS
Open PHACTS Practical SemanticsOpenPHACTS
GlaxoSmithKline – CoordinatorUniversität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit AmsterdamNovartisMerck SeronoH. Lundbeck A/SEli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics InstituteJanssen Esteve AlmirallOpenLink ScibiteThe Open PHACTS FoundationSpanish National Cancer Research Centre University of Manchester Maastricht University AqnowledgeUniversity of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität BonnAstraZenecaPfizer
Why is it so hard to….
Competitors?
What’s the structure?
Are they in our file?
What’s similar?
What’s the target?Pharmacology
data?
Known Pathways?
Working On Now?Connections to
disease?
Expressed in right cell type?
IP?
30@gray_alasdair Big Data Integration
Knowledge is federated
Publishing – then…
…and now?
http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf
Data Market
Publishers - the guardians of knowledge
This is a poster for Guardians of the Galaxy. The poster art copyright is believed to belong to the distributor of the Film, Walt Disney Studios Motion Pictures, the publisher, Marvel Studios, or the graphic artist.
Data Publishing
Original artist: Joseph Ferdinand Keppler (1838-1894) Restoration: Adam Cuerden - http://www.loc.gov/pictures/item/2011661385/ by way ofhttp://adamcuerden.deviantart.com/gallery/#/d5onmxh
The World we live in
Moore’s Law
"Internet host count history". Internet Systems Consortium. Retrieved May 16,2012.
We are on a verge of a new technical revolutionand it feels great to anticipate it and be ready to ride!
Image from surfline.com by Mike Cianciulli
Data Science @ RSC
The team. From left to right: Valery Tkachenko and Alexey Pshenichnov, based in the United States; Aileen Day, based in Southampton; John Boyle, Peter Corbett, Colin Batchelor, Jeff White, Nicholas
Bailey and Val the plant, based at TGH