Evolution of open chemical information

44
Evolution of open chemical information Valery Tkachenko Royal Society of Chemistry ACS Fall 2016 Philadelphia, PA

Transcript of Evolution of open chemical information

Page 1: Evolution of open chemical information

Evolution of open chemical information

Valery TkachenkoRoyal Society of Chemistry

ACS Fall 2016Philadelphia, PA

Page 2: Evolution of open chemical information

The Short History of Time

Image credit: Rhys Taylor, Cardiff University

~1992

Page 3: Evolution of open chemical information
Page 4: Evolution of open chemical information

Chemical database

Page 5: Evolution of open chemical information
Page 6: Evolution of open chemical information

PubChem

Page 7: Evolution of open chemical information

• 57 million chemicals and growing• Data sourced from >500 different sources• Crowdsourced curation and annotation• Ongoing deposition of data from our

journals and our collaborators• A structure centric hub for web-searching

Page 8: Evolution of open chemical information

ChemSpider

Page 9: Evolution of open chemical information

ChemSpider

Page 10: Evolution of open chemical information

ChemSpider real-time curation

Page 11: Evolution of open chemical information

Article X-rayCompoundsReactionAnalytical DataText and References

Page 12: Evolution of open chemical information

Reaction 1: NextMove reaction text-mined from RSC archive – original article

Page 13: Evolution of open chemical information

Reaction 1: NextMove reaction text-mined from RSC archive – cml output

<?xml version="1.0" encoding="UTF-8"?><reactionList xmlns="http://www.xml-cml.org/schema" xmlns:cmlDict="http://www.xml-cml.org/dictionary/cml/" xmlns:nameDict="http://www.xml-cml.org/dictionary/cml/name/" xmlns:unit="http://www.xml-cml.org/unit/" xmlns:cml="http://www.xml-cml.org/schema" xmlns:dl="http://bitbucket.org/dan2097"><reaction> <dl:source> <dl:documentId>c3ra45871g</dl:documentId> <dl:paragraphText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid. [α]D20 −24.2 (c 1.1, CHCl3); 1H NMR (CDCl3, 300 MHz) δ 0.04 (s, 3H), 0.07 (s, 3H), 0.85 (s, 9H), 1.34 (s, 3H), 1.44 (s, 3H), 2.16 (br, 1H), 3.68–3.81 (m, 3H), 4.16 (t, J = 13.8 Hz, J = 13.8 Hz, 1H), 4.59 (t, J = 6.6 Hz, J = 6.6 Hz, 1H), 5.22 (d, J = 10.7 Hz, 1H), 5.34 (d, J = 17.1 Hz, 1H), 5.90 (ddd, J = 7.2 Hz, J = 10.2 Hz, J = 17.2 Hz, 1H); 13C NMR (CDCl3, 75 MHz) δ 134.1, 118.4, 108.5, 79.5, 78.8, 70.8, 65.0, 27.8, 25.9, 25.4, 18.1, −3.7, −4.4. HRMS (ESI) calcd for [M + Na]+ (C15H30O4SiNa) 325.1811, found 325.1807.</dl:paragraphText> </dl:source> <dl:reactionSmiles>[H-].C([Al+]CC(C)C)C(C)C.C([O:17][CH2:18][C@@H:19]([O:29][Si:30]([C:33]([CH3:36])([CH3:35])[CH3:34])([CH3:32])[CH3:31])[C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21]1)(=O)C(C)(C)C&gt;ClCCl&gt;[C:33]([Si:30]([CH3:32])([CH3:31])[O:29][C@@H:19]([C@@H:20]1[C@H:24]([CH:25]=[CH2:26])[O:23][C:22]([CH3:28])([CH3:27])[O:21]1)[CH2:18][OH:17])([CH3:36])([CH3:35])[CH3:34] |f:0.1|</dl:reactionSmiles> <productList> <product role="product"> <molecule id="m0"> <name dictRef="nameDict:unknown">10</name> <dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethanol</dl:nameResolved> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00102">1.02 mmol</amount> <amount dl:propertyType="MASS" dl:normalizedValue="0.308">308 mg</amount> <amount dl:propertyType="PERCENTYIELD" dl:normalizedValue="79">79%</amount> <amount dl:propertyType="CALCULATEDPERCENTYIELD" dl:normalizedValue="79.1" units="unit:percentYield">79.1</amount> <identifier dictRef="cml:smiles" value="C(C)(C)(C)[Si](O[C@H](CO)[C@H]1OC(O[C@H]1C=C)(C)C)(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/C15H30O4Si/c1-9-11-13(18-15(5,6)17-11)12(10-16)19-20(7,8)14(2,3)4/h9,11-13,16H,1,10H2,2-8H3/t11-,12+,13-/m0/s1"/> <dl:entityType>definiteReference</dl:entityType> <dl:appearance>colourless</dl:appearance> <dl:state>liquid</dl:state> </product> </productList> <reactantList> <reactant role="reactant"> <molecule id="m1"> <name dictRef="nameDict:unknown">Diisobutylaluminium hydride</name> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00323">3.23 mmol</amount> <amount dl:propertyType="MOLARITY" dl:normalizedValue="1.1">1.1 M</amount> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.00293">2.93 mL</amount> <identifier dictRef="cml:smiles" value="[H-].C(C(C)C)[Al+]CC(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/2C4H9.Al.H/c2*1-4(2)3;;/h2*4H,1H2,2-3H3;;/q;;+1;-1"/> <dl:entityType>exact</dl:entityType> </reactant> <reactant role="reactant" count="1"> <molecule id="m2"> <name dictRef="nameDict:unknown">9</name> <dl:nameResolved>(R)-2-((tert-Butyldimethylsilyl)oxy)-2-((4S,5S)-2,2-dimethyl-5-vinyl-1,3-dioxolan-4-yl)ethyl pivalate</dl:nameResolved> </molecule> <amount dl:propertyType="AMOUNT" dl:normalizedValue="0.00129">1.29 mmol</amount> <amount dl:propertyType="MASS" dl:normalizedValue="0.500">500 mg</amount> <identifier dictRef="cml:smiles" value="C(C(C)(C)C)(=O)OC[C@H]([C@H]1OC(O[C@H]1C=C)(C)C)O[Si](C)(C)C(C)(C)C"/> <identifier dictRef="cml:inchi" value="InChI=1S/C20H38O5Si/c1-12-14-16(24-20(8,9)23-14)15(13-22-17(21)18(2,3)4)25-26(10,11)19(5,6)7/h12,14-16H,1,13H2,2-11H3/t14-,15+,16-/m0/s1"/> <dl:entityType>definiteReference</dl:entityType> </reactant> </reactantList> <spectatorList> <spectator role="solvent"> <molecule id="m3"> <name dictRef="nameDict:unknown">dichloromethane</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.020">20 mL</amount> <identifier dictRef="cml:smiles" value="ClCCl"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH2Cl2/c2-1-3/h1H2"/> <dl:entityType>exact</dl:entityType> </spectator> </spectatorList> <dl:reactionActionList> <dl:reactionAction action="Add"> <dl:phraseText>Diisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C</dl:phraseText> <dl:chemical ref="m1"/> <dl:chemical ref="m2"/> <dl:chemical ref="m3"/> <dl:parameter propertyType="Temperature" normalizedValue="-78">-78 °C.</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Stir"> <dl:phraseText>The reaction mixture was stirred at −78 °C for another 2 h</dl:phraseText> <dl:parameter propertyType="Time" normalizedValue="7200">2 h</dl:parameter> <dl:parameter propertyType="Temperature" normalizedValue="-78">-78 °C</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Heat"> <dl:phraseText>warmed up to rt</dl:phraseText> <dl:parameter propertyType="Temperature" normalizedValue="room temperature">rt</dl:parameter> </dl:reactionAction> <dl:reactionAction action="Quench"> <dl:phraseText>quenched with methanol (3 mL) and citric acid(aq) (w/w, 10%, 5 mL)</dl:phraseText> <chemical> <molecule id="m4"> <name dictRef="nameDict:unknown">methanol</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.003">3 mL</amount> <identifier dictRef="cml:smiles" value="CO"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH4O/c1-2/h2H,1H3"/> <dl:entityType>exact</dl:entityType> </chemical> <chemical> <molecule id="m5"> <name dictRef="nameDict:unknown">citric acid</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.005">5 mL</amount> <identifier dictRef="cml:smiles" value="C(CC(O)(C(=O)O)CC(=O)O)(=O)O"/> <identifier dictRef="cml:inchi" value="InChI=1S/C6H8O7/c7-3(8)1-6(13,5(11)12)2-4(9)10/h13H,1-2H2,(H,7,8)(H,9,10)(H,11,12)"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Concentrate"> <dl:phraseText>concentrated</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Add"> <dl:phraseText>The residue was added with water (10 mL)</dl:phraseText> <chemical> <molecule id="m6"> <name dictRef="nameDict:unknown">water</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.010">10 mL</amount> <identifier dictRef="cml:smiles" value="O"/> <identifier dictRef="cml:inchi" value="InChI=1S/H2O/h1H2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Extract"> <dl:phraseText>extracted with dichloromethane (12 mL × 3)</dl:phraseText> <chemical> <molecule id="m7"> <name dictRef="nameDict:unknown">dichloromethane</name> </molecule> <amount dl:propertyType="VOLUME" dl:normalizedValue="0.012">12 mL</amount> <identifier dictRef="cml:smiles" value="ClCCl"/> <identifier dictRef="cml:inchi" value="InChI=1S/CH2Cl2/c2-1-3/h1H2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Dry"> <dl:phraseText>dried over Na2SO4</dl:phraseText> <chemical> <molecule id="m8"> <name dictRef="nameDict:unknown">Na2SO4</name> </molecule> <identifier dictRef="cml:smiles" value="[Na+].[Na+].[O-]S(=O)(=O)[O-]"/> <identifier dictRef="cml:inchi" value="InChI=1S/2Na.H2O4S/c;;1-5(2,3)4/h;;(H2,1,2,3,4)/q2*+1;/p-2"/> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Filter"> <dl:phraseText>filtered</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Concentrate"> <dl:phraseText>concentrated</dl:phraseText> </dl:reactionAction> <dl:reactionAction action="Purify"> <dl:phraseText>The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33)</dl:phraseText> <chemical> <molecule id="m9"> <name dictRef="nameDict:unknown">crude product</name> </molecule> <dl:entityType>definiteReference</dl:entityType> </chemical> <chemical> <molecule id="m10"> <name dictRef="nameDict:unknown">SiO2</name> </molecule> <dl:entityType>falsePositive</dl:entityType> </chemical> <chemical> <molecule id="m11"> <name dictRef="nameDict:unknown">EtOAc-hexanes</name> </molecule> <dl:entityType>exact</dl:entityType> </chemical> </dl:reactionAction> <dl:reactionAction action="Yield"> <dl:phraseText>to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid</dl:phraseText> <dl:chemical ref="m0"/> </dl:reactionAction> </dl:reactionActionList> </reaction></reactionList>

Page 14: Evolution of open chemical information

Reaction 1: procedure stepsDiisobutylaluminium hydride (1.1 M in cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C. The reaction mixture was stirred at −78 °C for another 2 h, warmed up to rt, quenched with methanol (3 mL) and citric acid (aq) (w/w, 10%, 5 mL), concentrated. The residue was added with water (10 mL) and extracted with dichloromethane (12 mL × 3). The organic layers were combined, dried over Na2SO4, filtered and concentrated. The crude product was further purified by column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33) to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid.

Text mining breaks down procedure summary into steps: <dl:reactionActionList/dl:reactionActions> dl:phraseTexts• action="Add“: Diisobutylaluminium hydride (1.1 M in

cyclohexane, 2.93 mL, 3.23 mmol) was added dropwise to the solution of 9 (500 mg, 1.29 mmol) and dichloromethane (20 mL) at −78 °C

• action=" Stir“: The reaction mixture was stirred at −78 °C for another 2 h

• action="Heat“: warmed up to rt• action="Quench“: quenched with methanol (3 mL) and

citric acid(aq) (w/w, 10%, 5 mL)• action="Concentrate“: concentrated• action="Add“: The residue was added with water (10 mL)• action="Extract“: extracted with dichloromethane (12 mL ×

3)• action="Dry“: dried over Na2SO4• action="Filter“: filtered• action="Concentrate“: concentrated• action="Purify“: The crude product was further purified by

column chromatography (SiO2, EtOAc–hexanes, 1 : 7; Rf 0.33)

• action="Yield“: to give 10 (308 mg, 1.02 mmol, 79%) as a colourless liquid

Page 16: Evolution of open chemical information

The World we are heading into

http://www.gartner.com/newsroom/id/3143521

Page 17: Evolution of open chemical information

Our World is hyperconnected

Page 18: Evolution of open chemical information

Standards?

Page 19: Evolution of open chemical information

Data quality issues

Robochemistry

Proliferation of errors in public and private databases

Automated quality control system

Page 20: Evolution of open chemical information

CVSP

Page 21: Evolution of open chemical information

CVSP – submission details

Page 22: Evolution of open chemical information

CVSP – issues review

Page 23: Evolution of open chemical information

J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10

DB06287

Page 24: Evolution of open chemical information

CVSP - mapping

Page 25: Evolution of open chemical information

CVSP – rules

Page 27: Evolution of open chemical information

D2I2K2W

Page 28: Evolution of open chemical information

[email protected] @Open_PHACTS

Open PHACTS Practical SemanticsOpenPHACTS

GlaxoSmithKline – CoordinatorUniversität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit AmsterdamNovartisMerck SeronoH. Lundbeck A/SEli LillyNetherlands Bioinformatics CentreSwiss Institute of BioinformaticsConnectedDiscoveryEMBL-European Bioinformatics InstituteJanssen Esteve AlmirallOpenLink ScibiteThe Open PHACTS FoundationSpanish National Cancer Research Centre University of Manchester Maastricht University AqnowledgeUniversity of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität BonnAstraZenecaPfizer

Page 29: Evolution of open chemical information

Why is it so hard to….

Competitors?

What’s the structure?

Are they in our file?

What’s similar?

What’s the target?Pharmacology

data?

Known Pathways?

Working On Now?Connections to

disease?

Expressed in right cell type?

IP?

Page 30: Evolution of open chemical information

30@gray_alasdair Big Data Integration

Knowledge is federated

Page 31: Evolution of open chemical information

Publishing – then…

Page 32: Evolution of open chemical information

…and now?

Page 33: Evolution of open chemical information

http://ec.europa.eu/research/press/2016/pdf/opendata-infographic_072016.pdf

Page 34: Evolution of open chemical information

Data Market

Page 35: Evolution of open chemical information

Publishers - the guardians of knowledge

This is a poster for Guardians of the Galaxy. The poster art copyright is believed to belong to the distributor of the Film, Walt Disney Studios Motion Pictures, the publisher, Marvel Studios, or the graphic artist.

Page 36: Evolution of open chemical information

Data Publishing

Original artist: Joseph Ferdinand Keppler (1838-1894) Restoration: Adam Cuerden - http://www.loc.gov/pictures/item/2011661385/ by way ofhttp://adamcuerden.deviantart.com/gallery/#/d5onmxh

Page 37: Evolution of open chemical information

The World we live in

Page 38: Evolution of open chemical information
Page 39: Evolution of open chemical information

Moore’s Law

Page 40: Evolution of open chemical information

"Internet host count history". Internet Systems Consortium. Retrieved May 16,2012.

Page 41: Evolution of open chemical information

We are on a verge of a new technical revolutionand it feels great to anticipate it and be ready to ride!

Image from surfline.com by Mike Cianciulli

Page 42: Evolution of open chemical information
Page 43: Evolution of open chemical information

Data Science @ RSC

The team. From left to right: Valery Tkachenko and Alexey Pshenichnov, based in the United States; Aileen Day, based in Southampton; John Boyle, Peter Corbett, Colin Batchelor, Jeff White, Nicholas

Bailey and Val the plant, based at TGH

Page 44: Evolution of open chemical information

Thank you

Email: [email protected]

Slides: http://www.slideshare.net/valerytkachenko16