Post on 09-Feb-2017
Chem4Word: Semantic Chemical Authoring within Microsoft WordAlex D. Wade Tony HeyDirector, Scholarly Communications Corporate VP Microsoft Research Connections Microsoft Research Connections
GEPS20112http://research.microsoft.com/connections/
Imagine…• Live research reports that had
multiple end-user ‘views’ and which could dynamically tailor their presentation to each user
• An authoring environment that absorbs and encapsulates research workflows and outputs from the lab experiments
• A report that can be dropped into an electronic lab workbench in order to reconstitute an entire experiment
• A researcher working with multiple reports on a Surface and having the ability to mash up data and workflows across experiments
• The ability to apply new analyses and visualizations and to perform new in silico experiments
Envisioning a New Era of Research Reporting
DynamicDocuments
Reputation& Influence
Reproducible Research
Interactive Data
Collaboration
Words & Pictures• Papers/reports today describe chemical reactions/entities in a
variety of ways: – common (or brand-name) labels– identifiers and shorthand notations– chemical formulae– two- (and three-) dimensional graphical images of molecular
structure.• Describing chemical data becomes an exercise in typesetting
and/or graphics, and cross- and re-referencing existing chemical entities is labor intensive. – The resulting text is usually interpretable by humans but
chemical data are lost in the process, making it difficult to programmatically extract meaningful information from such reports.
• The goals of Chem4Word are to: – simplify the task of authoring a chemical document,– do so in a way that produces a semantically meaningful document,
facilitating downstream tasks such as publishers workflows, entity extraction, and semantic applications.
Chemistry Add-in for Wordaka Chem4Word
• Chem4Word allows chemists to create, edit and manipulate chemistry in the Word environment, by– Providing a built in dictionary of chemical structures– Enabling online lookup of further structures via web services (e.g.
Pubchem)– Facilitating linking/embedding chemical structures inside a Word
document– Modification of chemical structures & representations of those
structures• Authoring is backed by semantic data in
Chemical Markup Language (CML), enabling:– novel functionality in data checking during the authoring process– chemistry-centric article reading support– data-mining applications.
• Open source project (Outercurve Foundation); Apache 2.0 license
• ~500K downloads to date
Word UI Extensibility• Ribbon• Task Pane• Gallery• Templates• Recognizers• Applications
FILE FORMATS:OFFICE OPEN XML DOCUMENTS
Thanks to: http://www.slideshare.net/HollowKnight/a-quick-tour-of-open-xml-format
Binaryformat
Office Open XMLformat
Binaryformat
Office Open XMLformat
THEY LOOK IDENTICAL, BUT …
Binaryformat
Office Open XMLformat
Office Open XMLis a ZIP file …
That contains XML parts
Images stored in native format
(JPEG, PNG, GIF, …)
Programmer View of Open XML Files
• ZIP Archive• Document Parts
– XML Parts– Binary Parts– Typed (RFC 2616)
• Relationships– Connections between parts
• Content Type Stream– A specially-named stream– Defines mappings from part names to content types– Not itself a part, not URI addressable
• Folder structure for convenience only
Multiple ‘views’ backed by a single CML data file
EXAMPLE OF GETTING CML DATA BACK OUT OF A DOCUMENT
Current publishing… is broken for data-rich science
With Chem4Word… the cycle is closed
Data publication difficult and unsupported
Insufficient data to fully support research
Data preparation integrated into user workflow
Open Standards promote Open Semantic Science
To conclude..
Important Details
• Project Site– http://research.microsoft.com/chem4word
• Binaries and source code– http://chem4word.codeplex.com
• Facebook Page– http://www.facebook.com/groups/186300551397797/
• Outercurve Foundation– http://www.outercurve.org
Contributors
University of Cambridge• Peter Murray-Rust• Jim Downing• Joe Townsend
Microsoft Research• Alex D. Wade• Savas Parastatidis• Oscar Naim• Pablo Fernicola• Murray Sargent• Geraldine Wade• Tola Chhoeun• Anthony Hanses• Jim McGill