ContentMining and Clinical Trials
-
Upload
thecontentmine -
Category
Health & Medicine
-
view
73 -
download
0
Transcript of ContentMining and Clinical Trials
![Page 1: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/1.jpg)
Content-Mining for Clinical TrialsPeter Murray-Rust
contentmine.orgCochrane UK, Oxford, 2015-03-16
• OPEN Platform for Machines+humans to automatically “read” the trials literature
• Grow communities and give everyone the tools and know-how to mine trials
![Page 2: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/2.jpg)
• 09:30 - Introductions10:00 - Overview of ContentMine10:30 - Discussion: why might content mining clinical trials be useful?11:00 - Tea/coffee break11:15 - Discussion: current tools and what is needed12:00 - Discussion: imagining the clinical trials mining pipeline12:30 - Lunch13:30 - Demo and introduction to software14:30 - Technical session 1 (hands-on content mining)15:30 - Tea/coffee break15:45 - Technical session 2 (hands-on content mining)17:00 - Event close
![Page 3: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/3.jpg)
Background for Today• Contentmine aims to make large areas of scientific fact OPEN (100
million facts/year)• We’re working with WellcomeTrust, Europe PubMedCentral, etc.• A politically “hot” area (Hargreaves legislation, EU activity)• A week ago WellcomeTrust workshop on TDM and Neuroscience; “rough
consensus” on what was needed.• In the last few days we’ve prototyped what we think is a good starting
point…• NOTE: The software is very “bleeding edge”! Please treat in a spirit of
adventure!!
• Vision/enthusiasm from Amy Price, Anna Noel-Storr, Emily Sena (E’burgh) and yourselves!
![Page 4: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/4.jpg)
Questions we could tackle
• How to we find (mentions of) clinical trials?• Is a document a (clinical) trial?• What is the subject of the trial?• What is the methodology used?• Does the design and practice conform to CONSORT?• What are the outcomes?• Can we extract specific re-usable information?• Who are involved? (researchers, sponsors, patients?)• Has a proposed trial been completed and reported?
![Page 5: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/5.jpg)
Afternoon session
• Work in groups; mixture of skills and experience
• Take different sections of CONSORT• Scrape articles from trialsjournal.com• Explore word frequency – create your own
lists of frequent words• Design regexes to extract CONSORT 8a->11
![Page 7: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/7.jpg)
https://en.wikipedia.org/wiki/Irrigation#mediaviewer/File:Pump-enabled_Riverside_Irrigation_in_Comilla,_Bangladesh,_25_April_2014.jpg CC BY-SA 3.0
Daily Stream of 100,000 Open Facts
Twitter?Indexed by CAT
![Page 8: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/8.jpg)
![Page 9: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/9.jpg)
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0111303&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRYTEXT
MATH
contentmine.org tackles these
![Page 10: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/10.jpg)
What is “Content”?
![Page 11: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/11.jpg)
Machine-Human symbioses
• Wikipedia• Open StreetMap
We aim to make it trivial for a human+machine to mine the scientific literature. By building Communities
![Page 12: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/12.jpg)
ContentMine Workshops and Hackdays
Open Science Brazil, 2014-08
Easily distributed software
Get started in 30 mins
Build application in a morning
Start simple: bagOfWords, Stemming, Regex, templates
![Page 13: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/13.jpg)
Oxford 2013
Berlin 2014
Delhi 2014
Jenny Molloy with mascot AMI
![Page 14: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/14.jpg)
Workshops (1-hour -> full day or more)
2014-May->Nov• Budapest/Shuttleworth• Leicester Univ• Electronic Theses and Dissertations• Austrian Science Fund AT• OKFest DE• Eur. Bioinformatics Institute• Open Science Rio de Janeiro BR• Sci DataCon , Delhi IN• Univ of Chicago US• OpenCon 2014, Wash DC. US• JISC , London
Upcoming• LIBER • Cochrane• BL• Wellcome Trust (April)• WHO
Collaborators
• Wikimedia/Wikidata• Mozilla• Open Knowledge• LIBER (European Research Libraries)• British Library• Wellcome Trust• EBI (Eur. Bioinf. Inst.)• JISC• Open Access Button• SPARC• Creative Commons• CORE• EuropePubmedCentral
![Page 15: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/15.jpg)
• CRAWL the web for scientific documents (articles, grey literature, repositories)• quickSCRAPE pages (text, graphics, images, data)• NORMA-lize page to semantic form
…Open semantic science …• MINE pages with your methods and tools (AMI)
• CAT-alogue results in searchable index• Automate daily process (CANARY)
contentmine.org Infrastructure
![Page 16: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/16.jpg)
quickscrapeCrawlFeed Norma Index &
Transform
XML
URL
DOI
Scientificliterature
Repositories DOC
CSV
sHTML
PluginsRegex
SequencesSpecies
Bespoke
ScrapersXPathPer-Journal
TaggersPer- Journal
MetadataChemistry
Phylogenetics Farming
AMI
BadHTML
OCR
Diagrams
Open NORMA-lized Scientific Literature + Facts
CANARY pipeline
CAT-alogue index
![Page 17: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/17.jpg)
https://commons.wikimedia.org/wiki/File:Flickr_-_DVIDSHUB_-_RSP_Warrior_Challenge_Prepares_Soldiers_Mentally,_Physically_%281%29.jpg
CRAWLing the Literature
NO Central Table of Contents
Massive technical, political, legal opposition
Little interest from Academia
Tedious
Few general tools
![Page 18: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/18.jpg)
The Right to Read is The Right To Mine
PMR in 2012: http://blog.okfn.org/2012/06/01/the-right-to-read-is-the-right-to-mine/
![Page 19: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/19.jpg)
SCRAPE
https://en.wikipedia.org/wiki/Gleaning#mediaviewer/File:Millet_Gleaners.jpg PublicDomain
HTML
XML quickscrape*
*Scrapers created by Richard Smith-Unna + Community
HTMLPDFXMLPNGSVGCSVDOCLaTeXCIF…
Non-standard per-publisher site
![Page 20: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/20.jpg)
https://en.wikipedia.org/wiki/W._Heath_Robinson#mediaviewer/File:Robinson%28WH%29-%28%27Uncle_Lubin%27%29.jpg PublicDomain
NORMA-lization of Scientific Literature
PDFs, Broken HTMLPNGs for Math, etc.
NORMA
UnicodeDiacriticsWell-formedSectionedTaggedSVG diagrams
![Page 21: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/21.jpg)
AMI-plugins• BagOfWords, Stemming and Regular Expressions• Species• Biological Sequences• Chemical compounds & reactions
• Farming * (Rory Aaronson)
• Crystallography * (Saulius Grazulis, COD)• Clinical Trials * (Amy Price)
• Phylogenetics * (Ross Mounce)
• Phytochemistry * (Chris Steinbeck, PMR)
* subcommunities
![Page 22: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/22.jpg)
Text-based plugins
• Bag of words (https://en.wikipedia.org/wiki/Bag-of-words_model)
• https://en.wikipedia.org/wiki/Tf%E2%80%93idf (Term-frequency, inverse document frequency)• Templates and regexes (regular expressions).
![Page 23: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/23.jpg)
“Bag of Words”
Three fulltext articles from trialsjournal.com
![Page 24: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/24.jpg)
Facts Marked by “non-scientists” in ContentMine workshops
With Wikipedia everyone can be a scientist
![Page 25: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/25.jpg)
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … chemical
project places
![Page 26: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/26.jpg)
Advanced Plugins
![Page 27: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/27.jpg)
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
![Page 28: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/28.jpg)
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
![Page 29: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/29.jpg)
![Page 30: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/30.jpg)
![Page 31: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/31.jpg)
UNITS
TICKS
QUANTITYSCALE
TITLES
DATA!!2000+ points
VECTOR PDF
![Page 32: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/32.jpg)
Dumb PDF
CSV
SemanticSpectrum
2nd Derivative
Smoothing Gaussian Filter
Automaticextraction
![Page 33: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/33.jpg)
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
![Page 34: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/34.jpg)
https://blogs.ch.cam.ac.uk/pmr/2014/06/25/content-mining-we-can-now-mine-images-of-phylogenetic-trees-and-more/ for story of extraction
Thinning Topology
Serialization
Newick
![Page 35: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/35.jpg)
Phytochemistry extraction
O. dayi
“volatile composition of “
A.sibeiri
A. judaica
Displayed by CAT (CottageLabs)
![Page 36: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/36.jpg)
contentmine.org proposed Services
• Workshops• Repository indexing• Funder Compliance• Publication enhancement• Extraction of scientific data
![Page 37: ContentMining and Clinical Trials](https://reader035.fdocuments.us/reader035/viewer/2022062820/58a54e451a28abef2c8b4b95/html5/thumbnails/37.jpg)
contentmine.org team