Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

18
CONTENT-MINING FOR TRANSPARENCY OF DRUG RESEARCH @chris_kittel @stefankasberger

Transcript of Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Page 1: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

CONTENT-MINING FOR TRANSPARENCY OF DRUG

RESEARCH

@chris_kittel@stefankasberger

Page 2: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

1. open bit.ly/cm-mozfest15

2. open pad

3. copy files

Page 3: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

1. Collaboration

2. Reproducibility

3. Big scale

4. Open Research Data

5. Do it together

Page 4: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Agenda:

1. Introduction (30min)

2. Hands-On (60min)

3. World Cafe (45min)

Page 5: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Introduction Round

Page 6: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

ContentMine

Page 7: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

THE SCALE OF THE TASK• ~ 27,000 peer reviewed journals*

• > 5,000 publishers

• ~ 3,000 new papers per day

• “costing” 15 Billion USD to publish

• Representing 500 Billion USD of

research*Ulrich’s database:

http://ulrichsweb.serialssolutions.com/login

Page 8: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

The right to read

is the right to mine.

Page 9: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Facts in contextdaily IUCN endangered species news

en.wikipedia.org CC By-SA

Page 10: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

catalogue

getpapers

query

DailyCrawl

EuPMC, arXivCORE , HAL,(UNIV repos)

ToCservices

PDF HTMLDOC ePUB TeX XML

PNGEPS CSV

XLSURLsDOIs

crawl

quickscrape

norma

NormalizerSectionerSemanticTagger

Text

DataFigures

ami

UNIVRepos

search

Lookup

CONTENTMINING

COMMUNITY

plugins

Visualizationand Analysis

PloSONE, BMC, peerJ… Nature, IEEE, Elsevier…

Publisher Sites

scrapers

taggers

abstract

methods

references

Captioned Figures

Fig. 1

HTML tables

Up to 30, 000 pages/day Semantic ScholarlyHTML

Facts

Page 11: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Supertree for 924 species

Tree

Page 12: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research
Page 13: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

HACK WITH FACTS

Page 14: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

What have you found?

Page 15: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

WORLD CAFE

Page 16: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

1. Get in groups (4-5 people)

2. 3 rounds (discuss and document)

3. Harvest in Circle

Page 17: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

Questions:

1. What kind of questions could the data from the

hacking session answer in terms of transparency

and collaboration?

2. What are the opportunities you see in

ContentMining on a massive scale? Think big!

3. What challenges do you see for ContentMining?

Page 18: Workhop Mozfest15 - Content-Mining for Transparency of Drug Research

The right to read is the

right to mine.

contentmine.org