Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents...

21
Scholia Finn ˚ Arup Nielsen Cognitive System, DTU Compute, Technical University of Denmark 28 October 2017

Transcript of Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents...

Page 1: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Finn Arup Nielsen

Cognitive System, DTU Compute, Technical University of Denmark

28 October 2017

Page 2: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Can we make a Wikidata presenter tool

targeted at bibliographic information?

Finn Arup Nielsen 1 28 October 2017

Page 3: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Presenting Wikidata: Reasonator

Magnus Manske’s Reasonator, https:

//tools.wmflabs.org/reasonator/

Extracts information from Wiki-

data and makes templated (“nat-

ural language”) text, maps, time-

lines, fetches relevant images, for-

mats other information nicely and

adds internal and external links.

Runs from Wikimedia Toolforge.

What about citation information,

aggregation of publication for an or-

ganization, etc.?

Finn Arup Nielsen 2 28 October 2017

Page 4: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Scholia

Web site with scholarly information extracted from Wiki-

data running from https://tools.wmflabs.org/scholia/.

Developed from GitHub under GPL https://github.com/

fnielsen/scholia with work/input from Egon Willigha-

gen, Daniel Mietchen, Jakob Voß, Magnus Manske, Andy

Mabbett.

Almost entirely built by using Wikidata Query Service:

We generate tables, bubble charts, time lines, graphs,

etc.

Finn Arup Nielsen 3 28 October 2017

Page 5: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

“Aspects”

Scholia presents the data in different “aspects”: author, work, organi-

zation (e.g., university, research group), venue (journal or conference),

series (e.g., conference proceedings series), publisher, sponsor, award,

topic.

Researcher can be viewed as an author or a topic. University could be an

organization or a publisher.

Finn Arup Nielsen 4 28 October 2017

Page 6: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Author aspect publications per year

Inspired by ShubhanshuMishra’s and Vetle I. Torvik’sLEGOLAS visualization.

Number of publications peryear from https://tools.

wmflabs.org/scholia/author/

Q20980928.

Color-coding based on author-role (first author, last au-thor, middle author, soloauthor)

Using default “BarChart” of Wikidata Query Service with complex SPARQLquery: https://query.wikidata.org/#%23defaultView...

Finn Arup Nielsen 5 28 October 2017

Page 7: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Work aspect citation graph

Citation panel on work

aspect for partial ci-

tation graph with WDQS’s

Graph output.

For A principal com-

ponent analysis of 39

scientific impact mea-

sures paper.

Network constructed with

the BlazeGraph’s RDF

GAS API for graph

queries.

Finn Arup Nielsen 6 28 October 2017

Page 8: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Publisher aspect

Overview of number ofpapers published andtheir citations acrossjournals published bythe publisher with Scat-

ter output from WDQS.

Here for BioMedCen-tral (which may be animprint):

https://tools.wmflabs.org/scholia/publisher/Q463494

Finn Arup Nielsen 7 28 October 2017

Page 9: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Organization aspect

Incomplete statistics on page production per year for DTU Cognitive

Systems. Yet another color-coded WDQS Bar chart.

Finn Arup Nielsen 8 28 October 2017

Page 10: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Award aspect

Turing award locations

The points are colored ac-

cording to the “layer” col-

umn in the SPARQL re-

sult.

https://tools.wmflabs.org/scholia/award/Q185667

Finn Arup Nielsen 9 28 October 2017

Page 11: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Multiple items

Aspects for multi-

ple items: two or

more works, two or

more authors, two

or more publishers.

Comparison of au-

thors, universities,

etc.

For instance, two works works/Q20900776,Q25938983

Finn Arup Nielsen 10 28 October 2017

Page 12: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Software aspect

There are several topic-related aspects: disease,protein, gene, chemical,biological pathway, soft-ware.

Software aspect: Herethe panel with Top-ics of works using thesoftware queried fromP2283 with SPM.

Based on work by KatherineThornton: Wikidata fordigital preservation: AWikidata Portal.

Finn Arup Nielsen 11 28 October 2017

Page 13: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Scholia tool: arxiv-to-quickstatement

Conversion of arXiv identifier to Magnus Manske’s quickstatements which

can setup up a new Wikidata item.

Finn Arup Nielsen 12 28 October 2017

Page 14: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Implementation of Scholia

Python (Python27 and Python35).

Uses the Flask, mostly selected be-

cause of its simplicity and nice tutorial.

JavaScript to query the Wikidata

API (https://www.wikidata.org/w/api.

php) for labels, inclusion of Wikipedia

extract and setup of tables: jQuery,

DataTables (elements have links to

Scholia rather than Wikidata).

Test with tox, flake8 and py.test.

Finn Arup Nielsen 13 28 October 2017

Page 15: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Contributing

For contribution, GitHub users

can fork the repo at GitHub

and make pull requests under

GPL.

Particularly Egon Willighagen

has made a number of pull

requests around proteins, bi-

ological pathways and chem-

icals, see, e.g., citric acid

where he has made SPARQL

queries on qualifiers and refer-

ences.

We got lots of issues.

Finn Arup Nielsen 14 28 October 2017

Page 16: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Deployment on Toolforge

The canonical version of

Scholia runs from Toolforge:

https://tools.wmflabs.org/

Toolforge (rebranded from

Wikimedia Tool Labs), — the

free cloud service provided by

the Wikimedia Foundation.

Currently running gridengine.

Considering Kubernetes to

get better stability in re-

sponse.

Finn Arup Nielsen 15 28 October 2017

Page 17: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Wikidata-based BIBTeX generation

A rough-in-the-edges implementation in Scholia can generate BIBTeX.bib files from .aux files

My .tex file:

\bibliographystyle{Nielsen2012Slides}

\bibliography{Nielsen2017ScholiaWikidataCon_slides}

Commands:

latex Nielsen2017ScholiaWikidataCon_slides.tex

python -m scholia.tex write-bib-from-aux \

Nielsen2017ScholiaWikidataCon_slides.aux

bibtex Nielsen2017ScholiaWikidataCon_slides

latex Nielsen2017ScholiaWikidataCon_slides.tex

latex Nielsen2017ScholiaWikidataCon_slides.tex

Finn Arup Nielsen 16 28 October 2017

Page 18: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Scholia issues :(

Wikidata far from complete. Particular non-Pubmed.

Citation data lacking, but some released with I4OC.

Some queries run into WDQS time-out.

Some queries generate too large results. This is a problem from graphs

such as citation networks (we put in SPARQL “LIMIT”s) and co-occurence

topics graphs (Egon Willighagen pointed to aluminum).

Finn Arup Nielsen 17 28 October 2017

Page 19: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Scholia issues :)

Wikidata act as a hub for different resources linking Google Scholar,

Twitter, Scopus, VIAF, ResearchGate, ...

Good author disambiguation possible, — even for authors that do not

have an account on the site.

Data description more detailed with many different properties: main

theme, genre, multiple affiliation with time points, sex of author, license,

sponsor, etc.

Linking to much more than science: Wikidata is becoming the “Internet

duct tape that can solve anything” (light-hearted comment by Andrew

Lih, somewhere on Facebook)

Finn Arup Nielsen 18 28 October 2017

Page 20: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

What’s next for Scholia and Wikicite?

Building scrapers. Could focus on non-PubMed and non-DOI publica-

tions.

Better integration between panels and aspects in Scholia (Javascript and

D3 work)

“Editable Scholia”: Edit Wikidata items from Scholia. (Magnus Manske

implements editing with his Listeria tool).

“Social Scholia”: User login, followers, followees, messages between

users, messages when new relevant data appears in Wikidata.

Specialized aspects: Neuroinformatics, Bioinformatics, . . . ?

Feeds (Egon Willighagen is working on this issue)

Finn Arup Nielsen 19 28 October 2017

Page 21: Scholia - Technical University of Denmark · 2017. 10. 26. · Scholia \Aspects" Scholia presents the data in di erent \aspects": author, work, organi-zation (e.g., university, research

Scholia

Thanks

Finn Arup Nielsen 20 28 October 2017