Between information retrieval services and bibliometrics research
–new ways of semantic browsing and visual analytics
Rob Koopman, Shenghui Wang OCLC ResearchAndrea ScharnhorstDANS- KNAW
November 7, 2015ASIST, sigmetrics workshop
Content- New approach to find structure in
bibliographic information – ARIADNE (2 Method)- Applications: - Data curation – author disambiguation (1 Motivation)- Illustration of topics – the case of digital humanities
Topical browsing – DEMO (3)- Excursion into bibliometrics – the Berlin group challenge
(4)- Wrapping up (5)
Data curation – author disambiguation
Mapping topics, communities, research fronts, …..
Bibliometrics
Documents are similar because they:- Cite each other- Are cited together- Use the same references- Use the same vocabulary- Have the same authors
Information retrieval
Documents are similar because they:- Use the same vocabulary- - ….
ARIADNE is about similarity of entities!
Document/work, Record and Entity
…
Authors Title Journal … Reference Subject
Authors names
Topical terms
Reference
Journal
Glänzel, W.
Glanzel, W.
bibliometrics
…
…
citations … Casimir effect
N=SUM (doc)
A MARC record
title
authors
issn
deweypublisher
Demo examples
• http://thoth.pica.nl/demo/relate WorldCat
• http://thoth.pica.nl/relate ArticleFirst
• http://thoth.pica.nl/astro/relate Astrophysics data Berlin group
Dataset
● WorldCat, 300+ million records● Selected 13 million items (topical terms,
authors, ISSNs, Dewey decimal codes, publishers, subject headings)
● Represented by 6 million topical terms
But a matrix of 13M x 6M is too big to process
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C after random projection -- Semantic matrix
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne’s thread: In- teractive navigation in a world of networked information. In: CHI’15 Extended Abstracts.
Step 1: Building the semantic matrix – and Dimension reduction based on Random Projection
Step 2: Interactive exploration
- Provide a simple search/text box- Calculate the top 500 most related
candidates- Find mutually related items - Convert distances to probabilities- Project to 2D
- Enhance interface with links to other spaces
Exploration of a topic
http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=
EINS 1st PLENARY
Digital libraries
Science, ComputerScience, ontologies
Many different humanities fieldsProminently language &Literary studies
Illustration of context around a topic/field – journal view
Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive navigation in a world of networked information. In: CHI'15 ExtendedAbstracts. (2015)
As visual exploration of any dataset – astrophysics case
Wrapping up – future work● Compare the algorithm to other existing algorithms – benchmarking
● More metadata fields (publisher, subject, identifiers) – ongoing
● Identify further problems to which Ariadne can be applied ● Curation (e.g. author name disambiguation); ● Knowledge discovery (e.g. matching chemical molecules); ● Information science – population of libraries, subject areas, …
● Feedback from users – Prepare user scenarios for usability testing and set up an evaluation project – tbd
● Improve visualisation
● More functionality (timeline, history)
● Extend the implementation to other databases
Thank [email protected]@[email protected]
http://thoth.pica.nl/relate (ArticleFirst)http://thoth.pica.nl/astro/relate (Astrophysics articles)http://thoth.pica.nl/demo/relate (WorldCat)
ReferencesKoopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactivenavigation in a world of networked information. In: B. Begole, J. Kim, K. Inkpen, W. Woo(eds.) Proceedings of the 33rd Annual ACM Conference Extended Abstracts on HumanFactors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18 - 23, 2015, pp. 1833{1838. ACM (2015). DOI 10.1145/2702613.2732781. URLhttp://doi.acm.org/10.1145/2702613.2732781 (Preprint Arxiv.org)
Koopman, R., Wang, S., Scharnhorst, A.: Contextualization of Topics - Browsing throughTerms, Authors, Journals and Cluster Allocations. In: A.A. Salah, Y. Tonta, A.A.A.Salah, C. Sugimoto, U. Al (eds.) Proceedings of ISSI 2015 Istanbul. 15th InternationalSociety of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29th June to 4thJuly 2015, pp. 1042{1053. Boazici University Printhouse, Istanbul (2015). URL http://www.issi2015.org/en/Proceedings-of-ISSI-2015.html
Top Related