Between  information  retrieval  services  and bibliometrics  research. New  ways  of...

18
Between information retrieval services and bibliometrics research new ways of semantic browsing and visual analytics Rob Koopman, Shenghui Wang OCLC Research Andrea Scharnhorst DANS- KNAW November 7, 2015 ASIST, sigmetrics workshop

Transcript of Between  information  retrieval  services  and bibliometrics  research. New  ways  of...

Page 1: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Between  information  retrieval  services  and bibliometrics  research

–new  ways  of  semantic  browsing  and  visual analytics

Rob Koopman, Shenghui Wang OCLC ResearchAndrea ScharnhorstDANS- KNAW

November 7, 2015ASIST, sigmetrics workshop

Page 2: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Content- New approach to find structure in

bibliographic information – ARIADNE (2 Method)- Applications: - Data curation – author disambiguation (1 Motivation)- Illustration of topics – the case of digital humanities

Topical browsing – DEMO (3)- Excursion into bibliometrics – the Berlin group challenge

(4)- Wrapping up (5)

Page 3: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Data curation – author disambiguation

Page 4: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Mapping topics, communities, research fronts, …..

Bibliometrics

Documents are similar because they:- Cite each other- Are cited together- Use the same references- Use the same vocabulary- Have the same authors

Information retrieval

Documents are similar because they:- Use the same vocabulary- - ….

ARIADNE is about similarity of entities!

Page 5: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Document/work, Record and Entity

Authors Title Journal … Reference Subject

Authors names

Topical terms

Reference

Journal

Glänzel, W.

Glanzel, W.

bibliometrics

citations … Casimir effect

N=SUM (doc)

Page 6: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

A MARC record

title

authors

issn

deweypublisher

Page 7: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Demo examples

• http://thoth.pica.nl/demo/relate WorldCat

• http://thoth.pica.nl/relate ArticleFirst

• http://thoth.pica.nl/astro/relate Astrophysics data Berlin group

Page 8: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Dataset

● WorldCat, 300+ million records● Selected 13 million items (topical terms,

authors, ISSNs, Dewey decimal codes, publishers, subject headings)

● Represented by 6 million topical terms

But a matrix of 13M x 6M is too big to process

Page 9: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

C: a co-occurrence matrix

R: a random matrix of +/-1

C’: approximation of C after random projection -- Semantic matrix

Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne’s thread: In- teractive navigation in a world of networked information. In: CHI’15 Extended Abstracts.

Step 1: Building the semantic matrix – and Dimension reduction based on Random Projection

Page 10: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Step 2: Interactive exploration

- Provide a simple search/text box- Calculate the top 500 most related

candidates- Find mutually related items - Convert distances to probabilities- Project to 2D

- Enhance interface with links to other spaces

Page 11: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Exploration of a topic

http://thoth.pica.nl/relate?input=hirsch%20index&fsize=100&ncluster=

Page 12: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics
Page 13: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics
Page 14: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

EINS 1st PLENARY

Digital libraries

Science, ComputerScience, ontologies

Many different humanities fieldsProminently language &Literary studies

Illustration of context around a topic/field – journal view

Koopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactive navigation in a world of networked information. In: CHI'15 ExtendedAbstracts. (2015)

Page 15: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

As visual exploration of any dataset – astrophysics case

Page 16: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

Wrapping up – future work● Compare the algorithm to other existing algorithms – benchmarking

● More metadata fields (publisher, subject, identifiers) – ongoing

● Identify further problems to which Ariadne can be applied ● Curation (e.g. author name disambiguation); ● Knowledge discovery (e.g. matching chemical molecules); ● Information science – population of libraries, subject areas, …

● Feedback from users – Prepare user scenarios for usability testing and set up an evaluation project – tbd

● Improve visualisation

● More functionality (timeline, history)

● Extend the implementation to other databases

Page 18: Between  information  retrieval  services  and bibliometrics  research. New  ways  of  semantic  browsing  and  visual analytics

ReferencesKoopman, R., Wang, S., Scharnhorst, A., Englebienne, G.: Ariadne's thread: Interactivenavigation in a world of networked information. In: B. Begole, J. Kim, K. Inkpen, W. Woo(eds.) Proceedings of the 33rd Annual ACM Conference Extended Abstracts on HumanFactors in Computing Systems, Seoul, CHI 2015 Extended Abstracts, Republic of Korea, April 18 - 23, 2015, pp. 1833{1838. ACM (2015). DOI 10.1145/2702613.2732781. URLhttp://doi.acm.org/10.1145/2702613.2732781 (Preprint Arxiv.org)

Koopman, R., Wang, S., Scharnhorst, A.: Contextualization of Topics - Browsing throughTerms, Authors, Journals and Cluster Allocations. In: A.A. Salah, Y. Tonta, A.A.A.Salah, C. Sugimoto, U. Al (eds.) Proceedings of ISSI 2015 Istanbul. 15th InternationalSociety of Scientometrics and Informetrics Conference, Istanbul, Turkey, 29th June to 4thJuly 2015, pp. 1042{1053. Boazici University Printhouse, Istanbul (2015). URL http://www.issi2015.org/en/Proceedings-of-ISSI-2015.html