ASIST 2013 Panel: Altmetrics at Mendeley

Altmetrics at Mendeley

William Gunn, Ph.D. Head of Academic Outreach

Mendeley

@mrgunn

Two audiences

• The information science community

– What we know & what we’re still trying to understand

– What we think important questions are

• The altmetrics community

– Where Mendeley is going

– What we think are the important things to address

What we think we know

• Where we are: discovery, but not assessment – we can describe, but not predict

• What it means to correlate with citations

• What we’re really measuring – attention

– who listens to you and who do you listen to

• minus the people who are listened to, but don’t listen well

• Lit derived metrics are not enough!

Amgen: 47 of 53 “landmark” oncology publications could not be reproduced

Bayer: 43 of 67 oncology & cardiovascular projects were based on contradictory results

Dr. John Ioannidis: 432 publications purporting sex differences in hypertension, multiple sclerosis, or lung cancer. Only one data set was reproducible

There is no gold standard

We didn’t see that a target is

more likely to be validated if it

was reported in ten publications

or in two publications NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)

Either the results were reproducible

and showed transferability in other

models, or even a 1:1 reproduction of

published experimental procedures

revealed inconsistencies between

published and in-house data NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)

Building a reproducibility dataset

• Mendeley and Science Exchange have started the Reproducibility Initiative

• $1.3M grant from LJAF to Initiative via Center for Open Science

• 50 most highly cited & read papers from 2010, 2011, and 2012 will be replicated

• Figshare & PLOS to host data & replication reports

What we don’t know

• What we can predict

– need to understand intent, imported or derived reputation

• How to capture all mentions, even without direct identifiers

– what skew is there, and what does it mean

• How to adjust for regional or cultural differences

Cultural skew is important

South America is weak on N.A. social media, strong on Mendeley

How to understand sources of variability

• Collect the same set of metrics at different times, by different people, using different methods

• This will inform the standards process & assist IS people with capturing provenance, doing preservation, and giving advice

What are the important questions we aren’t asking yet?

• let’s get past the “ranking people by their Twitter followers” stuff

• Tell us what we should be looking at and how you would like to be involved

What people want to know about Mendeley

• We realize what we do makes a big difference

– RG/Academia began to do more once we showed the potential

– Researchers value our coverage and source neutrality

– Many consume our data, even when it’s crappy

Focusing on recommendations

• Mendeley Suggest

– personalized recommendations based on reading history

• related articles

– relatedness based on document similarity

• recommender frameworks

– implement recommendations as a service

• third-party recommender services

– serve niche audiences

improving data quality

• Research Catalog v2

– better duplicate detection

– readership numbers stable

• only increase

– canonical docs

• API v2

– exposing more information

• annotations

• other events (what do you want to see?)

Stability and Security

• We are serious

– adapting to and promoting changes in practice

• investing in building relationships with developers

• platform, not a silo

TEAM Project academic knowledge management solutions • Algorithms to determine the content similarity of academic papers

• Performing text disambiguation and entity recognition to differentiate between and relate similar in-text entities and authors of research papers.

• Developing semantic technologies and semantic web languages with the focus of metadata integration/validation

• Investigate profiling and user analysis technologies, e.g. based on search logs and document interaction.

• We will also improve folksonomies and through that, ontologies of text.

• Finally, tagging behaviour will be analysed to improve tag recommendations and strategies.

• http://team-project.tugraz.at/blog/

Code Project

Use case = mining research papers for facts to add to LOD repositories and light-weight ontologies. • Crowd-sourcing enabled semantic enrichment & integration

techniques for integrating facts contained in unstructured information into the LOD cloud

• Federated, provenance-enabled querying methods for fact discovery in LOD repositories

• Web-based visual analysis interfaces to support human based analysis, integration and organisation of facts

• http://code-research.eu/

Semantics vs. Syntax

• Language expresses semantics via syntax

• Syntax is all a computer sees in a research article.

• How do we get to semantics?

•Topic Modeling!

Distribution of Topics

0%5%

10%15%20%25%30%35%

Subcategories of Comp. Sci.

0%

5%

10%

15%

20%

AI HCI Info Sci SoftwareEng

Networks

www.mendeley.com

[email protected] @mrgunn

mailto:[email protected]

ASIST 2013 Panel: Altmetrics at Mendeley

Science

Transcript of ASIST 2013 Panel: Altmetrics at Mendeley