ASIST 2013 Panel: Altmetrics at Mendeley
-
Upload
william-gunn -
Category
Science
-
view
98 -
download
0
description
Transcript of ASIST 2013 Panel: Altmetrics at Mendeley
Altmetrics at Mendeley
William Gunn, Ph.D. Head of Academic Outreach
Mendeley
@mrgunn
Two audiences
• The information science community
– What we know & what we’re still trying to understand
– What we think important questions are
• The altmetrics community
– Where Mendeley is going
– What we think are the important things to address
What we think we know
• Where we are: discovery, but not assessment – we can describe, but not predict
• What it means to correlate with citations
• What we’re really measuring – attention
– who listens to you and who do you listen to
• minus the people who are listened to, but don’t listen well
• Lit derived metrics are not enough!
Amgen: 47 of 53 “landmark” oncology publications could not be reproduced
Bayer: 43 of 67 oncology & cardiovascular projects were based on contradictory results
Dr. John Ioannidis: 432 publications purporting sex differences in hypertension, multiple sclerosis, or lung cancer. Only one data set was reproducible
There is no gold standard
We didn’t see that a target is
more likely to be validated if it
was reported in ten publications
or in two publications NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
Either the results were reproducible
and showed transferability in other
models, or even a 1:1 reproduction of
published experimental procedures
revealed inconsistencies between
published and in-house data NATURE REVIEWS DRUG DISCOVERY 10, 712 (SEPTEMBER 2011)
Building a reproducibility dataset
• Mendeley and Science Exchange have started the Reproducibility Initiative
• $1.3M grant from LJAF to Initiative via Center for Open Science
• 50 most highly cited & read papers from 2010, 2011, and 2012 will be replicated
• Figshare & PLOS to host data & replication reports
What we don’t know
• What we can predict
– need to understand intent, imported or derived reputation
• How to capture all mentions, even without direct identifiers
– what skew is there, and what does it mean
• How to adjust for regional or cultural differences
Cultural skew is important
South America is weak on N.A. social media, strong on Mendeley
How to understand sources of variability
• Collect the same set of metrics at different times, by different people, using different methods
• This will inform the standards process & assist IS people with capturing provenance, doing preservation, and giving advice
What are the important questions we aren’t asking yet?
• let’s get past the “ranking people by their Twitter followers” stuff
• Tell us what we should be looking at and how you would like to be involved
What people want to know about Mendeley
• We realize what we do makes a big difference
– RG/Academia began to do more once we showed the potential
– Researchers value our coverage and source neutrality
– Many consume our data, even when it’s crappy
Focusing on recommendations
• Mendeley Suggest
– personalized recommendations based on reading history
• related articles
– relatedness based on document similarity
• recommender frameworks
– implement recommendations as a service
• third-party recommender services
– serve niche audiences
improving data quality
• Research Catalog v2
– better duplicate detection
– readership numbers stable
• only increase
– canonical docs
• API v2
– exposing more information
• annotations
• other events (what do you want to see?)
Stability and Security
• We are serious
– adapting to and promoting changes in practice
• investing in building relationships with developers
• platform, not a silo
TEAM Project academic knowledge management solutions • Algorithms to determine the content similarity of academic papers
• Performing text disambiguation and entity recognition to differentiate between and relate similar in-text entities and authors of research papers.
• Developing semantic technologies and semantic web languages with the focus of metadata integration/validation
• Investigate profiling and user analysis technologies, e.g. based on search logs and document interaction.
• We will also improve folksonomies and through that, ontologies of text.
• Finally, tagging behaviour will be analysed to improve tag recommendations and strategies.
• http://team-project.tugraz.at/blog/
Code Project
Use case = mining research papers for facts to add to LOD repositories and light-weight ontologies. • Crowd-sourcing enabled semantic enrichment & integration
techniques for integrating facts contained in unstructured information into the LOD cloud
• Federated, provenance-enabled querying methods for fact discovery in LOD repositories
• Web-based visual analysis interfaces to support human based analysis, integration and organisation of facts
• http://code-research.eu/
Semantics vs. Syntax
• Language expresses semantics via syntax
• Syntax is all a computer sees in a research article.
• How do we get to semantics?
•Topic Modeling!
Distribution of Topics
0%5%
10%15%20%25%30%35%
Subcategories of Comp. Sci.
0%
5%
10%
15%
20%
AI HCI Info Sci SoftwareEng
Networks