Searching Scholarly Literature: A Google Scholar Perspective Anurag Acharya.
-
Upload
raymond-ross-harrington -
Category
Documents
-
view
219 -
download
0
Transcript of Searching Scholarly Literature: A Google Scholar Perspective Anurag Acharya.
Searching Scholarly Literature:
A Google Scholar Perspective
Anurag Acharya
Overview
Goals & key ideas
Support for libraries
Coverage & usage
Reflections
Goal: Best possible scholarly search
Single place to find scholarly material
– All areas, all sources, all languages, all time
– Relevance-based ordering (“Google-like”)
Easy to use
– Common queries should just work
– Researchers, like everyone else, just want answers
Idea: Index all forms of articles
Preferred form: fulltext
– Go beyond author identified features
– Facilitate serendipity
Fulltext online for only small fraction
– Influential/seminal papers still offline
Index whatever form is available
– Abstract or even just the citation
Idea: Be inclusive Provide worldwide visibility to all research
– Should be able to find research done anywhere
– Who knows what triggers discovery
Our goal is to find all scholarly work
– Journals, conferences, preprints, reports
– All countries, all languages, all sources
Make decisions on a per-article basis
– Good work can come from anywhere!
Idea: Universal discovery
Free to all users everywhere
– Should be able to find relevant research no matter where you live
– Don’t know where the next magic will come from
Access will depend on variety of factors
– Impact of discovery is larger than people think
Idea: Rank as researchers do
Ideal: The Stuff I Need To Know
Approximation: Relevant stuff that is likely to be good
How to estimate “likely to be good”?
– who wrote it, where it was published, how many people cite it, where citations are from
Plus usual information retrieval techniques
Idea: Automate citation extraction
Necessary to be able to scale
Much variance in citation styles
– Widely different conventions
Citations error-prone
– Desire to compress (unusual abbreviations)
– Author sloppiness + error propagation
Need to normalize citations
Idea: Rank works, not instances
Single work may have many forms/versions
– Preprint, report, conference paper, journal article
Each may be cited independently
– Need to collect citations for true import of work
Grouping versions facilitates ranking/presentation
– Collect citations for all versions – improve ranking
– Present a single work as a unit – easier to scan
Idea: Links to offline content
Only a small fraction of articles online
Libraries hold huge repositories
– Books, journals, articles, and much more
Link to library resources
– Help users find the wealth in their libraries
Support for libraries Library Links
– Links to resources in a given library
– For libraries that use link resolvers/OpenURLs
– About 325 participating libraries, growing rapidly
Library Search
– For libraries participating in OCLC’s Open WorldCat
– Find nearby libraries that have the book
– Looking to work with other union catalogs!
Library links - example
Library search - example
Library search – example
Google Scholar Coverage Commercial publishers & scholarly societies
– Fulltext from all major except Elsevier and ACS
– Includes popular papers from all publishers as citations/abstracts
Hosting services – many publishers, societies– Highwire, AllenPress, MetaPress, Atypon, Ingenta, MUSE, others
Public A&Is – PubMed, ADS– Fairly complete, no matter what you read in some reviews….
Open web and institutional repositories– Arxiv.org, Repec, pubmedcentral, others
Open access journals – all we can find (including Scielo)
Coverage by category
eng14%
soc13%
bus5%
med22%
chm7%
bio13%
unclassified6%
low confidence
4%phy12%
three categories
0%
two categories
4%
Worldwide usage
Countries with the most queries:
– US, UK, Australia, Germany, Mexico, Brazil
– Canada, China, Netherlands, India, France
– Japan, Israel, Italy, Taiwan, Spain
– Switzerland, Colombia, Nigeria, Philippines
– S. Africa, S. Korea, Malaysia, Egypt, Turkey
Reflections
Audience will expand beyond scholars
– Esp for health/medical research, maybe others
– Educated laypeople, patients, care-givers
The service is useful today for many users
– US as well as internationally
– Much more still to do to reach goals
Finally… Mendel's concept of the laws of genetics was lost
to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.
– As We May Think (Vannevar Bush), July 1945
Hope: loss of Mendel’s laws never repeated