Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital...
-
date post
22-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Digital Challenges – Bridging the gap between publication and data Adam Farquhar Head of Digital...
Digital Challenges – Bridging the gap between publication and data
Adam Farquhar
Head of Digital Library TechnologyThe British Library
IASSIST, Tampere, 27 May 2009
The British Library:
‘This is the life blood of research and innovation’
GIA Funding 08/09:£94.8m operational, £12m capital
Other funding secured 07/08: c.£33m
Helping people advance knowledge
to enrich lives
National library of the UK.
Serves researchers, business, libraries, education & the general public
Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m)
The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours
2 main sites in London and Yorkshire. Circa 2,000 staff
Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development
Generates value to the UK economy each year of 4.4 times public funding
Collection fills over 600km of shelving and grows at 11km per year
30 Tb of digital material growing rapidly
Science and Innovation Investment Framework 2004-2014, H.M. Treasury (2004)Information infrastructure2.23 The growing UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation.
3
Supporting research
Social Sciences
Science, Technology & Medicine
Arts & Humanities
Document Supply service provides 1.4m articles/year primarily to scientists
Renewed engagement with researchers using digital content and online services
In-depth focus on biomedicine and energy/environment Collection includes journals, patents, theses and more, and
is updated by some 9,000 articles every day
A significant international collection of books, journals, reports, theses, official publications and other materials
A unique collection of grey literature, of special interest to practitioners and theoreticians
Research collaboration with ESRC
Greatest research collection of its kind in the world World-class curatorial expertise by subject, medium and geographical area BL has been developing world-leading e-innovations for past decade (e.g.
International Dunhuang Project) and building a significant corpus of digitised texts Research collaboration with AHRC, British Academy and HEIs
4
Building the Digital Research Infrastructure
BL Digital library system Large scale, highly resilient digital store Continuous validation & correction Long term digital storage for BL content &
eLegal deposit/distribution Long term access (digital preservation)
Leading EU-funded digital preservation project ‘Planets’ (16 partners)
Developing cost models and case studies with UCL (‘Life’ projects)
Addressing root causes of digital obsolescence
Edinburgh -2009
Aberystwyth
Boston Spa
St. Pancras
Cambridge Univ.
Oxford Univ.
55
Digital Library
Live Content Streams
Sound Archives
Voluntary Digital Donations
Nineteenth Century Digitised Books
Born Digital Newspapers
Storage
>440,000 Digital Items
>30 Terabytes of Content
Coming soon
eJournals
Digitised Newspapers
66
Role of the British Library in Science, Technology and Medicine
Long history of collecting scientific and technical literature
Serves business & industry, researchers, academics and students
Dedicated reading rooms in London
The Library operates the world’s largest document delivery service - millions of items each year to customers all over the world predominantly in the STM disciplines
Indexing the UK input into Medline/PubMed
Creation of AMED (Allied and Complementary Medicine A&I Database) research articles on complementary medicine and allied health
Lead Partner in UK PubMed Central
7
WorldWideScience.org
Global science gateway based on US Department of Energy’s Science.Gov service
Multilateral partnership to enable federated searching of national and international scientific databases and portals.
Launched in 2008 Large number of countries already
providing access to publicly funded research outputs - latest addition is China
Chaired by British Library
88
UK PubMed Central
Number of articles: 1.4 million Over 2,500 manuscripts submitted by grant holders Information held on 20,000 research grants awarded to 9,000
PIs by UKPMC Funders Downloads have grown strongly with over 300,000 in March
2009 UKPMC users are predominantly UK based (70%) but service
is accessed across the world Working with the Bioscience community and Funders to
develop the service based on UK research community needs
Launched in January 2007
9
Research Information Centre – the research lifecycle
9
Based on Microsoft’s Sharepoint product Developed with Microsoft External Research
Team DOI:10.1109/ADVCOMP.2007.14 Beta tested by 25 bioscience research teams
(academia & commercial) in UK & US
Supports full research life-cycle Accessible by web browser Configured for biosciences but flexible Designed for collaboration
10
Social Science Collection and Research
New team established in 2006 Priorities: define and develop the collection, improve accessibility,
raise awareness, build networks, build capacity Strong focus on researcher needs
Develop strategies for grey literature and data access
Build the collection of government publications Recent and historic print collections with LSE and Oxford Soc
Science Library, … Digital and web collections with TNA and UK e-OP ‘digital
continuity’ Managing Access to Government Information Collaboratively
(MAGIC) with LSE
©Clive Sherlock
11
Social Science Collection and Research
Research collaborations Voices of the UK; Children’s play in the media age
Knowledge exchange, awareness and capacity building Corporate and Social Responsibility seminars Multi-modal PhD seminars ESRC Festival of Social Science ESRC Interns Postgraduate training days, thematic study days, ESDS
seminars Public events - Census 2011 to explain the role of
quantitative and qualitative social surveys
©Clive Sherlock
12
Books and data – a parable
A scientist measured environmental conditions to determine their impact on leather bindings
When the project was complete, he printed the data, bound it, and submitted it to UK copyright libraries
Thirty years later, a scientist took it off the shelf and started to reuse the data, and collect anew
When his project was complete, he had had 30,000 images and megabytes of data
Too big for any shelf
Not interesting for a data centre
Is the project web site enough?
13
Journals and data – a problem
In 2003, Legal Deposit Legislation in the UK is extended to cover digital material Building on the 1911 Legal Deposit Act
Electronic journal articles are covered – they will be collected and archived for the long term
… But supplementary material is not covered For now, it remains on the publisher web sites
14
Long-term access is critical
According to a Parse.Insight survey 50% needed research data gathered by other researchers that was not
available
Within High Energy Physics More than 90% think that data preservation is important - crucial
Benefits include Verify scientific results independently (60%) Combine past and future data (60%) Re-analyze in the light of new theories and future results (75%)
45% - old data could have improved their scientific results 40% - important HEP data have been lost in the past.
Many are willing to share 80% would provide data behind tables and figures 45% would provide “raw” data But 50% believe costs to repackage for sharing are high
15
Widening gap
A widening gap in the scientific record between published research and the data that underlies it Published work held by libraries Datasets held by data centres No effective way to link between
datasets and articles No widely used method to identify
datasets No widely used method to cite
datasets
As a result, datasets are Difficult to discover Difficult to access Second-class citizens in the
scientific record
16
Datasets in the scholarly record (OECD White Paper)
45% of journal publishers provide access to datasets associated with journal articles they publish (ALPSP)
But there are no rules about how to publish, present, cite, or otherwise catalogue datasets
CitationMain mortality estimate: Estimated settler mortality. Settler mortality is calculated from the mortality rates of European-born soldiers, sailors, and bishops when stationed in colonies. It measures the effects of local diseases on people without inherited or acquired immunities. Source: Acemoglu et al. (2001), based on Curtin (1989) and other sources.
CitationTertiary school enrollment: School enrollment, tertiary (% of gross). Source: Barro and Lee (2000) and their databases
17
Datasets – first class citizens?
Datasets
Data is difficult to manage after project funding ceases
Informal networks provide the primary means of sharing
Only 21% use a national or international facility
Datasets are not included in impact analysis
Good luck finding it (your discipline may vary)!
UKRDS Study
Published articles
Libraries ensure long-term storage and management
Established funded services provide the primary means of access
Nearly all published articles are held in multiple national libraries
Articles and citations form the backbone of impact analysis
Catalogues and full-text search support discovery
18
Global responses to the challenge
Research council mandates Data management plans Data retention plans
Funded initiatives Australian National Data Service UK Research Data Service UK Digital Curation Centre US DataNet programme JISC Data programme EU Science Data Infrastructure, …
STM publishers Brussels Declaration: Raw research data should be made
freely available to all researchers
19
MakeVisible
Find
AccessTrackImpact
Verify
Reuse
Cite
?Persistent
Identification
A key component for many goals
20
Dataset citation using Digital Object Identifiers (DOIs)
The DOI system offers an easy way to connect the article with the underlying data
Several organisations have started to assign DOIs to datasets IUCR, ICPSR, OECD through
CrossRef Pangea, Mare, and others
through TIB (German Science Library)
DatasetG.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA doi:10.1594/PANGAEA.587840
ArticleG. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77doi:10.1038/nature05431
Cites
21
It looks so easy
Organisational challenges Data centres, funders have
regional or disciplinary scope Universities have teaching
and research mission and competitive relationships
Publishers do not cover un-published material
Consortium of the above require large and fragile coalitions
We need an consortium of national institutions with a long-term stewardship role
Social challenges Acceptance by key
stakeholders including funders, data centres, universities, researchers, publishers
Use by data creators and authors
Technical challenges Robust infrastructure Identifying the right thing Ensuring longevity
22
DataCite
Organisations with the national science library role are working together to establish a European and global infrastructure to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence
Publishing agents (data centres, research institutes) are responsible for: Quality assurance Content storage and access Creating the identifier Creating and updating metadata
The DataCite registration agency Maintains the resolution infrastructure Maintains a searchable database of metadata Manages the identifiers over the long term Establish and share best practice
23
Memorandum of Understanding
Paris, March 2, 2009
Recognizing the importance of research datasets as the foundation of knowledge and sharing a common commitment to promote and establish persistent access to such datasets, we, the signed parties, hereby express our interest to work together to promote global access to research data.
Our long term vision is to support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.
24
Initial Signatories
Technische Informationsbibliothek (TIB), Germany
Library or the ETH Zürich, Switzerland
L’Institut de l’Information Scientifique et Technique (INIST), France
Library of TU Delft, The Netherlands
Technical Information Center of Denmark
The British Library
25
Key facts about DOI
Usage >35m DOIs have been
assigned >2m resolutions each month
Organizational Not-for-profit International
DOI Foundation (IDF) Provides social infrastructure Includes registration agencies Registration done in co-
operation with a publication agent
Publication agents are responsible for the content
Technical A DOI Name is a persistent
identifier used to cite and link resources Linked to an object – not to
a location The location may change,
but the DOI remains the same
The DOI System holds metadata about objects including their URL
Resolution redirects the user from a DOI name to the URL
26
Strengths and weaknesses of DOI
DOIs have some strong advantages Accepted by researchers and scientists Mature infrastructure Put datasets on the same playing field as articles
But perceived as Expensive
The current IDF business model favours larger registration agencies
Publisher oriented The largest registration agency is the publisher-oriented
CrossRef
27
DataCite Structure
DataCite
NationalInstitution
Data CentreData CentreData Centre
NationalInstitution
Data CentreData CentreData Centre
…
Carries
Works with
International DOI Foundation
Global Handle System
28
Typical workflow (Data Centre)
Data Centre registers with DataCite Data Centre ingests a dataset and assigns an identifier Data Centre registers the dataset by submitting an XML file
containing relevant bibliographic metadata and the URL for the dataset’s access page Metadata drawn from ISO 690-2 for referencing electronic
information
• language• publisher• publishing date• publishing place
• author• title• size• edition
29
Typical workflow (2)
Author Includes citation using the DOI, just like an article
Reader Follows the resolvable link that includes the DOI (or
searches for it), just like an article Reaches a unique landing page at the Data Centre for the
dataset Open to every reader Includes the DOI and metadata to help the reader decide
if the dataset will help May need to take additional steps to access the dataset
31
Thanks!
The British Library has a duty of care for the scientific record Renewed engagement in STM and Social Sciences Actively partnering to achieve goals
There is a widening gap between published research and the data that underlies it
DataCite will support researchers by enabling them to locate, identify, and cite research datasets with confidence This is the start of a long and open dialogue There are many open issues to address
We welcome your comments, questions, and ideas!
Email: adam.farquhar {@} bl.uk