Learning Together: Noonkodin Secondary School and UK schools
Working together to archive the UK...
Transcript of Working together to archive the UK...
![Page 1: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/1.jpg)
Working together to archive
the UK Web
Helen Hockx-Yu
Head of Web Archiving, British Library
![Page 2: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/2.jpg)
www.bl.uk 2
The UK Web Domain
4th TLD after .com, .de and .net
Over 10 million .uk registered domain
UK organisations also use non .uk domain
names (eg .com or .org) – scale unknown
Non-print Legal Deposit (since April 2013) applies to
the open (freely available) web: .uk and other UK-published (non
.uk) websites, such as .com, .org…
also e-journals, e-books, news web pages and other digital
publications, either by harvesting or mutual agreement on other
delivery methods
![Page 3: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/3.jpg)
www.bl.uk 3
Web Archiving at the British Library
Collect UK digital heritage and provide continued access to archived
web resources
Started web archiving in 2003: Open UK Web Archive
Selective, topical collections and key sites
Consortium sharing infrastructure and development effort;
agreement on who collects what
Curating collections with organisations and researchers
Archiving UK Web for non-print Legal Deposit since April 2013: Legal
Deposit UK Web Archive
Comprehensive national archive with on-site access only
Joint responsibility of six Legal Deposit Libraries (LDLs)
![Page 4: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/4.jpg)
www.bl.uk 4
UK Legal Deposit libraries
The British Library
Bodleian Libraries of the
University of Oxford
Cambridge University Library
The National Library of
Scotland
The Library of Trinity College,
Dublin
The National Library of Wales
![Page 5: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/5.jpg)
www.bl.uk 5
![Page 6: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/6.jpg)
www.bl.uk 6
Non-print Legal Deposit Governance
Governance is important
Representation of stakeholders
Ensure accountability and effectiveness of implementation
Joint decision making
Collaboration
Key groups
Joint Committee for Legal Deposit – collaboration with publishers
e.g. legal deposit content on users’ devices
Legal Deposit Libraries Committee – e.g. notice and take-down policy
Legal Deposit Implementation Group - e.g. collect embedded content
(eg CSS, images) regardless where it is hosted
Web Archiving Collection Prioritisation Group
![Page 7: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/7.jpg)
www.bl.uk 7
Domain Crawl
News S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
Domain crawl:
• Broad
sweep of
UK domain
• Once or
twice a
year
Events & key
sites and
news:
• Events of
UK interest
• High value,
high impact
sites
• National &
regional
news
Special
Collection:
• Focused,
thematic
collections
• Support
priority
subjects
Key sites Events S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
S
p
e
c
i
a
l
c
o
l
l
e
c
t
i
o
n
Collecting strategy for websites
![Page 8: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/8.jpg)
www.bl.uk 8
The Digital Library System
4 Nodes (Complete Copies)
British Library, St. Pancras
British Library, Boston Spa
National Library of Wales
National Library of Scotland
Additional Access Points
Bodleian Library, Oxford
Cambridge University Library
Trinity College Library, Dublin
![Page 9: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/9.jpg)
www.bl.uk 9
Beyond the LDLs
Curating (open access) collections
World War 1 Collection including 1000+ Centenary Community Project funded by the
Heritage Lottery Fund
The National Archives
UK Government Web Archive
The Digital Preservation Coalition
Web Archiving Task Force
Technology Watch report
The Web Observatory
Web archives as data on the web
The crowd: nomination form & Twitter
to encourage selection
![Page 10: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/10.jpg)
www.bl.uk 10
International collaboration
International Internet Preservation Consortium (IIPC)
49 members worldwide
British Library plays an active role in the IIPC
A founding member
On the Steering Committee
Hosts the IIPC Programme and Communications Officer
Benefits of collaboration
Community of practice
Tools development, eg OpenWayback
Staff training and development
Collaborative collections, eg Olympic Games
![Page 11: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/11.jpg)
www.bl.uk 11
Collaboration with researchers
Building collections
Researchers’ involvement in
scoping collections, selecting
and describing websites
Creation of specific, (narrow)
topical collections
Formulating research question
Brain-storm sessions, workshops, discussion, surveys etc.
Lack of awareness & baseline knowledge
Challenging: you don’t know what you don’t know
Co-development of access services
This is changing how we collect and store data
![Page 12: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/12.jpg)
www.bl.uk 12
JISC UK Web Domain dataset (1996-2013)
Collaboration between the Internet Archive (IA), the Joint Information Systems
Committee (JISC) and the British Library
Extracted copies of UK websites from the Internet Archives collection
1st tranche : 1996 – 2010, 30TB, 2.5 billion URLs
2nd tranche: 2010 – April 2013, 27.5TB, 1.5 billion URLs (estimated)
Research agreement between JISC and IA, upholding IA’s Terms of Use
Access via IA’s Wayback Machine
Allows replication / extraction of derivative or secondary datasets
BL hosts the dataset on behalf of JISC
Data used by research projects
Institute of Historical Research project: Analytical Access to the Domain Dark
Archive (AADDA)
Oxford Internet Institute project: Big data for political science
![Page 13: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/13.jpg)
www.bl.uk 13
Big UK Domain Data for Arts and Humanities
Funded by the UK Arts and Humanities Research Council as one of
the 21 “Big Data” projects
Collaboration between the Institution of Historical Research, Oxford
Internet Institute, British Library and Aarhus University
Develop theoretical and methodological framework for the study of
web archives
Build on ADDAA: researchers and the BL co-produce access tools
A major study of the history of UK web space from 1996 to 2013 +
sub-projects covering a range of disciplines
Also an online training course and peer-reviewed journal articles.
![Page 14: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/14.jpg)
www.bl.uk 14
Web archiving researcher bursaries
![Page 15: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/15.jpg)
www.bl.uk 15
Query building
Corpus formation and
handling
Annotation and curation
In-corpus analysis
Whole-dataset analysis
Shine
![Page 16: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/16.jpg)
www.bl.uk 16
What’s in it for us?
Helps researchers understand the value of web archives and explore new
ways of using these for scholarly research
Allows BL to obtain hands-on experience with indexing and processing
large scale web archive datasets
(Prototypes) analytics and visualisations can be applied to our own Legal
Deposit collection
Enables BL to participate in various UK, European and international
projects
Helps curators understand characteristics of large scale digital corpora
Improve the way we collet and store web archive
![Page 17: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/17.jpg)
www.bl.uk 17
Evolution of the UK web (2004 -2013)
![Page 18: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/18.jpg)
www.bl.uk 18
Memento service
![Page 19: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/19.jpg)
www.bl.uk 19
The “access” paradoxes
Completeness versus openness of web archives
Some countries don’t have Legal Deposit
Legal Deposit national collections have restricted access
Documents-centred versus data driven
Pre-selected or defined collections not relevant to all researchers;
difficulty in finding relevant content in large scale web archive.
Arbitrary (national) boundaries often irrelevant to research question
but most heritage institutions operation within certain geographical
areas
…
![Page 20: Working together to archive the UK Webncdd.nl/site/wp-content/uploads/2014/12/NCDDWorkshop_HHY_Final… · Working together to archive the UK Web Helen Hockx-Yu Head of Web Archiving,](https://reader034.fdocuments.us/reader034/viewer/2022042622/5f8395b8fc786c7c3436df9a/html5/thumbnails/20.jpg)
www.bl.uk 20
Web archives for reference AND for
analytics
Base-line knowledge self-explanatory
Focus on national events for curated
collections; provide means to assemble
research corpora
Link to what we do not have
Offer a bag of tools to support scholarly use
A way forward
Exploit open licences, changes to copyright law
Online access to selected websites, metadata and secondary datasets
The British Library Collection Development Policy for websites
Lobbying – review of Non-print Legal Deposit Regulations in 2018