Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of...
-
Upload
stuart-powers -
Category
Documents
-
view
216 -
download
0
Transcript of Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of...
![Page 1: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/1.jpg)
netarkivet
RESAW seminar, Dec 2-3, 2013
Day 1
![Page 2: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/2.jpg)
Who are we today
□ Birgit N. Henriksen, head of digital preservation, KB
□ Bjarne Andersen, head of digital preservation, SB
□ Eld Zierau, developer and researcher, KB
□ Ditte Laursen, curator and researcher, SB
□ Henrik Smith-Sivertsen, researcher, KB
![Page 3: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/3.jpg)
Organization
□ a virtual center (SB/KB – IT development, IT operation, Collection department)
□ steering committee□ daily manager□ editorial advisory board
![Page 4: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/4.jpg)
Collection policy
□ Legal deposit law 2005: ”Materials made public via electronic communication network”
□ Danish materials Websites on the .dk TLD Websites minded on a Danish audience /
written in Danish Websites about Danish people (Hans
Christian Andersen etc.) More or less any site of interest to Denmark
![Page 5: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/5.jpg)
Collection strategies
□ 4 strategies■ 4 annual snapshots (KB)
□ ensure the wide picture
■ Selective harvesting of 80 domains (SB)□ ensure frequently updated websites
■ Event-harvesting of 2-3 national events per year (KB/SB)□ 2013: Teachers’ lockout, International Melodi
Grandprix, Danish local elections, Election of the pope (IIPC) …
■ Special havests (KB/SB), ie. wikileaks, kriseinfo.dk, nyalliance.dk …
![Page 6: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/6.jpg)
Collection strategiescoverage
time
snapshotselective
event
special
![Page 7: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/7.jpg)
Access
□ The archive contains sensitive personal data, therefore the entire archive is considered sensitive■ only researchers including PhD students can be
granted access□ if research on sensitive personal data, the Data
Protection Agency assesses the application□ if not, the library assesses the application□ the Copyright Act defines research as being from
PhD level and up□ the Privacy Act defines research as something with a
’scientific purpose’
□ Netarkivet is working on a wider access■ for students and for the general public■ small corpus
![Page 8: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/8.jpg)
Use of the archive
□ Only a handful active researchers■ no user friendly way of accessing the archive■ lack of knowledge about the archive■ new kind of data source
□ Research projects – examples■ dr.dk’s history 1996-2006■ the history of internet newspapers■ the mediation of art in the network society■ the digital music revolution – the case of Sys Bjerre■ Danish parlimentary elections 2007-2011…
![Page 9: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/9.jpg)
Technical setup
□NetarchiveSuite (open source)□44 servers, 260 running java apps□WayBack-machine□Batch-jobs□Full-text indexing experiments□ARC/WARC
![Page 10: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/10.jpg)
Some numbers
□ Total: 414 TB – 13 billion objects Snapshots: 353 TB Selective: 47 TB Events: 13 TB
□ One snapshot: approx. 30 TB (2006: 9 TB)
![Page 11: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/11.jpg)
Current challenges
□ wider access□ better access (free text search)□ inclusion of older net collections □ collection of websites with restricted access□ advanced websites, ie. with sound/video/live
interaction (chat, virtual worlds …)□ electronic communication networks ≠ the web □ long-term preservation□ documentation
![Page 12: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/12.jpg)
2013-2014
Tools search - free text indexes harvesting - the use of Heritrix3 and Live
Archiving proxy
Infrastructure web archives as part of a research infrastructure access to archived material using Persistant
Identifiers
Archiving methods capturing online games automatic methods to locate relevant Danish web
materials outside the Danish TLD .dk
![Page 13: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/13.jpg)
Ongoing activites related to RESAW’s topics
□ API improvement / so-called service layer
□ corpus building□ documentation□ full-text search□ statistics
□ legal aspects (ie. broader access, data mining policy)
![Page 14: Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bdb7a9/html5/thumbnails/14.jpg)
What is the RESAW project in 10 years?
□ a very strong partner to IIPC□ common infrastructure across
borders (ERIC / ESFRI status)□ coordinated european collection
building