BT Digital Archives Coventry University, BT plc and The National Archives
University Archives University Archives & Archive-It WebCom 2011-03-29.
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of University Archives University Archives & Archive-It WebCom 2011-03-29.
![Page 1: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/1.jpg)
University Archives
University Archives & Archive-It
WebCom 2011-03-29
![Page 2: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/2.jpg)
The Duke University Archives is responsible for the collection and management of records of
enduring value created by the University's administrative offices and academic units.
The Archives also acquires records of student, faculty and staff organizations, selected
personal papers, and books, images, audio, and other documentation about Duke
University.
![Page 3: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/3.jpg)
Archive-It Service Agreement
• $6,000 Subscription Fee• 8,000,000 URLS• 0.75 TB storage• 1-2 Active Collections• Maximum 200 Active Seeds
• Collection & Crawl Interface• Search Portal• All data will be copied to Internet Archive’s
Wayback Machine on contract termination
![Page 4: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/4.jpg)
Front Page
![Page 5: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/5.jpg)
Collection Page
![Page 6: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/6.jpg)
Page Capture Index
http://wayback.archive-it.org/1858/*/http://news.duke.edu/
![Page 7: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/7.jpg)
Page View
![Page 8: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/8.jpg)
Priorities this past year…
• Institutes & Student Groups
– Have a relatively short life
– The Archives rarely receives records transfers
from these groups
• Units with existing relationships
• Opportunities as they arise
![Page 9: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/9.jpg)
Crawl of duke.edu• Started March 4, 2011 4:34:20 PM• Completed March 7, 2011 5:46:52 PM
• Average Doc Rate 13.66 urls/sec• Average KB Rate 1,646 KB/s
• Total Documents 3,594,845• Total Data 413.2 GB
• Duke Domains Found 1,698
![Page 10: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/10.jpg)
Issues
• Capturing the “Look & Feel” of a site
• Crawler Traps (e.g. calendars)
• Junk URLS (e.g. bad CMS link generation)
![Page 11: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/11.jpg)
Robots Exclusions
We do want:• Look & Feel
– JavaScript– CSS– Images
• Policy• Publications• Events• RSS
We (usually) don’t want:• Every day of every year• Your taxonomies• Administrative pages• Maps/GIS• “Personal” pages
User-agent: archive.org_bot
![Page 12: University Archives University Archives & Archive-It WebCom 2011-03-29.](https://reader030.fdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a04172/html5/thumbnails/12.jpg)
Contact me:
Seth Shaw
Electronic Records Archivist
Duke University Archives
684.6181