F1 hadar miller__israeli_internet_archive-nli
-
Upload
evaminerva -
Category
Education
-
view
207 -
download
1
description
Transcript of F1 hadar miller__israeli_internet_archive-nli
“ArchioNet”Israeli Internet
DomainArchive
Agenda
oNLI Digital Library Infrastructure
o“ArchioNet” Project Scope
oTechnical Issues
oThe Project in Numbers
oLegislation
oWhat’s Next
NLI Digital Library Infrastructure
“ArchioNet” Project scope
• Why do we need this project ?
• What do we harvest?
• Phase A : .IL web site
• Phase b : Hebrew characters sites
• How to enable accessibility:
• Phase A : “Way back machine” in NLI Only , “Archionet” Only.
• Phase B : Over the Web , Cross Reference Discovery.
• When we started?
• Phase A : 2 full crawl annually started September 2013
• Phase B : additional 4 subject based crawl annually.
• Where to execute the harvest ?
• Phase A : NLI with Internet Archive.
• Phase B : NLI Infrastructure
Technical Issues
• Which Crawler ( version ) to use ?
• Cataloguing and Search tool
• What to harvest ?
• Seeds is needed
• Depth of a site
• Robots.txt
• The Deep Web
• How to store and preserve a WARC file
• Virus Detection
• System Architecture
The Project in Numbers
•~220K web sits
•0.5 Giga byte/Site
•~100 Tera / Harvest
•Avg page lifetime ~ 100 days
•2 Full Harvest - Annually
Legislation
•Can NLI Harvest
•Where is it accessible ?
•Intellectual Properties
•What can/should we block ?
Thank You
Back