Lsr vpresntation
-
Upload
jarcherumd -
Category
Documents
-
view
242 -
download
0
Transcript of Lsr vpresntation
![Page 1: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/1.jpg)
Problems and Issues in Selecting, Harvesting, and Cataloging Web
Resources
Joanne Archer and John SchalowUniversity of Maryland Libraries
![Page 2: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/2.jpg)
Jargon
CrawlerWeb Harvesting
Seed
Harvest
Crawl
![Page 3: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/3.jpg)
Wayback Machine
![Page 4: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/4.jpg)
Options for Web Harvesting
In House Program
i.e. Pandora, Web Curator Tool
Pro: flexibility
Con: $$$
i.e. HTTrack, Adobe Web Capture
Pro: inexpensive
Con: not-scalable
Off the Shelf
Software
Third Party
Subscription
i.e. Web Archiving Service
Archive-It
Pro: Ease-of-use
Con: $
![Page 5: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/5.jpg)
Key Questions for Harvesting Projects
unique
ness
ephemerality
research valueharvest frequency
scope
![Page 6: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/6.jpg)
Maryland’s Pilot Harvests(2008-2010)
Historic Preservation Maryland State Documents
![Page 7: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/7.jpg)
Why harvest these areas?
• Collections are unique
• Builds on existing strengths in print collections
• Large amount of material migrating to the web
![Page 8: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/8.jpg)
Key Questions for Harvesting Projects
unique
ness
ephemerality
research valueharvest frequency
scope
![Page 9: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/9.jpg)
Harvesting
![Page 10: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/10.jpg)
Harvesting Challenges:• Javascript• Streaming media• Form and database driven content• Password protected sites• Robot.txt files• Multiple hosts/subdomains
![Page 11: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/11.jpg)
Single host = www.preservemd.org
Multiple hosts = www.umd.edu
www.lib.umd.edu
![Page 12: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/12.jpg)
End-User Access
![Page 13: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/13.jpg)
End-User Access
collection note
subjectheading
general material designation
URLs
uniform title
![Page 14: Lsr vpresntation](https://reader035.fdocuments.us/reader035/viewer/2022062313/558921f9d8b42a8c508b46f4/html5/thumbnails/14.jpg)
Conclusions
Challenges• Start up costs• What to collect• Metadata creation
BUT We are well prepared to meet the challenges