Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Preservation for Ongoing Accessibility: research group
Professor Ross Harvey
Dr Bob Pymm
Dr Anne Lloyd
Geoff Fellows
Jake Wallis
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Pandora - http://pandora.nla.gov.au
• NLA solution to website preservation
• Archive of over 1.7 terabytes of data
• selective - identifies specific sites for harvest and gains permission to archive
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Internet Archive - http://www.archive.org/
• Automated
• Harvests ‘the web’
• issues?– cost – reliability of the crawl eg deep web
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
.au Harvest by Internet Archive
• first ran 2005 - producing 6.9 terabytes of data, 185 million unique files
• Issues?– difficulties with certain file types– password-protected sites– difficulty in accessing the ‘deep’ web
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
.au Harvest
• September 2006 – more sophisticated crawl
• 19 terabytes of data, 596 million files
• predominant dataset for POA group
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Research potential?
• digital preservation
• Australian digital culture
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
3 broad questions
• What are the contents of the harvests?
• How can access be provided to this content?
• What is the value of the domain harvests in relation to the NLA’s overall web preservation interests?
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Blogs
• low skill threshold technology
• as barometer of engagement
• social space
• catalyst for online community
• a new and important collecting point for digital cultural heritage
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Archiving and preserving blogs• how to identify Australian specific material?• what to capture
– selection criteria?– linked material?
• frequency of capture to ensure accurate representation• provision of access to harvested blog content
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Aspirations
• a conceptual framework for studies in digital anthropology
• a broadening of voices within the Australian public sphere
Separating the wheat from the chaff: Identifying key elements in the NLA .au domain harvest
Questions/comments?
Top Related