ARCHIVING PRESERVING WEB CONTENT and OCUL Archi… · ARCHIVE-IT: A WEB ARCHIVING SERVICE A...

Post on 14-Jun-2020

1 views 0 download

Transcript of ARCHIVING PRESERVING WEB CONTENT and OCUL Archi… · ARCHIVE-IT: A WEB ARCHIVING SERVICE A...

ARCHIVING & PRESERVING WEB CONTENT

THE INTERNET ARCHIVE

What?A non-profit digital library and archive

Where? San Francisco, CA

When? Who?Founded in 1996 by Brewster Kahle

How?Officially designated a library by the state of California in 2007

THE WAYBACK MACHINE

Online: https://archive.org/web/

The largest publicly available web archive in existence.

> 280 Billion Pages > 100 million websites> 150 languages ~ 1 billion URLs added per week

WEB ARCHIVING

What is a web archive?A collection of archived URLs grouped by theme, event, subject area, or web address.

A web archive contains as much as possible from the original resources and documents their change over time. It is a priority to recreate the same experience a user would have had if they had visited the live site on the day it was archived.

THE LIFESPAN OF A WEBSITE

How long does a website last?

In general, a typical web page can be expected to last ~90-100 days before changing, moving, or disappearing completely.

> In 2013, our colleagues at Old Dominion University determined that over 10% of event related content posted to social media platforms is lost after one year.

> In 2014, a study by UCLA determined that 7-in-10 scholarly articles that include citations with hyperlinks suffer from reference rot.

ARCHIVE-IT: A WEB ARCHIVING SERVICE

A web-based application launched in 2006 that allows users to create, manage, access and store collections of web-based digital content.

A fully hosted solution, including access and storage.

A suite of tools for selecting and scoping, and cataloging.

Provides the ability to capture content using 10 different frequencies.

Archived web content includes: html, text, videos, audio, social media, PDF, images, password protected content, static databases and newspapers.

Browse archived content 24 hours after a capture is complete; full text search is available within 7 days.

Private access options are available.

HOW IS ARCHIVE-IT DIFFERENT THAN THE GENERAL/GLOBAL WAYBACK?

Focused collections

Control over scope and frequency

Technical support

All content and metadata indexed for search

Archived data shipped/downloaded

Private access options

Available 24 hours after captured

Subscription service

One collection

Snapshot

Automated

Search and cataloging not available

Shipping/download not available

Public access only

Access varies

Absolutely free

WHAT OUR PARTNERS ARE COLLECTING...

ARCHIVE-IT USE CASES

Create a thematic/topical web archive on a specific subject or event> Often related to traditional collecting activity around the same topical focus> Capture spontaneous events> Document different perspectives and social commentaries

Fulfill a mandate to capture/preserve evolving web history> Construct a historical record of an institution or individual’s web/social media presence> Support an electronic records system to meet records retention requirements> Collect publications/documents that are no longer in print form

Closure crawls> Document a public institution’s presence on the web before it changes or closes

UNIVERSITY OF ALBERTA: ALBERTA FLOODS JUNE 2013

Use Case:Archive web content before, during, and after the 2013 Alberta floods

> Personal and institutional blogs > News articles > Institutional websites

WILFRID LAURIER UNIVERSITY

> Document the university’s social media presence

Use Cases:

> Archive the university’s web presence in order to meet required records retention mandates.

ACCESS TO COLLECTIONS

Partners: > Can view through private web application with login/password

General Public:

> Can view from Archive-It’s website: http://www.archive-it.org/

> Search Archive-It data and metadata from institutional domains

> Landing Pages: branded pages that link back to Archive-It hosted data

EXAMPLES OF ORGANIZATIONS’ LANDING PAGES

Library of VirginiaUniversity of Texas at Austin

PRIVATE ACCESS OPTIONS

> Entire account

> Individual collections

> Specific URLs

> IP address

STORAGE AND PRESERVATION

Storage:

> 2 copies (primary & backup) of archived data are stored at San Francisco data centers.

> A third copy is transferred to the General Archive.

> A copy of archived data can be shipped on a hard drive

> Partners can always download their archived data from Internet Archive’s servers.

Preservation partnerships:

> 2008: LOCKSS

> 2013: DuraCloud

> 2017: Multiple in development...

DATA REPOSITORY

KEY ARCHIVE-IT FEATURES

> Different levels of access for account users

> Ten available capture frequencies (from twice daily to yearly)

> Browse collections by URL, search by full-text and metadata

> Detailed post crawl reports for analysis

> Quality Assurance (QA) tools

> Online Help Center and User Manual

> Web Archivists and technical support

> Hosting, access, and redundant storage

SUBSCRIPTION MODEL

> Annual, renewable subscription

> Subscription levels vary by the amount of archived data archived

> Factors include: type and number of sites, how large they are, and how frequently they are archived

> All subscriptions include hosting, access, and perpetual storage (primary and backup)

TIME COMMITMENTS

Staff dedicated to web archiving programNDSA, Web Archiving in the United States: A 2016 Survey

58%

13%

5%

5%

19%

THE WEB ARCHIVING LIFE CYCLE

http://www.archive-it.org/publications

COMPLIMENTARY TRIAL

Create a collection of up to 5 websites, archive content, and view the results!

ARCHIVE-IT WEB APPLICATION DEMO

STO

LEARN MORE

Check out our blog: www.archive-it.org/blog

Follow us on Twitter: @archiveitorg

Like us on Facebook: https://www.facebook.com/ArchiveIt

Questions? ait@archive.org

THANK YOU!