ARCHIVING PRESERVING WEB CONTENT and OCUL Archi… · ARCHIVE-IT: A WEB ARCHIVING SERVICE A...

ARCHIVING & PRESERVING WEB CONTENT

THE INTERNET ARCHIVE

What?A non-profit digital library and archive

Where? San Francisco, CA

When? Who?Founded in 1996 by Brewster Kahle

How?Officially designated a library by the state of California in 2007

THE WAYBACK MACHINE

Online: https://archive.org/web/

The largest publicly available web archive in existence.

> 280 Billion Pages > 100 million websites> 150 languages ~ 1 billion URLs added per week

WEB ARCHIVING

What is a web archive?A collection of archived URLs grouped by theme, event, subject area, or web address.

A web archive contains as much as possible from the original resources and documents their change over time. It is a priority to recreate the same experience a user would have had if they had visited the live site on the day it was archived.

THE LIFESPAN OF A WEBSITE

How long does a website last?

In general, a typical web page can be expected to last ~90-100 days before changing, moving, or disappearing completely.

> In 2013, our colleagues at Old Dominion University determined that over 10% of event related content posted to social media platforms is lost after one year.

> In 2014, a study by UCLA determined that 7-in-10 scholarly articles that include citations with hyperlinks suffer from reference rot.

ARCHIVE-IT: A WEB ARCHIVING SERVICE

A web-based application launched in 2006 that allows users to create, manage, access and store collections of web-based digital content.

A fully hosted solution, including access and storage.

A suite of tools for selecting and scoping, and cataloging.

Provides the ability to capture content using 10 different frequencies.

Archived web content includes: html, text, videos, audio, social media, PDF, images, password protected content, static databases and newspapers.

Browse archived content 24 hours after a capture is complete; full text search is available within 7 days.

Private access options are available.

HOW IS ARCHIVE-IT DIFFERENT THAN THE GENERAL/GLOBAL WAYBACK?

Focused collections

Control over scope and frequency

Technical support

All content and metadata indexed for search

Archived data shipped/downloaded

Private access options

Available 24 hours after captured

Subscription service

One collection

Snapshot

Automated

Search and cataloging not available

Shipping/download not available

Public access only

Access varies

Absolutely free

WHAT OUR PARTNERS ARE COLLECTING...

ARCHIVE-IT USE CASES

Create a thematic/topical web archive on a specific subject or event> Often related to traditional collecting activity around the same topical focus> Capture spontaneous events> Document different perspectives and social commentaries

Fulfill a mandate to capture/preserve evolving web history> Construct a historical record of an institution or individual’s web/social media presence> Support an electronic records system to meet records retention requirements> Collect publications/documents that are no longer in print form

Closure crawls> Document a public institution’s presence on the web before it changes or closes

UNIVERSITY OF ALBERTA: ALBERTA FLOODS JUNE 2013

Use Case:Archive web content before, during, and after the 2013 Alberta floods

> Personal and institutional blogs > News articles > Institutional websites

WILFRID LAURIER UNIVERSITY

> Document the university’s social media presence

Use Cases:

> Archive the university’s web presence in order to meet required records retention mandates.

ACCESS TO COLLECTIONS

Partners: > Can view through private web application with login/password

General Public:

> Can view from Archive-It’s website: http://www.archive-it.org/

> Search Archive-It data and metadata from institutional domains

> Landing Pages: branded pages that link back to Archive-It hosted data

EXAMPLES OF ORGANIZATIONS’ LANDING PAGES

Library of VirginiaUniversity of Texas at Austin

PRIVATE ACCESS OPTIONS

> Entire account

> Individual collections

> Specific URLs

> IP address

STORAGE AND PRESERVATION

Storage:

> 2 copies (primary & backup) of archived data are stored at San Francisco data centers.

> A third copy is transferred to the General Archive.

> A copy of archived data can be shipped on a hard drive

> Partners can always download their archived data from Internet Archive’s servers.

Preservation partnerships:

> 2008: LOCKSS

> 2013: DuraCloud

> 2017: Multiple in development...

DATA REPOSITORY

KEY ARCHIVE-IT FEATURES

> Different levels of access for account users

> Ten available capture frequencies (from twice daily to yearly)

> Browse collections by URL, search by full-text and metadata

> Detailed post crawl reports for analysis

> Quality Assurance (QA) tools

> Online Help Center and User Manual

> Web Archivists and technical support

> Hosting, access, and redundant storage

SUBSCRIPTION MODEL

> Annual, renewable subscription

> Subscription levels vary by the amount of archived data archived

> Factors include: type and number of sites, how large they are, and how frequently they are archived

> All subscriptions include hosting, access, and perpetual storage (primary and backup)

TIME COMMITMENTS

Staff dedicated to web archiving programNDSA, Web Archiving in the United States: A 2016 Survey

THE WEB ARCHIVING LIFE CYCLE

http://www.archive-it.org/publications

COMPLIMENTARY TRIAL

Create a collection of up to 5 websites, archive content, and view the results!

ARCHIVE-IT WEB APPLICATION DEMO

LEARN MORE

Check out our blog: www.archive-it.org/blog

Follow us on Twitter: @archiveitorg

Like us on Facebook: https://www.facebook.com/ArchiveIt

Questions? ait@archive.org

THANK YOU!

ARCHIVING PRESERVING WEB CONTENT and OCUL Archi… · ARCHIVE-IT: A WEB ARCHIVING SERVICE A...

Documents

Transcript of ARCHIVING PRESERVING WEB CONTENT and OCUL Archi… · ARCHIVE-IT: A WEB ARCHIVING SERVICE A...

Tool Academy: Web Archiving

Web Archiving

Web Archiving Challenges and Opportunities Presentation for Web archiving Engineering position

From web archiving to web collecting

K12 Web Archiving Program Lori Donovan Coordinator, K12 Web Archiving Program Internet Archive.

THE WEB ARCHIVING LIFE CYCLE MODEL - Archive-It - Web ... · The Web Archiving Life Cycle Model is an attempt to incorporate the technological and programmatic arms of web archiving

OCUL Annual Report 2015-2016 copy Annual Report 2015-2016_… · Brock University Library . OCUL Annual Report 4 2015-2016 OUR MEMBERS OCUL is a consortium of Ontario’s 21 university

Fall 2018 Web Archiving Updates · 11/27/2018 · overview •IIPC Web Archiving Conference •LOCKSS + web archiving •LAAWS •WASAPI •Ivy Plus network “LAX on take off”

Web archiving meeting 2013 blog archiving (Trochidis Ilias - Tero LTD)

CDL's Web Archiving System

SAA 2015 Web Archiving Roundtable

Web archiving challenges and opportunities

Future of web archiving

The promise of web archiving in Belgium - kbr.be fileWhat is web archiving? Definition of web archiving by the International Internet Preservation Consortium “Web archivingis the

UBC Library Web Archiving 2016

web archiving tools and technologies

Introducing OCUL - eprints.rclis.orgeprints.rclis.org/7030/1/oculcla04.pdf · Introducing OCUL Ontario Council of University Libraries ... National Library, ILL Manager (completing

Web Archiving Claudia Niederée, Gideon Zenz€¦ · Web Archiving Claudia Niederée, Gideon Zenz Web Science Lecture November 30, 2010 Web Archiving, November 30, 2010 1 . Structure

WEB ARCHIVING : Πρακτικές Ψηφιακής Αρχειοθέτησης

Archiving the Mobile Web