UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly...

26
UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: [email protected]

Transcript of UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly...

Page 1: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Preserving Web Sites

Brian KellyUKOLNUniversity of BathEmail: [email protected]

Page 2: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Contents

• Why Is Web Site Preservation An Issue?• The Nightmare Scenario• Administrative Issues• Technical Challenges • What Is My Web Site?• What Is My Preferred Future For My Web

Site?• Mothballing Procedures• Lessons For Future Work• Questions

Page 3: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Why Is Web Site Preservation An Issue?Digital Resources Don't Rot

• Digital resources (images, video, software, Web sites, …) don't degrade due to environmental factors. This is a key difference with physical resources.

• Web sites are made from various digital resources: HTML pages, GIF, JPEG, etc. image files, PDF resources, software (CGI scripts, JavaScript, etc.)

• These won't degrade so why is Web site preservation an issue?

• Isn't the fact that old Web sites won't disappear and may be embarrassing more of a challenge?

Page 4: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Digital Resources Do Rot!

In fact digital resource do 'rot':• Operating systems are upgraded and existing

applications case to work• Security holes are identified and there is a need to

install patches• Resources may be dependent on external

resources (e.g. links, news feeds, …) which may disappear

• Resources may be hosted by external services and there is a need for ongoing funding for the hosting

• …

Page 5: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

The Nightmare Scenario

To be avoided:• The funding finishes• Project staff leave, partnership dissolves• Hosting agency upgrades operating system,

resulting in scripts to access resources from backend database are broken

• User finds page with invitation to project launch and travels to meeting. Unfortunately the event took place in 2002.

• Invoice for domain name is not paid, as administrator has left.

• Web site domain taken over by porn company• Prime Minister picks up pen containing project URL

and visits pornographic Web site

Page 6: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

It Has Happened

Webtechs.com• Software company which

hosted early HTML validation service

• In 1998/99 confusion over payment of domain name

• March 1999 company receives many messages saying validation service is now a porn site

• Over 30,000 links to Web site!

• Sept 1999 porn company agrees to sell domain name back to Webtech

Page 7: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

The Embarrassment Still Exists

The hijacked Web site can still be accessed using the Internet Archive's Wayback Machine.

Note that the archived Web site contains JavaScript (and Active X controls?) which could delete data on the viewer's PC

See <http://www.exploit-lib.org/issue1/webtechs/>See <http://www.exploit-lib.org/issue1/webtechs/>

Page 8: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Related Incidents

Grove • Grove Art and Music set up Web site in mid 1990s• Company decides cost of managing and

maintaining Web site isn't justified• Company relinquishes domain• Domain bought by porn company• Grove belatedly realise they have made a mistake

Lynx • A Web site which provides advice on the Lynx text

browser decides to get a 'better' domain name• The old domain name is taken over by a porn

company• The author uses the Web site in his workshops!

Page 9: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

A Possible Scenario For NOF-digi

A potential scenario:• Cultural heritage Web site developed• Heritage body has limited networking expertise• Domain name lapsed due to lack of knowledge of

terminology ("What's a DNS? Is this invoice legit?")• Once virtual domain name lapsed, accesses go to

service developer's Web site • Service developer's Web site has links to Web sites

they've built (including some of a dubious nature)• Once address expires in DNS caches links go to a

porn company• Cultural heritage gateway points to a porn site!

Let's ensure that this doesn't happenLet's ensure that this doesn't happen

Page 10: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

A Web site isn't just for Christmas, it's for life!

The lessons:• You need to be aware that Web sites developed

using short-term project funding need to be kept for a long period after funding finishes

• Porn domain name pirates are looking for Web sites whose domain name has expired

• Web sites which are well-linked and easily found using Google are particularly attractive to porn pirates

• Avoid this happening to your Web site• Avoid your Web site linking to sites you link to

Page 11: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Other Administrative Issues

Digital Signatures• You buy a digital signature

which identifies your Web site as belonging to a legitimate organisation

• The digital signature is used for (a) the encryption of credit card details and (b) use of an Intranet

• You fail to renew the signature / renewal not accepted as the consortium is not a legal entity

• Users see "Non-valid signature" message

http://www.bathimpact.com/io_article.php?section=On%20Campus&ref=48

http://www.bathimpact.com/io_article.php?section=On%20Campus&ref=48

Page 12: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

What Is My Web Site?

What do we mean by my Web site?

What purposes could be provided by my Web site?• The public Web site which users see• Several Web services used by users (e.g.

www.foo.org.uk, search.foo.org.uk, …)• The Web site containing a public area and a private

area for use by consortia members• A public Web site and a private one• Several public Web sites, one for each member of

the consortia

See <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-15/>

See <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-15/>

Page 13: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

The Preferred Future For My Web SiteAfter the project funding finishes:

• The project money has helped pump-prime an activity which is core to my organisation's mission. The project Web site will be developed through my organisation's existing funding streams.

• We'd like to build on the work. We're looking for new funding streams.

• We've decided we don't want to engage in the e-world. We'd like someone to take the Web site off our hands (we don't want it to become a porn site!)

• We haven't given any though to this. Anyway we're all left the project.

Page 14: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Technical Issues

Standards And Formats• Has the Web site been designed using open

standards, which should help future-proofing?• Have proprietary formats been used (for which

backwards compatibility may not be considered)

Architecture & Implementation• Has the technical architecture of the Web site been

documented?• Can I continue to use technical systems after

funding has finished

Page 15: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Content Issues

Accuracy:• Is the content of my Web site accurate today –

and tomorrow• Could the content of my Web site be misleading

Usability:• Are links working today – and tomorrow

Legal:• Is my Web site legal today (accessibility;

copyright; defamation; IPR; …)?• Will my Web site be legal tomorrow, if new

legislation is enacted?

Page 16: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Mothballing Your Web Site (1)

Before funding finishes you should take steps for the mothballing of your Web site:

• Run a link check across the Web site. Fix broken internal links and as many external links as is reasonable. Document the link report.

• Run HTML (and CSS) validation checks across the Web site. Fix as many invalid pages as is reasonable. Document the findings.

• Run an accessibility check across the Web site. Fix as many inaccessible pages as is reasonable. Document the findings.

This should not be an onerous task if you have following NOF-digi guidelines. Note that errors found later occurred after your funding finished.

This should not be an onerous task if you have following NOF-digi guidelines. Note that errors found later occurred after your funding finished.

Page 17: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Mothballing Your Web Site (2)

You should also address technical areas:• Remove any backend scripts which are no longer

needed (e.g. online booking forms for old events).• Remember that scripts, etc. are liable to go

wrong. Ensure that applications are configured to break gracefully and provide meaningful errors: The config.ssi is missing. This should be

reported to the systems administrator (email [email protected] or ring +44 020 123 123. Please provide the URL of the broken page and the project name)

Apache error 6963

Page 18: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

This Web si

te is no lo

nger maintained.

This Web si

te is no lo

nger maintained.

See home page for d

etails

See home page for d

etails

Mothballing Your Web Site (3)

You should also address the content of your Web site:• Clarify the status of the Web site on the home

page.• Ensure the tense of the content reflects the

position i.e. don't say "This project will …"• Ensure that contact details will remain valid i.e.

provide generic email addresses not an individuals

• Remember that many users will arrive deep in your Web site (e.g. using Google). If necessary use CSS to flag all pages with a watermark

See <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-04/>

See <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-04/>

Page 19: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Mothballing Toolkit

UKOLN and AHDS are developing a QA methodology by QA Focus work which is funded by JISC

We have developed an automated self-assessment toolkit which, although aimed at JISC projects, is free to use by all

http://www.ukoln.ac.uk/qa-focus/toolkit/mothballing/

http://www.ukoln.ac.uk/qa-focus/toolkit/mothballing/

Page 20: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Testing Repurposing Of Your Web Site You may find that:

• Your Web site is repurposed by third parties• You wish to move your Web site to another location

In order to check that repurposing can happen without errors you should think about testing the process:

• If you have a PDA use Avantgo.com (or similar) tool to access Web site on another device

• Use a Web site mirroring tool (e.g. HTTrack) to copy your Web site to your desktop PC

Such tools can:• View your Web site will look on other devices• Spot potential problems for mirroring your Web site

See <http://www.ukoln.ac.uk/qa-focus/documents/briefings/briefing-05/>

Page 21: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Lessons For The Future

How easy is it for you to implement mothballing techniques?

• You may find that deploying a watermark on every page of your Web site is time-consuming to implement

• Any difficulties encountered with your NOF-digi project should be noted, and lessons learnt should be applied for future development work

• Think about preservation from the original planning stage for a Web site

Page 22: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Case Study - Exploit Interactive (1)

Exploit Interactive:• EU-funded ejournal available at

<http://www.exploit-lib.org/>• Funded from Jan 1999 – Dec 2000• Web site is still hosted locally

Issues:• Should we continue hosting domain after 3 years?• What is the cost of this (domain name registration,

disk storage, system maintenance)?

Page 23: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Case Study - Exploit Interactive (2)

Findings:• Disk storage is 4Gb (large proportion is log files)• A 30 Gb disk drive costs ~ £40• It was decide to run an annual link check of the Web

site. Although there were broken links to external sites, the internal links all worked.

• It was estimated that it would take about 30 minutes / year to run a link check and document findings.

• A policy for the ongoing provision of the Web site was agreed

• See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-17/>

Page 24: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Short-Medium Term Access Strategy

• We will seek to ensure the Web site continues for at least 10 years after the end of funding.

• We will seek to ensure that the Web site continues to function.

• We will not fix broken links to external resources.• We will not fixing non-compliant HTML resources.

We will use the following procedures:• We will have internal administrative procedures to

ensure that the domain name bill is paid.• We will record disk space usage and provide an

estimate of the cost of providing disk space • We will run a link checker annually and record the

nos. of internal broken links. We will keep an audit trail to see if internal links start breaking.

• Any changes to the policy … need to be agreed by an appropriate management group.

Page 25: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Conclusions

To conclude:• Web sites can disappear• They may reappear as porn sites!• Organisations should ensure they have

procedures to ensure this does not happen• You should developed a medium term Web site

preservation strategy • You should test mirroring of your Web site• You should seek to address such issues at the

planning stage of your Web site

Page 26: UKOLN – a centre of expertise in digital information management Preserving Web Sites Brian Kelly UKOLN University of Bath Email: B.Kelly@ukoln.ac.uk.

UKOLN – a centre of expertise in digital information management

Questions?