Two little talks CrossRef Membership Meeting November, 2004.

Post on 29-Dec-2015

214 views 1 download

Tags:

Transcript of Two little talks CrossRef Membership Meeting November, 2004.

Two little talks

CrossRef Membership MeetingNovember, 2004

* Appropriate copy issue

* Some ruminations on digital preservation

Appropriate copy issue…

Talk One

A reminder

“Appropriate copy” problem is about which copy a user is

directed to

Any old system

CitationDOI

Step1

Step2 DOI Resolver

DOI

URL

Cited article

Search response

RepositoryURL

Article

Step3

DOI resolution

CLICK

But – what if more than 1 copy exists?

• Elsevier journals, for example, are on-line at:– Elsevier ScienceDirect– OhioLink– University of Toronto

Which URL?

DOIResolver

DOI

URL?

Sciencedirect.com?

Ohiolink.edu?

Utoronto.ca?

The APPROPRIATE copy

When more than 1 copy exists, specific populations frequently have the right to access specific copies

DOI localization

• Architecture created by CrossRef, CNRI, some publishers, and group of digital librarians

• Implemented in 2002

Any old system

CitationDOI

Step1

Step2 DOI Resolver

DOI

Search response

Localization architecture

CLICK

DOI proxyDoes user havelocalization?

Locallink

server

Y

N

Redirect resolutionfor local decision

making

Local link servers

• Directs user based on local business arrangements

• Can provide rich services– the right digital copy, a paper copy, other works

by the author…

• Also provides a place in the architecture to insert proxies for off-campus users

• Now widely implemented and heavily used

Local link serving is VERY popularSFX Requests per Month

2004

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8 9 10

Requests

A new concern

(and CrossSearch…)

Google – what happened to the DOI?

Most journal article linkslook like this!

Viewed 40 CrossSearch results pages to find a DOI…

The problem…

I clicked this

and got…

But Harvard subscribes!

Frustration!

Just as we’ve gotten local linkingto work with A&I services, journal

references, and the DOI in general…

publishers are filling Google with direct links to their copies!!

Talk 2

Some ruminations on digitalpreservation

Role of publishers in digital preservation?

After years of talk, this remainsmurky, very murky…

but it is certain that “none” is not the answer!!

1. My most important point

Cost and effectiveness of preservation is determined at or

near the point of creation

Think up front

* about format

* about metadata

* about quality

Format

Formats vary significantly in “preservability”

Format

• Some criteria (from Library of Congress)– disclosure (how well documented?)– adoption (how widely used?)– transparency (is compression used?)– self documenting (good!)– external dependencies (self sufficiency is good)– patents (could limit preservation actions)– encryption (what if decryption key is not available?)

Different formats for different purposes

* archival master

* production master

* use copy

Metadata

• The basis of decision-making for preservation– technical metadata

• what format is this in

• what format options are used

– structural• if I change this, what else is affected?

– administrative• who has the right to make decisions about this?

Metadata

– relationships• are there other versions of this object?

– how do these affect my preservation strategy?

– provenance• where did this come from?

• what changes has it already undergone?

Key difference between preservationrepositories and content management systems

Quality

If that archival version is bad when youput it on the shelf, it will still be bad

10 years later when you need it…

and it will be hard to go back to the creator at that point!

2. There is a LOT happening in the domain

…are you watching?

Preservation initiatives

• OAIS “Open Archival Information System” reference model– Formal, structured model for designing digital

preservation archives– ISO standard

• PREMIS (PREservation Metadata: Implementation Strategies)– Define core metadata by end of year– Survey of current practices just published

Initiatives…

• Format registry– Definitive sources of description for technical

formats– community effort to share effort of documenting

digital formats

• RLG/NARA Digital Repository Certification Task Force– recommend structure and metrics of an international

process for certifying preservation repositories

Initiatives…

• JHOVE (JStor/Harvard Object Validation Environment)– Open source tool to identify format of an object,

generate technical metadata from an object, test to see if object is well-formed

• Library of Congress NDIIPP– Define a shared national program of digital preservation

– Well funded: $100M from Congress, $75M matching contributions

NDIIPP national preservation grants

• Web archiving (California Digital Library)• Geographic information

• UC Santa Barbara• North Carolina State

• Digital television (Educational Broadcasting Corporation)• Digital archives (Emory)• Selection for preservation (U Illinois)• Business history (U Maryland)• Social science data sets (InterUniversity Consortium for

Political and Social Research)

Other NDIIPP grants

• Repository interoperation (Stanford, Johns Hopkins, Harvard, Old Dominion)

• Architecture and tools (Los Alamos National Laboratory)

• Research in digital preservation (together with National Science Foundation)

Major programs abroad

• National Library of Australia

• British Library– and a UK national Digital Preservation

Coalition

• Koninklijke Bibliotheek (National Library of the Netherlands)– major digital preservation research program

3. Think of 50 years, not 5 years

The questions are different:* discontinuous technological change

* loss of “common knowledge”* very antique formats

Thus the need for deep documentation and metadata…

4. So many things to preserve

• GIS, survey and economic data, visual resources, research datasets, web stuff, institutional records, faulty papers, audio & video, visualizations, blogs, newsletters, etc.

• Setting priorities– fleeting things demand immediate attention

• “the web”…

– attend to your own house first• faculty output, library digitization, institutional records

A lot to do….

Where does the formal literaturefit in setting priorities?

What will be the role ofdigital copyright deposit?

5. Paying for a common good

• Only one or a few institutions need to archive a given resource

• Two related questions– motivation: why would you not wait until the other follow does

it?– if I do it, can I get others to share the cost?

• Digital is different than paper– Costs of preservation more apparent– Possibility of remote access means you don’t have to do it locally

• Fundamental question, now topic of research– NSF digital preservation grant program– OCLC research paper:

• Brian Lavoie, The Incentives to Preserve Digital Materials

6. LOCKSS is not preservation

• LOCKSS ignores most of the key issues– format– metadata– management– reformatting– repository…

• LOCKSS is great technology for distributed replication, but does not truly address preservation

7. “Hand-off” is a critical component

• What happens if there is one archival copy, and the repository gives up responsibility?– priorities change, institutions come and go…

• Handing off responsibility is a repository’s final preservation action

• How does this relate to publishers?

Lastly…what about preserving e-journals?

• Well, we have the KB, maybe the JStor archive, and LOCKSS(?)…

• Some movement on national digital copyright deposit

• The library/publisher dialog of a few years ago needs to be re-invigorated!

• In the mean time, publishers are hopefully paying attention…