Europeana Newspapers LFT Infoday Thompson

Post on 10-Jul-2015

113 views 0 download

Transcript of Europeana Newspapers LFT Infoday Thompson

Digitisation at the Wellcome Library:

Lessons learned & shared.

Historical Newspapers in the Digital Age, Bolzano

October, 2014

Dave Thompson

Digital Curator, Wellcome Library

The Wellcome Library

• Part of Wellcome Collection, astonishing public

venue in London developed by the Wellcome

Trust. Where people can learn more about

medicine through the ages & across cultures.

• More than 10,000 readers visit us each year,

including historians, academics, students, health

professionals & consumers, journalists, artists &

members of the general public.

Digitisation in the Wellcome Library

• Strategic approach, conscious planned decisions.

• Library transformation strategy, physical to digital.

• From ‘project’ to ‘production’.

• Digitisation as a sustainable end-to-end process.

Overview – four IT systems…

1. Workflow management system – ‘Goobi’ =

PRODUCTION.

2. Digital object repository – ‘Preservica’ =

STORAGE.

3. Front end - ‘the player’ = ACCESS.

4. Temporary & permanent storage for content =

70tb

Digitisation: Metadata import

MARC records are imported from Sierra into

Goobi as MARC XML.

Digitisation: Image upload

Digitised images (Internally or externally

digitised) are imported into Goobi &

normalised to JPEG2000.

Digitisation: Upload, ftp, harvesting

ftp’d content can be automatically imported

into Goobi & processed or IA content can be

automatically harvested.

Digitisation: METS/ALTO for access

Content is OCR’d & METS /ALTO files are

created in Goobi. Manual/automatic.

Digitisation: Repository ingest

Goobi initiates automated ingest of images &

metadata in Preservica.

Digitisation: Access

Player pulls images from

Preservica using metadata in the

METS/JSON file.

Or from a different perspective…

Goobi (METS/OCR)

Preservica

In-house

Institutions

Contractors

Harvesting

TIFF or JP2

TIFF or JP2

HD & ftp

TIFF or JP2

Normalises TIFF

to JP2

Manual

Automatic

Jpylyzer validates

JP2

Auto harvesting of

JP2 & DMD

Grey literature

PDF

Pro

ject M

an

ag

ers

/ In

ge

st O

ffic

er

Pro

ject M

an

ag

ers

Ingest Officer / Digital Curator

Snagging

Snagging

Lesson 1 - Digitisation as a social activity

1. Digitisation is not a technical problem; it’s a social

activity between creator & user.

2. Internally: Digitisation engages with all parts of the

organisation, & draws of many different skills.

3. Externally: Engaging with (Between…?) creators &

users, moving data into public realms, providing

access.

http://www.emmanueladegbola.com/networking-leads/

Projects & workflows

1. Standardised processes to deal with differences in

content & themes.

2. Use ‘projects’ & workflows to define activities &

automated steps to handle material from

transfer/acquisition to dissemination.

3. Projects & workflows allow us to manage our

processes & to report activity.

http://www.amross.sd/

Standardised formats

1. Digitisation process built around a small number of

formats.

2. Only accept – or create - TIFF or JPEG2000 image

format for digitisation. MPEG2 for video.

3. Share our JPEG2000 profile with creators & validate

images at point of processing.

4. Standardised metadata format(s) for discovery –

MARC - & retrieval – ALTO/JSON.

http://blog.absolutvision.com/en/jpeg2000-format/

Lesson 2 – It’s a strategic issue

1. Given the scale & complexity clear strategic direction

is essential.

2. Digitisation has to support an institutions users & their

information needs.

3. Digitisation has to be a strategic decision supporting

an institutions purpose.

4. Digitisation doesn’t change the mission of an

organisation.

Industrialisation of processes

1. Digitisation built around a small number of formats.

Workflows built around a small number of pre-defined

steps.

2. Common workflow activities mean less system

development, we can build our own processes.

3. Easier for humans to learn, less training, more

certainty/reliability.

4. Industrialisation supports processes that are

sustainable.

http://www.howtobeadad.com/2013/14723/unicorn-poop-how-i-fell-in-

love-with-the-daughter-i-never-had

Lesson 3 – sustainability or bust

1. Digitisation has to be a sustainable process.

2. Processes have to be scalable to ambition.

3. Design, re-design & review processes constantly &

integrate with existing services.

4. Digitisation as evolution, learn from what has been

done, apply & move forward.

http://planetivy.com/gaming/25273/natural-selection-2-gaming-evolution-in-action/

Automation is key

1. Automation is essential to scalability & efficiency.

2. Within digitisation some activities very susceptible to

automation. Automate them.

3. Automation standardises processes. Good for life

cycle management of data.

4. Automated processes maximise investment in

digitisation & support scalability.

http://www.technibble.com/automating-computer-business-for-

profit/

Automated harvesting of IA content

Content processed automatically, including

creation of METS & ALTO.

Goobi has a ‘repository’ of IA identifiers for

searching/harvesting.

Goobi harvests data from Internet Archive

website.

Content available in the player. Content stored in Preservica. DDS creates JSON for the player & pre-

caches some content.

Lesson 4: Nothing without imagination

1. The power of digitisation can only be revealed if we

can imagine the uses the data can be put to.

2. Digitisation is not an exercise in technology for its own

sake.

3. There is nothing that cannot be achieved, but it takes

more than kit, tools, computers, software.

4. Digitisation is about engaging with creators &

consumers, with the data & with the future.

Digitisation is not a separate activity

• Starts with alignment with the institutional mission.

• Builds on strategic vision.

• Digitisation as a strategic activity, planned &

supported.

• Integrate all institutional systems, bibliographic, IT

& human.

http://ocdindia.com/

Lesson 5 – The complete package

1. Digitisation is much more than sticking stuff under a

camera or on a scanner.

2. Digitisation has to be developed as a whole &

complete end-to-end process.

http://veritusgroup.com/how-to-create-a-dynamic-strategy-for-

every-single-donor-a-step-by-step-process/

So, lessons learned

• Digitisation is a social activity.

• Digitisation as a planned strategic activity.

• Digitisation has to be a sustainable & scalable

activity.

• Automation is key.

• Nothing without imagination.

• Digitisation has to be a complete package.

In the end we built something beautiful

Questions now, questions

later…?

Dave Thompson

Digital Curator

Wellcome Library

d.thompson@wellcome.ac.uk @D_N_T