Building Digital Collections: Managing and Sharing

Post on 29-Nov-2014

694 views 0 download

description

Workshop presented at the Wisconsin Conference for Local History and Historic Preservation, Wisconsin Rapids, October 11, 2013. Presenters: Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society and Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS.

Transcript of Building Digital Collections: Managing and Sharing

Local History and Historic Preservation Conference

Building Digital CollectionsPart 2: Managing and sharing

Supported by WHRAB

TODAY’S AGENDA

• Introductions• Tell us about yourself

• Creating an inventory• Starting your inventory• Selecting content to preserve

• Managing your collections• Organizing collections• Management tools

• Storage options• Access considerations• Why provide access?• Software options

• Promoting your collections• Wrap-up and final thoughts Waterford Public Library/University of

Wisconsin Digital Collections

introductions

• We are…• Sarah Grimm, Electronic

Records Archivist, Wisconsin Historical Society

• Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS

• You are…• What organization do you

represent? • What digital projects are you

currently working on or thinking about?

Eager Free Public Library/University of Wisconsin Digital Collections

LOC and DPOE

The Library of Congress started the Digital Preservation Outreach and Education (DPOE) program in order to foster national outreach and education to encourage individuals and organizations to actively preserve their digital content.

http://www.digitalpreservation.gov/education/

Digital Preservation

Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007

What is Digital content?

• Digital content is any content that is published or distributed in a digital form, including text, data, sound recordings, photographs and images, motion pictures, and software.• Digital materials created from analogue sources• Born-digital content

• Digital materials you currently have or create – or expect to have – that you want to preserve.

Well-managed Collections

• Sample characteristics of well-managed: • Basic information about each collection• Minimal metadata for objects (you define) • Common file formats • Controlled and known storage of content • Multiple copies in at least 2 locations

INVENTORY

Philharmonic Chorus MembersImage ID: WHi-92113

Why Do We Identify Content?

• Not all digital content can or should be preserved

• Good digital preservation requires an explicit commitment of resources, which - for most organizations - means planning ahead

• An explicit inventory is the best way to identify content

Food for the Boys in FranceImage ID: WHi-35438

First Steps

• Identifying content is a first step to planning for current and future preservation needs

• Ask: what content do I have, will I have,might I have, must I have?

An inventory is the best way to identify what content you have now – and raise awareness in your institution.

Goals

• Identify potential digital content you may need to preserve• Treat the inventory as a management tool that grows as your

preservation program grows• Use it as a planning tool – e.g., to prepare staff, training,

annual growth• Use as a basis for acquiring content, defining submission

agreements, plans

Does your institution have an inventory of your digital content?

Inventory Considerations

• Inventory content is more important than the technology• Inventory results should be: • Documented: an inventory should actually exist• Usable: use a simple format to sort, list, etc.• Available: accessible to others• Scalable: should be able to add content/fields over time• Current: update periodically and date it

Inventory Tips

• Don’t let implementing the software become the focus• Use software you know and have available• Test it out with a number of people and collections• Stick with a single format; don't change once you've

decided on it• Be consistent, comprehensive, and concise

How Much Detail to Include

• Inventories can be general to detailed • Determine appropriate level of detail for you• Factors in determining level of detail:• Extent of content to be inventoried• Nature & location of content • Resources available to complete inventory• Timeframe & deadlines for completion

What Do You Have?

• Identify collections of digital materials. (Don’t work at the item level…..)• Provide a brief title and description • Estimated growth over time ***

Who is involved?

• Who is currently managing the collection/digital content• Who knows the most about it?• Creator (Internal or External) – who created the digital

content

Digital Management

Collections Management

Creator

THESE MAY BE DIFFERENT PEOPLE

What does it consist of?

• Medium (6cds, 1 hard drive, 115 floppy disks)• Extent = Format + Amount (600 .pdfs, 30 .doc)• File Size – (MB, GB, TB)

http://www.csgnetwork.com/memconv.html

Date Considerations

Inventories should note:•Date of inventory and updates to it•Dates associated with the content (18601865)•Date of files – created or modified (2009)•Date received – if relevant / possible (2011)

ShawanoProbate Cases1860-1865

Digitized by USG In 2009

Received by WHS In 2011

Content Location

Locations of content are important :• List primary locations (Network drive location, Hard drive on Bo

b’s shelf)• List locations of all backups/copies (CDs in the storage room,

weekly backup tapes)

Remember to change locations as content moves

Selection Process

Why select content to preserve?

Log jam on the St. Croix River, 1886Wisconsin Historical Society WHi-2364

Why select content to preserve?

• Cost: storage may be cheap, management is not…especially over time

• Discovery and dissemination services: scale, scope, performance, sustainability

• Quality of content may be variable• Content meets organization’s mission

Selection Criteria

Ask yourself which materials are…•most significant to your organization?•most unique?•highest value?•most extensive?•most requested/used?•easiest?•oldest?•newest?•at risk?

Neville Public Museum of Brown County

Show Stoppers

Stop if or when the answer is NO•Content• Does the content have long term value?• Does it fit your scope and mission?

•Technical• Is it feasible for you to preserve the content?

•Access• Is it possible to make the content available? • Are you the only holder of this content?

Add to your inventory

Supplement your inventory with more detailed information about the material you plan to preserve over the long term.•Access• How will the public access the content?• Is access restricted? How? For how long?

•Rights • Who owns the rights to preserve and disseminate?

•Use• What’s the lifespan of the content? • Will its value/use change over time?

Add to your inventory

• Data criticality• Is it only in digital form?

Do we hold the only copy?

• Business/mission criticality• If we lose it, what’s the

damage to our reputation? How will it impact our function or services?

Charlie Chaplin and Jackie Coogan in The Kid.Image ID: WHi-68423

Selection Exercise

Postal workers sorting mail, 1955Wisconsin Historical Society WHi-36392

Next Steps

Memorial Union Steps

Analyze the Results

When the inventory is complete, ask yourselves what digital content•do we have that we didn’t know about?•should we be keeping that we aren’t now?•will we create or likely acquire in the future?•are we required to keep? •do we need to review?

"Deering Ideal" Stripper Harvester Catalog CoverImage ID: WHi-27577

ORGANIZE YOUR FILES

• Centralize your files• Minimize your layers• Leave breadcrumbs (AKA

“READ ME”)• Determine what you don’t

know

IH General Office Mail RoomImage ID: WHi-12016

WHAT NOT TO KEEP?

• Backups/copies/drafts• Supplementary files that

provide no additional long-term value• Corrupted files• Same item – different file

formats• Items that don’t fit your

organization’s purpose

Boy on Curb near Trash PileImage ID: WHi-57208

Goals/Outcomes

• Expanded inventory of content to preserve …and what you can delete (gray areas identified)

• Well-defined and documented selection criteria, policies and procedures • Better understanding of content for future planning and

growth

Greater knowledge = greater control!

Tools

Guitar Maker's ShopImage ID: WHi-27234

Remove Empty Directories

The application searches and deletes empty directories recursively below a given start folder and shows the result in a well arranged tree

http://sourceforge.net/projects/rem-empty-dir/files/latest/download?source=files

Remove Duplicate Files

• Auslogics Duplicate File Finder http://www.auslogics.com/en/software/duplicate-file-finder/

• Similar Images http://similarimages.en.softonic.com/

• VisiPics http://www.visipics.info/index.php?title=Main_Page

Auslogics Duplicate File Finder

Select Search Criteria

Select More Search Criteria

Select Delete Criteria

Image Viewer

IrfanView http://www.irfanview.com/

•Tool with many different capabilities for image manipulation/editing•For photos, we can easily view an entire folder’s worth of images at one time

Checksums

• Checksums (AKA “Hash Sums”) are created by programs running an algorithm against the contents of a file. (there are many free utilities that will perform this function for you)

• The resulting checksum is a short sequence of letters and/or numbers that uniquely identifies that file. (think “electronic fingerprint”)

Unix cksum utility

Why is this a good thing?

• Checksums help maintain the INTEGRITY of your collections because they will tell you when things change over time.

• If two files are exactly the same, the checksums of those files will also be exactly the same (generally speaking )

• If a file becomes corrupted, degraded or is changed in some way, the next time you run the utility on it, the checksum will change

MD5summer

• MD5summer http://www.md5summer.org/download.html

• This tool will give you a couple of options for the hashing algorithm MD5 SHA-1

• Other tools will give you other options……

How Does it work?

• Open MD5summer• Select your

root folder• Select

“Create Sums”

Create List of files to sum

• Select the files to beadded• Click “Add” or

“Add recursively” • Click “OK”

MD5 sums will start Generating

Save the File

Verify Hash Values

• Copy files to anotherdirectory(think “backup”)

• Open MD5Summer• Select the files in

the new location• Click “Verify Sums”

Open the Md5sum file

• Find your MD5 file• Click “Open”

MD5sums will be compared

YEAH!

IF THE FILES ARE DIFFERENT……

Uh-Oh!

Things to remember

Things that will NOT affect checksums•Moving items from one place to another •Changing the file name

Run on the master fileswhen a collection is completed

Set up a schedule to run“verify checks” periodically

St. Mary of the Lake Parish School First DayImage ID: WHi-98433

STORAGE

Key Decision Points

• How are you going to organize it? • What are you going to store it on?• Where are you going to store it?• How many copies do you

need?

Post OfficeImage ID: WHi-9135

Factors to consider

• Immediate Costs • Quantity (size and number of files)

• Number of copies

• Media (life span, availability, $$)

• Other resources• Expertise (skills required to manage)

• Services (local vs. hosted)

• Partners (achieving geographic distribution)

• Institutional constraints

How Many and Where?

• Multiple• Minimum: two (2) copies in two locations• Optimum: six (6) copies

• Geographically distributed• Don’t keep your copies onsite if possible

Local STORAGE OPTIONS

• Local network • RAID device• External hard drive• Archival quality (gold) CDs

or DVDsTake into account potential future storage needs.

Villa Terrace Decorative Arts Museum

Cloud storage options

Commercial options:•Google Drive• Up to 5GB free (approx. 140 high-resolution TIFF files)• 25GB = $2.50/month

• Amazon Simple Storage Service (S3)• $.095 per GB/month

Institutional options:•DuraCloud

*Public Records Board Guidance on the Use of Contractors for Records Management Services

*Use of Contractors for Records Management Services

Access Considerations

Historical Society library stacks, 1896Wisconsin Historical Society WHi-23281

why are you providing access to content?

• User demand• Institutional visibility• Legal mandates or grant

requirements• Generate revenue• Contribute to our collective

knowledge

South Wood County Historical Museum

What makes a good online collection?

• Publicly accessible.• Searchable - Includes keywords and other descriptive

information (metadata) so users can find what they’re looking for.• Organized and consistent.• Based on existing international/national/statewide

standards and best practices.• Uses software that is sustainable (will be around for a

long time) and interoperable (can be migrated or shared).• Respects intellectual property rights.

What are we aiming for?

Content should be delivered to users over time:•Easily – using current and known technologies•Coherently – well-documented and presented•Completely – intact and well-formed •Correctly – accurately representing content•Reliably – using well-managed technologies•Consistently – in accordance with policies•Fairly – with equity and precedent

Some software options

• CONTENTdm• ResCarta Web• PastPerfect Online• Omeka

Beloit College

contentdm

• Hosted by Milwaukee Public Library through Recollection Wisconsin• Produced and distributed by OCLC• Costs:• $200 one-time setup fee• Annual hosting fees starting at $75

http://content.mpl.org/ashland

http://content.mpl.org/ashland

http://content.mpl.org/ashland

http://content.mpl.org/ashland

Rescarta web

• Free and open source• Host it yourself; or hosting available through Northern

Micrographics (fee-based)• ResCarta Foundation – based in La Crosse

http://www.ecpubliclibrary.info/research/general/history.html

http://www.ecpubliclibrary.info/research/general/history.html

http://www.ecpubliclibrary.info/research/general/history.html

Pastperfect online

• PastPerfect add-on• Requires PastPerfect MultiMedia Upgrade• Hosted by PastPerfect• Costs:• $285 set-up• $440 annually (price breaks for AASLH members)

http://oshkosh.pastperfect-online.com

http://oshkosh.pastperfect-online.com

http://oshkosh.pastperfect-online.com

omeka

• Free and open source• Host it yourself; or subscribe to hosted version, omeka.net• Developed by the Center for History and New Media, George

Mason University

http://uwoshkosh.omeka.net

http://uwoshkosh.omeka.net

http://uwoshkosh.omeka.net

Promotion

Wisconsin Tourism Sign, Rhinelander, 1930-1942Wisconsin Historical Society WHi-37927

Potential audiences

• Local residents• Students and teachers• Genealogists• Specialists (e.g. Civil War

re-enactors, railroad buffs)• Academic researchers• Curious Wisconsinites• Everyone!

College of Menominee Nation

Stakeholders and partners

• Board• Staff and/or volunteers• Local experts• Community members• Chamber of Commerce• Local government• Students• Other organizations in

your community/ county/region• Who else? McMillan Memorial Library, Wisconsin Rapids

Encouraging use of your collections

• Organizations are moving away from “if you build it, they will come” approach – Google is not enough• Participatory archives

concept—shared authority, community engagement• Bring your content to your

audience—find them where they already are• Let them look behind the

curtain and see projects in progress, warts and all Milwaukee Public Library

PROMOTION – BRAINSTORMING

• What are some ways you’ve had success promoting your digital collections?• What are cool ideas you’ve seen that you’d like to

try?

Marketing ideas

• Add introduction/background information on your own website• http://www.newberlinhistoricalsociety.org

• Highlight an item of the day/week/month• https://www.facebook.com/lacross

e.history

• Host an opening event• Whitefish Bay Public Library• College of Menominee Nation

• Host a slide show or exhibition• South Wood County Historical

Museum• Mineral Point Historical Society Rock County Historical Society

Marketing ideas

• Send someone with a laptop to popular local spots/events to demonstrate digital collections:• Ask, “Where do people go first to look for this kind of

information?” and then, market there! • Upload a few digitized images to Flickr with descriptions that

point back to your related digital and physical collections.• Contribute to relevant pages on Wikipedia and include references

pointing to specific digital materials.• Request that the Chamber of Commerce and other

relevant local organizations link to the new digital collections from their websites.• Send a press release to local media

EVALUATING IMPACTEVALUATING IMPACT

Understanding current users…Online survey instrumentWeb analyticsEmail subscriber listsVisitor forms

Understanding future users…Special interest groups (AASLH, SAA, etc.)ListservsWorkshops and conference sessions

WRAPPING UP – FINAL THOUGHTS

Commencement, 1978UW-Madison Archives

Next steps/To do list

• Create and maintain an inventory• Develop your selection criteria• Play with the tools• Develop a storage management policy• E.g., number of copies, locations

• Monitor copies of content for errors/changes• Evaluate technology to determine your preferred access

platform• Develop a marketing plan• Determine how you will evaluate the success of your

marketing plan

Thank you!

• Sarah Grimm, Wisconsin Historical Societysarah.grimm@wisconsinhistory.org608-261-1008

• Emily Pfotenhauer, WiLSemily@wils.org608-616-9756

• Slides and handouts available at http://recollectionwisconsin.org/localhistory2013

South Wood County Historical Museum