SSHELCO 2016 metadata workshop

Post on 15-Feb-2017

225 views 1 download

Transcript of SSHELCO 2016 metadata workshop

Plays Well with Others:

Getting Your Digital Collections Metadata Ready

for the World2016 SSHELCO Annual

March 17, 2016

Linda Ballinger
This could be fun to add to the contact slide with all our names and emails.
Doreva Belfiore
That's awesome!

Linda Ballinger, Penn StateDoreva Belfiore, Temple UniversityBill Fee, State Library of PennsylvaniaLeanne Finnigan, Temple UniversityKristen Yarmey, University of ScrantonElise Warshavsky, Temple University

The Pennsylvania Digital Collections Project (PDCP) Metadata team!

On the agendaPDCP/DPLA OverviewMeet the aggregatorWhy metadata mattersField by Field metadata madness!

Derived FieldsRequired FieldsHighly Recommended FieldsRecommended FieldsOptional Fields

Before we startQ&A throughoutFun breaks at panelists’ discretion!

Slides and guidelines will be available.

Most of all:Don’t panic. We’re all in this together.

PDCP Project Overview

Toward a PA DPLA Hub

August 2014: meeting at the Free Library of Philadelphia

Initiated by Joe Lucia and Stacey Aldrich, former PA State Librarian

Including representatives from a number of institutions across the state

Founders

Why get involved?DPLA as major discoverability conduit:

Worldwide exposure for PA content

DPLA as a means of working efficiently:Collaboration at the cross-institutional levelTaking advantage of economy of scaleDPLA portal / api vs. customized siloes

Digital Public Library of America

http://dp.la

DPLA Hub and Spoke ModelContent Hubs:

Single institutions, 200K+ objects, i.e. NARA, Hathi Trust, NYPL

Service Hubs:Content aggregation for many

institutionsState/regional level; ideally 1:1 ratioDigital Commonwealth (MA), Mountain

West Digital Library, Empire State Digital Network (NY)

Digitization and Repository Support Activities

Digitization:For organizations that have not started

digitizing materials, or have not done much

Potential for remote, local and mobile digitization options (a.k.a. “scannebagos”)

Provided by the State Library of Pennsylvania

Content Hosting:For organizations that already have digital

files but no current digital repository capabilities

Provided by POWER Library (HSLC)Free for Pennsylvania institutions

SUCCESS!PDCP Announced as DPLA

Pennsylvania Service hub, August 28, 2015!

Estimated Timeline:September, 2015 - OrientationOctober-November, 2015 - Signing Legal

Agreements, Metadata RevisionOctober-December, 2015 - Metadata

normalization, harvesting tests and QAEarly 2016 - Planned live ingest of records

into the DPLA!

PA-DPLA Aggregator

Proof-of-concept prototypeDevelopment: Pennsylvania State University / Temple University

partnershipDec. 2014 - Mar. 2015Hydra (Fedora) - Open Source Platform

Harvesting & exposing metadata via OAI-PMHhttps://github.com/tulibraries/dplah

Released production versionSummer 2015

PA-DPLA Aggregator

OAI-PMH Metadata - Human readable with faceting browsing and searching

http://libcollab.temple.edu/aggregator

PA-DPLA Aggregatorhttp://libcollab.temple.edu/aggregator

PA-DPLA Aggregator

Kitchen Memories, Scranton Public Library, http://content.lackawannadigitalarchives.org/cdm/ref/collection/cookbooks/id/1657

http://libcollab.temple.edu/aggregator

Testing Data http://libcollab.temple.edu/aggregator-testing/

Prototype Harvested Content“Lowest hanging fruit”:

○ OAI-PMH harvestable data■ 29 institutions, 147K+ harvested records

○Primarily targeting collections from PDPC Steering and Planning Committee institutions

○Keep numbers manageable for testing purposes

○Scalable to full production mode for the future

Why do we need good metadata?

http://iknowwhereyourcatlives.com/

http://dp.la/map?utf8=%E2%9C%93&q=textiles

http://dp.la/apps?page=2

Collection Policies

CC0 Metadata

Contributing institutions are required to share their metadata and thumbnails under a CC0 license (full access - no rights reserved).

The digital objects themselves retain any original specified rights.

Collecting ScopeThe following types of collections are NOT currently accepted by the DPLA:Scholarly materials: ETDs, Journal ArticlesFinding Aids: EADs, Collection GuidesAggregate Description: Objects described at the

folder, series, or collection level instead of the item level

Items that don’t resolve to a publicly-accessible URLIndividual page-level objects instead of compound

ones

Restricted ItemsIf your institution needs to restrict any digital objects at the item level, for copyright or other reason:Enter the string pdcp_noharvest in fields that map to either of these Dublin Core values:dcterms:accessRightsdc:rights

Restricted Area for Humans Only, Ronyasoft, http://www.ronyasoft.com/products/poster-forge/templates/funny-signs/restricted-area-sign-template/images/restricted-area-sign-template.jpg

Derived Fields

Derived FieldsDerived fields are those metadata fields that are created by the PDCP aggregator automatically from the OAI-PMH feed.

Happy Face, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p15037coll3/id/6541

Derived = “Dont worry, be happy!”

ThumbnailThumbnails are the small preview versions of your digital object that are shown both in your repository and in the DPLA.

They are important because they give viewers a confirmation that they have found (or not found) what they are looking for.

ThumbnailThumbnails can be derived by our aggregator from these common repository systems:

CONTENTdmBepressOmekaVUDL… and more to be added

Thumbnail

Portrait of Zapata, Kutztown University, http://digital.klnpa.org/cdm/ref/collection/asaro/id/21

ThumbnailFor other systems, we need a consistent path where the thumbnail is housed, i.e.:

http://www.server.org/repo/thumbs/$identifier/

CollectionThe collection name is set up by the team before harvesting. It generally matches the digital collection name found online.

Contributing InstitutionThe contributing institution name refers to YOUR ORGANIZATION and is set up by the team before harvesting.

Are YOU in This?, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p16002coll9/id/2952

Contributing InstitutionThe Contributing Institution name can also be pulled automatically from the following DC fields:

ContributorCreatorPublisherSource

Intermediate ProviderIf your data is hosted by an aggregator or common repository then we list that entity as an Intermediate Provider, i.e.:

Keystone Library Network (KLN)Lackawanna Valley Digital Archives (LVDA)

POWER Library (via HSLC)

Resource LocationThe Resource Location is a trackback to the original collection URL for a digital object.

Example: http://content.lackawannadigitalarchives.org/cdm/ref/collection/SPL/id/36

Resource Location

Resource Location

Early Library Staff, Scranton Public Library, http://content.lackawannadigitalarchives.org/cdm/ref/collection/SPL/id/36

Resource LocationRequired by DPLA to present your original data record.

Can be derived from the OAI-PMH data feed for typical systems:

CONTENTdmBepressOmekaVUDL Can be custom mapped if needed for other systems, e.g.:http://www.server.org/repo/$identifier/

Required fields

TitleOther than the thumbnail, the title is often the first piece of information a user sees on a results listShould be the name by which an object is known, not a file name

LanguageRequired if appropriate

3 letter ISO 639-2 language codes are preferredAggregator normalizes these codes to full

language names for displayExamples:

eng ---> English lat---> Latin

ita ---> Italian san---> Sanskrit

spa ---> Spanish vie---> Vietnamese

Language

RightsContains information about rights associated with the resource

“In the public domain and may be used without copyright restriction.”

“Content is under copyright of the University of Scranton.”http://creativecommons.org/licenses/by-sa/3.0/

REMINDER: DPLA will only accept objects that are available and viewable to the general public

pdcp_noharvest

Rights‘Getting it Right on Rights’

Working group (DPLA, Europeana, etc.)

Released white paper May 2015 and opened it up for comments

Standardized rights statementsComing soon!

RightsThe aggregator can add rights statements at the collection level

TypeThe nature or genre of the resource

DCMI Type Vocabulary recommendedAssign ‘Text’ type to images of texts

Think of the user

TypeTypes used by DPLA:

text, image, sound, moving image, physical object

The aggregator can map your local types to these at the collection/seed level

Type

Highly Recommended Fields

Date CreatedDate of creation of the original resource.

(Not date digitized.)(Not date range or time period.)

Date CreatedMap todcterms:created (preferred)ordc:date

Date CreatedUseISO 8601 (W3CDTF) format

which looks likeYYYY-MM-DDYYYY-MMYYYY

http://xkcd.com/1179/

Date CreatedWatch out for:Extra spaces or symbols

_1943Missing digits

1943-1-5Placeholders and qualifying terms

Unknownn.d.ca. 19501950s

PlaceSpatial characteristics of the resource. Geographic location relevant to the original item.

PlaceMap todcterms:spatial (preferred)ordc:coverage

PlaceWhere is this thing?

PlaceMultiple choice:PhiladelphiaPhiladelphia; PennsylvaniaPhiladelphia (Pa.)Philadelphia, Pennsylvania, United StatesSeventh and Sansom Streets (Philadelphia, Pa.)

Franklin Institute (Philadelphia, Pa.)

Facade of the original Franklin Institute building

prior to moving to the Parkway in 1934.

PlaceUseLCNAF (preferred)orTGN, FAST, ...

Addresses, lat/long, or other location markers can also be mapped to dcterms:spatial.

PlaceExamples: Pittsburgh (Pa.)Allegheny County (Pa.)Harrison (Allegheny County, Pa. : Township)

PlaceWatch out for:Place ≠ Time Period

SubjectThe topic of the resource.

SubjectMap todc:subject

SubjectMany variations on a theme:NewspapersStudent newspaperNew Holland (Pa.) NewspapersScranton (Pa.) -- NewspapersWest Chester University Student NewspapersCollege student newspapers and periodicals -- Pennsylvania --

ScrantonUniversity of Scranton -- Students -- NewspapersLock Haven University of Pennsylvania Student Newspaper Archive

SubjectUseLC Authorities (preferred)orDDC, MeSH, UDC, AAT, TGN…

For LCSH, use space and double hyphen:term -- term

SubjectExamples: Harlem Renaissance -- MapsCoal miners -- Pennsylvania -- Social conditions

Cassatt, Mary, 1844-1926

SubjectWatch out for:Quotation marks

"D.O.R.A at Westminster"Separate terms with semi-colons

, Holt, Colbin32 Carat Club,anniversary ,charitable organizations,social

servicesOdd symbols or characters

2nd &amp

SubjectWatch out for:Standardization

SubjectWatch out for:Local terms with limited global meaningKSOMWML20

Recommended Fields

Creator - Old way(DON’T DO THIS)

Creator – Proper way

Description• The 500 field of the Dublin Core world

Format

From New York Heritagehttp://cdm16694.contentdm.oclc.org/cdm/ref/collection/p15109coll6/id/2083

From African Americans Seen Through the Eyes of the Newsreel Cameramanhttp://collections.contentdm.oclc.org/cdm/singleitem/collection/p9002coll1/id/277/rec/1

Identifiers• OCLC # 12177842• Call # PT 1.1• FRBR linking code• OCLC number (of object)• CONTENTdm number• CONTENTdm file name• Identifier• Generated identifier• HPHWPZ201404000165 (unique, assigned)

Publisher

Publisher State Library of Pennsylvania NO!

Publisher Mount Pleasant, Pa.: Mount Pleasant Press, 1906

Optional Fields

Alternate Title

Contributor

NO!

Wrap-Up

What’s next?See PDCP Metadata Guidelines (still in draft)

Let us know if you have feedback!We plan to finalize v.1 in AprilLiving document

Would your institution like to contribute to the DPLA?Email: dplainpa@gmail.com Institutions will be forwarded to different organizations based

upon needs and readiness for data harvest (harvesting and metadata support, repository support, digitization support)

Checklist to Contribute DataPermission letter agreeing to share

metadata and thumbnails to DPLA under a CC0 license

Data available on a publicly accessible website

Ability to share metadata via OAI-PMH or CSV file

Staff available to work with PDCP about metadata issues

Stay in touch:Come to PA Backwards session tomorrow morning (9am)!Email the PDCP team: dplainpa@gmail.com Twitter: Follow us at @pdcp_pa PADIGITAL Listserv - general information about statewide digital

initiativesPADIGITAL@listserv.albright.org Send a message to listserv@albright org with the text “subscribe

padigital” in the body

Resources and SupportDocumentation : https://

drive.google.com/folderview?id=0B-icpMLW3cRXQmVQMnJoMkZJUDg&usp=sharing

Onboarding : Leanne Finnigan (leanne.finnigan@temple.edu) or

Elise Warshavsky (elise@temple.edu)

Online office hours

Webinar workshops

METADATA

Original: The Doctor Is In, Peanuts Worldwide, LLC.

0 ¢

Thank you!

http://xkcd.com/1543/

PDCP Metadata TeamLinda BallingerDoreva Belfiore

Bill FeeLeanne FinniganKristen Yarmey