Roundtable Luncheon: OCLC Research Library Partnership

45th ARLIS/NA Conference • 8 February 2017

Roundtable LuncheonOCLC Research Library Partnership

Dennis MassieProgram Officer, OCLC Research

MEETING PHOTOS

Sandra Brooke of Princeton University

Deborah Kempe of the Frick Art Refence Library

Amy Lucker of NYU’s Institute of Fine Arts

ARLIS OCLC RLP Roundtable Attendees from the front of the room

ARLIS OCLC RLP Roundtable Attendees from the back of the room

WELCOME

– Mangia, mangia!– Welcome – Art Discovery Group Catalogue update– OCLC RLP Update– Web Archiving Metadata Working Group – Round Robin Round-up – Nawlins style

Today’s agenda

ART DISCOVERY GROUP CATALOGUE -- UPDATE

LibrarianMarquand Library of Art & Archeology

Sandra L. Brooke

http://artdiscovery.net Art Discovery Group Catalogue Update ARLIS/NA 2017

http://artdiscovery.net/

New and pending members– Pinacoteca di Sao Paolo, Brazil– Polimoda International Institute of Fashion Design

and Marketing, Florence– Danish National Art Library – RKD-Netherlands Institute for Art History, The

Hague– Royal Museum of Fine Arts, Antwerp– Hendrik Conscience Heritage Library, Antwerp– Bibliothèque du Musée des Arts Décoratifs, Paris– Polish Academy of Sciences Art Institute in

Warsaw, Poland– Warburg Institute, London

Features (SB)

Post-ADGC conference survey results: feedback

• …an extraordinarily important project in which all relevant art libraries must be represented

• …It has succeeded in being a focal point for bringing together art librarians together to discuss our profession and the future of art bibliography in general.

• …a great and outstanding project but still not completely mature in its development.

• It feels like something that only the 'big boys', ie. large academic, research and/or national museums can contribute to because of the lack of resources in many smaller, European libraries.

Post-ADGC conference survey results: needs improvement

• …too much noise in the result lists.

• …more important databases should be added (BHA, RILA, ASCO, etc.)

• …the website should be updated more frequently.

• …it should be more advertised

• …have the interface and webpages available in other languages

• ...need more non-european and non-american libraries, whose holdings would clearly enrich the catalogue. This might help to raise the attraction of the catalogue, by not only acquiring new data but also new users.

Next steps and future plans

• Develop and conduct usability study and user survey to share with community

• Add non-European and non-American art libraries

• Add more unique content, such as BHA, ASCO, Arkyves

• Work with OCLC EMEA on improvements to interface

Questions?

ADGC committee members in New Orleans this week:

Sandra Brooke ([email protected])Deborah Kempe ([email protected])

Kathleen Salomon ([email protected])

OCLC RESEARCH LIBRARY PARTNERSHIP UPDATE

OCLC Membership and Research Division

• Engage members as advocates who participate effectively in the cooperative

• Activate communities of practice that help libraries plan with confidence, position with effect and make an impact

• Generate knowledge, evidence and models that will improve libraries

News

• Launched 2015• 9,000+ members• 1,100 attendees

• 13 events • US, Canada, Puerto Rico and the UK

2016

VIAFVirtual International

Authority File

Data ScienceThe Web is the native environment of information seekers. OCLC Research recognizes that to be integrated into the Web, traditional library data must be transformed in various ways. We are analyzing the data in WorldCat and other sources to derive new meaning, insights, and services for use by libraries and others on the Web.

oc.lc/oclc-wikilib

http://oc.lc/oclc-wikilib

http://oc.lc/oclc-wikilib

• Build awareness – wherefore Wikipedia?• Online training program for 500 US public library staff• Guidance from a Wikipedian-in-Residence• Learning applied in libraries and community programs• Case studies and resources published

December 2016 – May 2018

Wikipedia project

www.oclc.org/research

http://www.oclc.org/research

• Are you subscribed to OCLCRLP-ANNOUNCE-L?

• Have you picked an OCLC Research brain lately?–Email [email protected]

Two takeaways – Partner staff

mailto:[email protected]

WEB ARCHIVING METADATA WORKING GROUP

Chief, Collection Management & AccessFrick Art Reference Collection

Deborah KempePlace Speaker

Photo Here

Web Archiving Metadata Guidelines Working Group

Report at OCLCRLP Update, ARLIS/NA Annual Conference, New Orleans, February 8, 2017

Deborah Kempe, WAM WG member

THE IMPERATIVE

“It is far easier to find an example of a film from 1924 than a website from 1994.”

--M.S. Ankerson. “Writing web histories with an eye on the analog past”

2012

“In the past few years, we have noticed a significant uptick in the use of web archives in mainstream media.”

--Web Sciences and Digital Libraries Research Group11 September 2016

“Web archiving operates at the frontier of capturing and preserving our cultural and historical record.”

--The British Library web archive blog14 September 2016

Archived websites often are not easily discoverable via search engines or library and archives catalogs and finding aid systems, which inhibits use.

Absence of community best practices for descriptive metadata was the most widely-shared web archiving challenging identified in two surveys:

– OCLC Research Library Partnership (2015)– Rutgers/Weber study of users of archived website (2016)

THE OBJECTIVE

Develop best practices for web archiving metadata that are community-neutral and output-neutral.

Publish a set of data elements with the scope of each defined (i.e., a data dictionary).

OCLC RESEARCH LIBRARY PARTNERSHIP WEB ARCHIVING METADATA WORKING GROUP

Working Group charge

The OCLC Research Library Partnership Web Archiving Metadata Working Group will evaluate existing and emerging approaches to descriptive metadata for archived websites and will recommend best practices to meet user needs and to ensure discoverability and consistency.

Working Group Members:25 individuals from diverse community of libraries and archives

Planned outputsA report on user needs and behaviors will inform community-wide understanding of documented needs and behaviors as evidence to underlie the metadata best practices.

A report evaluating selected open-source web archiving tools will describe metadata-related functionalities.

Best practices guidelines for descriptive metadata will address aspects of bibliographic and archival approaches.

LITERATURE REVIEWS

MethodologyTwo literature reviews: user needs and metadataSelectively gathered published literature, conference reports and notes, social media, etc.Abstracted each item

Synthesized our findings

Why study users’ needs?A necessary prelude to development of metadata best practices

We want to recommend an approach based on real user needs & behaviors

Users don’t necessarily utilize the usual discovery tools to locate archived websites

Lack of awareness that libraries harvest and archive web content

We focused on three questionsWho uses web archives?How and why do they use them?What can we do to support their needs?

Types of users identified in readings

Academic researchers (social sciences, history)

Legal researchers

Digital humanists and data analysts

Web and computer scientists

Note: We did not find sources that address other types of users.

Types of use

Reading specific web pages/sites

Data and text mining

Technology development

What we learned about user needs

Formatting and organization of data is an issue

Lack of discovery tools make access challenging

“Provenance” information is a critical missing piece

Libraries and archives need to actively engage in

outreach to users

What we learned about metadata issuesConfirmed that no appropriate set of best practices

existsData formats used include MARC, Dublin Core, MODS,

EAD Finding aidsMetadata elements used vary widely, and the same

element may have different meaningsLevel of description varies as well: Single site, collection

of sites, seed URLs …Creating metadata at scale is … impossible?

CAN WE AUTOMATE METADATA CREATION?

The community longs for the magic bullet: automatic generation of metadata from the tools used to crawl websites.

So we evaluated open-source toolsArchive-ItHeritrixHTTrackMementoNetArchiveSuiteNutchWax

Site StorySocial Feed ManagerWeb Archive DiscoveryWaybackwebrecorder.ioWebCurator

Evaluation criteriaBasic purpose of the toolWhich objects/files can it take in and generate?Which metadata profiles does it record in?Which descriptive elements are automatically generated?Which descriptive elements can be exported?What relation does it have to other tools?

Evaluation grid for webrecorder

Etc.

What we learnedWebsites generally have poor metadata (e.g., title is “home

page”)Tools have few or no metadata-related features Extractable data usually limited to title, crawl date ….Most tools capture only the site (usually in WARC format), so

metadata must be created manually/externallyA few tools enable manual input of metadata within WARC fileErgo: no magic bullet (big sigh)

DEVELOPING METADATA BEST PRACTICES

Why study existing practices?A necessary prelude to development of metadata best practices

We want to evaluate them in light of user needs

We want to borrow their best features

We want to understand how existing rules and guidelines are reflected in current practice

MethodologyAnalyze descriptive metadata standards & local guidelines

Evaluate existing records “in the wild”

Differentiate bibliographic & archival approaches

Identify issues specific to web archiving

Incorporate findings from literature reviews

Descriptive standards under review

Describing Archives: A Content Standard (SAA)

Integrating Resources: A Cataloging Manual (Program for Cooperative Cataloging, based on RDA)

Dublin Core

Local guidelines under reviewArchive-ItColumbiaGovernment Printing OfficeHarvardLibrary of CongressNew York Art Resources Consortium (NYARC)University of MichiganUniversity of Texas

Most common elements in local guidelines

Collection title*Creator/contributorDate of captureDate of content*DescriptionGenre

LanguagePublisherRights/Access conditionsSubject*TitleURL

* Only three elements that appear in all standards and local guidelines.Some RDA elements that are not often used: Extent, Place of Publication, Publisher, Source of Description, Statement of Responsibility.

Analysis of data elementsDefinitions?

Most frequently used?

Core?

Same concept, different elements?

Same element, different concepts?

Analysis of existing recordsData sources

– WorldCat– ArchiveGrid– Archive-It

Types of record– Bibliographic: MARC, Dublin Core, MODS …– Archival: MARC, finding aids

Bibliographic vs. archival descriptionBibliographic: description of a single site

– Based on RDA or Dublin Core– Includes standard bibliographic data elements– Usually does not provide context

Archival: description of a collection of sites– Based on DACS or Dublin Core– Includes narrative description, e.g. of creator, content ...– Includes context of creation and use

DILEMMAS SPECIFIC TO WEB CONTENT

Is the website creator/owner the … publisher? author? subject? all three?

Should the title be … transcribed verbatim from the head of the site? Edited to clarify the nature/scope of the site? Should it begin with "Website of the …"

Which dates are both important and feasible? Beginning/end of the site's existence? Date(s) of capture by the repository? Date of the content? Copyright?

How should extent be expressed? “1 archived

website”? "1 online resource"? "6.25 Gb"? "circa 300 websites"?

Is the institution that harvests and hosts the site the … repository? creator? publisher? selector?Does provenance refer to …the site owner? the repository that harvests and hosts the site? ways in which the site evolved?Does appraisal mean …the reason the site warrants being archived? a collection of sites named by the repository? the parts of the site that were harvested?Is it important to be clear that the resource is a website? If so, how to do so? In the extent? title? description?Which URLs should be included? Seed? access? landing page?

A sample finding aid notePhysical Characteristics and Technical Requirements note:

"The web collection documents the publicly available content of the web page, it does not archive material that is password protected or blocked due to robot txt exclusions.

Although [institution] attempts to archive the entirety of a website, certain file types will not be captured dependent on how they are embedded in the site.

This can include videos (Youtube, Vimeo, or otherwise), pdfs (including Scribd or another pdf reader), rss feeds/plug-ins (including twitter), commenting platforms (disqus, facebook), presi, images, or anything that is not native to the site."

Source: New York University

MARC21 record typesWhen coded in the MARC 21 format, should a website be considered a …

– Continuing resource? – Integrating resource? – Electronic resource? – Textual publication? – Mixed material? – Manuscript?

FORTHCOMING REPORTS

Estimated publication dates

● Tools evaluation (February/March 2017)○ With evaluation grids

● User needs (February/March 2017)○ With annotated bibliography

● Best practices guidelines (April/May 2017)○ With annotated bibliography and local guidelines evaluation grid

DISCUSS!! http://www.oclc.org/research/themes/research-collections/wam.html

http://www.oclc.org/research/themes/research-collections/wam.html

ROUND ROBIN ROUND-UP 2017

Amy LuckerPlace Speaker

Photo Here HeadInstitute of Fine Arts Library

2017 OCLC Round Robin Round-Up

Who responded?

8 total; 7 museums and the Getty

Major changes this past year?

Staff changes in 8 of 8

More digital stuff in 6 of 8

Facilities:3 minor1 major1 being planned (it’s going to be big!)

What about that funding?

2 3 Grants/gifts: 5

Stay tuned for 2018

Thanks for attending!

Dennis MassieProgram Officer

[email protected]

©2017. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: “This work uses content from ‘OCLC Research Library Partnership Roundtable Luncheon’ © OCLC, by Dennis Massie, Sandra Brooke, Deborah Kempe and Amy Lucker, used under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/.”

http://creativecommons.org/licenses/by/4.0/

Roundtable Luncheon: OCLC Research Library Partnership

Education

Transcript of Roundtable Luncheon: OCLC Research Library Partnership