A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata...

Post on 19-Dec-2015

217 views 0 download


Transcript of A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata...

A demonstration of transparent and scalable OpenURL quality metrics for use

in promoting metadata consistency across content providers

Adam ChandlerCornell University Library

Cornell University Library, Metadata Working Group Forum16 October 2009

OpenURL model

OpenURL model cont. incoming OpenURL




in our knowledge base?

title: Library hi tech issn: 0737-8831 start date: 19970101 end date:

link-to syntax for Emerald


OpenURL is pervasive

Cornell link resolver alone:July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests.

402,000 * 123(ARL libraries) = 49 million

Cornell’s top 10 OpenURL sources

1. Web of Knowledge2. WorldCat Local3. Google Scholar4. Webfeat (our “Find Articles” service)5. EBSCOHost6. OCLC FirstSearch7. SilverPlatter8. Weill Cornell Medical Center9. SciFinder Scholar 10. PubMed

… but quality of experience is difficult to benchmark

• Wrong start end date in the local library's holdings knowledge base (see NISO KBART)

• Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)

• Wrong link-to syntax in link resolver• Fragile handling of incoming links by content


… but quality of experience is difficult to benchmark

• Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles)

• Subscription errors (especially with the start of a new calendar year)

• Syntactically incorrect or missing metadata from the OpenURL origin

Literature review

I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.

Wakimoto, Walker, and Dabbour (2006)

Main finding: Users just expect full-text. When they do not get it they are disappointed.

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Wakimoto, Walker, and Dabbour (2006)

"Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134)

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Blake and Knudson (2002)

• “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

Blake and Knudson (2002)

• “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.


Blake and Knudson (2002)

• “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

Mellon funded planning grant for L'Année philologique

1. Canonical Citation Linking: http://cwkb.orgIn collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library

2. OpenURL QualityIs it possible to build a tool for evaluating the quality of OpenURLs from a content provider?

Constant: Core elements used by content providers in their link-to targets

title - 64%spage - 64%volume - 61%issue - 60%date - 48%aulast - 47%issn - 35%atitle - 35%DOI - 14%ISBN – 5%

Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.

Variable: Frequency of element string patterns for all sources


First author's family name. This may be more than one word. In many citations, the author's family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as "aulast=smith"

aulast if ($e =~ /aulast/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^[A-Za-z]+$/) { $patterns{$neworigin}

{$newsid}{"aulast_simple"}++; } elsif ($elementhash{$e} =~ /^[A-Za-z]+, .+$/)

{ $patterns{$neworigin}{$newsid}{"aulast_comma"}++; } elsif ($elementhash{$e} =~ /^[A-Z][a-z]+( [A-Z]\.)+$/)

{ $patterns{$neworigin}{$newsid}{"aulast_simpleplusinitial"}++;} else { $patterns{$neworigin}{$newsid}{"aulast_other"}++; } }

aulast_other examples

Ryan S MillerLouise D BryantDAVID J MCKENZIE%C4%90okovi%C4%87Indu B Ahluwalia Carreras-Sangr%c3%a0Bautista-Casta%C3%B1oO%27SheaMelissa Ventura MarraGuan XueYing%3B Yu Nan%3B Shangguan XiaoXia


First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.


if ($e =~ /spage/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d+$/) { $patterns{$neworigin}

{$newsid}{"spage_number"}++; } elsif ($elementhash{$e} =~ /^\d+-\d+$/) { $patterns{$neworigin}

{$newsid}{"spage_number_number"}++; } elsif ($elementhash{$e} =~ /[A-Za-z].+\d/)

{ $patterns{$neworigin}{$newsid}{"spage_string_w_number"}++; }

else { $patterns{$neworigin}{$newsid}{"spage_other"}++; } }

spage_other examples

• 1033 (6 pages)• 85(19)• 575 (11 pages)• 283...290• PHYS• GLRM• 58,+VI


The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.

date if ($e =~ /date/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dddd"}++; } elsif ($elementhash{$e} =~ /^\d{8}$/) { $patterns{$neworigin}{$newsid}{"date_dddddddd"}++; } else {$patterns{$neworigin}{$newsid}{"date_dateother"}++; } }

date_other examples

• 1956 July• %7E1994• June 5%2C 2002• JUN 30 05• 2006%282007%29• 1922,+April+25th

• %5B%5B1943-06-19%5D%5D


International Standard Serials Number (ISSN). The issn may contain a hyphen, e.g. "1041-5653"


if ($e =~ /issn/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}-\d{3}./)

{ $patterns{$neworigin}{$newsid}{"issn_number_number"}++; }

elsif ($elementhash{$e} =~ /^\d{7}./) { $patterns{$neworigin}{$newsid}{"issn_number"}++; }

else { $patterns{$neworigin}{$newsid}{"issn_other"}++; } }

issn_other examples

• 0065-2598%28print%29• 0018-5345+%28ISSN+print%29• ISSN ISBN 0-9525091-5-6.• 0021-8375%28print%29%7C1439-

0361%28electronic%29• 1471-2164+%28ISSN+online%29• 0191-8699%3B0191-8699• 0741-8329 (Print)%3B NLM Unique Journal

Identifier%3A 8502311

How often out of 402,000 Cornell OpenURLs?

metric frequency in July-Sep 2008 sample

au_last_other 5476spage_other 772date_other 591issn_other 200

flat file output

logsource year quarter origin sid metric countcornell 2009 Q1 csa csa:commabs-set-c atitle 154cornell 2009 Q1 csa csa:commabs-set-c atitle_colon 101cornell 2009 Q1 csa csa:commabs-set-c atitle_other 53cornell 2009 Q1 csa csa:commabs-set-c aulast 159cornell 2009 Q1 csa csa:commabs-set-c aulast_other 4cornell 2009 Q1 csa csa:commabs-set-c aulast_simple 155cornell 2009 Q1 csa csa:commabs-set-c date 159cornell 2009 Q1 csa csa:commabs-set-c date_dddd 110cornell 2009 Q1 csa csa:commabs-set-c date_dddd-dd 49cornell 2009 Q1 csa csa:commabs-set-c isbn 6cornell 2009 Q1 csa csa:commabs-set-c isbn_10 6cornell 2009 Q1 csa csa:commabs-set-c issn 135cornell 2009 Q1 csa csa:commabs-set-c issn_number-number 135cornell 2009 Q1 csa csa:commabs-set-c issue 136cornell 2009 Q1 csa csa:commabs-set-c issue_number 132cornell 2009 Q1 csa csa:commabs-set-c issue_number_dash_number 2cornell 2009 Q1 csa csa:commabs-set-c issue_other 2cornell 2009 Q1 csa csa:commabs-set-c spage 153cornell 2009 Q1 csa csa:commabs-set-c spage_number 153cornell 2009 Q1 csa csa:commabs-set-c title 160cornell 2009 Q1 csa csa:commabs-set-c total 160cornell 2009 Q1 csa csa:commabs-set-c volume 139cornell 2009 Q1 csa csa:commabs-set-c volume_number 139



Next steps

• create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index”

• add non-Cornell data from libraries and link resolver vendors (model is agnostic to source)

• confirm and publicize key elements used by target syntaxes

• can the quality of the global OpenURL network be modeled mathematically?

How to stay in the loop


Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: alc28@cornell.edu