AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX...
Transcript of AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX...
![Page 1: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/1.jpg)
ANEXPLORATORYSTUDYOFTHEDESCRIPTIONFIELD
INTHEDIGITALPUBLICLIBRARYOFAMERICA
HannahTarver
OksanaL.Zavalina
MarkPhillips
1October14,2016
![Page 2: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/2.jpg)
Outlineofpresentation
• Introductionandbackground
• Methodologyofthestudy
• Somefindings
• Discussion
• Conclusionsandfutureresearch
2October14,2016 2
![Page 3: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/3.jpg)
INTRODUCTIONANDBACKGROUND
3October14,2016
![Page 4: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/4.jpg)
Descriptionmetadata
October14,2016
• Repeatablemetadataelementswithfree-textdatavalues
• Oneormoreelementsindifferentmetadataschemes
• DublinCore:Description(abstract,tableofcontents)
• MetadataObjectDescriptionSchema:Abstract,Note,Table
ofContents
• EncodedArchivalDescription:ScopeandContent
• MachineReadableCataloging:5XXnotes(atotalof53
fields) 4
![Page 5: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/5.jpg)
Descriptionmetadata:datavalues
October14,2016
• Bestpracticerecommendationssuggestincludinginformationaboutinformationobject’s:
• anabstract,tableofcontents,referencetoagraphicalrepresentationofcontentorafree-textaccountofthecontent(DC)
• subject,significance,andfunction(CCOandCDWA)
• provenanceandhistory,language(OSULibraries)
• etc.5
![Page 6: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/6.jpg)
Descriptionmetadataapplication
October14,2016
Inconsistentlevelsofapplication
• in40%-99%ofmetadatarecords:• 50.9%inOAIster(Ward,2003)
• 40-75%acrossuniversitydigitalrepositories(Kurtz,2010)
• 99%indigitalvideocollections(Weagley,Gelches,&Park,2010)
• 100%indigitalimagecollections(Park,2006)
• by72%-89%ofdataprovidersinaggregations:• 72%inOAIster(Ward,2003)
• 89%inIMLSDCCaggregation(Jacksonetal.,2008)6
![Page 7: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/7.jpg)
DPLA
• DigitalPublicLibraryofAmerica(launchedin2013)
• Oneofthelargestandrapidlygrowingdigitalrepositories
• Distributednetworkmodel:• Contenthubs• Servicehubs
October14,2016 7
![Page 8: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/8.jpg)
DPLA&Description
8October14,2016
DescriptiondefinedinDPLAMetadataApplicationProfileas“Includesbutisnotlimitedto:anabstract,atableofcontents,orafree-textaccountofdescribedresource”• DiscrepancyinDescriptiondocumentation
• recommendedfieldinIntrotoDPLADataModel:version4(2015)• optionalincompleteDPLAMetadataApplicationProfile:version
4(2015)
• MetadatanormalizedatharvestingintoDPLA• Variousnativemetadatafieldsmappedtodcterms:description
![Page 9: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/9.jpg)
Problemstatement
October14,2016
Lackofsystematicempirical
studiesofdigitallibrarymetadata
withthefocusonfree-text
Descriptionmetadata
• inverylargeaggregations
(e.g.,HathiTrust,Internet
Archive,DPLA,etc.)
9
![Page 10: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/10.jpg)
METHODS
October14,2016 10
![Page 11: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/11.jpg)
Problemstatement&ResearchQuestions
11October14,2016
Lackofsystematicempiricalstudiesofdigitallibrarymetadatawiththefocusonfree-textDescriptionmetadata
• inverylargeaggregations(e.g.,HathiTrust,InternetArchive,DPLA,etc.)
• WhatistheoverallusageofDescriptionfieldbyhubsinDPLA?
• HowcanlengthofdatavaluesprovideinsightintoDescriptionmetadatapracticesamongDPLAhubs?
![Page 12: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/12.jpg)
Datacollection&processing
12October14,2016
BigDataapproach:DPLABulkDownloadhttp://dp.la/info/developers/download
• over11.5millionmetadatarecordsinasinglecompressedJSONfile
• eachrecordparsed,allinstancesofDescriptionfieldextracted• Solrfull-textindexerforDescriptionfields:StatsComponent
http://wiki.apache.org/solr/StatsComponent
![Page 13: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/13.jpg)
Dataanalysis
October14,2016
• LevelofapplicationofDescriptionfieldbyDPLAhub• fieldinstancesperrecord
• LengthofdatavalueinDescriptionfield–range,mean,standarddeviationof:
• numberofcharacters• numberofwords• averagewordlength• proportionofdatavaluethatconsistofletters,
punctuation,orintegers• ContentanalysisofdatavaluesfromDescriptionfield
instances(n=200)
Thefindingspresented
here
13
![Page 14: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/14.jpg)
FINDINGS:LEVELOFAPPLICATION
October14,2016 14
![Page 15: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/15.jpg)
%ofrecordswith1+Descriptioninstance
October14,2016 15
Only5outof29hubsincludeDescriptioninallrecords;22hubsincludeitin50%+ofrecords
![Page 16: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/16.jpg)
MAXnumberofDescriptioninstancesperrecord
16October14,2016
51104 17191 98
25
6511
77
98
35
111615
1216179
31
25 4611
020406080100120140160180artstor
bhlcdl
david_rumsey
digital-commonwealth
digitalnc
esdn
georgia
getty
gpo
harvard
hathitrustindiana
internet_archivekdlmdlmissouri-hubmwdl
nara
nypl
scdl
smithsonian
the_portal_to_texas_history
tn
uiuc
undefined_provider
uscvirginiawashington7hubswithover20instances:
4withover50
![Page 17: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/17.jpg)
Averageno.ofDescriptioninstancesperrecord
October14,2016 17
Above2.00for8of29hubs:5ofthesehubshaveunusuallyhighmaxnumberofinstances
![Page 18: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/18.jpg)
%ofDescriptioninstanceswithuniquevalues
October14,2016 18
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
Under50%valuesareuniqueforallbut6hubs
![Page 19: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/19.jpg)
FINDINGS:LENGTHOFDATAVALUES
October14,2016 19
![Page 20: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/20.jpg)
Ave.lengthofdatavalues(no.ofcharacters)
20October14,2016
200charactersorlessformosthubs
Veryhighvariabilityfor2hubs
![Page 21: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/21.jpg)
MAXlengthofdatavalues(no.ofcharacters)
October14,2016 21
![Page 22: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/22.jpg)
Morefindings:contentanalysisofdatavalues
22October14,2016
DatavalueinDescriptionfield Category1glassnegative:b&w;8x10in.;sulfiding. PhysicalobjectdescriptionThismaterialhasbeenprovidedbyTheRoyalCollegeofSurgeonsofEngland.TheoriginalmaybeconsultedatTheRoyalCollegeofSurgeonsofEngland
Rightsorusagestatement
ThisimageshowsasectionofThornCemeteryincludinggravestones. Objectcontentdescription
Microform. ObjecttypeorformatTitlesuppliedbycataloger. Noteormetadatasource
Thisseriescontainstranscriptsofproceedings,depositions,andoralexaminationspreparedexclusivelyfororintheDistrictCourt.ThedepositionsandoralexaminationsweretakenoutofcourtandareprimarilyinterviewswithSchoolBoardrepresentativesandemployeesconcerningthedevelopment,implementation,andreviewofdesegregationplans.
Collection-levelcontentdescription
P950. Identifierorcallnumber
Mainsourceofnon-uniquedatavalues(slide18)
![Page 23: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/23.jpg)
DISCUSSION,CONCLUSIONS&
FUTURERESEARCH
October14,2016 23
![Page 24: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/24.jpg)
Variabilityissues
• Notallhubs(and/orpartnerinstitutionswithinservicehubs)may:• considerfree-textDescriptionfields(e.g.,notesofvariouskinds)
tobeequallyimportant,or
• enforcetheusageofDescriptionfields
• WidevariabilityofthenumberofinstancesofDescriptionandthelengthofdatavalues
• Higher(butnotoutlying)lengthscouldindicatemorerigorousstandardsofDescriptioninhubs
• OutlierrecordswithDescriptionfieldvalues20,000characters+shouldbereviewedastotheirappropriatenesstolocaldescriptivemetadatainputrulesOctober14,2016
Unexpected:departurefrombestpractice
guidelines
24
![Page 25: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/25.jpg)
Mappingissues
October14,2016 25
_
differingperceptionsofDescriptionsemanticsamongDPLAhubsandcontributors
absenceofabettermatchthanDescriptiontomapthe
informationfromnativemetadata(richerthanDublin
Core)inDPLA
insufficientconsistencyofcontributedrecordsformoreaccuratemapping
ThevarietyofinformationtypesobservedinDescriptiondatavaluesmightbedueto1ormorefactors:
![Page 26: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/26.jpg)
Otherpossibleproblems
• Context
• Shortervaluesmightbeduetolocalpracticeswheredescriptionofitem
isinthecontextofthedescriptionofcollection
• Contextlostinaggregation
• Quality
• Outlierswithshortestdatavaluelengthsmightindicatelackofrelevant
informationaboutanitem
• Outlierswithlongestdatavaluelengthsmightbeduetofulltext
harvestedwiththemetadatarecordOctober14,2016 26
![Page 27: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/27.jpg)
Conclusions
• Simplestatisticalanalysescanprovidebetterunderstandingofmetadata
usageinaggregation
• Largesetsofempiricaldata(bigdataapproach,eliminatessamplingerror)
• Diversityallowsforbetterunderstandingofcontributinginstitutions’varying
practices
• Recommendationforaggregatorstoincludeinmetadataapplication
profilesaseparateNotepropertyformappingofinformationthatdoes
notfitDescription
• AdditionalresearchisneededOctober14,2016 27
Someideasonthenext2slides
![Page 28: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/28.jpg)
Futureresearch
UseoflanguageinDescriptiondatavalues• numberofwords
• averagewordlength
• proportionofdatavaluesthatconsistofletters,punctuation,orintegers
• proportionofwordsfromlistoffrequently-usedEnglishworks(e.g.,1K,5Ketc.,standardEnglishdictionary)
October14,2016
Datacollectedbutnotanalyzed
yet
28
![Page 29: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/29.jpg)
Morefutureresearch1. CategorizinginformationinDescription• automaticallyidentifyingsomeofthisinformationtomapdata
valuesmoreaccuratelyormarkthemforreviewforqualitycontrol.
2. ComparingperceivedimportancewithactualapplicationofDescriptionmetadatainDPLAhubs’nativemetadata
3. Researchinto:• Howdifferentinstitutionsperceiveitem-level(andcollection-
level)metadatainnativesystemsandaspartofanaggregation• End-userperceptionofusefulnessofdescriptiveinformationin
helpingfinditemsü couldhelptorefineguidelinesonDescriptionfield
October14,2016 29
![Page 30: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/30.jpg)
30
Questions?Comments?
Ideas?October14,2016
![Page 31: AN EXPLORATORY STUDY OF THE DESCRIPTION FIELD IN THE ... · • Machine Readable Cataloging: 5XX notes (a total of 53 fields) 4 Description metadata: data values October 14, 2016](https://reader030.fdocuments.us/reader030/viewer/2022040615/5f0e5ef17e708231d43eeb8b/html5/thumbnails/31.jpg)
Workscited• Baca,M.andP.Harpring(Eds.).(2009)CategoriesfortheDescriptionofWorksofArt(CDWA),GettyResearchInstitute,SantaMonica.
• Baca,M.,etal.(2006)CatalogingCulturalObjects:AGuidetoDescribingCulturalWorksandtheirImages,AmericanLibraryAssociation,Chicago.
• DigitalPublicLibraryofAmerica(2015a,March5).AnintroductiontotheDPLAmetadatamodel.Retrievedfromhttp://dp.la/info/wp-content/uploads/2015/03/Intro_to_DPLA_metadata_model.pdf
• DigitalPublicLibraryofAmerica(2015b,March5).Metadataapplicationprofile:Version4.0.Retrievedfromhttp://dp.la/info/wp-content/uploads/2015/03/MAPv4.pdf
• EncodedArchivalDescription.(2002).Retrievedfromhttp://www.loc.gov/ead/.• EncodedArchivalDescription:EAD3.(2015).Retrievedfromhttp://www2.archivists.org/sites/all/files/TagLibrary-VersionEAD3.pdf.
• Jackson,A.S.,M.Han,K.Groetsch.,M.MustafoffandT.W.Cole.(2008).DublinCoremetadataharvestedthroughOAI-PMH.JournalofLibraryMetadata,8(1),5-21.
• Hillmann,D.(2005).UsingDublinCore.Retrievedfromhttp://dublincore.org/documents/usageguide/• Kurtz,M.(2010).DublinCore,DSpace,andabriefanalysisofthreeuniversityrepositories.InformationTechnology&Libraries,29(1),40-46.Retrievedfromhttp://ejournals.bc.edu/ojs/index.php/ital/article/view/3157/2771
• OSUKnowledgeBankMetadataApplicationProfileforDigitalVideo.(2011).Retrievedfromhttps://library.osu.edu/documents/knowledge-bank/KnowledgeBankMetadataApplicationProfile2011.pdf
• Park,J.(2006).Semanticinteroperabilityandmetadataquality:Ananalysisofmetadataitemrecordsofdigitalimagecollections.KnowledgeOrganization,33(1),20-34.
• Ward,J.(2003).AquantitativeanalysisifunqualifiedDublinCoremetadataelementsetusagewithindataprovidersregisteredwiththeOpenArchivesInitiative.Proceedingsofthe2003JointConferenceonDigitalLibraries,pp.315-317.
• Weagley,J.,E.Gelches,&J.Park.(2010).Interoperabilityandmetadataqualityindigitalvideorepositories:astudyofDublinCore.JournalofLibraryMetadata,10(1),37-57.DOI:10.1080/19386380903546984.
31October14,2016