Post on 09-May-2015
description
Metadata Quality Issues in Learning Object Repositories
PhD CandidateNikos Palavitsinis
PhD Supervisors Ass. Prof. Salvador Sanchez-Alonso,
Dr. Nikos Manouselis
2
Structure
• Introduction• Digital Repositories & Federations• Metadata & Education• Quality & Metadata• Metadata Quality Assessment Certification Process• PhD Work/Research • Timetable• Next Steps
Introduction
3/55
4
Problem
• Generic Problem: Low quality metadata in digital repositories that affects resource discovery
• Specific Problem: How might we insert quality assurance mechanisms in the digital repository lifecycle, to enhance metadata quality
Introduction/Problem
5
Background
• Relevant studies that look into quality issues:– Study based on the Open Language Archives
Community (Hughes, 2004)
– Studies based on the National Science Digital Repository (Zeng et al., 2005; Bui & Ran Park, 2006)
– Studies based on ARIADNE Federation repositories (Najjar et al., 2004; Ochoa et al., 2011)
Introduction/Background
6
Aim of Digital Repositories
• Databases used for storing and/or enabling the interoperability of Learning Objects (McGreal, 2007)
• Enable the efficient search & discovery of objects (Richards et al., 2002)
• How can the digital repositories fulfill their goals, if the quality of the metadata provided is poor?– Is it that poor?
Digital Repositories & Federations/Aim of Digital Repositories
7
ARIADNE case
21 elements <50%21 elements >50%
8
ARIADNE case
14 elements <50%12 elements >50%
9
Metadata
• Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource
• …vital component of the learning object economy (Currier et al., 2004)
Metadata & Education/Metadata
10
Metadata in Education
• In the field of Technology-Enhanced Learning, the need for describing resources with information that extends the scope of regular metadata has been identified early (Recker & Wiley, 2001)
• Most commonly used metadata schemas in education are IEEE LOM & Dublin Core
• For users of Educational Repositories, problems in metadata result to poor recall of resources and inconsistent search results (Currier et al., 2004)
Metadata & Education/Metadata in Education
11
Quality
• Level of excellence; A property or attribute that differentiates a thing or person
• Quality is the suitability of procedures, processes and systems in relation to the strategic objectives
• Metadata are of high importance to the success of Learning Object Repositories (LORs)– Heery & Anderson, 2005; Guy et al., 2004; Robertson 2005
Quality & Metadata/Quality
12
Quality in Metadata
• Poor quality metadata can mean that a resource is essentially invisible within a repository of archive that remains unused (Barton et al., 2003)
• Different settings and purposes require different approach to what represents quality in metadata (Robertson, 2005)
– Quality cannot be discussed in a vacuum (Bruce & Hillman, 2004)
Quality & Metadata/Quality in Metadata
13
Metadata Creators
• In some cases, subject matter experts have been proven to be better in metadata creation than information specialists (Greenberg et al., 2001; Park, 2009)
• Neither resource creators nor the information specialists handle pedagogic aspects of metadata well (Barton et al., 2003)
• Importance of having only trained professionals providing metadata (Holden, 2003)
Quality & Metadata/Metadata Creators
14
Metadata experts VS Domain experts
I have studied information
management
I know how to create & manage
data sources
I have been involved in EU projects for
digital libraries
I have a PhD in education
I know how to create educational
resources
I have worked with teachers for over 20
years
I think I can use the expertise of
both…
15
Metadata Creation
• Metadata today is likely to be created by people without metadata training, working largely in isolation and without adequate documentation
• Metadata records are also created automatically, often with poorly documented methodology and little or no indication of provenance
• Unsurprisingly, the metadata resulting from these processes varies strikingly in quality and often does not play well together (Hillman et al., 2004)
Quality & Metadata/Metadata Creation
16
Metadata Quality Metrics (1/2)
• Completeness– Number of element values provided by annotator,
compared to the total possible number of values
• Accuracy– Metadata descriptions correspond to the actual resource
they describe
• Consistency– Degree of conformance of the metadata provided
according to the rules metadata application profile used
Quality & Metadata/Metadata Quality Metrics
17
Metadata Quality Metrics (2/2)
• Objectiveness– Degree in which the metadata provided describe the
resource in an unbiased way
• Appropriateness– Fitness of use of the metadata provided when considered
in terms of the envisaged services of the environment/tool deployed
• Correctness – Usage of the language in the metadata, syntactically
and/or grammatically
Quality & Metadata/Metadata Quality Metrics
18
Back to the problem
• How might we insert quality assurance mechanisms in the digital repository lifecycle, to enhance metadata quality?
• Solution that capitalizes more on the human factor but also on automated methods of examining metadata quality
Metadata Quality Assessment Certification Process/Introduction
Proposed Method
19/55
20
Metadata Quality Assessment Certification Process
21
Structure
Metadata Quality Assessment Certification Process/Structure
Phases Different “periods” in the repository lifecycle
Steps Specific metadata processes taking place in each phase
Quality Assurance Methods
“Control points” inserted in the repository lifecycle, to enhance metadata quality
Quality Tools /Instruments
Tools that are used to deploy the Quality Assurance Methods
Actors People that are involved in the repository lifecycle with various roles
Outcomes Results of each Quality Assurance Method used in different Steps
22
Metadata Design Phase
• Description– Metadata specification / application profiling of an existing
metadata schema that will be used in a specific context
• Quality Assurance Methods– Metadata Understanding Session– Preliminary Metadata Hands-on Annotation
• Actors– Subject-matter experts & metadata experts
• Outcomes– Initial input for metadata specification– Paper-based metadata records
Metadata Quality Assessment Certification Process/Metadata Design Phase
23
Testing Phase
• Description– The envisaged system/tool is implemented & the users are
working with the first implementation of the metadata standard
• Quality Assurance Methods– Test implementation of the tool– Hands-on annotation experiment– Metadata Quality Review of test sample of resources
• Actors– Subject-matter experts & metadata experts
• Outcomes– Good & Bad Metadata Practices Guide– Feedback for the development of the system/tool
Metadata Quality Assessment Certification Process/Testing Phase
24
Calibration Phase
• Description– The envisaged system/tool is deployed in a controlled
environment and the subject matter experts continuously upload resources on it
• Quality Assurance Methods– Metadata Quality Peer Review Exercise
• Actors– Subject-matter experts & metadata experts
• Outcomes– Good & Bad Metadata Practices Guide updated– Recommendations for metadata improvement– Peer Review results related to the quality of metadata for
the resources examinedMetadata Quality Assessment Certification Process/Calibration Phase
25
Building Critical Mass Phase
• Description– Tools have reached a high-maturity phase and the
metadata application profile has been finalized. Repository accepts a large number of resources
• Quality Assurance Methods– Analysis of Usage Data coming from the tool(s)– Metadata Quality Certification Mark
• Actors– Metadata experts
• Outcomes– Minor changes to application profile– Recommendations for metadata improvementMetadata Quality Assessment Certification Process/Building Critical Mass Phase
26
Regular Operation Phase
• Description– Metadata used in the tool(s) are finalized and content
providers are uploading resources regularly. This period lasts for as long as the deployed services are online
• Quality Assurance Methods– Regular Analysis of Usage Data coming from the tool(s)– Online Peer Review Mechanism– Quality Prizes/Awards for selected resources
• Actors– Metadata experts & Content users/consumers
• Outcomes– Recommendations for metadata improvementMetadata Quality Assessment Certification Process/Regular Operation Phase
Case Study
27/55
28
Case Study
• Metadata Quality Assessment Certification Process applied in the Organic.Edunet Federation of Learning Repositories
• Each respective Phase is presented focusing on its application in the Organic.Edunet case
Metadata Quality Assessment Certification Process/Case Study
29
Metadata Design Phase
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
• Metadata Understanding Session– Form that assesses elements easiness to
understand, usefulness and appropriateness for the application domain
– Also asking whether or not each element should be mandatory, recommended or optional
Duration 2 hours
Annotated Objects 0
Actors involved 20 metadata & subject-matter experts
30
Metadata Design Phase
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
31
• Preliminary Hands-on Annotation– Subject matter experts annotate a sample of their
resources using the suggested metadata application profile
– Session organized with the participation of all content providers with supervised annotation of resources
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
Metadata Design Phase
32
Metadata Design Phase
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
33
ResultsResults
Question Totally Disagree Disagree Neutral Agree Totally Agree
Is the element easy for you to understand? 0% 4% 21% 42% 33%
Is this element useful for describing Organic.Edunet content resources? 0% 12% 33% 41% 14%
Is the selection of the element’s possible values clear and appropriate? 0% 4% 37% 50% 9%
Best rated Rating
Is the element easy for you to understand? General.Keyword
Technical.Format
Technical.Size 9.2 / 10
Is this element useful for describing Organic.Edunet content resources?
General.Identifier
General.Description
Technical.Format 8.8 / 10
Is the selection of the element’s possible values clear and appropriate?
General.Description
Rights.Cost Format.Size 8.1 / 10
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
34
Results
Metadata Quality Assessment Certification Process/Case Study/Metadata Design Phase
Worst rated Rating
Is the element easy for you to understand?
Classification.Taxon
Relation.Resource
Educational.Semantic Density 3.1 to 4.8 / 10
Is this element useful for describing Organic.Edunet content resources?
Classification.Taxon
Annotation.Entity Annotation.Date 2.3 to 3.1 / 10
Is the selection of the element’s possible values clear and appropriate?
Classification.Taxon
Classification.Purpose
General.Identifier 2.9 to 4 / 10
Mandatory Recommended Optional
Question Before After Before After Before After
Should this element be mandatory, recommended or optional? 19 25 26 21 12 11
Percentile change in overall number of mandatory / recommended or optional elements +31% -19% -8,3%
35
Testing Phase
• Hands-on annotation experiment– Core metadata quality criteria – Related more with information management
practices and less with the content itself– Issues that are not connected to the domain of
use for the resources
Metadata Quality Assessment Certification Process/Case Study/Testing Phase
Duration 1 week
Annotated Objects 500 objects (5%)
Actors involved 4 metadata experts
Resources Reviewed 15 per metadata expert (60)
36
Results
Metadata Quality Assessment Certification Process/Case Study/Testing Phase
37
Results
Title “Please use a more comprehensive title. For example the CRC acronym, can be refined as Cooperative Research Centre just to provide the user with a way to understand what this learning resource is about.”
Keyword “More keywords needed. Just one keyword is not enough, and even so, the keyword text here is misleading. These keywords should be provided separately as “turkey” and “poultry” along with some others, and not as one “turkey poultry”.”
Typical Age Range
“…why is it that simple pictures of pigs in the snow with no scientific details on them cannot be used for children that are less than 10 years old? Couldn’t these pictures be used in the context of a primary class?”
Context “Since the age range is from 15 years old to undefined, it only makes sense that the Educational context cannot be limited to higher education but should also consider high school. Be very careful because in this sense, these two elements should not conflict.”
Metadata Quality Assessment Certification Process/Case Study/Testing Phase
38
Calibration Phase
• Metadata Quality Peer Review Exercise– Peer reviewing metadata records using a pre-
defined quality grid assessing metadata quality metrics• Completeness, accuracy, correctness of language, etc
based on Bruce & Hillman’s model
Duration 3 weeks
Annotated Objects 1.000 objects (10%)
Actors involved 20 subject matter experts
Resources Reviewed 105 resources (5 per expert)
Metadata Quality Assessment Certification Process/Case Study/Calibration Phase
39
Calibration Phase
Metadata Quality Assessment Certification Process/Case Study/Calibration Phase
40
Results
Metadata Quality Assessment Certification Process/Case Study/Calibration Phase
Score
1. In which degree is this
metadata record
completed?
2. Overall accuracy of
the metadata provided
3. Values provided
consistent to
metadata standard
4. Describe the resource
in an objective
way?
5. Values provided,
appropriate for the use in the
Portal?
6. Degree of correctness
of the language
used
7. Overall score for the metadata of this resource
5 42 54 53 72 43 72 42
4 47 34 29 22 35 22 39
3 5 10 16 6 19 9 20
2 9 3 1 2 5 0 0
1 1 1 0 0 2 1 1
no 1 3 6 3 1 1 3
41
Building Critical Mass Phase
• Analysis of Usage Data coming from tool(s)– Expecting to verify findings from the experiment in
the “Metadata Design” Phase• Necessary elements, being used more,• Elements with values easy to understand being used
correctly, etc.• Beginning of the intensive content population
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
Duration 1 week
Annotated Objects 6.600 objects (60%)
Actors involved 2 metadata experts
Resources Analyzed 6.600
42
Building Critical Mass Phase
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
• “1” shows that an element is completed whereas “0” shows the opposite
• In the case of elements with multiplicity >1, values can be “2”, “3”, etc.– Interesting to look at the case of keywords, classification
terms and/or educational elements
43
Results
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
No ELEMENT NAME Records filled %
1 General Title – 1.2 6639 99.8%
2 General Description – 1.4 6307 94.8%
3 General Language – 1.3 6248 93.9%
4 Rights Cost Copyright & Other Restrictions – 6.2 1066 16.0%
5 Rights Cost – 6.1 1043 15.7%
6 Educational Learning Resource Type – 5.2 895 13.5%
7 Educational Intended End User Role – 5.5 853 12.8%
8 General.Keyword – 1.5 850 12.8%
9 Classification.Taxon Path.TaxonID – 9.2.2.1 785 11.8%
10 Lifecycle.Contribute.Role – 2.3.1 763 11.5%
44
Results
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
12 General Title
14 General Description
13 General Language
62 Rights Cost Copy-right And
Other Restric-tions
61 Rights Cost
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%99.8% 94.8% 93.9%
16.0% 15.7%
52 E
duca
tiona
l Lea
rnin
g Re
s...
55 E
duca
tiona
l Int
ende
d En
d...
15 G
ener
al K
eyw
ord
9221
Cla
ssifi
catio
n Ta
xonP
a...
233
Life
Cycl
e Co
ntrib
ute
Date
56 E
duca
tiona
l Con
text
231
Life
Cycl
e C
ontr
ibut
e Ro
le
232
Life
Cycl
e Co
ntrib
ute
Entit
y
17 G
ener
al S
truc
ture
63 R
ight
s Des
crip
tion
57 E
duca
tiona
l Typ
ical
Age
...0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
14.0%
16.0%13.5% 12.8% 12.8%
11.8%10.3% 10.2%
8.7% 8.7%7.9% 7.7%
3.8%
45
Compare & Contrast
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
ELEMENT NAME Records filled %
Rights Cost – 6.1 1043 15.7%
Educational Learning Resource Type – 5.2 895 13.5%
Educational Intended End User Role – 5.5 853 12.8%
General.Keyword – 1.5 850 12.8%
Classification.Taxon Path.TaxonID – 9.2.2.1 785 11.8%
Lifecycle.Contribute.Role – 2.3.1 763 11.5%
Best rated Rating
Is the element easy for you to understand? General.Keyword
Technical.Format
Technical.Size 9.2 / 10
Is the selection of the element’s possible values clear and appropriate?
General.Description
Rights.Cost Format.Size 8.1 / 10
46
Building Critical Mass Phase
• Metadata Quality Certification Mark– Introduced the concept of a “Quality Seal” for
each metadata record that a content provider uploads to the Organic.Edunet Federation
– In meta.metadata element
Metadata Quality Assessment Certification Process/Case Study/Building Critical Mass Phase
47
Regular Operation Phase
• Regular Analysis of Usage Data coming from the tool(s)– Any improvement to the quality of the metadata?– Measuring completeness only– Analysis conducted on October 2010
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
Duration 1 week
Annotated Objects 11.000 objects (100%)
Actors involved 2 metadata experts
Resources Analyzed 11.000
48
MANDATORY ELEMENTSCritical Mass Regular Operation
Records % Records % Diff.
1.2 General Title 6639 99.8% 10.741 98.7% -1.1%
1.3 General Language 6248 93.9% 10.188 93.6% -0.3%
1.4 General Description 6307 94.8% 10.745 98.6% 3.8%
6.1 Rights Cost 1043 15.7% 8.681 79.7% 64.0%6.2 Rights Cost Copyright & Other Restrictions 1066 16.0% 10.720 98.4% 82.4%
Results
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
49
RECOMMENDED ELEMENTSCritical Mass Regular Operation
Diff.Records % Records %
1.5 General Keyword 850 12.8% 9.314 90.9% 78.1%
1.7 General Structure 523 7.9% 8.722 80.1% 72.2%
2.3.1 LifeCycle Contribute Role 763 11.5% 8.167 75% 63.5%
2.3.2 LifeCycle Contribute Entity 578 8.7% 8.244 75.8% 67.1%
2.3.3 LifeCycle Contribute Date 687 10.3% 6.842 62.8% 52.5%
5.5 Educational Intended End User Role 853 12.8% 8.589 78.9% 66.1%
5.6 Educational Context 678 10.2% 6.278 57.6% 47.4%
5.7 Educational Typical Age Range 252 3.8% 6.700 61.5% 57.7%
6.3 Rights Description 511 7.7% 9.865 90.6% 82.9%
Results
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
50
OPTIONAL ELEMENTSCritical Mass Regular Operation
Diff.Records % Records %
1.6 General Coverage 10 0.2% 8730 80.1% 79.9%
2.2 LifeCycle Status 22 0.3% 4284 39.3% 39%
5.1 Educational Interactivity Type 22 0.3% 3907 35.9% 35.6%
5.3 Educational Interactivity Level 22 0.3% 3931 36.1% 35.8%
5.4 Educational Semantic Density 14 0.2% 3931 36.1% 35.9%
5.8 Educational Difficulty 9 0.1% 3947 36.2% 36.1%
5.10 Educational Description 102 1.5% 1603 14.7% 13.2%
5.11 Educational Language 22 0.3% 5577 51.2% 50.9%
Results
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
51
Regular Operation Phase
• Online Peer Review Mechanism– Deployed on the Organic.Edunet Federation Portal– Collecting ratings on metadata quality for all
resources available
Metadata Quality Assessment Certification Process/Case Study/Regular Operation Phase
52
Overview
ExperimentNo of
participants / records
Phase Date
Application Profile Questionnaire & Hands-on annotation 20 Metadata Design 1/2009
Metadata Record review from metadata experts 4 / 60 (records) Testing 4/2009
Metadata Record review from subject matter experts 20 / 105 (records) Calibration 6/2009
Log files analysis from Annotation Tool 6.600 (records) Building Critical Mass 9/2009
Log files analysis from Annotation Tool 11.000 (records) Regular Operation 10/2010
Metadata Quality Assessment Certification Process/Case Study/Overview
PhD Progress
53/55
54
Progress VS Publications (1/2)
Experiment Phase Date Published
Application Profile Questionnaire & Hands-on annotation Metadata Design 1/2009 JIAC 2009
Palavitsinis et al.: Interoperable metadata for a federation of learning repositories on organic agriculture and agroecology
Metadata Record review from metadata experts Testing 4/2009 MTSR 2009
Palavitsinis et al.: Evaluation of a Metadata Application Profile for Learning Resources on Organic Agriculture
Metadata Record review from subject matter experts Calibration 6/2009 ED-MEDIA
2011
Palavitsinis et al.: Metadata quality in learning repositories: Issues and considerations
PhD Work
55
Progress VS Publications (2/2)
PhD Work
Experiment Phase Date Published
Log files analysis from Annotation Tool Metadata Design 9/2009 ICSD 2009
Palavitsinis et al.: Evaluating Metadata Application Profiles based on Usage Data
Log files analysis from Annotation Tool Testing 10/2010 ED-MEDIA 2011
Palavitsinis et al.: Metadata quality in learning repositories: Issues and considerations
56
Early Publications
• Knowledge Organization Systems– Online study of Knowledge Organization Systems
on agricultural and environmental sciences • Palavitsinis & Manouselis, ITEE 2009
• Metadata Lifecycle– “Towards a Digital Curation Framework for
Learning Repositories: Issues & Considerations”• Palavitsinis et al., SE@M 2010
PhD Work
57
Real Users
• Organized a series of workshops involving users annotating resources– Organic.Edunet Summer School 2009– Joint Technology Enhanced Learning Summer
School 2010– American Farm School & Ellinogermaniki Agogi
workshops– HSci Conference in Crete• Working with users (i.e. subject-matter experts,
educators and metadata experts)
PhD Work/User Events
58
Stakeholder Consultation
• e-Conference: held during October 2010 (6/10-30/10)
• Experts on Quality for e-learning• Two phases – four topics• Provided input for a separate PhD chapter
PhD Work/e-Conference
59
Topics
• Each main topic, had 4 refining questions,• Each main topic, had 1 or 2 moderators• The e-Conference had 2 administrators• 1 keynote was recorded from Mrs. Amee Evans Godwin of the
Institute for Knowledge Management in Education (IKSME)
PhD Work/e-Conference/Topics
Phase TopicsI(6-30/10)
Learning resources creation: What constitutes a quality learning resource?Providing quality metadata: Is the gain worth the effort?
II(14-30/10)
Populating a repository with resources and metadata: The quality versus quantity dilemmaManaging a portal with thousands of resources and users: Are communities “attracted” to quality, like bees to honey?
What’s next
60/55
61
Next Experiments
• Pilot Experiment in Agricultural Learning Resources’ Repository completed – Organic.Edunet (Confolio)
• Validation Experiment in Scientific/Scholarly Content Repository ongoing– VOA3R case (in Calibration Phase)
• Validation Experiment in Cultural Content Repository ongoing– Natural Europe case (in Testing Phase)
Timetable
62
Timeline
5/09 5/10 10/10
Literature Review (A)
Adapted MeQuACeP
2/11
Pilot Experiment
Validation Experiments
12/11 9/12
Introductory Research
Literature Review (B)
6/12
Timetable
WRITING
63
Next Steps
• 11/2011 – Journal paper on Metadata Quality Assessment Certification Process ready
• 4/2012 – Journal paper on MeQuACeP applied in other contexts pending
• 6-9/2012 – Writing of thesis
Next Steps
64/55
Metadata Quality Issues in Learning Object Repositories
Thank you for your attention!