Using Web Data Provenance for Quality Assessment
-
Upload
olaf-hartig -
Category
Technology
-
view
1.934 -
download
3
description
Transcript of Using Web Data Provenance for Quality Assessment
UsingWeb Data Provenance
forQuality Assessment
Olaf Hartig*Jun Zhao˚
*Humboldt-Universität zu Berlin ˚University of Oxford
Olaf Hartig - Using Web Data Provenance for Quality Assessment 2
Information Quality (IQ)
● Common definition: fitness for use of information
● Multidimensional concept
● IQ criteria not independent of each other
● Relevancy of criteria determined by task and preferences
Category* Criteria / Dimensions
Intrinsic Accuracy, Believability, Objectivity, ...
Contextual Completeness, Relevance, Timeliness, ...
Representational Conciseness, Understandability, ...
Accessibility Availability, Security, ...*Classification by Wang and Strong, 1996
Olaf Hartig - Using Web Data Provenance for Quality Assessment 3
IQ Assessment
● Assigning numerical values (IQ scores) to IQ criteria
● It is difficult!● Precision vs. Practicality
Semi-automatic methods● Rating-based● Reputation-based
Manual methods● Questionnaires
Olaf Hartig - Using Web Data Provenance for Quality Assessment 4
Automated IQ Assessment
● Literature only outlines ideas for automatic methods
● Content analysis● Comparison (e.g. outlier detection)● Application of information retrieval methods● Analysis of results from data cleansing● Sampling techniques
● Context analysis● Analysis of metadata● Utilization of domain knowledge
Olaf Hartig - Using Web Data Provenance for Quality Assessment 5
Our Goal:
Methods to automatically assessIQ criteria of Web data
Primary means:
Provenance of assessed data
Olaf Hartig - Using Web Data Provenance for Quality Assessment 6
Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 7
Existing Provenance Research
● Main research areas: (scientific) workflows, DBMSs
● General focus: data creation
Olaf Hartig - Using Web Data Provenance for Quality Assessment 8
Provenance of Web Data
Olaf Hartig - Using Web Data Provenance for Quality Assessment 9
Provenance of Web Data
Web data provenancecomprises
two dimensions:
Data Creation • Data Access
Olaf Hartig - Using Web Data Provenance for Quality Assessment 10
Model of Web Data Provenance
● Provenance graph describes provenance of a data item● Nodes: provenance elements – pieces of provenance info● Edges: relate provenance elements to each other● Subgraphs for related data items possible
Olaf Hartig - Using Web Data Provenance for Quality Assessment 11
● Provenance model defines:● Types of provenance elements● Relationships
Model of Web Data Provenance
Actors
Executions
Artifacts
Olaf Hartig - Using Web Data Provenance for Quality Assessment 12
Data Access Dimension
Data Item
retrieved by Document
Data Access
contains
Relation tothe provided Information
Resource
Data Providing Service (Non-Human)
Data Publisher(Human)
Service Provider
uses controls
Data Accessor(Non-Human)
performs
accessed
Execution Time
Olaf Hartig - Using Web Data Provenance for Quality Assessment 13
Data Access Dimension cont.
(Verified)Artifact
Integrity Verification
Relation tothe signed Data
Signer
Verification Result
Signature Verification
{incomplete}
Signature Method
Olaf Hartig - Using Web Data Provenance for Quality Assessment 14
Data Creation Dimension
ProvenanceInformation
ProvenanceInformation
ProvenanceInformation
Data Creator(Human or Non-human)
{complete,disjoint}
Relation tothe created Data
Execution Time
Creation Guidelines
Data Creation
responsible for responsible for
Data Creating Service (e.g. Software Agent)
Data Creating Entity (e.g. Person, Group, Orga.)
Data Creating Device(e.g. Sensor)
Source Data
Data Item
part of
(Encompassing)Data Item
Olaf Hartig - Using Web Data Provenance for Quality Assessment 15
Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 16
A General Approach
● Blueprint for actual assessment methods that● Address specific scenario● Focus on specific IQ criterion
● Provenance elements have an influence on IQ
● Impact values represent these influences
● Assessment is affected by knowing about the influences
● Calculation of the IQ score with an assessment function that combines all impact values
Olaf Hartig - Using Web Data Provenance for Quality Assessment 17
General Assessment Procedure
Step 1 – Generate a provenance graph for the data item
Step 2 – Annotate the provenance graph with impact values
Step 3 – Execute the assessment function
Olaf Hartig - Using Web Data Provenance for Quality Assessment 18
Outline
1. Web Data Provenance
2. General Assessment Approach
3. Development of Assessment Methods
Olaf Hartig - Using Web Data Provenance for Quality Assessment 19
Designing Assessment Methods
● Developing the general approach into an actual method
● Fundamental design question:
For which IQ criterion do we want to apply the method?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 20
Designing Assessment Methods
● Developing the general approach into an actual method
● Fundamental design question:
For which IQ criterion do we want to apply the method?
● Timeliness: degree to which the data item is up-to-date with respect to the task at hand
● Representation* as an absolute measure in [0,1]● 1 – meeting the most strict timeliness standards● 0 – unacceptable
*Following Ballou et al., 1998
Olaf Hartig - Using Web Data Provenance for Quality Assessment 21
1 Generate the Provenance Graph
● Two complementary options:● Recording● Analyzing metadata
Where and how do we get provenance information?
What types of provenance elements are necessary?
What level of detail (i.e. granularity) is necessary?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 22
Example:
● Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
● All msr stored in a Web-accessible storage device (store)
● Our system (sys) accesses them for further processing
● sys assesses the timeliness of all msr
1 Generate the Provenance Graph
Olaf Hartig - Using Web Data Provenance for Quality Assessment 23
Example:
● Sensors (e.g. sensor1) hourly take measurement (e.g. msr)
● All msr stored in a Web-accessible storage device (store)
● Our system (sys) accesses them for further processing
● sys assesses the timeliness of all msr
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00
1 Generate the Provenance Graph
Olaf Hartig - Using Web Data Provenance for Quality Assessment 24
2 Annotation with Impact Values
● Systematically analyze each type of provenance elements
● Impact values not necessarily numerical● Depends on the assessment function in step 3
How might each provenanceelement influence the IQ criterion?
What kind of impact values are necessary?
How do we determine impact values?
How do we represent the influences by impact values?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 25
Determining Impact Values
● From the provenance information
● From user input● Configuration options● Rating-based, Reputation-based
● By content analysis● Comparison (e.g. outlier detection)● Adoption of information retrieval methods● Adoption of data cleansing techniques
● By context analysis● Further metadata● Domain knowledge
Olaf Hartig - Using Web Data Provenance for Quality Assessment 26
Prov. Element Type Impact Values
Data Creation ● creation time● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
Data Creation Dimension:
2 Annotation with Impact Values
How might each provenanceelement influence the IQ criterion?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 27
Prov. Element Type Impact Values
Data Creation ● creation time● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00
2 Annotation with Impact Values
Olaf Hartig - Using Web Data Provenance for Quality Assessment 28
Prov. Element Type Impact Values
Data Creation ● creation time● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00creation time
10:00
2 Annotation with Impact Values
Olaf Hartig - Using Web Data Provenance for Quality Assessment 29
Prov. Element Type Impact Values
Data Creation ● creation time● weights
Creation Guidelines -
(Source) Data Item ● expiry time
Data Creator -
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00creation time
10:00
expiry time11:00
2 Annotation with Impact Values
Olaf Hartig - Using Web Data Provenance for Quality Assessment 30
3 Assessment Function
● Develop the function together with the impact values
● Take incompleteness into consideration● Provenance graphs could be fragmentary● Annotations could be missing
What does the assessment function look like?
How do we represent the IQ criterion by an IQ score?
Olaf Hartig - Using Web Data Provenance for Quality Assessment 31
Step 3 – Assessment Function
Olaf Hartig - Using Web Data Provenance for Quality Assessment 32
Step 3 – Assessment Function
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00creation time
10:00
expiry time11:00
Olaf Hartig - Using Web Data Provenance for Quality Assessment 33
Step 3 – Assessment Function
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00creation time
10:00
expiry time11:00
Olaf Hartig - Using Web Data Provenance for Quality Assessment 34
Step 3 – Assessment Function
msrtype: Data Item
doctype: Document
aExctype: Data Access
contained by
systype: Data Accessor
performed by
cExctype: Data Creation
storetype: Data Providing Service
sensor1type: Data Creator
accessed
retrieved by
created by performed by
Execution Time: 10:13
Execution Time: 10:00creation time
10:00
expiry time11:00
t(msr) = 1 – (10:15 – 10:00) / (11:00 – 10:00) = 1 – 0.25h / 1h = 0.75
Olaf Hartig - Using Web Data Provenance for Quality Assessment 35
Conclusion
● Web Data Provenance (data creation + data access)
● General approach for provenance-based IQ assessment● Impact values: influence of provenance elements on IQ
● Design decisions for actual assessment methods
● Application to timeliness (more in the paper)
● Future work:● How do we deal with incompleteness?● Application of the approach to other IQ criteria
Olaf Hartig - Using Web Data Provenance for Quality Assessment 36
These slides have been created byOlaf Hartig
http://olafhartig.de
This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Attribution:● http://www.flickr.com/photos/rrrrred/3809362767/● http://www.hasslefreeclipart.com