Uncertainty in Data Integration Ai Jing 2007-11-10.

Uncertainty in Data Integration

Ai Jing2007-11-10

Outline Data Integration with Uncertainty Overview of Workshop on

Management of Uncertain Data Uncertainty in Deep Web

Outline

Data Integration with Uncertainty Overview of Workshop on

Data Integration with Uncertainty

Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

Traditional Data Integration SystemsSELECT P.title AS title, P.year AS year, A.

name AS authorFROM Author, Paper, AuthoredBy

WHERE Author.aid = AuthoredBy.aid AND Paper.pid = AUthoredBy.pid Q

Uncertainty Can Occur at Three Levels in Data Integration Applications

III. Query Level

II. Mapping Level

I. Data Level

Focus of the paper:Probabilistic schema mappings

Example Probabilistic Mappings

T(name, email, mailing-addr, home-addr, office-addr)S(pname, email-addr, current-addr, permanent-addr)

T(name, email, mailing-addr, home-addr, office-addr) S(pname, email-addr, current-addr, permanent-addr)

T(name, email, mailing-addr, home-addr, office-addr)

S(pname, email-addr, current-addr, permanent-addr)

Top-k Query Answering w.r.t. Probabilistic Mappings

Mediated Schema

Q: SELECT mailing-addr FROM T

0.5 0.40.1

Q1: SELECT current-addr FROM S

Q2: SELECT permanent-addr FROM S

Q3: SELECT email-addr FROM S

Definition of probabilistic mappings

Schema Mapping

Probabilistic Mapping

S=(pname, email-addr, home-addr, office-addr)

T=(name, mailing-addr)

one-to-one schema matchinghave exact knowledge of mapping

S=(pname, email-addr, home-addr, office-addr)

T=(name, mailing-addr)

1.0 0.1 0.5 0.4

By-Table Semantics

By-Tuple Semantics

Pr(<m1,m3>)=0.05

By-Table Query Answering

By-Tuple Query Answering

Complexity of query answering

More on By-Tuple Query Answering The high complexity comes from computing probabili

ties the number of mapping sequences is exponential in the size of the i

nput data n tuples, m mappings m^n mapping sequences

There are two subsets of queries that can be answered in PTIME by query rewriting SELECT mailing-addr FROM T SELECT mailing-addr FROM T,V

WHERE T.mailing-addr = V.hightech In general query answering cannot be done by query

rewriting

One of Dt

Extensions to More Expressive Mappings

The complexity results for query answering carry over to three extensions to more expressive mappings Complex mappings

GLAV mappings

Conditional mappings:

Contributions

Definition of probabilistic mappingsSemantics: by-table v.s. by-tuple

Complexity of query answering

Outline

Overview of MUD 2007

Theory A New Language and Architecture to Obtain Fuzzy Global Depende

ncies About the Processing of Division Queries Addressed to Possibilistic

Databases Making Aggregation Work in Uncertain and Probabilistic Datab

ases Application

Materialized Views in Probabilistic Databases

Application Flexible matching of Ear Biometrics Consistent Joins Under Primary Key Constraints

A New Language and Architecture to Obtain Fuzzy Global Dependencies

SQL does not satisfy the minimum requirements to be true DM language

A New Language: dmFSQL (data mining Fuzzy Structured Query Language)

Fuzzy Database Data mining

About the Processing of Division Queries Addressed to Possibilistic Databases

They devised a data model which is a strong representation system for operations in possibilistic databases

A possibilistic databases D can be interpreted as a weighted disjunctive set of regular databases

Division Queries

Making Aggregation Work inUncertain and Probabilistic Databases

Trio is a prototype database management system for storing and querying data with uncertainty and lineage

Trio’s query language——TriQL

Trio data model and query semantics

Aggregation function in the Trio system for uncertain and probabilistic data

Materialized Views in Probabilistic Databases

Materialized Views for probabilistic may not define a unique probability distribution

view representation Answer queries on large probabilistic dat

a set more efficiently with materialized views

Flexible matching of Ear Biometrics

Research area Image Recognition (or Identification)

Scenario identifying found bodies in a large-scale disaster

Challenge fast and cheap identification no DNA-databases or fingerprint

databases are at hand

Consistent Joins Under Primary KeyConstraints

Inconsistent database primary key

will the natural join of the repaired relations always be nonempty, no matter whichtuples are selected?

game theory, winning strategy

Outline

Uncertainty in Deep Web

No “perfect” data Noise Dirty Redundancy ……

No “perfect” solution Web data extraction Interface integration ……

Uncertainty in Deep Web Data Integration(1)

Query Translation

Resul ts Extraction

Data Merging

Integrated Interface

Deep Web

WDB Discovery

Interface Integration

RDBWeb DB

Web DB

Web DBWeb DB

Interface Schema Extraction

WDB Clustering

Query Process Modul e

I nterface I ntegrati on Modul e

WDB Selection

Query Submission

Resul ts Annotation

Resul t Process Modul e

•Robust•Evaluable

Uncertainty in Deep Web Data Integration(2)

Query Translation

Resul ts Extraction

Data Merging

Integrated Interface

Deep Web

WDB Discovery

Interface Integration

RDBWeb DB

Web DB

Web DBWeb DB

Interface Schema Extraction

WDB Clustering

Query Process Modul e

I nterface I ntegrati on Modul e

WDB Selection

Query Submission

Resul ts Annotation

Resul t Process Modul e

•Tuning•Feedback•Evaluable

Uncertainty in Jobtong(1)

Data level

Query level

How can we give every result a probability to show it’s importance?

The automatic maintenance of configuration files

<record><xpath>/html/body//table/tr[@class='nob']</xpath> <combination>2</combination> <items> <item> <name>title</name> <xpath>td[2]/a/span</xpath> </item> <item> <name>company</name> <xpath>td[3]/a/span</xpath> </item> </items></record>

<record> <xpath>/html/body//table/tr[@class='list2' or @class='list3']</xpath> <combination>2</combination> <items> <item> <name>title</name> <xpath>td[2]/a</xpath> </item> <item> <name>company</name> <xpath>td[3]/a</xpath> </item> </items></record>

Thank you!

Uncertainty in Data Integration Ai Jing 2007-11-10.

Documents

Transcript of Uncertainty in Data Integration Ai Jing 2007-11-10.

Photography Jing

Home - Jing Jing Thai Food - vegan 12oct 2017jingjing.com.au/wp-content/uploads/2017/10/Jing-Jing... · 2017-10-23 · Fish maw curry w grilled market fish, sweet potatoes & pumpkin.

Jing Tutorial

Jingle Jing

Jing presentation

Jing Workshop

Jing – overview

Jing Screencast

Reasoning Under Uncertainty - Εθνικόν και …cgi.di.uoa.gr/~artint2/STORAGE/lectures/2011-2012/...AI II Reasoning under Uncertainty ’ & $ % The Importance of Uncertainty

Ja-Da (Ja Da, Ja Da, Jing Jing Jing!).

Corporate Default with Chinese Characteristics · Corporate Default with Chinese Characteristics Jing Ai&, Warren Bailey*, Haoyu Gao$, Xiaoguang Yang, and Lin Zhao# 19th April 2017

Jing Jing Zhao · Jing Jing Zhao Managing the transition between GCSE and AS/A2 level 10th Annual Chinese Conference, IOE Confucius Institute 18 May 2013 Workshop example materials

Jing Video

Uncertainty · 2015-12-03 · CS 1571 Intro to AI M. Hauskrecht Uncertainty Two types of uncertainty: • Disease Symptoms uncertainty – A patient suffering from pneumonia may not

AI Principles, Lecture on Reasoning Under Uncertainty Reasoning Under Uncertainty (A statistical approach) Jeremy Wyatt.

Using Jing

In the Past Months, I spent a bit u ... - Jing Jing Thai Foodjingjing.com.au/wp-content/uploads/2017/10/Jing... · Jing Jing Breakfast Set 9.5 Two 63 degree eggs /w maggie sauce &

Zhao, jing

Global order and the (mis)perception of powerful AI M.L ...hal.pratt.duke.edu/sites/hal.pratt.duke.edu/files... · 2 The inability of AI to handle uncertainty raises serious questions

Uncertainty AI