[ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information
description
Transcript of [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information
![Page 1: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/1.jpg)
A Probabilistic Relational Data
Model for Uncertain Information
Nguyen Hoa and Tran Duc Hieu
IEEE 2013 the 3rd International Conference on Information Science and Technology (ICIST 2013)
March23-25, Yangzhou, Jiangsu, China & March 27-28, Phuket, Thailand
Reporter: Tran Duc Hieu Department for Computational and Knowledge Engineering
Institute of Applied Mechanics and Informatics
Vietnam Academy of Science and Technology
![Page 2: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/2.jpg)
Contents
Introduction 1
Uncertain Attribute Values 2
Probabilistic Relational Data Base model 3
Selection Operation 4
PRDB Management System 3
Conclusions and Future Works 4
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 2
![Page 3: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/3.jpg)
Introduction
• Motivation The restriction of Traditional Relational Database
(RDB) in representing and handling uncertain and
imprecise information
Uncertain or imprecise information is very
important and also very popular in daily life
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 3
![Page 4: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/4.jpg)
Introduction
• Objectives Build a new Probabilistic Relational Data Base
(PRDB) model to represent and handle uncertain
information in the real world
Build an initial PRDB-SQLite Management System
to demonstrate the ability to apply and process of
PRDB in reality
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 4
![Page 5: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/5.jpg)
Some Probabilistic Combination Strategies
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 5
Strategy Operators
Independence ([L1, U1] in [L2, U2]) [L1 . L2, U1 . U2]
([L1, U1] in [L2, U2]) [L1 + L2 – (L1 . L2), U1 + U2 – (U1 . U2)]
([L1, U1] ⊖in [L2, U2]) [L1 . (1 – U2), U1 . (1– L2)]
Mutual Exclusion ([L1, U1] me [L2, U2]) [0, 0]
([L1, U1] me [L2, U2]) [min(1, L1 + L2), min(1, U1 + U2)]
([L1, U1] ⊖me [L2, U2]) [L1, min(U1, 1 – L2)]
• Prob(e1) = [L1, U1], prob(e2) = [L2, U2]
Prob(e1 e2), Prob(e1 e2), Prob(e1 e2) is calculated as followed
![Page 6: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/6.jpg)
Uncertain Attribute Values
• In PRDB the value of each attribute is a probabilistic triple
A: V, α, β
V: a set of values of the atribute A ( V = {v1, v2,…,vk} )
α, β: lower bound and upper bound probabilistic
distribution on V
The attribute A take a value v in V with a probability
belongs to [α(v), β(v)]
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 6
![Page 7: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/7.jpg)
PRDB Model
PRDB model is extended from RDB model by
integrating uncertain attribute values
Each tuple of a relation is a list of probabilistic triples
t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 7
![Page 8: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/8.jpg)
Probabilistic Relations
A probabilistic relation r over a probabilistic relational
schema R(A1, A2, …, Ak) is
r = {t t = (V1, 1, 1, V2, 2, 2,…, Vk, k, k)}
Example
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 8
PATIENT_ID PHYSICIAN_ID DISEASE DURATION
PT0421, u, u DT005, u, u lung cancer, tuberculosis, 0.8u, 1.2u 400, 500, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u
![Page 9: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/9.jpg)
Probabilistic Functional Dependencies
The probabilistic measure for equal attribute values
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 9
where t1.A = V1, 1, 1, t2.A = V2, 2, 2 and
[(v), (v)] = [1(v1), 1(v1)] [2(v2), 2(v2)],
v W = (v1, v2) V1 V2 v1 = v2
[vW (v), min(1, vW (v))], if W
prob(t1.A = t2.A) =
[0, 0], if W =
![Page 10: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/10.jpg)
Probabilistic Functional Dependencies
The probabilistic functional dependency in PRDB is
extended from the functional dependency in RDB
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 10
t1, t2 r, prob(t1[X] = t2[X]) prob(t1[Y] = t2[Y])
![Page 11: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/11.jpg)
Selection Expressions
x.A v x X, A is an attribute in R, is a
binary relation from =, , , , , ≥
and v is a value
x.A1 = x.A2 is a probabilistic conjunction strategy of
combining the probabilities for x.A1 = v1
and x.A2 = v2 so that v1 = v2
E1 E2 E1 and E2 are selection expressions
E1 E2 is a probabilistic disjunction strategy
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 11
![Page 12: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/12.jpg)
Selection Expressions
Example Relation DIAGNOSE
Selection expression
(x.DURATION 40) (x.COST 60)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 12
PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST
PT0421, u, u DT005, u, u lung cancer, tuberculosis,
0.8u, 1.2u 400, 500, u, u 300, 350, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u
![Page 13: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/13.jpg)
Selection Conditions
(E)[L, U] E is a selection expression [L, U] is an
probabilistic interval
( ) and are selection conditions
( )
Example
((x.DURATION 40) (x.COST 60))[0.4, 0.6])
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 13
![Page 14: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/14.jpg)
Probabilistic Interpretation of Selection Expressions
probt(E) is a probabilistic interval for a tuple t to satisfy
selection expression E
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 14
![Page 15: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/15.jpg)
Probabilistic Interpretation of Selection Expressions
Probt (x.A d) = [vW (v), min(1, vW (v))]
Probt (x.A1 = x.A2) = [vW (v), min(1, vW (v))]
Probt (E1 E2) = probt (E1) probt (E2)
Probt (E1 E2) = probt (E1) probt (E2)
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 15
![Page 16: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/16.jpg)
Satisfaction of Selection Conditions
Probt ⊨ (E)[L, U] if and only if probt(E) [L, U]
Probt ⊨ if and only if probt ⊨ does not hold
Probt ⊨ if and only if probt ⊨ and probt ⊨
Probt ⊨ if and only if probt ⊨ or probt ⊨
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 16
![Page 17: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/17.jpg)
Selection Operation
The selection on a relation r with respect selection
condition
(r) = t r | probR,r,t ⊨
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 17
![Page 18: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/18.jpg)
Selection Operation
Example Relation DIAGNOSE
Selection operation on DIAGNOSE with the selection condition
(x.DISEASE = hepatitis in x.COST 70)[0.4, 0.6])
is t =
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 18
PATIENT_ID PHYSICIAN_ID DISEASE DURATION COST
PT0421, u, u DT005, u, u lung cancer, tuberculosis,
0.8u, 1.2u 400, 500, u, u 300, 350, u, u
PT3829, u, u DT093, u, u hepatitis, cirrhosis, u, u 30, 40, u, u {60, 70}, u, u
PT2938, u, u DT102, u, u hepatitis, u, u 30, u, u {60}, u, u
![Page 19: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/19.jpg)
PRDB-SQLite Architecture
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 19
![Page 20: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/20.jpg)
PRDB-SQLite Schema
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 20
![Page 21: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/21.jpg)
PRDB-SQLite Relation
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 21
![Page 22: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/22.jpg)
PRDB-SQLite Query
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 22
![Page 23: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/23.jpg)
Conclusions & Future Works
• Conclusions Building a new PRDB model which is extended from RDB
model
Uncertain values in PRDB model are represented by a
probabilistic triple
The notions of schema, relation, functional dependency,
and selection on PRDB are respectively defined
Implement a simple visual management system for PRDB
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 23
![Page 24: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/24.jpg)
Conclusions & Future Works
• Future Works Build all other relational algebra operations on PRDB
Build a complete database management system for PRDB
Integrate fuzzy set value into the attribute value to build a
fuzzy and probabilistic relational data base
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 24
![Page 25: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/25.jpg)
Any questions for us ?
March 27-28th 2013 in Kathu, Phuket, Thailand Sea Pearl Villa Resort
![Page 26: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/26.jpg)
References
[1] R. Cavallo, M. Pittarelli, “The theory of probabilistic databases”, in Proc. 13th International
Conf. on Very Large Data Bases, Brighton, England, 1987, pp. 71-81.
[2] E. F. Codd, “A Relational model of data for large shared data banks”, Communications of
the Association for Computing Machinery, vol. 13,June. 1970, pp. 377-387.
[3] N. Fuhr, T. Rolleke, “A probabilistic relational algebra for the integration of information
retrieval and database systems”, Association for Computing Machinery Transactions on
Information Systems, vol. 15, Jan. 1997, pp. 32-66.
[4] T. Eiter, T. Lukasiewicz, and M. Walter, “Extension of the relational algebra to probabilistic
complex values”, in Proc. 1th International Symposium on Foundations of Information and
Knowledge System, Burg, Germany, 2000, 1762, pp. 95-115.
[5] T. Eiter, J. J. Lu, T. Lukasiewicz, and V. S. Subrahmanian, “Probabilistic object bases”,
Association for Computing Machinery Transactions on Database Systems, vol. 26, 2001, pp.
264–312.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 26
![Page 27: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/27.jpg)
References
[6] H. Garcia-Molina, J. D. Ullman, J. Widom, Database systems: the complete book, 2nd ed.,
Prentice Hall, Upper Saddle River, New Jersey, 2002.
[7] T. Imielinski, Jr. W. Lipski, “Incomplete Information in Relational Databases”, Journal of the
Association for Computing Machinery, vol. 31 issue 4, Oct. 1984, pp. 761-791.
[8] L. V. S. Lakshmanan, N. Leone, R. Ross, V. S. Subrahmanian, “Probview: A flexible
probabilistic database system”, Association for Computing Machinery Transactions on Database
Systems, vol. 22, 1997, pp. 419-469.
[9] H. Nguyen, T. H. Cao, “Extending probabilistic object bases with uncertain applicability and
imprecise values of class properties”, in Proc. 5th IEEE International Conf. on Fuzzy Systems,
London, England, 2007, pp. 487-492.
[10] T. H. Cao, H. Nguyen, “Uncertain and fuzzy object bases: a data model and Algebraic
operations”, International Journal of Transaction on Fuzzy Systems, 2011, pp. 275-305.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 27
![Page 28: [ICIST 2013] A Probabilistic Relational Data Model for Uncertain Information](https://reader033.fdocuments.us/reader033/viewer/2022051819/54c60b7f4a7959a5228b45a5/html5/thumbnails/28.jpg)
References
[11] H. D. Tran, “Constructing A Probabilistic Relational Data Base”, B.A. thesis, Dept.
Information. Tech., Ho Chi Minh City Open Univ., Ho Chi Minh City, Vietnam, 2010.
[12] W. Zhao, A. Dekhtyar, J. Goldsmith, “Databases for interval probabilities”, International
Journal of Intelligent Systems, vol. 19, 2009, pp. 789–815.
[13] W. Zhao, A. Dekhtyar, J. Goldsmith, “Query algebra operations for interval probabilities”,
in Proc. 14th International Conf. on Database and Expert Systems Applications, Prague, Czech
Republic, 2003, pp. 527-536.
March 28th, 2013 Nguyen Hoa & Tran Duc Hieu 28