KDD 2010 - Direct mining of discriminative patterns for classifying uncertain data
-
Upload
chuancong-gao -
Category
Technology
-
view
11 -
download
0
Transcript of KDD 2010 - Direct mining of discriminative patterns for classifying uncertain data
![Page 1: KDD 2010 - Direct mining of discriminative patterns for classifying uncertain data](https://reader037.fdocuments.us/reader037/viewer/2022083112/58ef72dd1a28ab53168b45b7/html5/thumbnails/1.jpg)
Direct Mining of Discriminative Patterns for Classifying Uncertain Data
Chuancong Gao, Jianyong Wang
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Uncertain Dataset
Evaluation Price Looking Tech. Spec. Quality
Unacceptable + - / {-: 0.8, /: 0.1, +: 0.1}
Acceptable / - / {-: 0.1, /: 0.8, +: 0.1}
Good - + / {-: 0.1, /: 0.8, +: 0.1}
Very Good / + + {-: 0.1, /: 0.1, +: 0.8}
0.8: Probability of Original Value on Certain Dataset
+: Good, /: Medium, -: Bad
Our Solution
• For each training instance find a set of most discriminative patterns, assuring the probability of being covered larger than threshold.
• For each selected pattern, generate a feature for each instance by whether the instance contains the pattern.
• Train a SVM classifier with generated features. • Classify all the testing instances.
Measure a Pattern’s Discriminative Power
Via the confidence value on each class label: • For patterns involving only Certain Attributes:
Calculate each confidence value directly. • For patterns involving at least one Uncertain Attributes:
Calculate the expected value of each confidence.
Definition of Expected Confidence
Given a set of transactions 𝑇 and the set of possible worlds 𝑊 w.r.t. 𝑇, the expected confidence of an itemset 𝑥 on class 𝑐 is:
where 𝑃 𝑤𝑖 is the probability of world 𝑤𝑖. 𝑐𝑜𝑛𝑓𝑥,𝑤𝑖
𝑐 is the confidence of 𝑥
on class 𝑐 in world 𝑤𝑖, while 𝑠𝑢𝑝𝑥,𝑤𝑖𝑐 (𝑠𝑢𝑝𝑥,𝑤𝑖
) is the support of 𝑥 (on class
𝑐) in world 𝑤𝑖.
Efficient Computation of Expected Confidence
#Transaction / n
Support / i
0 1 |T|
0
1
|T|
...
...
,| |( )c
i T xconfE
1,| |( )c
i T xconfE
1,| | 1( )c
i T xconfE
2
2
Stop Condition:
SkippedComputation in One Step Start of Next Step Explaination
_( )c
c cur db
i x maxbound conf conf
Accuracy Evaluation
Uncertain Degree # Uncertain Attr. Avg. Accuracy on 30 UCI Datasets
10% 1 79.0138% 74.8738% 75.2111%
2 78.6970% 73.1629% 73.4107%
4 77.9657% 72.2670% 69.4649%
20% 1 78.9537% 74.6577% 74.6287%
2 78.6073% 72.5642% 72.5460%
4 77.8352% 69.9157% 68.2066%
Ours DTU [1] uRule [2]
References
[1] B. Qin, Y. Xia, and F. Li. DTU: A decision tree for uncertain data. PAKDD’09. [2] B. Qin, Y. Xia, S. Prabhakar, and Y.-C. Tu. A rule-based classification algorithm for uncertain data. ICDE’09 MOUND Workshop.