Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision...
Transcript of Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision...
![Page 1: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/1.jpg)
Artificial IntelligenceLearning: decision lists, evaluation, Naive
Bayesian networks
Peter [email protected]
September 26, 2016 1A.I.
![Page 2: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/2.jpg)
Algorithms for concept learning◦ Best vs. version space
PAC-learning for decision lists
The evaluation of performance
From predictions to optimal decisions
Learning Naiv Bayesian networks
September 26, 2016A.I. 2
![Page 3: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/3.jpg)
Each model specifies true/false for each proposition symbol
E.g. P1,2 P2,2 P3,1false true false
With these symbols, 8 possible models, can be enumerated automatically.
Rules for evaluating truth with respect to a model m:
S is true iff S is false S1 S2 is true iff S1 is true and S2 is trueS1 S2 is true iff S1is true or S2 is trueS1 S2 is true iff S1 is false or S2 is truei.e., is false iff S1 is true and S2 is falseS1 S2 is true iff S1S2 is true andS2S1 is true
Simple recursive process evaluates an arbitrary sentence, e.g.,
P1,2 (P2,2 P3,1) = true (true false) = true true = true
9/26/2016 3A.I.
![Page 4: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/4.jpg)
9/26/2016 4A.I.
![Page 5: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/5.jpg)
Two sentences are logically equivalent} iff true in same models: α ≡ ß iff α╞ β and β╞ α
9/26/2016 5A.I.
![Page 6: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/6.jpg)
B1,1 (P1,2 P2,1)β
1. Eliminate , replacing α β with (α β)(β α).2.
(B1,1 (P1,2 P2,1)) ((P1,2 P2,1) B1,1)
2. Eliminate , replacing α β with α β.
(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)
3. Move inwards using de Morgan's rules and double-negation:
(B1,1 P1,2 P2,1) ((P1,2 P2,1) B1,1)
4. Apply distributivity law ( over ) and flatten:
(B1,1 P1,2 P2,1) (P1,2 B1,1) (P2,1 B1,1)
9/26/2016 6A.I.
![Page 7: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/7.jpg)
Goal: selection of a logical function f: {0,1}n→{0,1} from a function class C,
which is consistent with the data DN={(x1, y1),..,(xN, yN)}, i.e. for i=1..N: f(xi)= yi.
Predicted Ref.:0 Ref.1
0 True negative (TN)
False negative (FN)
1 False positive
(FP)
True positive
(TP)
Learning method:True negative/ True positive: -False negative: generalizeFalse positive: specialize
![Page 8: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/8.jpg)
False negative: generalization
◦ Replace A B to A
◦ Replace A to A B
False positive: specialization
◦ Replace A to A B
◦ Replace A B to A
September 26, 2016A.I. 8
+ + +
+ + +
+ + +
+ + +
+ + +
+ + -
+ + +
+ + -
![Page 9: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/9.jpg)
Bound the set of consistent hypotheses with two limiting sets:◦ S: the set of most specific consistent hypotheses
◦ G: the set of most general consistent hypotheses
Learning from (xi, yi): update Si and Gi
◦ For each hypothesis in Si:
FP: delete
FN: generalize to all neigbours
◦ For each hypothesis in Gi:
FP: specialize to all neighbours
FN: delete
September 26, 2016A.I. 9
Sp
ecia
lg
en
era
l
![Page 10: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/10.jpg)
One possible representation for hypotheses
E.g., here is the “true” tree for deciding whether to wait:
![Page 11: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/11.jpg)
How many distinct decision trees with n Boolean attributes?= number of Boolean functions= number of distinct truth tables with 2n rows = 22n
E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees
How many purely conjunctive hypotheses (e.g., Hungry Rain)?
Each attribute can be in (positive), in (negative), or out 3n distinct conjunctive hypotheses
More expressive hypothesis space◦ increases chance that target function can be expressed◦ increases number of hypotheses consistent with training set
may get worse predictions
![Page 12: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/12.jpg)
Sequential k tests using n attributes: k-DL(n)
Number of tests:
Number of test sequences:
Number of decision lists:
September 26, 2016A.I. 12
),(3
knConj
!),(3)(DL),(
knConjnkknConj
k
i
knOi
nknConj
0
)(2
),(
![Page 13: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/13.jpg)
Number of decision lists:
PAC sample complexity:
September 26, 2016A.I. 13
))(log( 22)(DLkk nnOnk
)))(log(1
(ln1
2
kk nnOm
![Page 14: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/14.jpg)
Sensitivity: p(Prediction=TRUE|Ref=TRUE)
Specificity: p(Prediction=FALSE|Ref=FALSE)
PPV: p(Ref=TRUE|Prediction=TRUE)
NPV: p(Ref=FALSE|Prediction=FALSE)
![Page 15: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/15.jpg)
Mutation
Onset
Bleeding
absent
P(D|a,l,m)
Regularity
weak
Onset=early Onset=late
h.wild
regular irregular
mutated
P(D|a,l,h.w.)
P(D|a,e)
strong
P(D|Bleeding=strong)
Mutation
P(D|w,i,m)
h.wild mutated
P(D|w,i,h.w.)
P(D|w,r)
Decision tree: Each internal node represent a (univariate) test, the leafs contains
the conditional probabilities given the values along the path.
Decision graph: If conditions are equivalent, then subtrees can be merged.
E.g. If (Bleeding=absent,Onset=late) ~ (Bleeding=weak,Regularity=irreg)
![Page 16: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/16.jpg)
Healthy Disease present
threshold t
![Page 17: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/17.jpg)
a1
a0
o0
o1
o0
o1
reported Ref.:0 Ref.1
0 C0|0 C0|1
1 C1|0 C1|1
![Page 18: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/18.jpg)
Variables (nodes) Flu: present/absent
FeverAbove38C: present/absent
Coughing: present/absent
Flu
Fever Coughing
P(Fever=present|Flu=present)=0.6
P(Fever=absent|Flu=present)=1-0.6
P(Fever=present|Flu=absent)=0.01
P(Fever=absent|Flu=absent)=1-0.01
P(Flu=present)=0.001
P(Flu=absent)=1-P(Flu=present)Model
P(Coughing=present|Flu=present)=0.3
P(Coughing=absent|Flu=present)=1-0.7
P(Coughing=present|Flu=absent)=0.02
P(Coughing=absent|Flu=absent)=1-0.02
Assumptions:
1, Two types of nodes: a cause and effects.
2, Effects are conditionally independent of each other given their cause.
![Page 19: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/19.jpg)
Decomposition of the joint:
P(Y,X1,..,Xn) = P(Y)∏iP(Xi,|Y, X1,..,Xi-1) //by the chain rule
= P(Y)∏iP(Xi,|Y) // by the N-BN assumption
2n+1 parameteres!
Diagnostic inference:
P(Y|xi1,..,xik) = P(Y)∏jP(xij,|Y) / P(xi1,..,xik)
If Y is binary, then the oddsP(Y=1|xi1,..,xik) / P(Y=0|xi1,..,xik) = P(Y=1)/P(Y=0) ∏j P(xij,|Y=1) / P(xij,|Y=0)
Flu
Fever Coughing
)|()|()(
),|(
presentFlupresentCoughingppresentFluabsentFeverppresentFlup
presentCoughingabsentFeverpresentFlup
![Page 20: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/20.jpg)
9/26/2016A.I. 20
![Page 21: Artificial Intelligence Learning: decision lists ... · Artificial Intelligence Learning: decision lists, evaluation, Naive Bayesian networks Peter Antal antal@mit.bme.hu A.I. September](https://reader031.fdocuments.us/reader031/viewer/2022022117/5cb2b06e88c9931b1e8c29d3/html5/thumbnails/21.jpg)
Naive concept learning
Learning decision lists
Decision trees and graphs
Optimal decisions
Error types in classification
Cost-free performance measures
Naive Bayesian network classifiers
September 26, 2016A.I. 21