Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final...
-
date post
20-Dec-2015 -
Category
Documents
-
view
223 -
download
1
Transcript of Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final...
Comparing between machine learning methods for a remote monitoring system.
Ronit ZrahiaFinal ProjectTel-Aviv University
Overview
The remote monitoring systemThe project databaseMachine learning methods:
Decision of Association Rules Inductive Logic Programming Decision Tree
Applying the methods for project database and comparing the results
Remote Monitoring System - Description
Support Center has ongoing information on customer’s equipment
Support Center can, in some situations, know that customer is going to be in trouble
Support Center initiates a call to the customer
Specialist connects to site from remote and tries to eliminate problem before it has influence
Remote Monitoring System - Description
GatewayProduct
AIX/NT
Customer
TCP/IP [FTP]
TCP/IP [Mail/FTP]
Support Server
AIX/NT/95
Modem
Modem
Remote Monitoring System - Technique
One of the machines on site, the Gateway, is able to initiate a PPP connection to the support server or to ISP
All the Products on site have a TCP/IP connection to the Gateway
Background tasks on each Product collect relevant information
The data collected from all Products is transferred to the Gateway via ftp
The Gateway automatically dials to the support server or ISP, and sends the data to the subsidiary
The received data is then imported to database
Project Database
12 columns, 300 recordsEach record includes failure
information of one product at a specific customer site
The columns are: record no., date, IP address, operating system, customer ID, product, release, product ID, category of application, application, severity, type of service contract
Project Goals
Discover valuable information from database
Improve the products marketing and the customer support of the company
Learn different learning methods, and use them for the project database
Compare the different methods, based on the results
Discovery of Association Rules - Goals
Finding relations between products which are bought by the customers Impacts on product marketing
Finding relations between failures in a specific product Impacts on customer support (failures
can be predicted and handled before influences)
Discovery of Association Rules - Definition
A technique developed specifically for data mining Given
A dataset of customer transactionsA transaction is a collection of items
FindCorrelations between items as rules
Example Supermarket baskets
Determining Interesting Association Rules
Rules have confidence and support IF x and y THEN z with confidence c
If x and y are in the basket, then so is z in c% of cases
IF x and y THEN z with support sThe rule holds in s% of all transactions
Discovery of Association Rules - Example
Input Parameters: confidence=50%; support=50%
If A then C: c=66.6% s=50% If C then A: c=100% s=50%
Transaction
Items
12345 A B C
12346 A C
12347 A D
12348 B E F
Itemsets are Basis of Algorithm
Rule A => C s=s(A, C) = 50% c=s(A, C)/s(A) = 66.6%
Transaction
Items
12345 A B C
12346 A C
12347 A D
12348 B E F
Itemset Support
A 75%
B 50%
C 50%
A, C 50%
Algorithm Outline
Find all large itemsets Sets of items with at least minimum
support Apriori algorithm
Generate rules from large itemsets For ABCD and AB in large itemset the rule
AB=>CD holds if ratio s(ABCD)/s(AB) is large enough
This ratio is the confidence of the rule
Pseudo Algorithm
k
kk
k
kk
k
L
CCL
C
Tt
LC
kLk
L
Answer (8)
(7)
} minsupc.count | { (6)
t),subset( (5)
ons transactiall (4)
)n(apriori_ge (3)
do ) ; ;2( (2)
} sets-item-1frequent { (1)
1
1
1
end
for
beginfor
Relations Between Products
17 / 19 = 0.89 2, 3 6
17 / 20 = 0.85 6 2, 3
17 / 17 = 1 2, 6 3
17 / 24 = 0.71 3 2, 6
17 / 20 = 0.85 3, 6 2
2 – 3 - 6 17 / 19 = 0.89 2 3, 6
20 / 20 = 1 6 3
3 - 6 20 / 24 = 0.83 3 6
17 / 20 = 0.85 6 2
2 - 6 17 / 19 = 0.89 2 6
19 / 24 = 0.79 3 2
2 – 3 19 / 19 = 1 2 3
21 / 21 = 1 9 1
1 - 9 21 / 24 = 0.875 1 9
18 / 24 = 0.75 3 1
1 - 3 18 / 24 = 0.75 1 3
Item Set ( L ) Confidence ( CF ) Association Rules
17 / 19 = 0.89 2, 3 6
17 / 20 = 0.85 6 2, 3
17 / 17 = 1 2, 6 3
17 / 24 = 0.71 3 2, 6
17 / 20 = 0.85 3, 6 2
2 – 3 - 6 17 / 19 = 0.89 2 3, 6
20 / 20 = 1 6 3
3 - 6 20 / 24 = 0.83 3 6
17 / 20 = 0.85 6 2
2 - 6 17 / 19 = 0.89 2 6
19 / 24 = 0.79 3 2
2 – 3 19 / 19 = 1 2 3
21 / 21 = 1 9 1
1 - 9 21 / 24 = 0.875 1 9
18 / 24 = 0.75 3 1
1 - 3 18 / 24 = 0.75 1 3
Item Set ( L ) Confidence ( CF ) Association Rules
Relations Between Failures
Item Set ( L ) Confidence ( CF ) Association Rules
4-6 14 / 16 = 0.875 4 6
14 / 15 = 0.93 6 4
5-10 15 / 18 = 0.83 5 10
15 / 15 = 1 10 5
Inductive Logic Programming - Goals
Finding the preferred customers, based on: The number of products bought by the
customer The failures types (i.e severity level)
occurred in the products
Inductive Logic Programming - Definition
Inductive construction of first-order clausal theories from examples and background knowledge
The aim is to discover, from a given set of pre-classified examples, a set of classification rules with high predictive power
Examples: IF Outlook=Sunny AND Humidity=High THEN
PlayTennis=No
Horn clause induction
Given:P: ground facts to be entailed (positive examples);N: ground facts not to be entailed (negative examples);B: a set of predicate definitions (background theory);L: the hypothesis language;
Find a predicate definition (hypothesis) such that1. for every (completeness)2. for every (consistency)
LH p|HB :Pp
n|HB :Nn
Inductive Logic Programming - Example
Learning about the relationships between people in a family circle
),(
),(
),(
),(),,(),(
alicejanemother
johnjanemother
janehenryfather
YZparentZXfatherYXrgrandfathe
B
),(
),(
alicehenryrgrandfathe
johnhenryrgrandfatheE
),(
),(
johnalicergrandfathe
henryjohnrgrandfatheE
),(),( YXmotherYXparentH
Algorithm Outline
A space of candidate solutions and an acceptance criterion characterizing solutions to an ILP problem
The search space is typically structured by means of the dual notions of generalization (induction) and specialization (deduction) A deductive inference rule maps a conjunction of clauses G
onto a conjunction of clauses S such that G is more general than S
An inductive inference rule maps a conjunction of clauses S onto a conjunction of clauses G such that G is more general than S.
Pruning Principle: When B and H don’t include positive example, then
specializations of H can be pruned from the search When B and H include negative example, then
generalizations of H can be pruned from the search
Pseudo Algorithm
satisfiedQH
QH
QH to HH Add
HHH yield to H to rr rules theApply
H to applied be to Rrr rules inference the
QHfromH
QH
n
nk1
k
)criterion(-stop
Prune
,...,
,...,,,...,
,..., Choose
Delete
Initialize:
1
21
1
until
repeat
The preferred customers
17%
83%
Preferred Customers
Others
If ( Total_Products_Types( Customer ) > 5 )and ( All_Severity(Customer) < 3 ) then
Preferred_Customer
Decision Trees - Goals
Finding the preferred customersFinding relations between products
which are bought by the customersFinding relations between failures in
a specific productCompare the Decision Tree results to
the previous algorithms results.
Decision Trees - Definition
Decision tree representation: Each internal node tests an attribute Each branch corresponds to attribute value Each leaf node assigns a classification
Occam’s razor: prefer the shortest hypothesis that fits the data
Examples: Equipment or medical diagnosis Credit risk analysis
Algorithm outline
A the “best” decision attribute for next node
Assign A as decision attribute for nodeFor each value of A, create new
descendant of nodeSort training examples to leaf nodesIf training examples perfectly classified,
Then STOP, Else iterate over new leaf nodes
Pseudo algorithm
Root
AAttributesributeTarget_attExamples
Examples ributeTarget_att
Examples
AvExamplesExamples
vARoot
Av
ARoot
gain ninformatio
ExamplesAttributesA
ExamplesributeTarget_att
Root
Attributes
Root
Examples
Root
AttributesributeTarget_attExamples
i
i
i
v
v
iv
i
i
Return
End
}){ ,,ID3(
subtree theaddbranch new thisbelow Else
in of uecommon val
most label with node leaf a addbranch new thisbelowThen
empty is If
for valuehave that ofsubset thebe Let
test the toingcorrespond , belowbranch treenew a Add
, of , value,possibleeach For
for attributedecision The
) highest with theattribute thei.e (
classifies that from attribute the
Begin Otherwise
in of value
common most label with , treenode-single Return the
empty, is If
clabel with , treenode-single Return the
C, class same in the are all If
treefor the node a Create
) , ,ID3(
best
Information Measure
Entropy measures the impurity of the sample of training examples S : is the probability of making a particular decision There are c possible decisions
The entropy is the amount of information needed to identify class of an object in S Maximized when all are equal Minimized (0) when all but one is 0 (the
remaining is 1)
c
1ii2i plogpEntropy(S)
ip
ip
ip
ip
Information Measure
Estimate the gain in information from a particular partitioning of the dataset
Gain(S, A) = expected reduction in entropy due to sorting on A
The information that is gained by partitioning S is then:
The gain criterion can then be used to select the partition which maximizes information gain
Values(A)v
vv )Entropy(S
SS
Entropy(S)A)Gain(S,
Decision Tree - Example
Day Outlook Temperature Humidity Wind PlayTennis
D1 sunny hot high weak No
D2 sunny hot high strong No
D3 overcast hot high weak Yes
D4 rain mild high weak Yes
D5 rain cool normal weak Yes
D6 rain cool normal strong No
D7 overcast cool normal strong Yes
D8 sunny mild high weak No
D9 sunny cool normal weak Yes
D10 rain mild normal weak Yes
D11 sunny mild normal strong Yes
D12 overcast mild high strong Yes
D13 overcast hot normal weak Yes
D14 rain mild high strong No
Decision Tree - Example (Continue)
humidity wind
high weaknormal strong
N P
S: [9+,5-]E=0.940
S: [9+,5-]E=0.940
[6+,2-]E=0.811
[3+,3-]E=1.00
Gain (S, Wind) = .940 - (8/14).811 - (6/14)1.0
= .048
[3+,4-]E=0.985
[6+,1-]E=0.592
Gain (S, Humidity) = .940 - (7/14).985 - (7/14).592
= .151
Which attribute is the best classifier?
Gain(S, Outlook) = 0.246
Gain(S, Temperature) = 0.029
Decision Tree Example – (Continue)
outlook
?
sunny overcast rain
Yes
{D1, D2, …, D14}[9+,5-]
{D4,D5,D6,D10,D14}[3+,2-]
{D1,D2,D8,D9,D11}[2+,3-]
{D3,D7,D12,D13}[4+,0-]
?
Ssunny = {D1,D2,D8,D9,D11} Gain(Ssunny, Humidity) = .970 – (3/5)0.0 – (2/5)0.0 = .970 Gain(Ssunny, Temperature) = .970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 = .570 Gain(Ssunny, Wind) = .970 – (2/5)1.0 – (3/5).918 = .019
Decision Tree Example – (Continue)
outlook
humidity wind
sunny overcast rain
Yes
high strongnormal weak
No Yes No Yes
Overfitting
The tree may not be generally applicable called overfitting
How can we avoid overfitting? Stop growing when data split not statistically
significant Grow full tree, then post-prun
The post-pruning approach is more commonHow to select “best” tree:
Measure performance over training data Measure performance over separate validation
data set
Reduced-Error Pruning
Split data into training and validation set Do until further pruning is harmful:
1. Evaluate impact on validation set of pruning each possible node (plus those below it)
2. Greedily remove the one that most improves validation set accuracy
• Produces smallest version of most accurate sub-tree
The Preferred Customer
NO: 7YES: 0
NO: 0YES: 3
NoOfProducts
< 2.5 >= 2.5
MaxSev
< 4.5 >= 4.5
NO: 3YES: 8
Target attribute is TypeOfServiceContract
Relations Between Products
NO: 0YES: 1
NO: 4YES: 0
Product2
Product9
0 1
0 1
Product6
0 1
NO: 0YES: 15
NO: 0YES: 1
Target attribute is Product3