COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

34
COMP 2208 Dr. Long Tran-Thanh [email protected] University of Southampton Decision Trees

description

Recognizing the type of situation you are in right now is a basic agent task: Classification Robotics: misidentifying a human body with some part of a car on the assembly line would be disastrous Military: friend or foo? Electric card usage: was it a fraud or not?

Transcript of COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Page 1: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

COMP 2208

Dr. Long [email protected]

University of Southampton

Decision Trees

Page 2: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Classification

Environment

Perception

Behaviour

Categorize inputs Update

belief model

Update decision making policy

Decision making

Perception

Behaviour

Page 3: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Recognizing the type of situation you are in right now is a basic agent task:

Classification

• Robotics: misidentifying a human body with some part of a car on the assembly line would be disastrous

• Military: friend or foo?

• Electric card usage: was it a fraud or not?

Page 4: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Last lecture: neural networks

Why more classification methods?

• Very powerful in theory

• Promising direction: deep learning

• Still difficult to fully control the technology

• In many cases: other techniques are more efficient

Occam’s razor: the simpler (the model) the better (the performance is) – go for something more complicated only if it’s really necessary

In many real-world problems, data cleaning is the most important step – after that, a simple classification method would do the job

Page 5: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Classification

Classification Algorithm

Bottom up: inspiration from biology - e.g., neural networks

Classification Algorithm

Top down: inspiration from higher abstraction levels

Page 6: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Prof or hobo 1?

http://individual.utoronto.ca/somody/quiz.html

Page 7: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Prof or hobo 2?

http://individual.utoronto.ca/somody/quiz.html

Page 8: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Prof or hobo 3?

http://individual.utoronto.ca/somody/quiz.html

Page 9: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Prof or hobo answers

http://individual.utoronto.ca/somody/quiz.html

Hobo HoboProfessor

Page 10: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Back to classification

Classification Algorithm

Different ways to go:

Honey? Fired? Evil plan?

Page 11: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Back to classification

Classification Algorithm

Some classification algorithms:

Logistic regression

Support vector machines (SVMs)

Decision trees + its family

• Easy to understand• (Relatively) easy to implement• Vey efficient in many cases

Page 12: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Decision making process

Did it go well?

Did it go well?

Yes

Yes

No

No

Page 13: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

What are the clues that allow you to distinguish a prof from a hobo?

• Clothes people are wearing

• Their eyes

• The beard

• …

Back to the “Prof or hobo” quiz

Main idea: checking out some properties in some order

Page 14: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Classification with decision trees

• A decision tree takes a series of inputs defining a situation, and

outputs a binary decision/classification.

• A decision tree spells out an order for checking the properties

(attributes) of the situation until we have enough information to

decide what's going on.

• We use the observable attributes to predict the outcome (or some

important hidden or unknown quantity).

Question: what is the optimal (efficient) order of the attributes?

Page 15: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

The importance of the ordering

• Think about the “20 questions” game: inefficient questions will lead to low performance

• Think about binary search:

• Optimal: always halve the interval

• Decision trees are very simple to produce if we already know the underlying rules.

• But what we don’t have the rules, just past examples (experience)?

Page 16: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Often we don't know in advance how to classify things, and want our agent to learn from examples.

Our objective

Page 17: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Which attribute to start with?The order of attributes is still very important

Idea: choose the next attribute whose value can reduce the uncertainty about the outcome of the classification the most

What does it mean when we say that something reduces the uncertainty in our knowledge?

Reducing uncertainty (in knowledge) = increase (known) information

So we should choose the attribute that provides the highest information gain

Page 18: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

EntropyHow to measure information gain (and how to define it)?

Answer: borrow similar concepts from information & coding theory

Entropy (Shannon, 1948):

• A measure of the amount of disorder or uncertainty in a system. • A tidy room has low entropy: You can be reasonably certain your

keys are on the hook you made for them. • A messy room has high entropy: things are all over the place and

your keys could be absolutely anywhere.

Page 19: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Input X Output Y

Entropy

Uncertainty about the outcome

Classification:

Entropy (Shannon, 1948):

How often Y =y Measure of information (surprise) when Y = y

(in bits)

Page 20: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Entropy example

Good OK Terrible

Birmingham 0.33 0.33 0.33

Southampton 0.3 0.6 0.1

Glasgow 0 0 1

Weather:

Page 21: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Entropy example

Birmingham P(x) logP(x) - P(x)logP(x)

Good 0.33 -1.58 0.53

OK 0.33 -1.58 0.53

Terrible 0.33 -1.58 0.53

Sum = 1.58 (bits)

Page 22: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Entropy example

Southampton P(x) logP(x) - P(x)logP(x)

Good 0.3 -1.74 0.52

OK 0.6 -0.74 0.44

Terrible 0.1 -3.32 0.33

Sum = 1.29 (bits)

Page 23: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Entropy example

Glasgow P(x) logP(x) - P(x)logP(x)

Good 0 -infinity 0

OK 0 -infinity 0

Terrible 1 0 0

Sum = 0 (bits)

When we are certain, the entropy is 0

Page 24: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Conditional entropy

Input X Output Y

Classification:

Entropy measures the uncertainty of a given state of the system

How to measure the change?

Conditional entropy:Joint probability

Conditional probability

• How much uncertainty would remain about the outcome Y if we knew (for instance) the outcome of attribute X

Page 25: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Information gain

Information gain:

Current level of uncertainty(entropy)

Possible new level of uncertainty

(conditional entropy)

• The difference represents how much uncertainty would decrease

Page 26: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Building a decision tree

• Split the tree on the attribute with the highest information gain. Then repeat.

Stopping Conditions: • Don't split if all matching records have same output value (no point,

we know what happens!). • Don't split if all matching records have same attribute values (no

point, we can't distinguish them).

Recursive algorithm:

Page 27: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Example: Predicting the importance of emails

Objective: predict whether the user will read the email

Page 28: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

18 emails: 8 read, 8 skipped

“Thread” attribute:

Reads Skips Row total

new_thread 7 (70%) 3 (30%) 10

follow_up 2 (25%) 6 (75%) 8

Example: Predicting the importance of emails

What is the information gain if we choose “Thread” ?

Calculation steps:

• Calculate H(Read)• Calculate H(Read | Thread)• Calculate G(Read, Thread) = H(Read) – H(Read | Thread)

Page 29: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Example: Predicting the importance of emails Calculating H(Read)

• 18 emails: 8 read, 8 skipped

• P(Read = True) = P(Read = False) = 0.5

• H(Read) = -(0.5*log2(0.5) + 0.5*log2(0.5)) = 1 (bit)

Page 30: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Example: Predicting the importance of emails Calculating H(Read | Thread)

Specific conditional entropy

Calculation steps:

• Calculate H(Read | Thread = new)• Calculate H(Read | Thread = follow_up)• Calculate H(Read | Thread) = p(new)*H(Read | Thread = new) +

+ p(follow_up)*H(Read | Thread = follow_up)

Page 31: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Reads Skips Row total

new_thread 7 (70%) 3 (30%) 10

follow_up 2 (25%) 6 (75%) 8

Example: Predicting the importance of emails

• P(Read = True | new)= 0.7; P(Read = False | new) = 0.3

• H(Read | new) = 0.88

• P(Read = True | follow_up) = 0.25; P(Read = False | follow_up) = 0.75

• H(Read | follow_up) = 0.81

• H(Read | Thread) = 10/18 *0.88 + 8/18*0.81 = 0.85

Page 32: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Example: Predicting the importance of emails Calculating G(Read,Thread):

• G(Read,Thread) = H(Read) – H(Read | Thread)

• G(Read,Thread) = 1– 0.85 = 0.15

Page 33: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Example: Predicting the importance of emails

Page 34: COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.

Advantages of decision trees

• Decision trees are able to generate understandable rules (i.e., human-readable).

• Once learned, decision trees perform classification very efficiently. • Decision trees are able to handle continuous as well as categorical

variables. You choose a threshold to split the continuous variables based on information gain.