CSCI3230 Entropy, Information &Decision Tree Liu Pengfei Week 12, Fall 2015 1.

1

CSCI3230Entropy, Information &Decision Tree

Liu Pengfei

Week 12, Fall 2015

2

Which feature to choose?

3

Information GainSuppose we have a dataset of peopleWhich split is more informative?

Split over whether Account exceeds

50K

Over 50KLess or equal 50K EmployedUnemployed

Split over whether applicant is employed

4

Information Gain

Uncertainty/Entropy (informal) Measures the level of uncertainty in

a group of examples

5

Uncertainty

Very uncertain group Less uncertain Minimum uncertainty

6

Entropy: a common way to measure uncertainty

• Entropy =

pi is the probability of class i, or that the proportion of class i in the set.

• Entropy comes from information theory. The higher the entropy, the less the information content

• --some say higher entropy,• more information

7

Information Gain We want to determine which attribute in a

given set of training feature vectors is most useful for discriminating between the classes to be learned.

Information gain tells us how important a given attribute of the feature vectors is.

We will use it to decide the ordering of attributes in the nodes of a decision tree.

Example of Entropy

Gender Major Like

Male Math Yes

Female History No

Male CS Yes

Female Math No

Female Math No

Male CS Yes

Male History No

Female Math Yes

=1

Whether students like the movie Gladiator

Example of Entropy

Gender Major Like

Male Math Yes

Female History No

Male CS Yes

Female Math No

Female Math No

Male CS Yes

Male History No

Female Math Yes

P(Major=Math & Like =Yes) = 0.25

P(Major=History & Like =Yes) = 0

P(Major=CS & Like =Yes) = 1

=1

=0

=0

Example of Entropy

Gender Major Like

Male Math Yes

Female History No

Male CS Yes

Female Math No

Female Math No

Male CS Yes

Male History No

Female Math Yes

P(Gender=male & Like =Yes) = 0.75

P(Gender =female & Like =Yes) = 0.25

𝐸 (𝐿𝑖𝑘𝑒∨𝐺𝑒𝑛𝑑𝑒𝑟=𝑚𝑎𝑙𝑒 )=− 14

log ( 14 )−− 3

4log( 3

4 )𝐸 (𝐿𝑖𝑘𝑒∨𝐺𝑒𝑛𝑑𝑒𝑟= 𝑓𝑒𝑚𝑎𝑙𝑒 )=− 3

4log( 3

4 )−− 14

log( 14 )

CSCI3230Introduction to Neural Network I

Liu Pengfei

Week 12, Fall 2015

12

The Angelina Effect Angelia is a carrier of the

mutation in the BRCA1 gene.

Angelina Jolie's risk of having breast cancer was amplified by more than 80 percent and ovarian cancer by 50 percent.

Her aunt died from breast cancer and her mother from ovarian cancer.

She decided to go for surgery and announced her decision to have both breasts removed.

Can we use a drug for the treatment?

13

Neural Network Project Topic: Protein-Ligand Binding Affinity Prediction Goal: Given the structural properties of a drug, you are helping a

chemist to develop a classifier which can predict how strong a drug (ligand) will bind to a target (protein).

Due date will be announced later Do it alone / Form a group of max. 2 students

(i.e. >= 3 students per group is not allowed.)

You can use any one of the following language: C, C++, Java, Swi-Prolog, CLisp, Python, Ruby or Perl.

However, you cannot use data mining or machine learning packages

Start the project as early as possible !

14

How to register your group ?

15

How to register your group ?

16

What to include in your zip file ?Your zip file should contain the followings:

preprocess.c – your source code (if you use C) preprocessor.sh – a script file to compile your

source code trainer.c – your source code (if you use C) trainer.sh – a script file to compile your source

code best.nn – your Neural Network

Case Sensitive !

17

Grading Please note that the neuron network

interfaces/formats are different from the previous years.

We adopts a policy of zero tolerance on plagiarism. Plagiarism will be SERIOUSLY punished.

To make the project estimation easier, we will evaluate your work based on the F-measure

18

Model Evaluation Cost-sensitive measures

cba

a

pr

rpba

a

FNTP

TPca

a

FPTP

TP

2

22(F) measure-F

(r) Recall

(p)Precision

Predicted Class

ActualClass

Class = Yes Class = No

Class = Yes a (TP) b (FN)

Class = No c (FP) d (TN)

20

Introduction to Neural Network

21

Biological Neuron A neuron is an

electrically excitable cell that processes and transmits information through electrical and chemical signals.

A chemical signal occurs via a synapse, a specialized connection with other cells.

22

Artificial Neuron An artificial neuron is a

logic computing unit.

In this simple case, we use a step function as the activation function: only 0 and 1 are possible outputs

Mechanism: Input:

Output: y = g(in)

Activation Function g(u)

Artificial Neuron

Step function

𝜃

23

Example

If w1 = 0.5, w2 = -0.75, w3 = 0.8, and step function g (with threshold 0.2) is used as activation function, what is the output?

Summed input = 5(0.5) + 3.2(-0.75) + 0.1(0.8) = 0.18Output = g(Summed input)

Since 0.18 < 0.2, so Output = 0

w1

w2

w3

Activation Function g(u)

Step function

𝜃

5

3.2

0.1

24

Artificial Neuron An artificial neuron is a

logic computing unit.

In this case, we use sigmoid function as activation function: real values from 0 and 1 are possible outputs

Mechanism: Input:

Output: y = g(in)

Activation Function g(z)

Artificial Neuron

Sigmoid function

𝑔 (𝑧 )= 1

1+𝑒− 𝑧

25

Example

If w1 = 0.5, w2 = -0.75, w3 = 0.8, and sigmoid function g is used as activation function, what is the output?

Summed input = 5(0.5) + 3.2(-0.75) + 0.1(0.8) = 0.18Output = g(Summed input) = = 0.54488

Activation Function g(z)

Sigmoid function

𝑔 (𝑧 )= 1

1+𝑒− 𝑧

w1

w2

w3

5

3.2

0.1

26

Logic Gate Simulation[0,1]

[0,1]

[0,1]

[0,1]

[0,1]

I1 I2 Total Input

t Input > t?

AND 0 0 0 1.5 0

AND 0 1 1 1.5 0

AND 1 0 1 1.5 0

AND 1 1 2 1.5 1

Function:Sum(input)>t

?

27


[0,1]

[0,1]

[0,1]

[0,1]

I1 I2 Total Input

t Input > t?

OR 0 0 0 0.5 0

OR 0 1 1 0.5 1

OR 1 0 1 0.5 1

OR 1 1 2 0.5 1


?

28


[0,1]

[0,1]

[0,1]

[0,1]

I1 I2 Total Input

t Input > t?

NOT 0 N/A 0 -0.5 1

NOT 1 N/A -1 -0.5 0


?

29

Logic Gate Simulation

For the previous cases, it can be viewed as a classification problem:

separate the class 0 and class 1.The neuron simply finds a line to separate the two classes

And: I1+I2 – 1.5 = 0Or: I1+I2 – 0.5 = 0Xor: ?

I1

I2

Output

0 0 0

0 1 1

1 0 1

1 1 0

30

Linear SeparabilityTwo classes ('+' and '-') below are linearly separable in two dimensions.

i.e we can find a set of w1, w2 such that:

Every point x belongs to class ‘+’ satisfy &Every point x belongs to class ‘-’ satisfy

Two classes ('+' and '-') below are linearly inseparable in two dimensions.

The above example would need two straight lines and thus is not linearly separable.

http://en.wikipedia.org/wiki/Linear_separability

31

Artificial Neural NetworksImportant concepts:

What is a perceptron? What is a single-layer perceptron? What is a multi-layer perceptron?

What is the feed forward property?

What is the general learning principle ?

32

Technical Terms Perceptron = Neuron

Single-layer perceptron = Single-layer neural network

Multi-layer perceptron = Multi-layer neural network

The presence of one or more hidden layer is the difference between single-layer perceptron and multi-layer perceptron.

The simplest kind of neural network is a single-layer perceptron network, which consists of a single layer of output nodes; the inputs are fed directly to the outputs via a series of weights.

33

Multi-layer Perceptron Multi-layer

Input layer Hidden layer(s) Output layer

Feed-forward Links go in one

direction only

A MLP consists of multiple layers of nodes in a directed graph. Except for the input nodes, each node is a neuron.

The multilayer perceptron consists of three or more layers (an input and an output layer with one or more hidden layers).

http://en.wikipedia.org/wiki/Multilayer_perceptron

34

Feed Forward Property Given inputs to input layer

(L0)

Outputs of neurons in L1 can be calculated.

Outputs of neurons in L2 can be calculated and so on.

Finally, outputs of neurons in the output layer can be calculated

35

General Learning Principle1. For supervised learning,

we provide the model a set of inputs and targets.

2. The model returns the outputs

3. Reduce the difference between the outputs and targets by updating the weights

4. Repeat step 1-3 until some stopping criteria is encountered

Target Input

w1

w2

w3

Input

Input

Input

Output

36

Summary1. We have learnt the similarity between the biological neurons

and artificial neurons

2. We have learnt the underlying mechanism of artificial neurons

3. We have learnt how artificial neurons compute logic (AND, OR, NOT)

4. We have learnt the meaning of perceptron, single-layer perceptron and multi-layer perceptron.

5. We have learnt how information is propagated between neurons in multi-layer perceptron (Feedforward property).

6. We have learnt the general learning principle of artificial neuron network.

37

Announcements Written assignment 3 will be released this

week.

Neural network project specification will be released this week.

Since there will be no tutorial on next Thursday due to congregation ceremony, please attend one of the other two tutorial sessions on Wednesday.

CSCI3230 Entropy, Information &Decision Tree Liu Pengfei Week 12, Fall 2015 1.

Documents

Transcript of CSCI3230 Entropy, Information &Decision Tree Liu Pengfei Week 12, Fall 2015 1.