Perceptron C++

Perceptron Learning & Decision Tree Learning

Artificial Intelligence CSCI 6751.81, FALL 2001

Instructor

Professor Francis Sand

By

Kanoksri Sarinnapakorn (Due date: December 15, 2001)

ii

A.I. Project Fall 2001

1) Write a program to implement the perceptron learning rule.

2) Write a program to implement the decision tree learning rule.

3) Solve Exercise 19.4. The number were generated from a kind of majority or voting function where each input had a different number of votes: 10 for I1, 4 for I2-I4, 2 for I5, 1 for I6.

4) Generate more examples from this function and test the performance of your trained neural net on these examples.

5) Draw learning curves for the perceptron and learning tree neural nets based on future made up examples with the same function.

Students will be expected to deliver the following items at the completion of the project.

Item 1 Code listing.

Item 2 Documentation of the programs, including explanation of the program logic and the inputs and expected outputs.

Item 3 Printout of the results obtained from running the executable program with the data supplied and the test data.

Item 4 The learning curves and a brief discussion of what you derive from them.

iii

Table of Contents

1. Perceptron Learning Program (PCTLearn)

1.1 Program logic ………………………..…………………………………….……………………………. 1

1.2 Input and output format ………………………………………………………………………………. 2

1.3 Source code listing ………………………………………………………………………………….. 3

2. Decision Tree Learning Program (DTLearn)

2.1 Program logic ………………………………..……………………………….……………………………. 7

2.2 Input and output format ……………………………………………………………………………. 8

2.3 Source code listing ………………………………………………………………………………….. 10

3. Exercise 19.4

3.1 The result from perceptron learning ……………………………..………………….………... 22

3.2 The result from decision tree learning …………………………………….….………………... 22

4. Testing Performance

4.1 Performance of perceptron learning ……………………………………..………………….…. 24

4.2 Performance of decision tree learning ………………………………………………….….…... 25

5. Learning Curves …………………………………………………………………………………………………. 28

6. Discussion …………………………….………………………………………………………………………….. 29

7. References …………………….………………………………………………………………………………….. 30

1

1. Perceptron Learning Program (PCTLearn)

The program (PCTLearn.cpp) is written in C++ and compiled using Microsoft Visual C++ Version 6 compiler.

1.1 Program Logic

1. Set the threshold = 1.0, learning rate (ALPHA) = 0.1, number of attributes (n) = 6, and maximum number of trials = 100.

2. Initialize the weight of each attribute by using a random number function (value between 0.0 to 1.0).

3. Open the input file and read all inputs into a training_data array.

4. Train the …

• For each example data

a) Calculate the output unit O from

O = Step (∑=

n

jJj IW

0

)

where Ij and Wj are the input and the weight of attribute j (j = 1 to n)

W0 = -threshold and I0 = 1.0

This step function means that if ∑=

n

jJj IW

0

>= 0, then O = 1; otherwise O = 0.

b) Calculate the error between the target (or correct output) and calculated output.

Err = T – O

where T = target and O = calculated output

c) If Err does not equal 0, adjust the weight of each attribute and start the training process again from the first example.

Wj <- Wj + ALPHA * IJ * Err , j = 1, … , n

d) If Err equals 0, use the same weights for the next example.

e) Repeat steps a to d until all the examples are correctly predicted (success) or the maximum number of trials is reached (failure: could not learn).

5. Print the final weights.

6. Calculate the percentage of correct prediction from 1000 test data.

2

1.2 Input and Output Format

Input file 1. Column1: The example number 2. Column2: Target value 3. Column3 – 8: Attributes (I1 – I6) Example of input file train14.dat EX T I1 I2 I3 I4 I5 I6 � Title 1 1 1 0 1 0 0 0 2 1 1 0 1 1 0 0 3 1 1 0 1 0 1 0 4 1 1 1 0 0 1 1 5 1 1 1 1 1 0 0 6 1 1 0 0 0 1 1 7 0 1 0 0 0 1 0 8 1 0 1 1 1 0 1 9 0 0 1 1 0 1 1 10 0 0 0 0 1 1 0 11 0 0 1 0 1 0 1 12 0 0 0 0 1 0 1 13 0 0 1 1 0 1 1 14 0 0 1 1 1 0 0 Output

Perceptron Learning

Number of training examples: 14Number of changing weights: 16Number of trials: 9

Final weights

W1 W2 W3 W4 W5 W60.767931 0.192413 0.254704 0.478561 0.121235 0.111350

Correct rate for 1000 examples is 98.80 %

Examples

3

1.3 Source Code Listing // ---------------------------------------------------------------------------// PCTLearn.cpp: Perceptron Learning// ---------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001// Fairleigh Dickinson University// Instructor: Prof. Fransic Sand// ---------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn// Date: 12/5/2001// Revisions: None.// ---------------------------------------------------------------------------

#include <stdio.h>#include <string.h>#include <iostream.h>#include <fstream.h>#include <iomanip.h>#include <time.h>#include <stdlib.h>

const int MAX_TRIALS = 100; // Maximum trialsconst int MAX_EXAMPLES = 1000; // Maximum number of the examplesconst int NUM_INPUTS = 6; // Number of input I1...I6const float ALPHA = float(0.1); // Learning rateconst float THRESHOLD = 1.0;

int training_data[MAX_EXAMPLES][NUM_INPUTS+1]; // add 1 for target value

void init_weights(float weights[]);void train(int num_examples, float final_weights[]);void get_input(ifstream &fin, int &num_examples);void adjust_weight(float weights[], int input[], int err);void print_weights(float weights[]);void cal_correct_rate(float final_weights[]);int perceptron_output(float weights[], int input[]);

int main(int argc, char *argv[]){

float weights[NUM_INPUTS];int num_examples;

char *filename = "train14.dat";ifstream fin;

if(argc < 2)fin.open(filename);

elsefin.open(argv[1]);

if (fin.fail())cout << "Could not open the file" << endl;

else{

init_weights (weights);get_input(fin,num_examples);fin.close();

4

cout << setw(42) << "Perceptron Learning" << endl << endl;cout << "Number of training examples: " << num_examples << endl;

train(num_examples,weights);print_weights(weights);cal_correct_rate(weights);

}return(0);

}

void get_input(ifstream &fin, int &num_examples){

const int TITLE_LENGHT = 100;const int NAME_LENGHT = 5;

char ex_number[NAME_LENGHT]; /* examples number */char title[TITLE_LENGHT];int j ; /* counter */int tmp_target;

num_examples = 0;fin.getline(title,TITLE_LENGHT,'\n'); /* read title line */

fin >> ex_number;while (!fin.eof()){

fin >> tmp_target;for (j = 0; j < NUM_INPUTS; j++)

fin >> training_data[num_examples][j];training_data[num_examples][NUM_INPUTS] = tmp_target;num_examples++;fin >> ex_number;

}}

void train(int num_examples, float weights[]){

int i,n_ex = 0, n_trial = 0; /* counter */

int input[NUM_INPUTS];int T, O, Err; // T = targer, O = ouput from perceptron, Err = T-Oint trials = 1,changes=0;

while (n_ex < num_examples && trials < MAX_TRIALS){

for (i = 0; i < NUM_INPUTS; i++)input[i] = training_data[n_ex][i];

T = training_data[n_ex][NUM_INPUTS];O = perceptron_output(weights, input);Err = T - O;

if (Err != 0){

adjust_weight(weights, input, Err);n_ex = 0; /* start at the beginning again */

5

++changes; /* Number of changing weights */}else

n_ex++;

if (++n_trial > num_examples){

n_trial = 0;++trials;

}}cout << "Number of changing weights: " << changes << endl;cout << "Number of trials: " << trials << endl;

}

void init_weights (float weights[]){

int i;srand((unsigned) time (NULL));

for (i=0 ; i< NUM_INPUTS ; i++)weights[i] = float(rand())/float(RAND_MAX); /* 0.0 - 1.0 */

}

void adjust_weight(float weights[], int input[], int err){

int i;for (i = 0; i < NUM_INPUTS ; i++)

weights[i] = weights[i] + ALPHA * err * input[i];}

int perceptron_output(float weights[], int input[]){

int i, output;float sum;

sum = -THRESHOLD;for (i=0 ; i < NUM_INPUTS ; i++)

sum += float (input[i]) * weights[i];if (sum >= 0.0)

output = 1;else

output = 0;return (output);

}

void print_weights(float weights[]){

int i;cout << endl << setw(40) << "Final weights " << endl

<< endl << setw(5);for (i = 0; i < NUM_INPUTS; i++)

cout << "W" << i+1 << setw(11);cout << endl;

cout.setf(ios::showpoint);cout.setf(ios::fixed);

6

setprecision(6);

for (i = 0; i < NUM_INPUTS; i++)cout << weights[i] << setw(12) ;

cout << endl << endl;}

void cal_correct_rate (float weights[]){

const int TITLE_LEN = 100;const int NAME_LEN = 5;

char ex_number[NAME_LEN]; /* examples number */char title[TITLE_LEN];

int num_examples;int target, data, correct_no, output, i;float sum;

char *filename = "testdata.dat";ifstream fin(filename);

if (fin.fail())cout << "Could not open the testdata.txt" << endl;

else{

correct_no = 0;num_examples = 0;fin.getline(title,TITLE_LEN,'\n'); /* read title line */fin >> ex_number;

while (!fin.eof()){

fin >> target;sum = -THRESHOLD;for (i = 0; i < NUM_INPUTS; i++){

fin >> data;sum += float(data)* weights[i];

}if (sum >= 0.0)

output = 1;else

output = 0;

if (output == target)correct_no++;

num_examples++;fin >> ex_number;

}cout << setprecision(2);

cout << "Correct rate for " << num_examples << " examples is "<< (correct_no * 100.0)/float (num_examples) << " %" << endl;

}}

7

2. Decision Tree Learning Program (DTLearn)

The program (DTLearn) is written in C++ and compiled on Microsoft Visual C++ version 6 compiler. It consists of five cpp files and five header files.

1. DTLearn.cpp and DTLearn.h - The main program and decision class

2. Dimension.cpp and Dimension.h - The dimension class stores an attribute and its possible values

3. Example.cpp and Example.h - The example class

4. Node.cpp and Node.h - The node class for creating the tree

5. Utility.cpp and Utility.h - The utility functions

The DTLearn is modified from decision tree learning programs which were written by Jon Monsarrat and run on Unix operating system. The original source code can be downloaded from http://yoda.cis.temple.edu:8080/books/dean/05.Learning/C%2b%2bCode/Decision/.

2.1 Program Logic

1. Read information about classes and attributes from the definition file.

2. Read each example (target value and attribute values) from the input file and keep in the examples class (using linked list data structure).

3. Build a tree. Start with the whole set of examples at the root node and split the root node using the following algorithm:

3.1 If the number of examples equals zero, return (skip this node).

3.2 Print the distribution.

3.3 If all examples have the same class, then return the class (declare the node a terminal node).

3.4 Call splitter function to find the best attribute in order to split the training set at this node.

Choose the best attribute which is the attribute with the largest gain. For any attribute A, the gain is defined as follows:

Gain(A) = ),(np

nnp

pI++

– Remainder(A)

Remainder(A) = ),(1 ii

i

ii

iv

i

ii

npn

npp

Inpnp

+++

+

∑=

npn

npn

npp

npp

npn

nppI

++

−

++

−=

++22 loglog),(

where attribute A has v distinct values and each subset has pi positive examples and ni negative examples.

4. Create a subtree for each non-terminal node using the same algorithm as in step 3. Repeat this process until no more non-terminal nodes left.

5. return the tree.

8

2.2 Input and Output Format

Input

There are two files for the input, the definition file and the data file.

Definition file (Filedef.dat)

Input format is laid out in this order:

1. The word "classes", followed by the number of classes and their values

2. The word "dimensions", followed by the number of dimensions

3. Each dimension in order, which includes

a. the name of the attribute

b. the number of different values of the attribute

c. all possible values of the attribute

Example of definition file classes 2 1 0 dimensions 6 I1 2 1 0 I2 2 1 0 I3 2 1 0 I4 2 1 0 I5 2 1 0 I6 2 1 0 Data file (ex: Train14.dat) 1. Column1: The example number 2. Column2: Target value 3. Column3 – 8: Attributes (for 6 attributes) EX T I1 I2 I3 I4 I5 I6 1 1 1 0 1 0 0 0 2 1 1 0 1 1 0 0 3 1 1 0 1 0 1 0 4 1 1 1 0 0 1 1 5 1 1 1 1 1 0 0 6 1 1 0 0 0 1 1 7 0 1 0 0 0 1 0 8 1 0 1 1 1 0 1 9 0 0 1 1 0 1 1 10 0 0 0 0 1 1 0 11 0 0 1 0 1 0 1 12 0 0 0 0 1 0 1 13 0 0 1 1 0 1 1 14 0 0 1 1 1 0 0

9

Output format

There are two parts in the output. The first part is the class distribution and the chosen attribute during the building of the decision tree. The second part is the resulting decision tree.

Example of output Class distribution is 1: 7 0: 7Dimension chosen: I1Class distribution is 1: 6 0: 1:: DistributionClass distribution is 1: 0 0: 2Terminal nodeClass distribution is 1: 0 0: 3Terminal nodeSplit on I1:---1:

Split on I3:---1:

[1]---0:

Split on I6:---1:

[1]---0:

[0]---0:

Split on I3: Decision tree---1:

Split on I4:---1:

Split on I6:---1:

[1]---0:

[0]---0:

[0]---0:

[0] From the decision tree information, we can draw the decision tree like this.

I1

I3

10

1

I6

0

1

1

10

0

I3

I4

1

1

I61

1

0

0

0

0

0

0

10

2.3 Source Code Listing // ---------------------------------------------------------------------------// DTLearn.cpp: The main program which learns and creates the decision tree// ---------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001// Fairleigh Dickinson University// Instructor: Prof. Fransic Sand// ---------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn// Date: 11/9/2001// ---------------------------------------------------------------------------

#include <stdlib.h>#include <iostream.h>#include <fstream.h>

#include "DTLearn.h"#include "Node.h"#include "Example.h"#include "Dimension.h"

int Decision::num_classes; // Number of classes for this runchar Decision::classes[MAX_CLASSES][MAX_STRING]; // Names for all classesint Decision::num_dimensions; // Number of dimensions for this runDimension **Decision::dimensions; // Array of dimensions

int main(int argc, char *argv[]){

ifstream fdef("filedef.dat"); /* fdef (definition file) */ifstream fin; /* fin (data file) */char *filename = "train14.dat”;

if(argc < 2)fin.open(filename);

elsefin.open(argv[1]);

if(fin.fail())cerr << "Could not open file " << filename << endl;

else{

Decision decision(fdef,fin);decision.build_tree();cout << decision << endl;

}

fdef.close();fin.close();

return 0;}

// Construct a decision given the filename for the info fileDecision::Decision(ifstream &fdef, ifstream &fin){

char keyword[MAX_STRING]; // Some scratch space for reading in wordschar title_line[100];

11

int i;

fdef >> keyword; // Should be "classes"fdef >> num_classes; // Read in the number of classes

for(i = 0; i < num_classes; i++) // Read in each class namefdef >> classes[i];

fdef >> keyword; // Should be "dimensions"fdef >> num_dimensions; // Read in the number of dimensions for this run

dimensions = new Dimension *[num_dimensions];for(i = 0; i < num_dimensions; i++){

dimensions[i] = new Dimension();fdef >> *dimensions[i]; // Read in each dimension

}

root = new Node; // Create the top of the decision treefin.getline(title_line,100,'\n'); // get title of variables

// Construct a linked list of all examples.// Read in all the examples and give them to the root nodewhile(!fin.eof()){

fin >> keyword; /* examples number */if(keyword[0] && !fin.eof() && !fin.fail()) // End of file?{

Example *example = new Example(keyword, fin);root->add_example(example);

}}

}

// Destroy the decisionDecision::~Decision(){

for(int i = 0; i < num_dimensions; i++)delete dimensions[i];

delete [] dimensions;delete root; //Deleting the root

}

// Given a name, look up which attribute this is, and return the indexint Decision::lookup_attribute(int i, char temp[]){

return dimensions[i]->lookup(temp);} // Given a name, look up which class this is, and return the indexint Decision::lookup_class(char temp[]){

int result = 0;for(int i = 0; i < num_classes && result == 0; i++)

if(strcmp(classes[i], temp) == 0)result = i;

return result;}

12

// Run the build_tree() algorithm to create the decision treevoid Decision::build_tree(){

root->build_tree();}

// Print out the treeostream& operator<<(ostream &o, Decision &d){

d.root->print(0);return o;

}

// --------------------------------------------------------------------------// DTLearn.h: Header file of DTLearn.cpp// ---------------------------------------------------------------------------// Class: Artificial Intelligence (CSCI6751) Fall 2001// Fairleigh Dickinson University// Instructor: Prof. Fransic Sand//---------------------------------------------------------------------------// Author: Kanoksri Sarinnapakorn// Date: 11/9/2001// Revisions: None.//---------------------------------------------------------------------------

#include <fstream.h>#include <iostream.h>#include "Utility.h"

const int MAX_CLASSES = 20; // The maximum number of classes allowed

class Dimension;class Node;class Decision {friend ostream& operator<<(ostream &o, Decision &d); // Print out the treepublic:

static int num_classes; // Number of classes for this runstatic char classes[MAX_CLASSES][MAX_STRING]; // Names for all classesstatic int lookup_class(char temp[]); // Get the class index given the name

static int num_dimensions; // Number of dimensions for this runstatic Dimension **dimensions; // Array of dimensionsstatic int lookup_attribute(int x, char temp[]); // Get the dimension index

Node *root; // The top of the decision tree

Decision (ifstream &fdef, ifstream &fin); // Construct a Decision~Decision(); // Destroy the decision tree

void build_tree(); // Build a decision tree};

13

// -------------------------------------------------------------------------------// Dimension.cpp: The dimension class stores an attribute and its possible values// -------------------------------------------------------------------------------

#include "Utility.h"#include "Dimension.h"#include "Example.h"#include "Node.h"

// Build a dimensionDimension::Dimension(){}

// Destroy a dimensionDimension::~Dimension(){}

// Given a name, lookup the index of the corresponding valueint Dimension::lookup(char temp[]){

int result = 0;

for(int i = 0; i < num_values && result == 0; i++)if(strcmp(temp, values[i]) == 0)

result = i;

return result;}

// Return the name of the attribute for this dimensionchar *Dimension::get_attribute(){

return attribute;}

// Read in a dimension from a fileistream& operator>>(istream &in, Dimension &d){

in >> d.attribute;

in >> d.num_values;for(int i = 0; i < d.num_values; i++)

in >> d.values[i];

return in;}

14

// -------------------------------------------------------------------------------// Dimension.H//// Each example that we want to place in the decision tree has a value for// each attribute. An attribute together with all the possible values for// the attribute we call a dimension. The dimension class stores an// attribute and its possible values, which is handy for various reasons.// -------------------------------------------------------------------------------

#include <iostream.h>#include "Utilit.h"

const int MAX_VALUES = 40; // Same as MAX_CHILDREN for a node.

class Node;class Example;class Dimension {friend istream& operator>>(istream &i, Dimension &d);

char attribute[MAX_STRING];

public:int num_values; // How many values this dimension haschar values[MAX_VALUES][MAX_STRING]; // The names of all values

int lookup(char temp[]); // Lookup the index of a given valuechar *get_attribute(); // Return the name of the attributeDimension(); // Construct a Dimension object~Dimension(); // Destroy this dimension

};

// -------------------------------------------------------------------------------// Example.cpp: An example is an assignment of values to each possible attribute.// An example has a name, and also a classification value.// --------------------------------------------------------------------------------

#include "Dtlearn.h"#include "Example.h"#include "Dimension.h"

// Construct an example, given the name of the example and an input stream.// This input stream is the input from the examples file.Example::Example(char *_name, istream& in){

strncpy(name, _name, MAX_STRING);char class_value[MAX_STRING];in >> class_value;my_class = Decision::lookup_class(class_value);

values = new int[Decision::num_dimensions];for(int i = 0; i < Decision::num_dimensions; i++){

in >> class_value;values[i] = Decision::lookup_attribute(i, class_value);

}}

15

// Destroy this example. That means destroying the dynamically allocated memoryExample::~Example(){

delete [] values;}

// Get the index of the class this example goes withint Example::get_class(){

return my_class;}

// Get the value for a certain dimension.int Example::get_value(int dim){

return values[dim];}

// Print out the exampleostream& operator<<(ostream &o, Example &e){

o << " [" << Decision::classes[e.my_class] << "] ";/* display each examples use this routine *///o << e.name << " [" << Decision::classes[e.my_class] << "] ";//for(int i = 0; i < Decision::num_dimensions; i++)//{// o << "(";// o << Decision::dimensions[i]->get_attribute() << ", ";// o << Decision::dimensions[i]->values[e.values[i]] << ") ";//}return o;

}

// -------------------------------------------------------------------------------// Example.H// An example is an assignment of values to each possible attribute.// An example has a name, and also a classification value.// -------------------------------------------------------------------------------#include <iostream.h>#include "Utility.h"

class Example {friend ostream& operator<<(ostream &o, Example &e); // Print out the example

char name[MAX_STRING]; // The name of this exampleint *values; // Array of values. We use numbers to index

// into the symbols in the dimensions.

int my_class; // Which of the valid classifications this is.

public:Example(char *_name, istream& in); // Construct given a name and in file~Example(); // Destroy this exampleint get_class(); // Get the class index of this exampleint get_value(int dim); // Get the value for a certain dimension.

Example *next; // The next example in a linked list.};

16

// ----------------------------------------------------------------------------// Node.cpp: Node class// ----------------------------------------------------------------------------

#include <iostream.h>#include <math.h>#include "Example.h"#include "Node.h"#include "DTLearn.h"#include "Dimension.h"

// Construct a node.Node::Node(){

num_children = 0;num_examples = 0;examples = NULL;

}

// Destroy a node. This means deleting all the dynamically allocated memoryNode::~Node(){

// Delete the linked list of examplesExample *example = examples;Example *temp;while(example){

temp = example->next;delete example;example = temp;

}

// Delete all the children nodesfor(int i = 0; i < num_children; i++)

delete children[i];}

// Add an example to this node. A specific example can only inhabit// one node at any one time.void Node::add_example(Example *example){

example->next = examples;examples = example;

num_examples++;}

// Build the decision subtree starting at this node. If this node is the root// node, then this means building the entire tree! First we see if// the examples at this node have more than one class. If not, then there// is no splitting required. If so, then we check all the dimensions// using a special metric in evaluate() and choose one dimension to split// on. This split generates new children nodes, and we run the build_tree()// algorithm on all the children, creating decision subtrees.

17

void Node::build_tree(){

if(num_examples != 0) // Skip this node if no examples{

print_distribution(); // Print the class distribution of this node

// We do not want to split this node if the examples all have the// same class. Further differentiation is not required, because all// the "answers" are the same.int first_class = examples->get_class();int different_classes = 0; // Assume not different classesfor(Example *example = examples->next; example; example = example->next)

if(example->get_class() != first_class)different_classes = 1;

if(different_classes) // if different classes{// Find the attribute that maximizes a particular evaluation function// implemented by evaluate(). Which dimension is best to split on?

attribute = splitter();

cout << "Dimension chosen: "<< Decision::dimensions[attribute]->get_attribute() << endl;

partition();// Actually break up the node. Allocate all examples to kids

for(int i = 0; i < num_children; i++)// Now do it on all the subtrees

children[i]->build_tree();}else

cout << "Terminal node" << endl;}

}

// This node has many examples. We build children from this node by creating// new nodes, one each for a value of the attribute specified in "attribute"void Node::partition(){

for(int val = 0; val < Decision::dimensions[attribute]->num_values; val++)children[num_children++] = new Node;

Example *temp;Example *example = examples;while(example){

temp = example->next;children[example->get_value(attribute)]->add_example(example);example = temp;

}

examples = NULL;}

18

// Print out the class distribution of this nodevoid Node::print_distribution(){

int i;cout << "Class distribution is ";int class_totals[MAX_CLASSES]; // How many are in each class?for(i = 0; i < Decision::num_classes; i++)

class_totals[i] = 0;

for(Example *example = examples; example; example = example->next)class_totals[example->get_class()]++;

for(i = 0; i < Decision::num_classes; i++)cout << Decision::classes[i] << ": " << class_totals[i] << " ";

cout << endl;}

// There are many dimensions that we could split on at this node.// Decide which one is most appropriate. Returns the number index.int Node::splitter(){

// Go through all the dimensions and find the one with the lowest// disorder value. We get this from the evaluation function.double min_value = evaluate(0); // The lowest evaluation seen so farint min_dim = 0; // Dimension of lowest evaluationdouble value;

for(int dim = 1; dim < Decision::num_dimensions; dim++){

value = evaluate(dim);if(value < min_value){

min_value = value;min_dim = dim;

}}

return min_dim;}

// Assuming that we split this node along the given dimension, there// will be a number of subtrees. Each of these subtrees will have a certain// disorder. Add up the total disorder of the subtrees and return it.double Node::evaluate(int dim){

// Find the numerical distribution of examples along this dimensionint dim_totals[MAX_VALUES];for(int value = 0; value < Decision::dimensions[dim]->num_values; value++)

dim_totals[value] = 0;

for(Example *example = examples; example; example = example->next)dim_totals[example->get_value(dim)]++;

// Tally up the overall totaldouble total = 0;for(value = 0; value < Decision::dimensions[dim]->num_values; value++)

19

total += dim_totals[value];

double sum = 0;for(value = 0; value < Decision::dimensions[dim]->num_values; value++)

sum += disorder(dim, value) * dim_totals[value] / total;

return sum;}

// Given a dimension to split on, and a particular class, how disordered// would this subtree be, starting from this node?double Node::disorder(int dim, int value){

double result;

int class_totals[MAX_CLASSES]; // How many are in each class?double total = 0, sum = 0;

for(int i = 0; i < Decision::num_classes; i++)class_totals[i] = 0;

for(Example *example = examples; example; example = example->next)if(example->get_value(dim) == value)

class_totals[example->get_class()]++;

for(i = 0; i < Decision::num_classes; i++)total += class_totals[i]; // Should be same as num_examples.

if(total < 0.1) // basically total == 0result = 0;

else{

for(i = 0; i < Decision::num_classes; i++)if(class_totals[i] > 0)

sum += (class_totals[i] / total) * logbase2(class_totals[i]);

result = logbase2(total) - sum;}return result;

}

// Print this node with "depth" amount of indentationvoid Node::print(int depth){

int i, j;if(num_children == 0){

Example *example = examples;for(i = 0; i < depth; i++)

cout << " ";cout << *example << endl;

else{

for(i = 0; i < depth; i++)cout << " ";

cout << "Split on " << Decision::dimensions[attribute]->get_attribute()<< ":" << endl;

20

for(j = 0; j < num_children; j++){

for(i = 0; i < depth; i++)cout << " ";

cout << "---"<< Decision::dimensions[attribute]->values[j] << ": " << endl;

children[j]->print(depth+1);}

}}

// ---------------------------------------------------------------------------// Node.h: Node class.// ---------------------------------------------------------------------------

const int MAX_CHILDREN = 40; // Cannot be more than MAX_VALUES for a dimension

class Example;class Dimension;class Node {

Example *examples; // Start of a linked list of examples.int num_examples; // The number of examples

int attribute;

int num_children;Node *children[MAX_CHILDREN]; // The children under this node

int splitter(); // Which dimension will node be split on?double evaluate(int dim); // Metric for deciding splitting dimensiondouble disorder(int dim, int value); // Another metric, used by evaluatevoid print_distribution(); // Print out the class distribution herevoid partition(); // Break up the node given the split

public:Node(); // Construct a node~Node(); // Destroy a nodevoid add_example(Example *example); // Add an example to this node's setvoid build_tree(); // Build the decision subtree from this

nodedouble disorder(int value); // Calculate a metric for disordervoid print(int depth); // Print the node with "depth" indentation

};

21

// ----------------------------------------------------------------------------// Utility.cpp: This is a collection of utility functions in DTLearn.// ----------------------------------------------------------------------------

#include <math.h>#include "Utility.h"

// Compares two strings. If they are equal, return zero.int strcmp(const char *a, const char *b){

int result = 0;

while(*b){

if(*a > *b)result = 1;

else if(*a < *b)result = -1;

a++;b++;

}

if(*a)result = 1;

return result;}

// Copy one string to another, but NOT more than n characters.// Assume that "a" points to allocated memory which can be written into.void strncpy(char *a, const char *b, int n){

for(int i=0; *b && i<n; i++)*a++ = *b++;

*a = '\0';}

double logbase2(double n){

return (log(n)/log(2.0));}

// ----------------------------------------------------------------------------// Utility.H// This is a general string class. We include it for a platform-independent// string comparison function.// ----------------------------------------------------------------------------

#ifndef _STRING_H_#define _STRING_H_

#ifndef NULL#define NULL 0#endif

const int MAX_STRING = 20; // The maximum length of any string.extern int strcmp(const char *a, const char *b);extern void strncpy(char *a, const char *b, int n);

#endifextern double logbase2(double n);

22

3. Exercise 19.4 Each example has six inputs and one target output:

I1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 I2 0 0 0 1 1 0 0 1 1 0 1 0 1 1 I3 1 1 1 0 1 0 0 1 1 0 0 0 1 1 I4 0 1 0 0 1 0 0 1 0 1 1 1 0 1 I5 0 0 1 1 0 1 1 0 1 1 0 0 1 0 I6 0 0 0 1 0 1 0 1 1 0 1 1 1 0 T 1 1 1 1 1 1 0 1 0 0 0 0 0 0

a) Run the perceptron learning rule on this example set, and show the resulting set of weights. The result from perceptron learning

Perceptron Learning

Number of training examples: 14Number of changing weights: 16Number of trials: 9

Final weights

W1 W2 W3 W4 W5 W60.767931 0.192413 0.254704 0.478561 0.121235 0.111350

Correct rate for 1000 examples is 98.80 % b) Run the decision tree learning rule, and show the resulting decision tree. The result from decision tree learning Class distribution is 1: 7 0: 7Dimension chosen: I1Class distribution is 1: 6 0: 1Dimension chosen: I3Class distribution is 1: 4 0: 0Terminal nodeClass distribution is 1: 2 0: 1Dimension chosen: I6Class distribution is 1: 2 0: 0Terminal nodeClass distribution is 1: 0 0: 1Terminal nodeClass distribution is 1: 1 0: 6Dimension chosen: I3Class distribution is 1: 1 0: 3Dimension chosen: I4Class distribution is 1: 1 0: 1Dimension chosen: I6Class distribution is 1: 1 0: 0Terminal nodeClass distribution is 1: 0 0: 1Terminal node

23

Class distribution is 1: 0 0: 2Terminal nodeClass distribution is 1: 0 0: 3Terminal nodeSplit on I1:---1:

Split on I3:---1:

[1]---0:

Split on I6:---1:

[1]---0:

[0]---0:

Split on I3:---1:

Split on I4:---1:

Split on I6:---1:

[1]---0:

[0]---0:

[0]---0:

[0]

I1

I3

10

1

I6

0

1

1

10

0

I3

I4

1

1

I61

1

0

0

0

0

0

0

24

4. Testing Performance and Learning Curve Generate more examples from majority function for data in exercise 19.4 (votes: 10 for I1, 4 for I2-I4, 2 for I5, 1 for I6). There are eleven training sets, each having different number of examples: 14, 50, 100, 150, …, 500. The test data set has 1000 examples. 4.1 Performance of Perceptron Learning

The PTCLearn program (section 1.3, page 3) is modified for testing performance. Output from one time running of different training set sizes

Size W1 W2 W3 W4 W5 W6 Correct rate14 0.941713 0.250551 0.240110 0.449168-0.089596 0.196139 96.20%50 0.841713 0.350551 0.240110 0.349168 0.110404 0.196139 96.50%100 0.841713 0.250551 0.340110 0.449168 0.010404 0.196139 95.70%150 0.841713 0.250551 0.340110 0.449168 0.110404 0.196139 92.60%200 0.841804 0.278533 0.285293 0.283505 0.225202 0.115986 96.30%250 0.941804 0.278533 0.285293 0.283505 0.325202 0.015986 96.30%300 0.941804 0.278533 0.185293 0.283505 0.325202 0.115986 92.80%350 0.741804 0.278533 0.285293 0.283505 0.125202 0.115986 95.90%400 0.841804 0.278533 0.285293 0.283505 0.225202 0.015986 96.30%450 0.941804 0.278533 0.285293 0.283505 0.325202 0.015986 96.30%500 0.841804 0.178533 0.385293 0.483505 0.025202 0.115986 94.90% Run 30 times for each training set size and compute the average and standard deviation of correct rate.

Training set size

Average correct rate (%)

Standard Deviation correct rate (%)

14 92.20 0.47 50 95.77 1.78 100 99.30 1.01 150 96.82 2.64 200 97.64 2.94 250 99.29 1.83 300 98.43 2.65 350 98.82 2.40 400 99.38 1.40 450 99.16 1.71 500 94.19 4.89

25

I1

I3

10

1

I6

0

1

1

10

0

I3

I4

1

1

I61

1

0

0

0

0

0

0

4.2 Performance of Decision Tree Learning

The decision tree from training set size 14 Split on I1:---1:

Split on I3:---1:

[1]---0:

Split on I6:---1:

[1]---0:

[0]---0:

Split on I3:---1:

Split on I4:---1:

Split on I6:---1:

[1]---0:

[0]---0:

[0]---0:

[0]

Correct rate 84.70%. The decision tree from training set size 50

Split on I1:---1:

Split on I4:---1:

[1]---0:

Split on I2:---1:

[1]---0:

Split on I6:---1:

[1]---0:

Split on I3:---1:

[1]---0:

[0]---0:

Split on I3:---1:

Split on I2:---1:

Split on I4:---1:

[1]---0:

[0]---0:

[0]---0:

[0]

Correct rate 95.70%.

I1

I4

10

1

I2

0

1

1

10

1

I3

I2

1

1

1

1

0

0

0

0I6

1

1I4

0

0 I3

0

0

1

0

26

The decision tree from training set size 100 Split on I1:---1:

Split on I4:---1:

[1]---0:

Split on I3:---1:

[1]---0:

Split on I6:---1:

[1]---0:

Split on I2:---1:

[1]---0:

[0]---0:

Split on I3:---1:

Split on I5:---1:

Split on I2:---1:

Split on I4:---1:

[1]---0:

[0]---0:

[0]---0:

[0]---0:

[0]

Correct rate 92.20% The decision tree from training set size 150 Split on I1:---1:

Split on I4:---1:

[1]---0:

Split on I3:---1:

[1]---0:

Split on I2:---1:

[1]---0:

Split on I5:---1:

Split on I6:---1:

[1]---0:

[0]---0:

[0]

---0:Split on I5:---1:

Split on I4:---1:

Split on I2:---1:

Split on I3:---1:

[1]---0:

[0]---0:

[0]---0:

[0]---0:

[0]

Correct rate 96.30%.

I1

I4

10

1

I3

0

1

1

10

1

I3

I5

1

1

1

0

0

0

0

0I6

1

1I2

0

0 I2

0

0

1

0

I4

1

0 1

I1

I3

10

1

I4

0

1

1

10

1

I3

I5

1

1

1

0

0

0

0

0I2

1

1I2

0

0 I5

0

0

1

0

I4

1

01

27

I1

I3

10

1

I4

0

1

1

10

1

I2

I4

1

1

1

0

0

0

0I2

1

1I3

0

0 I5

0

0

1

0

I5

1

0

I6

0

0

1

1

I6

0

0

1

1

I1

I2

10

1

I4

0

1

1

10

1

I4

I3

1

1

1

0

0

0

0I3

1

1I2

0

0 I6

0

0

1

0

I6

1

0

I5

0

0

1

1

I5

0

0

1

1

The decision tree from training set size 200 The decision tree from training set size 250

Correct rate 100.00% Correct rate 100.00%

The decision trees of 300, 350, 400 and 450 training sets have almost the same pattern as that of 200, 250 and 500 training sets. The decision tree from training set size 500 Correct rate 100.00%

I1

I3

10

1

I4

0

1

1

10

1

I2

I4

1

1

1

0

0

0

0I2

1

1I3

0

0 I6

0

0

1

0

I5

1

0

I5

0

0

1

1

I6

0

0

1

1

28

Learning Curves

Training set size

Perceptron correct rate (%)

Decision Tree correct rate (%)

14 92.20 84.70 50 95.77 95.70 100 99.30 92.20 150 96.82 96.30 200 97.64 100.00 250 99.29 100.00 300 98.43 100.00 350 98.82 100.00 400 99.38 100.00 450 99.16 100.00 500 94.19 100.00

Learning Curve

0

20

40

60

80

100

0 100 200 300 400 500

Training set size

% c

orre

ct o

n t

est

set

Perceptron Decision Tree

29

5. Discussion Both perceptron learning and decision tree learning are simple forms of supervised learning.

Perceptron Learning

The perceptron is a single layer neural network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The perceptron generated great interest due to its ability to generalize from its training vectors and work with randomly distributed connections.

Perceptron networks have several limitations. First, the output values of a perceptron can take on only one of the two values (True or False). Second, the perceptron can only classify linearly separable sets of vectors. If a straight line or plane can be drawn to separate the input vectors into their correct categories, the input vectors are said to be linearly separable and the perceptron will find the solution. If the vectors are not linearly separable, perceptron learning will never reach a point where all vectors are classified properly.

Decision Tree Learning

Decision tree learning is an efficient method for learning deterministic Boolean functions. It can be used with functions with a larger range of outputs as well. The result of decision tree learning is a decision tree that provides a simple representation for the prepositional knowledge. It helps us see the important attributes and understand the problem better, and the resulting tree can be used easily for decision making and classification of objects.

Performance Comparison

Comparing the learning curves from perceptron learning and decision tree learning, we see that both can predict with a good accuracy of 80% or above. When the training set size is below 200, perceptron learning has a slightly higher percentage of correct prediction. As the training set size gets bigger, decision tree learning performs better and outperforms perceptron learning. In the study here, decision tree learning gives a 100% correct prediction once the training set size is 200 or more.

In terms of interpretation, a linear function classifier as obtained from perceptron learning is harder to understand than a decision tree with a single attribute split at each node. Besides, the weights of attributes in the linear function can change from one run to another even with the same set of data, although the accuracy may not be affected much. In decision tree learning, it is possible that not all attributes are used in splitting. Perhaps we can say that decision tree learning can screen out some unimportant attributes and produce a parsimonious classifier. (The attributes that are not selected may be indeed unimportant or they may have high correlation with some other attributes presented in the tree.)

30

6. References

[1] Chakrabarty, N., Example of neural networks, [Online document], http://www.thechak.net/~neil/neilchak.htm

[2] Fullér, R., The perceptron learning rule, [Online document], ftp://ftp.abo.fi/pub/iamsr/nfs8.pdf

[3] Russell S. and Norvig P. (1995). Artificial Intelligence: A Modern Approach," Prentice Hall, Upper Saddle River, New Jersey.

[4] Thornton, C., Decision tree learning program, by, [Online document],

http://www.cogs.susx.ac.uk/users/christ/crs/sai/lec15.html

[5] Weisman, O. and Pollack, Z., The perceptron, [Online document],

http://www.cs.bgu.ac.il/~omri/Perceptron/

[6] Decision tree learning, [Online document],

http://www.cs.cornell.edu/home/cardie/tutorial/dtrees.pdf

[7] Example of decision tree programs, [Online document], http://yoda.cis.temple.edu:8080/books/dean/05.Learning/C%2b%2bCode/Decision/.

Perceptron C++

Documents

Transcript of Perceptron C++