Word

59
Carter SECTION 1: INTRODUCTION 1.1 OVERVIEW The process of data collection, no matter the objective, can provide a person or entity with a large amount of generic information on a particular subject. However, to learn from and to interpret the data requires much more than simply gathering it. For example, a business may build meaningful relationships with its customers by learning from previous interactions with them, observing their needs, and remembering what their preferences are, in order to determine how to serve them better in the future. In order for this type of learning to take place, data must first be collected and organized in a useful and consistent way. This procedure is known as data warehousing. Data warehousing allows a user to remember what has been noticed in the data. Afterwards, the data must then be analyzed, interpreted, and transformed into useful information. At this stage is where data mining comes into play. Data 1

description

 

Transcript of Word

Page 1: Word

Carter

SECTION 1: INTRODUCTION

1.1 OVERVIEW

The process of data collection, no matter the objective, can provide a person or

entity with a large amount of generic information on a particular subject. However, to

learn from and to interpret the data requires much more than simply gathering it. For

example, a business may build meaningful relationships with its customers by learning

from previous interactions with them, observing their needs, and remembering what their

preferences are, in order to determine how to serve them better in the future. In order for

this type of learning to take place, data must first be collected and organized in a useful

and consistent way. This procedure is known as data warehousing. Data warehousing

allows a user to remember what has been noticed in the data. Afterwards, the data must

then be analyzed, interpreted, and transformed into useful information. At this stage is

where data mining comes into play. Data mining is the exploration and analysis, by

automatic or semiautomatic means, of large quantities of data in order to discover

meaningful patterns and rules (Berry, Linoff, pg. 5). Data mining can be applied in a

wide variety of areas, from sports to law enforcement to education.

In this project, I use data mining techniques to predict the current contraceptive

method choice (no use, long-term methods, or short-term methods) of Indonesian women

based on their demographic and socio-economic characteristics. The algorithms that are

implemented here are Naive Bayesian Classification, One-Rule Classification and

Decision Tree. This project presents a Web-based client/server application. The project

makes use of the three-tier client/server architecture, with the Web browser as the client

1

Page 2: Word

Carter

front-end, the Common Gateway Interface (CGI), Perl, Visual Basic, and Active Server

Pages (ASP) as the middle-tier software, and Microsoft Access 2000 and a comma-

separated value (CSV) text file for the database back-end. The database administrator

has the capability to add, delete, edit, and search for records. The administrator can also

change the administrator password and add users who have permission to gain access to

the website. Users have privileges to add records and to search for records. A logging

system is also implemented, which keeps track of the time, date, host server, browser,

and operating system of users that access the database. The log is accessible by both the

administrator and the users.

1.2 BACKGROUND INFORMATION

According to the Central Bureau of Statistics, the nation of Indonesia is the

fourth-most populous country in the world, with an estimated total population of 207

million in 2000 (United Nations Population Fund). Indonesia has a growth rate of 1.5

percent a year, and although the population growth rate is at a moderate level, the country

has a significant momentum of growth. The government of Indonesia is concerned about

the uneven distribution of the population and the scale of the population growth. This is

especially true in when considering overcrowding in urban, densely populated areas, such

as Java and Bali. Other areas of concern are the relatively high infant and under-five

mortality rates (52 and 71 per 1,000, respectively) and the persistently high maternal

mortality ratio (estimated at 370 per 100,000 births).

Indonesia has been recognized for the success of its family planning efforts.

However, according to (United Nations Population Fund), the progress in the

contraceptive prevalence rate (CPR) seems to have stalled at about 57 percent. Also, the

2

Page 3: Word

Carter

burden of use of contraceptives appears to be unevenly shouldered by women, as the

male-based CPR is less than 2 percent. And even though the “unmet need” for

contraceptives of currently married women has been estimated at the relatively small 9.2

percent, this number is probably considerably higher when unmarried men and women

are taken into account. In order to meet this need, it is paramount that the quality and

scope of contraceptive services and information be expanded. A critical challenge for

Indonesia remains the access to affordable contraceptives by all its citizens, especially the

poor.

1.3 ABOUT THE DATASET

This dataset comes via a subset of the 1987 National Indonesia Contraceptive

Prevalence Survey. It was created and donated by Tjen-Sien Lim on June 7, 1997. The

contents were downloaded from the UCI Machine Learning Depository. The samples

contained in the survey are of married women who were either not pregnant or did not

know if they were pregnant at the time of the interview. The problem faced is predicting

the contraceptive method choice of the woman based on her demographic and socio-

economic characteristics. Predicting the contraceptive method choice of Indonesian

women can assist the government with how to and where to target and provide

information on contraceptive choices for its female population. The three choices are no

use, long-term methods, or short-term methods. The number of instances is 1473, and the

number of attributes is 11, including the primary key (ID) and the classifying attribute

(cmchoice).

3

Page 4: Word

Carter

1.4 ATTRIBUTE INFORMATION

No. Attribute Description Type / Values1. ID ID number (primary key attribute)2. wife_age Wife’s age (numerical)3. wife_ed Wife’s education (categorical) 1=low level, 2, 3, 4=high level4. hus_ed Husband’s education (categorical) 1=low level, 2, 3, 4=high level5. no_child Number of children ever born? (numerical)6. wife_rel Wife’s religion (binary) 0=Non-Islam, 1=Islam7. wife_work Wife’s now working? (binary) 0=Yes, 1=No8. hus_oc Husband’s occupation (categorical) 1=low level, 2, 3, 4=high level9. st_live Standard-of-living index (categorical) 1=low, 2, 3, 4=high10. media Media exposure (binary) 0=Good, 1=Not good11. cmchoice Contraceptive method used (class) 1=No-use, 2-Long-term, 3=Short-term

Figure 1: Attribute Information

There are no missing values in the dataset.

4

Page 5: Word

Carter

SECTION 2: TECHNICAL DESCRIPTION

2.1 THE THREE-TIER ARCHITECTURE

The most commonly used application development architecture, and the one

supported by most application servers, is a component-based, three-tier model (Directions

on Microsoft). Components provide an increase of reusable code and simplify

development. By using components, a developer can package the compiled (binary) code

in such a way that another developer is able to easily and efficiently discover the

functions provided by the component (usually by using a programming language

application such as Visual Basic) and invoke those functions. This is accomplished while

keeping the internal workings of the component hidden.

The three-tier architecture increases scalability and reliability by separating the

three major logical functions of an application (user interaction, business logic, data

storage) from one another. Many Web services must provide functionality that displays

the graphical user interface (GUI), performs the main logic of the program, and then

stores and retrieves data. And although a developer may write a single module that will

interconnect the three functions of user interaction, logic, and data storage, such an

approach would require a great deal of work in maintenance and in deployment.

Therefore, developers attempt to divide the application’s functionality into tiers, or

layers. Years ago, as business applications moved from minicomputer or mainframe

systems to the PC, developers adopted a two-tier strategy, which is also known as the

client-server model. In this model, the data storage (typically provided by a server

running a database management system such as SQL Server or DB2) is separated from

the rest of the application (typically running on desktop PCs). This resulted in many

5

Page 6: Word

Carter

developer tools being created around the client-server model. However, the client-server

model had its drawbacks, which included the following, as described in (Directions on

Microsoft):

Difficult to evolve. Because the client piece of a client-server system included

both the GUI and the business logic, developers updating the GUI could

inadvertently change the business logic as well.

Difficult to deploy. A client application had to be deployed on the desktop PC of

each user who wanted to access the application, potentially requiring thousands of

deployments.

Difficult to scale. Each running client connected directly to the database, thereby

consuming server resources and often limiting the number of simultaneous users

that could access an application.

On the other hand, the three-tier model introduces an intermediate business-logic tier

between the GUI and the data storage, which provides these advantages over the client-

server model:

Increased scalability. Logic components can be pooled and shared across multiple

running clients.

Easier to maintain. Since the GUI code is separate from the business logic, the GUI

can be changed and enhanced without accidentally altering core business rules. In

addition, when the business logic must be changed, only a relatively small number of

middle-tier servers need to be updated instead of a larger number of desktop PCs.

Shared business logic and support for multiple interfaces. The same business logic

can be used from a Web-based interface and a thick-client interface.

6

Page 7: Word

Carter

Figure 2 illustrates the setup of a typical three-tier architectural model:

Figure 2 (Delphi 2)

7

Page 8: Word

Carter

2.2 WEB BROWSER / HTML

HTML, the HyperText Markup Language, is the standard authoring language for

publishing on the World Wide Web. Having gone through several stages of evolution,

today’s HTML has a wide range of features reflecting the needs of a very diverse and

international community wishing to make information available on the Web (HTML

Activity Statement). HTML defines the layout and structure of a Web document by

using a series of tags and attributes.

In this project, I use HTML for the structure of the Web pages within my project

site. A Web browser is a software application used to locate and display HTML pages.

The Microsoft Internet Explorer Web browser serves as the client in this application.

2.3 CGI

The Common Gateway Interface (CGI) is a standard for interfacing external

applications with information servers, such as HTTP or Web servers (CGI: Common

Gateway Interface). A CGI program is executed in real-time, which means that it can

output dynamic data to a Web page. On the other hand, a generic HTML document

that is retrieved contains static information, which means it exists in a constant state and

the information outputted to the screen does not change. Because a CGI program is

executable, it allows visitors to a Web page to run a program on the server where the CGI

document is hosted. For this and other reasons, authors of CGI scripts must take some

security measures when it comes to the execution of the scripts. CGI programs must

reside in a special directory, so that the Web server knows to execute the program instead

of merely displaying it to the browser. Typically, this directory is under direct control of

8

Page 9: Word

Carter

the webmaster, which prevents the average user from creating CGI programs. The most

common practice is to place CGI programs in a directory entitled ‘/cgi-bin’.

2.4 PERL

A CGI program can be written in any language that allows it to be executed on

the user’s system, and Perl is the language of choice for many developers. Perl is an

acronym for the Practical Extraction Report Language. Perl is available for most

operating systems, including virtually all Unix-like platforms (Perl). The language is

optimized for scanning arbitrary text files, extracting information from those text files,

and printing reports based on that information. Perl can handle many system

management tasks, and the language’s designers intended it to be practical, easy to use,

and efficient. Perl combines many features of C, sed, awk, and sh, as well as csh,

Pascal, and BASIC-PLUS. Expression syntax in Perl corresponds closely to C

expression syntax. Perl, unlike most Unix utilities, does not arbitrarily limit the size of

the user’s data, as long as the required memory is available. As an example, Perl can

parse a whole file as a single string. Recursion in Perl is of unlimited depth. The tables

used by hashes, commonly referred to as associative arrays, grow as necessary to

prevent diminished performance. One of Perl’s most useful capabilities is that it can

use sophisticated pattern matching techniques to scan large amounts of data quickly.

And although optimized for scanning text, Perl can also deal with binary data.

In this project, I use Perl to implement CGI scripts for performing the database

manipulation operations, such as insert, delete, edit, and search. Perl and CGI serve as

a part of the middle tier of this application.

9

Page 10: Word

Carter

2.5 ASP / VBScript

Active Server Pages (ASP) are components that allow Web developers to create

server-side scripted templates. In turn, these templates generate dynamic, interactive web

server applications. By embedding special programmatic codes in standard HTML pages,

a user can interact with page objects such as Active-X or Java components, access data in

a database, or create other types of dynamic output. The HTML output by an Active

Server Page is totally browser independent, which means that it can be read equally well

by Microsoft Explorer, Netscape Navigator, or most other browsers (ASP-help.com).

In this project, I use ASP technology to allow the implementation of the user login

feature, as well as the add user function, which is done using Visual Basic script, or

VBScript. ASP / VBScript serve as a part of the middle tier of this application.

2.6 B-Course

B-Course is a Web-based data analysis tool for Naive Bayesian modeling.

Specifically, B-Course is used for dependence and classification modeling. B-Course can

be freely used for educational and research purposes as an analysis tool where

dependence or classification modeling based on data is needed. The software provides

two courses of modeling: dependency modeling and classification.

2.7 VISUAL BASIC DATA MINING.NET

Visual Basic Data Mining.Net is a Web portal that provides data mining

algorithm and application documentation, as well as various source codes in .Net and

Visual Basic. These features of the site demonstrate how the .NET Framework and/or

Visual Basic can be used to either learn how data mining algorithms and applications

function or to build data mining applications. Visual Basic Data Mining.Net also offers

10

Page 11: Word

Carter

a data mining community and provides functionality of data mining algorithms and

applications. The site provides a wizard-based interface for implementing the

algorithms. Visual Basic Data Mining.Net can be found online at: http://www.visual-

basic-data-mining.net.

2.8 SEE5

See5 analyzes data to produce decision trees and/or rulesets that relate a case’s

class to the values of its attributes (See5). In See5, an application consists of a

collection of text files. These files define classes and attributes, describe the cases to

be analyzed, provide new cases to test the classifiers produced by See5, and specify

misclassification costs or penalties. A See5 application consists of two mandatory

files, which are a .names file and a .data file. The .names file defines the classes and

attributes associated with the data. The .data file contains the actual cases to be

analyzed by See5 in the process of producing a classifier.

11

Page 12: Word

Carter

SECTION 3: DATA MINING ALGORITHMS

3.1 NAIVE BAYESIAN CLASSIFICATION

Bayes Theorem illustrates how to calculate the probability of one event given that

it is known some other event has occurred. Expressed algebraically, this is a simple class-

conditional approach, based upon the following assumption:

P(A|B) = P(A) * P(B|A) / P(B)

or, the probability that A takes place given that B has occurred (P(A|B)) equals the

probability that A occurs (P(A)) times the probability that B occurs if A has happened

(P(B|A)), divided by the probability of B occurring (P(B)). Naive Bayesian classifiers

make the assumption that an attribute’s effect on a given class is independent of values of

any other attribute, and this assumption is known as class conditional independence. It is

made to simplify the computation and in this sense considered to be “naive” (Naive

Bayes – Introduction).

The independence assumption that underlies the Naive Bayesian classification

technique is one that is deep-seated and therefore, may not be realistic. However, a

Naive Bayesian classifier can yield an excellent prediction. One example of this case

may occur when a feature selection process on the data is completed prior to

classification. This ensures that only one pair of any highly correlated features is saved

and used in the classification process. When dealing with gene expression data, feature

selection must be performed prior to classification due to the extremely high

dimensionality of the feature space (Wallach, 2003).

12

Page 13: Word

Carter

A Bayesian network consists of nodes and arcs that can connect pairs of nodes

(P.Myllymäki, et. al). For each variable, exactly one node exists. A major

restriction for the Bayesian network is that arcs are not allowed to form loops. If

the arcs can be followed such that some node is visited twice, the model is not a

Bayesian network. Figure 3 is an example of a network that is NOT a Bayesian

network:

Figure 3 (P.Myllymäki, et.al.)

Presented next is a dependency model for a Bayesian network. This example

model is given in (P.Myllymäki, et. al):

A and B are dependent on each other if we know something about C or D (or both).

A and C are dependent on each other no matter what we know and what we don't know about B or D (or both).

B and C are dependent on each other no matter what we know and what we don't know about A or D (or both).

C and D are dependent on each other no matter what we know and what we don't know about A or B (or both).

There are no other dependencies that do not follow from those listed above.

Figure 4 shows the Bayesian network for these dependencies:

Figure 4 (P.Myllymäki, et.al.)

13

Page 14: Word

Carter

A and B are considered dependent, when given a (possibly empty) set S that contains

some other variables of the network, if one can freely travel the arcs from A to B. If the

arcs cannot be freely traveled from A to B, A and B are not dependent given S. The

ability to travel an arc is generally independent of the direction of the arc. If S is an

empty set, one may travel the arcs forward and backward, given that the same node is

never visited twice and that an arc is first traveled forward, and immediately afterward

traveled backward on some other arc.

In this project, I use B-Course to perform Naive Bayesian dependency modeling

and Naive Bayesian Classification on the contraceptive method choice database.

3.2 ONE-RULE CLASSIFICATION

The one-rule algorithm creates one data mining rule for the dataset based on one

attribute (one column in a database table). After comparing the error rates from all the

attributes, it then chooses the rule that gives the lowest classification error. The rule will

assign to one category or class each distinct value of one chosen attribute. This rule can

be defined in pseudocode as (Tagbo):

For each attribute in the data set

For each distinct value of the attribute

Find the most frequent classification

Assign the classification to the value

Calculate the error rate for the value

Calculate the total error rate for the attribute

Choose the attribute with the lowest error rate

Create one rule for the chosen attribute

14

Page 15: Word

Carter

The goal of the one rule data mining algorithm in this implementation is to

classify each of the attributes wife_age, hus_ed, no_child, wife_rel, wife_work, hus_oc,

st_live, and media of the contraceptive method choice database as no use, long-term

methods, and short-term methods. Afterwards, the attribute with the lowest error rate is

chosen as the best rule. In this project, I use Visual Basic Data Mining.Net to process

the results of the One-Rule Classification algorithm on the contraceptive method choice

database.

3.3 DECISION TREE

A visual aid for data mining is the decision tree. A decision tree is in essence a

flow chart of questions or data points. These questions or data points eventually lead to a

decision. Decision tree algorithms begin by finding the test that performs the best task of

splitting the data among the preferred categories. At each successive level of the tree,

subsets created by the previous split are themselves split, making a path down the tree.

Each of the paths through the tree represents a rule. However, some rules are more useful

than other ones. And in some cases, the predictive power of the entire tree can be

bettered by pruning back the weaker branches. At each node of the tree, three things can

be measured: the number of records entering the node, the percentage of records

classified correctly at the node, and the way the records would be classified if it were a

leaf node. The tree continues to grow until it is no longer possible to locate more useful

ways to split the incoming records. Decision trees create a set of bins or boxes where the

data miner may place records.

15

Page 16: Word

Carter

In Figure 5, a partial binary tree for the classification of musical instruments. The

gap in the center of the row of bins corresponds to the root node of the tree. All stringed

instruments then fall to the left of the gap, and all other instruments fall to the right.

Figure 5 (Berry, Linoff, pg. 245)

In this project, I use See5 to construct decision trees and process those results for the

contraceptive method choice database.

16

Page 17: Word

Carter

SECTION 4: SYSTEM DESIGN

4.1 SYSTEM LAYOUT

Figure 6: Project System Flow

17

Query / Manipulation User Login

Administrator Login

Search Add Records

DeleteEdit

Add Users

Logs Data Mining

Naive Bayes

Decision Tree

One Rule

Change Admin Password

Page 18: Word

Carter

4.2 WEBSITE PRESENTATION

Figure 7: The contraceptive method choice database homepage

18

Page 19: Word

Carter

Figure 8: Administrator Login Page

To guarantee security, only the privileged database administrator can log in to the

database to perform three of the database manipulation functions, which are to add users,

delete records, and edit records. The administrator can also add users and change the

admin password.

19

Page 20: Word

Carter

Figure 9: Administrator Options

After the administrator successfully logs in, administrator options are presented. These

options include: search records, change password, add records, add users, delete records,

and edit records. NOTE: Clicking the “Delete Record” button next to an entry will

delete that entry from the database.

20

Page 21: Word

Carter

Figure 10: Password Change Success Page

21

Page 22: Word

Carter

Figure 11: Add User Page

Figure 12: Add User Success Page

22

Page 23: Word

Carter

Figure 13: Edit Record Page

23

Page 24: Word

Carter

Figure 14: Edit Record Success Page

24

Page 25: Word

Carter

Figure 15: User Login Page

Users have privileges to add records and to search for records.

Figure 16: Bad User Login Page

25

Page 26: Word

Carter

Figure 17: User Request Page

Figure 18: User Request Success Page

26

Page 27: Word

Carter

Figure 19: Email Message

This is the email that the system automatically sends to the database administrator when a

user requests a login name and password.

27

Page 28: Word

Carter

Figure 20: Search Page

Figure 21: Search Page Results

Both the database administrator and users have access to the search function.

28

Page 29: Word

Carter

Figure 22: Add Record Page

Figure 23: Add Record Success Page

Both the database administrator and users have access to the add records function.

29

Page 30: Word

Carter

Figure 24: Access Log Detail

Both the database administrator and users have access to the access log feature. A count

is kept for the different types of browsers and operating systems used. The log detail

contains the time, date, host server, browser, and operating system of the computer that

accesses the system.

30

Page 31: Word

Carter

SECTION 5: DISCUSSION

5.1 NAIVE BAYESIAN RESULTS

B-Course was used to construct Bayesian dependency models for the

contraceptive method choice database. All variables, excluding the primary key

ID, were used in constructing the model. When the software is invoked, B-Course

searches for the most probable model for the data and returns these intermediate

results. B-Course can then continue using a search strategy of selecting models

that resemble the current best model, instead of picking models randomly from a

set. As B-Course continues, it collects a set of relatively good models and then

attempts to combine the best parts of these models so that the resulting combined

model is better than any of the original models.

After evaluating 8539 candidate models, B-Course returned the following

Bayesian network as the best model:

Figure 25: Bayesian Network (P.Myllymäki, et.al.)

31

Page 32: Word

Carter

B-Course was started again, evaluating 444681 more candidate models, for a grand

total of 453220 models evaluated. After searching these candidate models, B-

Course located a new Bayesian network that represents the same model as the

previous network:

Figure 26: New Bayesian Network (P.Myllymäki, et.al.)

B-Course also provides for Naive Bayesian classification. In classification

modeling, one attribute of the data is chosen as the class variable, and the other attributes

become predictor variables. The ultimate goal is to find the model that, given the values

of predictor variables, deduces the value of the class variable. Classification modeling

can also help to test whether some classes are similar or not. For example, if a model can

correctly tell the classes apart, then there must be some difference in those particular

classes. More analysis can measure how significant the differences in classes are.

32

Page 33: Word

Carter

B-Course merges many quantitative models to build one single classification model.

After running B-Course, 301 candidate models were evaluated. The estimated

classification accuracy of the best model found was 48.74%. On the average the correct

class received 36.56% probability. Figure 27 displays the variables B-Course found as

the best subset for predicting the class variable:

Figure 27: Classification model (P.Myllymäki, et.al.)

Figure 28: Class arc weights (P.Myllymäki, et.al.)

33

Page 34: Word

Carter

It was estimated that if the selected models were used, then 48.74% of future

classifications would be done correctly. B-Course built 1473 models, each of which was

constructed using the data items in the dataset. Next, the model was used to classify the

data items not used in the model’s construction. Out of 1473 models, 718 succeeded in

classifying the one unseen data item correctly.

A confusion matrix displays how many members of a certain class were predicted

to be members of a different class. Figure 28 shows a confusion matrix for the Naïve

Bayesian classifier, where the entries denoting numbers of correct classifications are in

bold print.

ConfusionPredicted

Long-term No-use Short-term

Actual

Long-term 102 60 171

No-use 79 319 231

Short-term 66 147 297Figure 29: Confusion Matrix (P.Myllymäki, et.al.)

5.2 ONE-RULE RESULTS

Using Visual Basic Data Mining.Net software, I applied the one-rule

classification algorithm to the contraceptive method choice database. The steps used in

producing the one-rule results are as follows:

Step 1: Decide which of the attributes will be used to create the best one-rule for the dataset. Attribute ID is not chosen because it is the primary key for the database. Attribute cmchoice is not selected because it is the class attribute, containing the categories needed for classification. The remaining 9 attributes are chosen.

Step 2: List the distinct values of each attribute. These values can be seen in Figure 1.

34

Page 35: Word

Carter

Step 3: Find the most frequent classification for every distinct value of an attribute using the contraceptive method choice class values (no use, long-term methods, short-term methods). For example, according to the output, when no_child = 8, there were 9 cases of category no use, 7 cases of category long-term methods, and 8 cases of category short-term methods. Therefore, the most frequent classification is category no use, and a rule is made classifying 8 children as category no use, or 8 children No Use. The error rate for 8 children is the total number of times it appears in the dataset (24) minus the number of instances of its most frequent class (9), divided by the total (24). So the error rate in this case is 15 / 24.

Step 4: Repeat Step 3 for each case of each attribute.

Step 5: Choose the attribute with the lowest error rate

Step 6: Create a one-rule classification based on this attribute

Figure 30 displays a portion of the one-rule classification output. As shown, the attribute

with the lowest error rate, which was selected as the best rule, is no_child.

35

Page 36: Word

Carter

Attribute IsNumeric BestRule Value L.P.B. U.P.B. Class Frequency Total

wife_work False False

wife_rel False False

wife_ed False False

st_liv False False

media False False

hus_oc False False

hus_ed False False

wife_age True False

no_child True True 0 0 0 1 62 62

no_child True True 0 0 0 1 33 34

no_child True True 0 0 1 1 94 95

no_child True True 0 1 1 2 31 31

no_child True True 0 1 1 3 61 61

no_child True True 0 1 1 1 49 49

no_child True True 0 1 1 2 15 15

no_child True True 0 1 1.5 3 26 26

no_child True True 0 1.5 2 1 83 83

no_child True True 0 2 2 2 39 39

no_child True True 0 2 2 3 77 77

no_child True True 0 2 2 1 31 31

no_child True True 0 2 2 2 17 17

no_child True True 0 2 2.5 3 29 29

no_child True True 0 2.5 3 1 46 46

no_child True True 0 3 3 2 44 44

no_child True True 0 3 3 3 90 90

no_child True True 0 3 3 1 24 24

no_child True True 0 3 3 2 26 26

no_child True True 0 3 3.5 3 29 29

no_child True True 0 3.5 4 1 37 37

Figure 30: One Rule Output (Tagbo)

36

Page 37: Word

Carter

5.3 DECISION TREE RESULTS

I performed decision tree analysis on the contraceptive method choice dataset

using See5. There are 1473 instances in the dataset, with the 10 attributes, plus the

unique identifier ID. However, this version of See5 allowed a maximum of 400 cases

that could be used at a time. The class attribute, cmchoice, is represented by three

categories (1 = no use; 2 = long-term; 3 = short-term). The numbers shown between 0

and 1 represent the probability of the attribute, at the given criteria, belonging to the

specific class (no use, long-term, short term). The 400 cases were selected such that

relatively equal numbers of cases for each contraceptive method choice classification are

present. Thus, for the cmc.data file, the breakdown by ID is as follows:

No use (1): ID# 1 – 133

Long-term (2): ID# 416-549

Short-term (3): ID# 643-776

Below is a partial screen shot of a mine for the ruleset of the attributes. A 95%

confidence interval was used for all mines.

37

Page 38: Word

Carter

Figure 31: Ruleset (Quinlan)

38

Page 39: Word

Carter

Figure 32: Decision Tree Output (Quinlan)

See5 creates a decision tree of the results. To paraphrase, the tree can be translated in

this manner:

39

Page 40: Word

Carter

if no_child is less than or equal to 0, then no useelse if no_child > 0

if wife_ed = 1if wife_age > 36, then no useelse if wife_age <= 36

if st_live = 1, then no useif st_live = 2, then long-termif st_live = 3, then short-termif st_live = 4, then long-term

if wife_ed = 2……………. (etc…)

From the decision tree, conclusions can be drawn for determining which contraceptive

method choice is best for Indonesian women. For example, a woman with no children

would be most likely to choose no use. A wife with at least one child, a low educational

level, and above the age of 36 is predicted for no use. A wife with at least one child, a

low educational level, less than or equal to 36 years old, and with a standard of living

index of 2 is predicted to have long-term methods. A wife with those same

characteristics, but with a standard of living index of 3, is predicted to have short-term

methods. Numerous predictions can be seen throughout the decision tree.

Many times, classification decisions can occur slowly with changes in attribute

values. For example, a threshold may be a value less than or equal to 0.5 for one

classification, say long-term methods, and the values more than 0.5 may be another

classification, say short-term methods. If the former holds, we go no further and predict

long-term methods. By default, a threshold such as this is sharp. Therefore, a case with a

hypothetical value of 0.49 is treated quite differently from one with a value of 0.51.

See5 contains an option to invoke, instead of sharp thresholds like the case

mentioned in the previous paragraph, fuzzy thresholds. A fuzzy set is a set whose

40

Page 41: Word

Carter

elements are usually neither totally in the set nor totally out of the set (Meadow, et.al., pg.

217). When this is invoked, each threshold is broken into three ranges – they are denoted

by a lower bound lb, an upper bound ub, and a central value t. If the questioned attribute

value is below lb or above ub, classification is made by using the single branch

corresponding to the `<=' or '>' result respectively. If the value falls between lb and ub,

then both branches of the tree are investigated, with the results combined

probabilistically. The values of lb and ub are determined by See5 based on a study of the

perceived sensitivity of classification to small changes in the threshold. Figure 33 shows

a screenshot of the classifier construction options, and Figure 34 displays part of the

decision tree with fuzzy thresholds:

Figure 33: Classifier Construction Options (Quinlan)

41

Page 42: Word

Carter

Figure 34: Decision Tree Output with Fuzzy Thresholds (Quinlan)

Of note is how the upper and lower bounds of the thresholds are specified. For instance,

in the non-fuzzy example, when no_child is > 0 and wife_ed = 1, wife_age has one

threshold, 36 – if wife_age is greater than 36, no use is returned; if wife_age is less than

or equal to 36, then the tree branches to the st_live attribute to determine the appropriate

class. However, in the fuzzy example, there is no one specific threshold, or cut-off. If

no_child >= 1 and wife_ed = 1 when wife_age is >= 38, no use is returned; when

wife_age is <= 35, then the tree branches to the the st_live attribrute to determine which

class is predicted. The fuzzy thresholds option constructs an interval close to the

threshold. Within this interval, both branches of the tree are explored. Next, the results

are combined to give a predicted class. When wife_age is greater than 35 and less than

38, or 35 < wife_age < 38, the prediction becomes imprecise. A wife_age value of 36.5 is

chosen as the fuzzy threshold.

42

Page 43: Word

Carter

5.4 CONCLUSION

All three data mining algorithms were successful at predicting the contraceptive

method choice of an Indonesian woman based on her demographic and socio-economic

characteristics. B-Course created Bayesian dependency networks for the attributes of the

dataset. The estimated classification accuracy of the best model found was 48.74%.

With the resulting accuracy of the classification being less than 50% in this case, the

Naive Bayesian algorithm may not be the best model for this dataset. It is possible that

the creation of more candidate models may increase the accuracy percentage. One-Rule

classification determined that the no_child attribute, which is the number of children born

to an Indonesian woman, was the best rule for predicting the contraceptive method

choice. The decision tree algorithm determined that the best predictor of the

contraceptive method choice was the rule where no_child <=0, which would predict the

no use category (95.7%). In comparing the regular decision tree to the decision tree

containing fuzzy thresholds, the regular decision tree had an error rate of 25.0%, while

the decision tree with fuzzy thresholds had an error rate of 25.5%. There was not a

significant difference between these two methods.

43

Page 44: Word

Carter

WORKS CITED

ASPhelp.com. “What are Active Server Pages?”. Retrieved March 8, 2003, from the World Wide Web. http://www.asp-help.com/getstarted/gs_aboutasp.asp

Berry, Michael, and Gordon Linoff. Data Mining Techniques for Marketing, Sales, and Customer Support. New York: John Wiley and Sons. 1997.

CGI: Common Gateway Interface. Retrieved March 8, 2003, from the World Wide Web. http://hoohoo.ncsa.uiuc.edu/cgi/intro.html.

Delphi 2 – Developing for Multi-Tier Distributed Computing Architectures. Retrieved March 9, 2003, from the World Wide Web. http://community.borland.com/article/0,1410,10343,00.html#three.

Directions on Microsoft. “What is an Application Server?”. Retrieved March 9, 2003, from the World Wide Web. http://www.directionsonmicrosoft.com/sample/DOMIS/research/2002/12dec/1202wiaas.htm

HTML Activity Statement. Retrieved March 8, 2003, from the World Wide Web. http://www.w3.org/MarkUp/Activity.

Lewis, David. “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval”. Proceedings of ECML-98, 10th European Conference on Machine Learning. Florham Park, NJ: AT&T Labs Research, 1998.

Meadow, Charles, B.R. Boyce, D.H. Kraft. Text Information Retrieval Systems, 2 nd Edition . San Diego: Academic Press. 2000.

“Naïve Bayes – Introduction”. Retrieved February 5, 2003, from the World Wide Web. http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm.

O’Reilly and Associates. “Perl”. Retrieved March 8, 2003, from the World Wide Web. http://www.perldoc.com/perl5.6/pod/perl.html.

P.Myllymäki, T.Silander, H.Tirri, and P.Uronen. B-Course: A Web-Based Tool for Bayesian and Causal Data Analysis. International Journal on Artificial Intelligence Tools, Vol 11, No. 3 (2002) 369-387.

Quinlan, Ross. “RuleQuest Research Data Mining Tools”. Retrieved March 18, 2003, from the World Wide Web. http://www.rulequest.com/.

Tagbo, Kingsley. “Visual Basic Data Mining.Net”. http://www.visual-basic-data-mining.net. 2002.

44

Page 45: Word

Carter

United Nations Population Fund - Indonesia. Retrieved March 16, 2003, from the World Wide Web. http://www.un.or.id/unfpa/idpop.html.

Wallach, Hannah. “Supervised Learning Methods”. Retrieved March 14, 2003, from the World Wide Web. http://www.srcf.ucam.org/~hmw26/coursework/dme/node14.html

45