Data Mining: Classification. Classification What is Classification? –Classifying tuples in a...

Data Mining:Classification

Classification

• What is Classification?– Classifying tuples in a database

– In training set E• each tuple consists of the same set of multiple attributes

as the tuples in the large database W

• additionally, each tuple has a known class identity

– Derive the classification mechanism from the training set E, and then use this mechanism to classify general data (in W)

Learning Phase

• Learning– Training data are analyzed by a classification algorithm

– The class label attribute is credit_rating

– The classifier is represented in the form of classification rules

Testing Phase

• Testing (Classification)– Test data are used to estimate the accuracy of the classification rules

– If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples

Classification by Decision Tree

A top-down decision tree generation algorithm: ID-3 and its extended version C4.5 (Quinlan’93): J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kaufmann, 1993

Decision Tree Generation• At start, all the training examples are at the root

• Partition examples recursively based on selected attributes

• Attribute Selection– Favoring the partitioning which makes the majority

of examples belong to a single class

• Tree Pruning (Overfitting Problem)– Aiming at removing tree branches that may lead to

errors when classifying test data• Training data may contain noise, …

Eye Hair Height OrientalBlack Black Short YesBlack White Tall YesBlack White Short YesBlack Black Tall YesBrown Black Tall YesBrown White Short YesBlue Gold Tall NoBlue Gold Short NoBlue White Tall NoBlue Black Short No

Brown Gold Short No

1 2 3 4 5 6 7 8 91011

Another Examples

• After the analysis, can you classify the following patterns?– (Black, Gold, Tall)– (Blue, White, Short)

• Example distributions

BlackShort

BlackTall

WhiteShort

WhiteTall

GoldShort

GoldTall

Black + + + + ?

Brown + + ─

Blue ─ ? ─ ─ ─

Decision Tree

Decision Tree Generation

• Attribute Selection (Split Criterion)– Information Gain (ID3/C4.5/See5)– Gini Index (CART/IBM Intelligent Miner)– Inference Power

• These measures are also called goodness functions and used to select the attribute to split at a tree node during the tree generation phase

Decision Tree Generation

• Branching Scheme– Determining the tree branch to which a sample

belongs– Binary vs. K-ary Splitting

• When to stop the further splitting of a node– Impurity Measure

• Labeling Rule– A node is labeled as the class to which most sa

mples at the node belongs

Decision Tree Generation Algorithm: ID3

(7.1) Entropy

ID: Iterative Dichotomiser

Decision Tree Algorithm: ID3

Another Example

Decision Tree Generation Algorithm: ID3

Gini Index• If a data set T contains examples from n classes, gi

ni index, gini(T), is defined as

where pj is the relative frequency of class j in T.

• If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index, gini(T), is defined as

p)T(gini1

)(giniN

)T( TNTNginisplit 2

Inference Power of an Attribute

• A feature that is useful in inferring the group identity of a data tuple is said to have a good inference power to that group identity.

• In Table 1, given attributes (features) “Gender”, “Beverage”, “State”, try to find their inference power to “Group id”

Generating Classification Rules

Data Mining: Classification. Classification What is Classification? –Classifying tuples in a...

Documents

Transcript of Data Mining: Classification. Classification What is Classification? –Classifying tuples in a...

Seven Web Frameworks in Seven Weeksmedia.pragprog.com/titles/7web/apps.pdf · can store arbitrary Erlang tuples in it, and the first element becomes the key and the whole tuple is

XML-Tuples & XML-Spaces V0.7

Practical Multi-tuple Packet Classification using Dynamic Discrete Bit Selection

Scripting with RevitPythonShell in Revit / Vasari › .m › files › cp... · Data Structures – Tuples( ) • Tuples are like lists except they are immutable. • Tuple items

Tuples vs. Records

Web viewUnix. Kapoor. Canada. Tuple variables are most useful for comparing two tuples in the same relation. SQL>select distinct b1.title from book b1,book b2 where b1.unit

B oosting tuple propagation in multi- relational classification

Textbook Query Optimization 2. Textbook Query Optimizationresources.mpi-inf.mpg.de/departments/d5/teaching/... · 30 / 575 Textbook Query Optimization Algebra Revisited Tuples Tuple:

07 LCI - Tuples · Mark Whitehorn, Robert Zare and Mosha Pasumansky “Fast Track to MDX” 2004, Springer What is tuple? A tuple is an intersection of exactly a single member from

Chapter 7 : Lists, Tuples

C-Store: Tuple Reconstruction

Review The Relational Model •Why use a DBMS? OS provides ...cs186/fa06/lecs/02Relational.pdf–Disallow deletion of a Students tuple that is referred to? –Set sid in Enrolled tuples

Numbers, lists and tuples

20150701 PAT-tutorial MINIAOD Tholen - Collaboration … slide pool... · · 2015-07-01Analysis1 tuple Analysis2 tuple Analysis3 tuple Analysis5 tuple ... – with some loose preselection

Spring 20031 Classification. Spring 20032 Classification task Input: a training set of tuples, each labeled with one class label Output: a model (classifier)

CME193: IntroductiontoScientiﬁcPython Lecture3: Tuples ...schmit/cme193/lec/lec3.pdf · Contents Tuples Dictionaries Sets Strings Modules Exercises 3: Tuples, sets, dictionaries

Other Relational Languageskzhu/se305/L6_ER1.pdf · Tuple Relational Calculus A nonprocedural query language, where each query is of the form {t | P (t ) } It is the set of all tuples

Www.techstudent.co.cc Relations. Tuples Given a collection of types T i (i=1,2,….n), a tuple value on those types is a set of ordered.

Day 4 – Lesson 15 Tuples

UNIT -3 DATA STRUCTURES LIST S...P.SRIKANTH DEPARTMENT OF CSE @ MIC Tuples A tuple is a collection which is ordered and unchangeable. In Python tuples are written with round brackets(