Ch0 Content

12
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman [email protected] 03-89466507 012-6323430

description

Ch0 Content

Transcript of Ch0 Content

Page 1: Ch0 Content

SAK 5609DATA MINING

Prof. Madya Dr. Md. Nasir bin Sulaiman

[email protected]

Page 2: Ch0 Content

Synopsis Kredit: 3(3+0) Contact hours: 3 x 1 hour per week Semester: I Emphasis on concepts of data mining. It includes

principles of data mining, data mining functions, data mining processes, data mining techniques such as K-nearest neighbour and clustering algorithms, rule induction, decision tree algorithms, association rule mining, neural networks and genetic algorithms; and data mining examples. Industrial and scientific applications will be given.

Page 3: Ch0 Content

Assessment & References Assessment:

– Exercises (10%)– Project I (15%) + presentation I (5%) Week 7 Project II (15%) + presentation II (5%) Week 14– Mid-exam 20% (1 hour) Week 6– Final exam 30% (1.5 hours) Week 15 - 17

References:– Jiawei Han & Micheline Kamber, (2001), “Data Mining: Concepts

and Techniques”, Morgan Kaufman.– Michael J.A.Berry & Gordon S. Linoff, (2004), “Data Mining

Techniques (2nd edition)”, Wiley.– Other related articles

Page 4: Ch0 Content

Course Contents Chapter 1 Introduction

– Motivation– Origin of data mining– What it is/ isn’t– The KDD process– Types of data– Data mining tasks

• Association rule mining, sequential rules, clustering, classification, anomaly detection

Page 5: Ch0 Content

Course contents Chapter 2 Data issues

– What is data set?– Types of attributes– Transformation for different types– Types of data

• Structured data, record data, data matrix, document data, transaction data, graph data, ordered data

– Data quality• Noise and outliers, missing values,

inconsistent/duplicate data

Page 6: Ch0 Content

Course contents Chapter 3 Data preprocessing

– Why Data Preprocessing?– Why Is Data Preprocessing Important?– Major Tasks in Data Preprocessing

• Data Cleaning• Data integration• Data transformation• Data reduction• Data discretization

Page 7: Ch0 Content

Course contents Chapter 4 Association rule mining

– Introduction– The Model– Goal and Key Features– Mining Algorithms– Problems with the Association Rule Model– Issues of association rules– Other Main Works on Association Rules

Page 8: Ch0 Content

Course contents Chapter 5 Classification

– Overview– An example application– Definition– Classification Model– General Approach– Classification—A Two-Step Process– Classification Techniques– Evaluating classification methods– Decision Tree Based Classification, rule based classifiers, nearest

neighbor classifiers etc

Page 9: Ch0 Content

Course contents Chapter 6 Clustering

– Introduction– What is/is not cluster analysis?– Examples of clustering applications– Concepts of clustering– Types of data in clustering analysis– Types of clustering – hierarchical, partitional– Major Clustering Techniques– Types of clusters– Clustering algorithms

Page 10: Ch0 Content

Chapter 7 Anomaly Detection Applications Causes of anomalies Approaches to anomaly detection

– Statistical– Proximity-based outlier detection– Density-based outlier detection– Clustering-based techniques

Issues dealing with anomalies

Page 11: Ch0 Content

Course Contents Chapter 8 Visualization

– What is visualization?– Motivation for visualization– General categories of visualization– Representation– Arrangement– Selection– Do’s and don’ts– Visualization techniques

Page 12: Ch0 Content

Course contents Chapter 9 Text mining, web mining

– Introduction– Text processing– Relevance judgement– Web Search– Search engines