Ch0 Content
-
Upload
bama-raja-segaran -
Category
Documents
-
view
213 -
download
1
description
Transcript of Ch0 Content
SAK 5609DATA MINING
Prof. Madya Dr. Md. Nasir bin Sulaiman
Synopsis Kredit: 3(3+0) Contact hours: 3 x 1 hour per week Semester: I Emphasis on concepts of data mining. It includes
principles of data mining, data mining functions, data mining processes, data mining techniques such as K-nearest neighbour and clustering algorithms, rule induction, decision tree algorithms, association rule mining, neural networks and genetic algorithms; and data mining examples. Industrial and scientific applications will be given.
Assessment & References Assessment:
– Exercises (10%)– Project I (15%) + presentation I (5%) Week 7 Project II (15%) + presentation II (5%) Week 14– Mid-exam 20% (1 hour) Week 6– Final exam 30% (1.5 hours) Week 15 - 17
References:– Jiawei Han & Micheline Kamber, (2001), “Data Mining: Concepts
and Techniques”, Morgan Kaufman.– Michael J.A.Berry & Gordon S. Linoff, (2004), “Data Mining
Techniques (2nd edition)”, Wiley.– Other related articles
Course Contents Chapter 1 Introduction
– Motivation– Origin of data mining– What it is/ isn’t– The KDD process– Types of data– Data mining tasks
• Association rule mining, sequential rules, clustering, classification, anomaly detection
Course contents Chapter 2 Data issues
– What is data set?– Types of attributes– Transformation for different types– Types of data
• Structured data, record data, data matrix, document data, transaction data, graph data, ordered data
– Data quality• Noise and outliers, missing values,
inconsistent/duplicate data
Course contents Chapter 3 Data preprocessing
– Why Data Preprocessing?– Why Is Data Preprocessing Important?– Major Tasks in Data Preprocessing
• Data Cleaning• Data integration• Data transformation• Data reduction• Data discretization
Course contents Chapter 4 Association rule mining
– Introduction– The Model– Goal and Key Features– Mining Algorithms– Problems with the Association Rule Model– Issues of association rules– Other Main Works on Association Rules
Course contents Chapter 5 Classification
– Overview– An example application– Definition– Classification Model– General Approach– Classification—A Two-Step Process– Classification Techniques– Evaluating classification methods– Decision Tree Based Classification, rule based classifiers, nearest
neighbor classifiers etc
Course contents Chapter 6 Clustering
– Introduction– What is/is not cluster analysis?– Examples of clustering applications– Concepts of clustering– Types of data in clustering analysis– Types of clustering – hierarchical, partitional– Major Clustering Techniques– Types of clusters– Clustering algorithms
Chapter 7 Anomaly Detection Applications Causes of anomalies Approaches to anomaly detection
– Statistical– Proximity-based outlier detection– Density-based outlier detection– Clustering-based techniques
Issues dealing with anomalies
Course Contents Chapter 8 Visualization
– What is visualization?– Motivation for visualization– General categories of visualization– Representation– Arrangement– Selection– Do’s and don’ts– Visualization techniques
Course contents Chapter 9 Text mining, web mining
– Introduction– Text processing– Relevance judgement– Web Search– Search engines