Lecture 31
-
Upload
shani729 -
Category
Engineering
-
view
79 -
download
1
Transcript of Lecture 31
Data Warehousing
Lecture-31Supervised vs. Unsupervised Learning
Virtual University of PakistanVirtual University of Pakistan
Ahsan AbdullahAssoc. Prof. & Head
Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]
Data Structures in Data Mining
• Data matrix– Table or database – n records and m attributes, – n >> m
C1,1 C1,2 C1,3 C1,m
C2,1 C2,2 C2,3 C2,m
C3,1 C3,2 C3,3 C3,m
Cn,1 Cn,2 Cn,3 Cn,m
…
.
.
.…
.
.
.
1 S1,2 S1,3 S1,n
S2,1 1 S2,3 S2,n
S3,1 S3,2 1 S3,n
Sn,1 Sn,2 Sn,3 1
…
.
.
.…
.
.
.
• Similarity matrix– Symmetric square matrix– n x n or m x m
Main types of DATA MINING
Supervised• Bayesian Modeling • Decision Trees• Neural Networks• Etc.
Unsupervised• One-way Clustering• Two-way Clustering
Type and number of classes are NOT known in advance
Type and number of classes are known in advance
Clustering: Min-Max Distance
Age
Salary
20 40 60
outlier Inter-cluster distances are maximized
Intra-cluster distances are
minimized
Classification Process (1): Model Construction
TrainingTrainingDataData
NAME Time Items GenderMoin 10 2 MMunir 16 3 MMeher 15 1 FJaved 5 1 MMahin 20 1 FAkram 20 4 M
ClassificationClassificationAlgorithmsAlgorithms
IF time/items >= 6THEN gender = ‘F’
ClassifierClassifier(Model)(Model)
(observations, measurements, etc.)
Relationship between shopping time and items bought
Classification Process (2): Use the Model in Prediction
TestingTestingDataData Unseen DataUnseen Data
(Firdous, Time= 15 Items = 1)
ClassifierClassifier
Gender?NAME Time Items GenderTahir 20 1 MYounas 11 2 MYasin 3 1 M
The K-Means Clustering: Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
A B
D C