Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan...

16
Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Virtual University of Pakistan Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: [email protected]

Transcript of Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan...

Page 1: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data Warehousing

Lecture-31Supervised vs. Unsupervised Learning

Virtual University of PakistanVirtual University of Pakistan

Ahsan AbdullahAssoc. Prof. & Head

Center for Agro-Informatics Researchwww.nu.edu.pk/cairindex.asp

National University of Computers & Emerging Sciences, IslamabadEmail: [email protected]

Page 2: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data Structures in Data Mining

• Data matrix– Table or database – n records and m attributes, – n >> m

C1,1 C1,2 C1,3 C1,m

C2,1 C2,2 C2,3 C2,m

C3,1 C3,2 C3,3 C3,m

Cn,1 Cn,2 Cn,3 Cn,m

.

.

.…

.

.

.

1 S1,2 S1,3 S1,n

S2,1 1 S2,3 S2,n

S3,1 S3,2 1 S3,n

Sn,1 Sn,2 Sn,3 1

.

.

.…

.

.

.

• Similarity matrix– Symmetric square matrix– n x n or m x m

Page 3: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Main types of DATA MINING

Supervised• Bayesian Modeling • Decision Trees• Neural Networks• Etc.

Unsupervised• One-way Clustering• Two-way Clustering

Type and number of classes are NOT known in advance

Type and number of classes are known in advance

Page 4: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Clustering: Min-Max Distance

Age

Salary

20 40 60

outlier Inter-cluster distances are maximized

Intra-cluster distances are

minimized

Page 5: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

How Clustering works?

Page 6: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

One-way clustering example

INPUT OUTPUT

Black spotsare noise

White spotsare missing

data

Page 7: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Data Mining Agriculture data

INPUT Clustered OUTPUT

clusters

Page 8: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Which class?

Classifier (model)

Unseen Data

Classification

Page 9: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Output

ConfidenceLevel

Inputs

How Classification work?

Page 10: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Classification Process (1): Model Construction

TrainingTrainingDataData

NAME Time Items GenderMoin 10 2 MMunir 16 3 MMeher 15 1 FJaved 5 1 MMahin 20 1 FAkram 20 4 M

ClassificationClassificationAlgorithmsAlgorithms

IF time/items >= 6THEN gender = ‘F’

ClassifierClassifier(Model)(Model)

(observations, measurements, etc.)

Relationship between shopping time and items bought

Page 11: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Classification Process (2): Use the Model in Prediction

TestingTestingDataData Unseen DataUnseen Data

(Firdous, Time= 15 Items = 1)

ClassifierClassifier

Gender?NAME Time Items GenderTahir 20 1 MYounas 11 2 MYasin 3 1 M

Page 12: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Clustering vs. Cluster Detection

Page 13: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

Clustering vs. Cluster Detection Example

AA BB

Page 14: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

The K-Means Clustering

Page 15: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

The K-Means Clustering: Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

A B

D C

Page 16: Data Warehousing Lecture-31 Supervised vs. Unsupervised Learning Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.

The K-Means Clustering: Comment