Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

Post on 24-May-2015

632 views 0 download

Tags:

description

http://nlp.uned.es/~alpgarcia/pub_index.php

Transcript of Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps

Web Page Clustering Using a Fuzzy Logic BasedRepresentation and Self-organizing Maps

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez

NLP & IR Group, UNED

December 12, 2008

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 2

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 3

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Objectives

Group HTML documents by content similarity.

Self-Organizing Maps (SOM) to organize, visualize andnavigate through the collection.

Term weighting function taking advantage of HTML tags

Combining, by means of fuzzy logic, heuristic criteria based onthe inherent semantics of some HTML tags and word positionsin the document.

Hypothesis

An improvement in document representation will involve anincrease in map quality.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 4

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 5

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Fuzzy logic

Capturing human expert knowledge.

Close to natural language.

Knowledge base: defined by a set of IF-THEN rules.

Linguistic variables

Defined using natural language words and fuzzy sets.These sets allow the description of the membership degree ofan object to a particular class.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 6

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 7

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 8

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 9

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 10

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 11

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 12

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 13

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 14

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 15

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 16

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 17

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Extended Fuzzy Combination of Criteria

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 18

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 19

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 20

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 21

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 22

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 23

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 24

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Linguistic Variables

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 25

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

1 Fuzzy Logic2 EFCC3 Linguistic Variables4 Knowledge Base

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 26

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Knowledge Base

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 27

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Knowledge Base

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 28

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Knowledge Base

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 29

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Knowledge Base

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 30

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

1 Dimensionality Reduction2 Document Map3 Evaluation Methods

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 31

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Dimensionality Reduction

Input vectors dimension ranging from 100 to 5000

Stopwords, puntuaction marks suffixes, and words occurringless than 50 times in the whole corpus were removed.

Two well known methods:

Document frequency reduction.Random projection method.

Three proposed rank-based methods:

Most Valued Terms.Fixed reduction method.More Frequent Terms until n level.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 32

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

1 Dimensionality Reduction2 Document Map3 Evaluation Methods

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 33

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Document Map Construction

Benchmark dataset for clustering: Banksearch1

10000 documents10 classes

SOM size was set equal to the number of classes of inputdocuments, i.e. 5x2, in order to compare clustering results.

1M. P. Sinka and D. W. Corne. A large benchmark dataset for web document clustering. Soft Computing

Systems: Design, Management, and Applications, 2002.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 34

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

1 Dimensionality Reduction2 Document Map3 Evaluation Methods

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 35

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Evaluation Methods

Weighted average of the F-measure for each class.

After mapping the collection in the trained map, the classwith greater number of documents mapped on a neuron willbe selected to label the unit.

All the document vectors in a neuron which class is differentfrom the neuron label will be counted as errors.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 36

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 37

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Best reduction for each term weighting function

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 38

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

MFTn reduction provides stability

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 39

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

EFCC+MFTn obtains its best results with thesmallest number of features

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 40

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Table of Contents

1 Objectives

2 Our Approach: Extended Fuzzy Combination of Criteria(EFCC)

3 Experiment Description

4 Results

5 Conclusion

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 41

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Conclusion

Unsupervised document representation method, based onfuzzy logic, focused on clustering HTML documents by meansof self-organizing maps.

MFTn reduction is the most stable reduction in all cases.

EFCC representation allows to obtain better results using asmaller vocabulary.

Smaller number of features needed to represent the inputdocuments and SOM unit vectors, which implies animprovement in computational cost.

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 42

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Thank You!

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 43

Web Page Clustering Using a Fuzzy Logic Based Representation and Self-organizing Maps

Objectives Our Approach Experiment Description Results Conclusion

Related Work

VSM Topic Document Weighting ModifiesInformation Type Function SOM

Self organization ofa Massive Document Yes Yes Text Shannon’s Entrophy NoCollection2

Document Clustering Yes No Text Binary, TF, TF-IDF Nousing Phrases3

Document Clustering Yes Yes Text ESVM, HSVM, HyM Nousing WordNet4

Conceptional SOM5 Yes No Text TF Yes

2T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, and A. Saarela. Self organization of a

massive document collection. IEEE Trans. on Neural Networks, 2000.3

J. Bakus, M. Hussin, and M. Kamel. A som-based document clustering using phrases. In ICONIP, 2002.4

C. Hung and S. Wermter. Neural network based document clustering using wordnet ontologies. Int. J.Hybrid Intell. Syst., 2004

5Y. Liu, X. Wang, and C. Wu. Consom: A conceptional som model for text clustering. In Neurocomputing,

2008

Alberto P. Garcıa-Plaza, Vıctor Fresno, Raquel Martınez, NLP & IR Group, UNED slide 44