EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

download EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

of 29

Transcript of EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    1/29

    EFFECTIVE TERM BASED TEXT CLUSTERINGALGORITHMS

    NIBAS P.P

    EPAHECS033

    Government Engineering College

    Sreekrishnapuram

    Palakkad

    November 25, 2010

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1http:///reader/full/page1http:///reader/full/page2http:///reader/full/page1http:///reader/full/page1
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    2/29

    CONTENTS

    INTRODUCTION

    REQUIREMENT OF INFORMATION RETRIEVAL

    DOCUMENT PREPROCESSING

    TEXT CLUSTERING ATTRIBUTES SELECTIONPROBLEM DEFINITION

    FTC (Frequent Term-based Clustering)

    CLUSTERING ALGORITHMS

    APPLICATION

    CONCLUSION

    REFERENCE

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page3http:///reader/full/page2http:///reader/full/page2http:///reader/full/page1http:///reader/full/page3http:///reader/full/page1http:///reader/full/page1
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    3/29

    INTRODUCTION

    In every industry, almost all the documents on paper havetheir electronic copies.This is because electronic format provides:a) safer storage

    b) smaller sizec) quick access to documents

    Text clustering methods can be used to group large sets oftext documents.

    Document clustering is the automatic organization ofdocuments into clusters or groups. So grouping is based onthe principle of maximizing intra-cluster similarity andminimizing inter-cluster similarity.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page4http:///reader/full/page3http:///reader/full/page3http:///reader/full/page2http:///reader/full/page4http:///reader/full/page1http:///reader/full/page2
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    4/29

    REQUIREMENT OF INFORMATION RETRIEVAL

    To improve the result of information retrieval for documentclustering and the requirements of information retrieval is stated asfollows:

    The document model preserves the sequential relationshipbetween words in the document.

    Associating a meaningful label to each final Cluster isessential.

    Overlapping between documents should be allowed.The high dimensionality of text document should be reduced.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page5http:///reader/full/page4http:///reader/full/page4http:///reader/full/page3http:///reader/full/page5http:///reader/full/page1http:///reader/full/page3
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    5/29

    DOCUMENT PREPROCESSING

    All text clustering methods require several steps of

    preprocessing of data.Non-textual information such as HTML tags and punctuationare removed from the documents.

    Mostly the contexts of the documents are represented bynouns.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page6http:///reader/full/page5http:///reader/full/page5http:///reader/full/page4http:///reader/full/page6http:///reader/full/page1http:///reader/full/page4
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    6/29

    Contd...

    Based on this, following assumptions were made to achievedocument dimension reduction:

    Elimination of words which possess less than 3 characters.

    Elimination of general words.Elimination of adverbs and adjectives.

    Elimination of verbs.

    To achieve frequent term generation

    For small document, each line is treated as a record.

    For large document, each paragraph is treated as a record.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page7http:///reader/full/page6http:///reader/full/page6http:///reader/full/page5http:///reader/full/page7http:///reader/full/page1http:///reader/full/page5
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    7/29

    TEXT CLUSTERING ATTRIBUTES SELECTION

    Text clustering is performed in two stages:

    Frequent term set generation.

    Grouping of frequent term documents.Frequent term set generation is characterised by the attributeminimum support threshold.

    Grouping of frequent term documents is characterised by the

    attribute matching threshold.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page8http:///reader/full/page7http:///reader/full/page7http:///reader/full/page6http:///reader/full/page8http:///reader/full/page1http:///reader/full/page6
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    8/29

    Contd...

    Minimum Support ThresholdThe document database is reduced, based on the value calledminimum support threshold.If the minimum support threshold takes less value, then thedimension reduction is less. Inorder to get more reduction in

    size the value of minimum support should be high.

    Matching Threshold

    The grouping of documents is carried out by finding the matchof frequent terms between the documents which is measured

    by a value called matching threshold.Matching is the ratio of number of common terms betweendocuments to the total number of terms.For low matching threshold value ,the grouping of document ishigh and for high matching threshold value ,the grouping ofdocument is less.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page9http:///reader/full/page8http:///reader/full/page8http:///reader/full/page7http:///reader/full/page9http:///reader/full/page1http:///reader/full/page7
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    9/29

    PROBLEM DEFINITION

    Let D = {d1, d2, d3, . . . , dn} be the set of text documents.

    T be the set of all terms occurring in the documents of D.

    d1 = {t11, t12, . . . , t1m}, d2 ={t21, t22, . . . , t2m} be aset of frequent terms in document d1 and d2.

    Let F={f1,f2,...fk} be the set of all frequent term sets in Dwith respect to min-support, where min-support be a realnumber.

    The cover of each element fi of F can be regarded as a cluster.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page10http:///reader/full/page9http:///reader/full/page9http:///reader/full/page8http:///reader/full/page10http:///reader/full/page1http:///reader/full/page8
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    10/29

    Contd...

    Let the clustering of D in m sets be defined as R ={C1, C2,C3, . . . , Cm} such that each cluster Ci contains atleast onedocument. Ci= NULL,i= 1 . . . . m.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page11http:///reader/full/page10http:///reader/full/page10http:///reader/full/page9http:///reader/full/page11http:///reader/full/page1http:///reader/full/page9
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    11/29

    FTC(Frequent Term-based Clustering)

    Problems of text clustering such as:

    Very high dimensionality of the data.Understandability of the clustering descriptions.

    So a frequent term based approach of clustering has beenintroduced.

    Frequent Term based Clustering (FTC) is a text clusteringtechnique which uses frequent term sets and dramatically

    decreases the dimensionality of the document vector space.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page12http:///reader/full/page11http:///reader/full/page11http:///reader/full/page10http:///reader/full/page12http:///reader/full/page1http:///reader/full/page10
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    12/29

    CLUSTERING ALGORITHMS

    Algorithms for effective Text clustering are:1. Min-match Cluster Algorithm2. Max-match cluster algorithm3. Min-Max match cluster algorithm

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page13http:///reader/full/page12http:///reader/full/page12http:///reader/full/page11http:///reader/full/page13http:///reader/full/page1http:///reader/full/page11
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    13/29

    Min-match Cluster Algorithm

    Let A and B be two frequent term sets of documents d1 andd2 represented as vectors.

    Matching denoted as min(Vm) and defined as the number ofcommon elements between vector A and B to number of

    elements in the minimum of two sets.

    Example

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page14http:///reader/full/page13http:///reader/full/page13http:///reader/full/page12http:///reader/full/page14http:///reader/full/page1http:///reader/full/page12
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    14/29

    Algorithm

    D: Document databaseFTL: frequent term listCL: Cluster listFT: frequent termsMin-Cluster(CL,FTL,D)

    1. For each FT i in FTL do2. t1 = ith index frequent terms3. Initialise high percent matching = -1 and cluster index= -14. For each FT j in FTL do

    5. if (i= j) then t2 = jth index frequent words6. if (t1.length < t2.length) then total terms = t1.length7. Else total terms=t2.length End if8. match= Calculate matching terms between vector i and j usingBinary Search

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page15http:///reader/full/page14http:///reader/full/page14http:///reader/full/page13http:///reader/full/page15http:///reader/full/page1http:///reader/full/page13
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    15/29

    9. matching percent = match * 100 / total terms10. if (matching percent> matching threshold) and(high percent matching matching percent) thenhigh percent matching = matching percent and cluster index = j11. End if12. End if

    13. Next loop (j)14. if (cluster index = -1) then15. Add frequent term list(cluster index) to frequent term list(i)16. Add Cluster list(cluster index) to Cluster list(i)17. Remove Cluster list(cluster index)from Cluster list

    18. Remove frequent term list(cluster index)fromfrequent term list19. End if20. Next loop (i)

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page16http:///reader/full/page15http:///reader/full/page15http:///reader/full/page14http:///reader/full/page16http:///reader/full/page1http:///reader/full/page14
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    16/29

    Contd...

    In this algorithm,step 2 select a vector as a comparable vector.step 5 to 7 is used to find out the minimum vector from thetwo input vectors specified in step 2 & 5 and assign its lengthas minimum vector count.

    In step 8, the matching terms between two vectors arecalculated by using binary search concept.

    In step 9, matching percentage between vectors is calculatedusing minimum vector count.

    In step 10, the highest matching vector between the twovectors is selected and updates the value of highest matchvector.

    step 5 to 11 is repeated until the comparable vector has tocompare all the remaining vectors.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page17http:///reader/full/page16http:///reader/full/page16http:///reader/full/page15http:///reader/full/page17http:///reader/full/page1http:///reader/full/page15
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    17/29

    Contd...

    In steps 15 and 16, if the highest match vector is found, then :a) Its frequent terms are added to the terms of comparablevector selected in step 2.

    b) Add the highest match cluster to the comparable cluster(step 16).

    In steps 17 and 18, remove the highest match cluster from thecluster list (step 17).

    Remove the highest match cluster terms from the frequentterm list (step 18).

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page18http:///reader/full/page17http:///reader/full/page17http:///reader/full/page16http:///reader/full/page18http:///reader/full/page1http:///reader/full/page16
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    18/29

    Max-match cluster algorithm

    Let A and B be two frequent term sets of documents d1 andd2 represented as vectors.

    Matching denoted as max(Vm) and defined as the number ofcommon elements between vector A and B to number ofelements in the maximum of two sets.

    Example

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page19http:///reader/full/page18http:///reader/full/page18http:///reader/full/page17http:///reader/full/page19http:///reader/full/page1http:///reader/full/page17
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    19/29

    Algorithm

    D: document databaseFTL: frequent term listCL: Cluster listFT: frequent termsMax-Cluster(CL,FTL,D)

    1. For each FT i in FTL do2. t1 = ith index frequent words3. Initialise high percent matching = -1 and cluster index= -14. For each FT j in FTL do5. if (i= j) then t2 = jth index frequent words

    6. if (t1.length

  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    20/29

    9. matching percent = match * 100 / total terms10. if (matching percent>matching threshold) and(high percent matching< matching percent) thenhigh percent matching = matching percent and cluster index = j11. End if12. End if

    13. Next loop (j)14. if (cluster index= -1) then15. Add frequent term list(cluster index) to frequent term list(i)16. Add Cluster list(cluster index) to Cluster list(i)17. Remove Cluster list(cluster index)from Cluster list

    18. Remove frequent term list(cluster index)fromfrequent term list19. End if20. Next loop (i)

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page21http:///reader/full/page20http:///reader/full/page20http:///reader/full/page19http:///reader/full/page21http:///reader/full/page1http:///reader/full/page19
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    21/29

    Contd..

    Here the only difference is that here we find the maximum

    vector count of two input vectors.

    Rest of the steps are same as illustrated in the previousalgorithm.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page22http:///reader/full/page21http:///reader/full/page21http:///reader/full/page20http:///reader/full/page22http:///reader/full/page1http:///reader/full/page20
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    22/29

    Min-Max match cluster algorithm

    The matching is denoted by min-max(Vm) and is defined as thenumber of matching terms multiplied by 2 to the number ofelements of two sets

    Example

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page23http:///reader/full/page22http:///reader/full/page22http:///reader/full/page21http:///reader/full/page23http:///reader/full/page1http:///reader/full/page21
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    23/29

    Algorithm

    D: document databaseFTL: frequent term list (set contains set of Frequent Terms)CL: Cluster list (set contains set of Input Files Names)FT: frequent termst1, t2: Frequent Term Set

    Min-MaxCluster (CL,FTL,D)1. For each FT i in FTL do2. t1 = ith index frequent words3. Initialise high percent matching = -1 and cluster index= -14. For each FT j in FTL do

    5. if (i= j) then t2 = jth index frequent words6. t3 = ith FTL UNION jth FTL7. total terms = t3.length8. match= Calculate matching terms between vector i and j usingBinary Search

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page24http:///reader/full/page23http:///reader/full/page23http:///reader/full/page22http:///reader/full/page24http:///reader/full/page1http:///reader/full/page22
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    24/29

    9. matching percent = match * 2* 100 / total terms10. if (matching percent> matching threshold) and

    (high percent matching< matching percent) thenhigh percent matching = matching percent and cluster index = j11. End if12. End if

    13. Next loop (j)14. if (cluster index= -1) then15. Add frequent term list(cluster index) to frequent term list(i)16. Add Cluster list(cluster index) to Cluster list(i)17. End if

    18. Remove Cluster list(cluster index)from Cluster list19. Remove frequent term list(cluster index)fromfrequent term list20. Next loop (i)

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page25http:///reader/full/page24http:///reader/full/page24http:///reader/full/page23http:///reader/full/page25http:///reader/full/page1http:///reader/full/page23
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    25/29

    Contd...

    Here the first difference is that we are considering the total

    number of items present in the all sets.

    Another main difference is that we multiply the numerator bythe number of vectors.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page26http:///reader/full/page25http:///reader/full/page25http:///reader/full/page24http:///reader/full/page26http:///reader/full/page1http:///reader/full/page24
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    26/29

    APPLICATION

    Document clustering has wide application in areas such as :

    web miningIt is the process of discovering patterns from the web.

    search engineIt is designed to search for information on the world wide web.

    information retrievalIt is the science of searching for documents,for information

    within documents,for metadata about documents as well assearching relational database and world wide web.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page27http:///reader/full/page26http:///reader/full/page26http:///reader/full/page25http:///reader/full/page27http:///reader/full/page1http:///reader/full/page25
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    27/29

    CONCLUSION

    For effective text clustering, three new clustering algorithmswere proposed.

    All the three algorithms are compared with the standard FTCalgorithm to show their competency.

    The developed three algorithms perform better cluster qualitythan FTC algorithm.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page28http:///reader/full/page27http:///reader/full/page27http:///reader/full/page26http:///reader/full/page28http:///reader/full/page1http:///reader/full/page26
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    28/29

    References

    1 Ponmuthuramalingam P et. al,Effective Term Based TextClustering Algorithms,(IJCSE) Vol. 02, No. 05, 2010,1665-1673

    2 Beil F., Ester M. and Xu X.,Frequent Term-based Text

    Clustering,Proceedings of the 8th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, 2002,436-442

    3 Dubes R.C and Jain A.K,Algorithms for Clustering

    Data,Prentice Hall,Englewood Cliffs N.J,U.S.A,1988.4 Fung B.C.M,Wang K and Ester M,Hierarchial Document

    Clustering using Frequent Item sets,Proceedings of SIAMInternational Conference on Data Mining,2003,180-304

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page28http:///reader/full/page28http:///reader/full/page27http:///reader/full/page29http:///reader/full/page1http:///reader/full/page27
  • 8/8/2019 EFFECTIVE TERM BASED TEXT CLUSTERING ALGORITHMS

    29/29

    THANK YOU.

    http:///reader/full/page1http:///reader/full/findhttp:///reader/full/gobackhttp:///reader/full/page29http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page1http:///reader/full/page1http:///reader/full/page29http:///reader/full/page29http:///reader/full/page29http:///reader/full/page28http:///reader/full/page29http:///reader/full/page1http:///reader/full/page28