TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

25
TagLearner: A P2P Classifier Learning TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Tex System from Collaboratively Tagged Tex t Documents t Documents Haimonti Dutta Haimonti Dutta 1 , Xianshu Zhu , Xianshu Zhu 2 , Tushar Muhale , Tushar Muhale 2 , Hillol Kargupta , Hillol Kargupta 2 , , Kirk Borne Kirk Borne 3 , Codrina Lauth , Codrina Lauth 4 , Florian Holz , Florian Holz 5 , and Gerherd Heyer , and Gerherd Heyer 5 1 Columbia University 2 University of Maryland, Baltimore County 3 George Mason University 4 Fraunhofer Institute for Intelligent Analysis and Information Systems 5 University of Leipzig

description

TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents. Haimonti Dutta 1 , Xianshu Zhu 2 , Tushar Muhale 2 , Hillol Kargupta 2 , Kirk Borne 3 , Codrina Lauth 4 , Florian Holz 5 , and Gerherd Heyer 5. 1 Columbia University - PowerPoint PPT Presentation

Transcript of TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Page 1: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearner: A P2P Classifier Learning System frTagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documentsom Collaboratively Tagged Text Documents

Haimonti DuttaHaimonti Dutta11, Xianshu Zhu, Xianshu Zhu22, Tushar Muhale, Tushar Muhale22, Hillol Kargupta, Hillol Kargupta22, Kirk Borne, Kirk Borne33, , Codrina LauthCodrina Lauth44, Florian Holz, Florian Holz55, and Gerherd Heyer, and Gerherd Heyer55

1Columbia University 2University of Maryland, Baltimore County

3George Mason University4Fraunhofer Institute for Intelligent Analysis and Information Systems

5University of Leipzig

Page 2: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

OutlineOutline• Introduction and Motivation• Related Work• TagLearner• Distributed Classifier-learning Algorithm• Experiments• Conclusion and Future Work

Page 3: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

IntroductionIntroduction• Large Online Document Repositories:

– Online Newspapers, Digital Libraries, etc.– Growing in size

• Text categorization on the repositories:– No automated text classification mechanism– Performed by authorities, such as librarians

Impractical

Page 4: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Introduction (cont.)Introduction (cont.)• Collaborative tagging

– Del.icio.us, Flickr, Google image labeler– Recruit web users to add tags to a resource– Help to utilize power of people’s knowledge

• Pros and cons– Improve web search result, help on classification– Not support by most online text repositories – Lack of control

• Absence of standard keywords• Errors in tagging due to spelling errors• Harder to manage due to increased content diversity

Page 5: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

MotivationMotivation• Provide automated classification service

– Utilize collaborative effort of users • Collaborative tagging in Peer-to-Peer network

– Without repositories’ support

P2P Classifier learning system

Page 6: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Related WorkRelated Work• Collaborative tagging:

– Recommendation System (Tso-Sutter et al.)– Web search (Yahia et al.)– Classification accuracy (Brooks et al.)

• Distributed Linear Programming:– Distributed Simplex Algorithm (Dutta et al.)

Page 7: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearner: A P2P Classifier Learning SystemTagLearner: A P2P Classifier Learning System

Page 8: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearner: A P2P Classifier Learning SystemTagLearner: A P2P Classifier Learning System

Page 9: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearner: A P2P Classifier Learning SystemTagLearner: A P2P Classifier Learning System

Page 10: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearner: A P2P Classifier Learning SystemTagLearner: A P2P Classifier Learning System

Page 11: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Service provider:

provide P2P classifier learning service

TagLearnerTagLearner

- Register service by creating a tagging group

- Maintain a tagging group for this service- Predefined Labels used for tagging- Features for classification- Group members- Learnt classifier model

Page 12: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

TagLearnerTagLearner

• Interface: - Join or leave the tagging group - Tag the web documents• Distributed classifier learning algorithm

Client side browser plugin

Page 13: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents
Page 14: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Classifier Design by Linear Classifier Design by Linear ProgrammingProgramming

• Classification problem can be framed as a linear programming problem

kx

Class 1

Class 2

:feature vector of k-th instanceW : weight vector We want to find a W such that:

W can be found by minimizing the error

dWxk

kx

Page 15: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Classifier Design by Linear Classifier Design by Linear ProgrammingProgramming

• Maximize: Subject to: where

Use Simplex Method to solve it!

EWz TT SDEW

Px

x

x

...2

1

T

PeeeE ),...,,( 21 TdddD ),...,,( ),...,( 21 PsssS ),...,,( 21 P

Page 16: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Distributed Linear ProgrammingDistributed Linear Programming• Distributed data

– Each user only has a collection of constraints

• Objective function: • Constraints:

05.21167 321 wwwz

5.072 321 www5.033 321 www

5.03 321 www5.05.672 321 www

5.024 321 www

Z W1 W2 W3 value

1 -7 -16 -21.5 0

0 2 1 7 0.5

0 1 3 3 0.5

0 1 4 2 0.5

0 1 1 3 0.5

0 2 7 6.5 0.5

Simplex Tableau

Page 17: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Distributed Simplex AlgorithmDistributed Simplex Algorithm

Each user has different constraints, but wants to solve

the same objective function.

User A User B

User C

User D

Page 18: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Distributed Simplex AlgorithmDistributed Simplex Algorithm

User A User B

User C

User D

Page 19: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Distributed Simplex AlgorithmDistributed Simplex Algorithm

0.5/7=1/14

0.5/3=1/6

0.5/2=1/4

0.5/3=1/6

0.5/6.5=13/4

User A User B

User C

User D

Page 20: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Distributed Simplex AlgorithmDistributed Simplex Algorithm

0.5/7=1/14

0.5/3=1/6

0.5/2=1/4

0.5/3=1/6

0.5/6.5=13/4

User A User B

User C

User D

Page 21: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Experimental ResultsExperimental Results• Distributed Data Mining Toolkit (DDMT)

• “NSF Research Awards Abstracts 1990-2003” data set from the UCI Machine Learning Repository

• We only consider abstracts belonging to Earth and Mathematical sciences

• Features used for classification do not rely on collaboratively generated annotations.

Page 22: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Experiments (cont.)Experiments (cont.)

Figure 1. Communication cost versus the number of nodes in the network

Page 23: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Experiments (cont.)Experiments (cont.)

Page 24: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Conclusion and Future WorkConclusion and Future Work• Conclusion:

– P2P classifier learning system prototype – Scalable distributed classification algorithm based on

linear programming

• Future work:– extension of the classification algorithm for multi-class

classification problems – Improve classification accuracy

Page 25: TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents

Thank you !Thank you !

Questions ?Questions ?