DCA Mini Project Presentation
Click here to load reader
-
Upload
gmbakthavatchalam -
Category
Documents
-
view
220 -
download
0
Transcript of DCA Mini Project Presentation
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 1/11
RESULT SETRESULT SETCATEGORIZATIONCATEGORIZATION Categorizing the Document Search
Result Set
BYY. ( )BAKTAVATCHALAM 08MW03( )BAKTAVATCHALAM 08MW03
SG COLLEGE OF SG COLLEGE OFTECHNOLOGYECHNOLOGY
A Review on
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 2/11
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 3/11
INTRODUCTIONINTRODUCTION
– Text categorization is the classification of the information resource by its topic(politics, sport, etc), selected from apredetermined set.
–
– Here Searching a given keyword set in agiven website set and categorizes thewebsites. If a keyword set is given then it
will determine the documents which aremost relevant to that keyword set and alsoretrieve the category which it belongs tothat keyword set.
– – Here we do search for all kinds of TextualPSG College Of Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 4/11
PSG College Of
INTRODUCTION…INTRODUCTION…
– Here each key is associated with some Threshold value for ranking the result set.Each category is associated withcorresponding key set and weights of
those key set. Ranking is done bydocument key set weights and occurrencecount of those key set. The categories andtheir related categories are maintained
separately to refine result set. – Here we do two independent operations.
First we generate the categories and itsrelated categories. Second, we givekeywords to search engine to search the
document and its corresponding category.Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 5/11
PSG College Of
EXISTING SYSTEMEXISTING SYSTEM
DRAWBACKSDRAWBACKS
– Here the result set is not sorted according torelevance rather it is sorted by filename,date, size … So User will not get accurateresult and each time it search throw all
given File set, So the response time is veryhigh.
–
– Here categorized result is not available, so
User doesn’t know which file is whichcategory if many files has same name.Also user doesn’t know which file isrelated to which file.
Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 6/11
DESIGNDESIGN
PSG College Of
Key Set
Websites
( & )Documents Pages
Document Finder Categorizer
Search Keyword Documents
( , )Key Set Category
&Categories Related Categories+Documents Categories
Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 7/11
IMPLEMENTATIONIMPLEMENTATION
• Server Module This module contains following sub-modules, Load Details Categorizing
Searching• Load Details In this module we load Categories & its related
categories, Documents & its categories, Categories & itsKeys with Weights. Weight is given as 0 to 100.
• Categorizing In this module we categorize the given document
using key set parsed from that document andcorresponding weights relevant to available categories.
• Searching In this module we search documents and its
category using given key set. PSG College Of Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 8/11
IMPLEMENTATIONIMPLEMENTATION
• Parser Module This module contains following sub-modules, Load Module URL Content Grabber Module
• Load Module In this module we load keywords from server and
then retrieve URL to begin searching.
• URL Content Grabber Module Whenever a URL is coming from server then the
parser makes connection to that URL and retrieves thecontents to begin searching and after it collects key setsfrom that site.
PSG College Of Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 9/11
EVALUATION RESULTSEVALUATION RESULTS
• Parameters
– Input Keys:25
– Input Files: 15 (Size on avg. 15KB )
• Existing System – Time : 5 secs
– Accuracy : 89% { No categorization }
• Our System – Time : 5 secs
– Accuracy : 92% { with Category Listing }
PSG College Of Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 10/11
CONCLUSIONCONCLUSION
Thus the user can able to do searching of a set of keywords in a list of websites and the user can able toview the each keyword count for a particular website.This searching is very useful for crawl the websites withparticular perspective view of specific content.
PSG College Of Distributed Component Lab
8/14/2019 DCA Mini Project Presentation
http://slidepdf.com/reader/full/dca-mini-project-presentation 11/11
THANK YOUTHANK YOU
Distributed Component Lab PSG College Of
:eferencesv ,Saturnino Luz : Implementing a Text Categorization System a
- - step by step tutorial
v . . .A McCallum and K Nigam A comparison of event models for naive
.Bayes text classi cation / -In AAAI ICML 98 Workshop on Learning ,for Text Categorization – . , .pages 41 48 AAAI Press 1998
v . . . .Y Yang and J O Pedersen A comparative study on feature .selection in text categorization