Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to...

15
International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected] Volume 4, Issue 6, June 2015 ISSN 2319 - 4847 Volume 4, Issue 6, June 2015 Page 45 ABSTRACT As with the rapid increase of the huge amount of online information, there is a strong demand for Information retrieving from the web which helps to discover some useful knowledge from Web documents. Agents are often developed not in isolation but as part of a multi-agent system. Single agent is unable to retrieve the information from multiple searching tools. Single agent requires more time to retrieve information than multi-agent. Multi-agent neural network system is an effective solution to large scale Web mining. This work proposes Multi-agent neural network based framework for mining contents of semantic web, which would provide query relevant knowledge using STC (Suffix Tree Cluster) clustering technique. Clustering helps to provide user with query relevant cluster of web contents, which better satisfy user requirement and provides optimal utilization of web surfing time. It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural network is used by us to enable multi-agent to match a query terms which create a clusters and return search result with high accuracy in a reasonably short time. Keywords: Multi-agent Systems, Hierarchical Clustering, neural network, web mining, content mining. 1.INTRODUCTION Due to the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people to discover some useful knowledge from Web documents. There are number of search tools available for retrieving the information, information agent called search tools used to build database indices for web information. But these approaches often causes unacceptable access delays under highly competitive access situations as in search tools. As the amount of information stored in the database indices increases, it leads to the unacceptable access delay problem. To overcome this problem meta search services is used. Figure 1 Meta Search Service Meta search service retrieve the information from multiple search tools. But this technique also have a problem, i.e. multiple source problem means if there exist multiple search tool then the question arises to which source the given query be submitted to retrieve the information. To solve this question neural network is used. In this, the internal neural network mechanism discovers the search tools from which the associated information for the query is retrieved. The single agent approach is impractical for large scale information using the Multi-Agent approach. In this IR system number of agents collaboratively retrieves the desired information from the distributed web search tools. Multi agent neural network system is solution to the large web text mining. A multi-agent system is one in which a number of agents cooperates and interact with each other in a complex and distributed environment. Web mining [11] can be defined as mining of the World Wide Web (WWW) to find useful knowledge about user behavior, content, and structure of the web. Web Content Mining focuses on extracting knowledge from the contents or their descriptions. It involves techniques for summarizing, classification and clustering of the web contents. It can provide useful and interesting patterns about user needs and contribution behavior. In this paper, the neural network is applied using multi-agent for semantic web content mining. In the present era of WWW, the user is more interested in getting useful, relevant and knowledge oriented contents from the WWW. The paradigm is shifting from demand of Research on Neural Network Based MultiAgent Semantic Web Content Mining 1 Ms. Ashwini H. Bhuskat , Ms. Nupoor M. Yawale 2 , Ms. Pranita P. Deshmukh 3 , Ms. Rutuja A. Gulhane 4 , Ms. Meghana A. Deshmukh 5 PRMIT&R, Badnera (INDIA)

Transcript of Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to...

Page 1: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 45

ABSTRACT As with the rapid increase of the huge amount of online information, there is a strong demand for Information retrieving from the web which helps to discover some useful knowledge from Web documents. Agents are often developed not in isolation but as part of a multi-agent system. Single agent is unable to retrieve the information from multiple searching tools. Single agent requires more time to retrieve information than multi-agent. Multi-agent neural network system is an effective solution to large scale Web mining. This work proposes Multi-agent neural network based framework for mining contents of semantic web, which would provide query relevant knowledge using STC (Suffix Tree Cluster) clustering technique. Clustering helps to provide user with query relevant cluster of web contents, which better satisfy user requirement and provides optimal utilization of web surfing time. It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural network is used by us to enable multi-agent to match a query terms which create a clusters and return search result with high accuracy in a reasonably short time. Keywords: Multi-agent Systems, Hierarchical Clustering, neural network, web mining, content mining.

1.INTRODUCTION Due to the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people to discover some useful knowledge from Web documents. There are number of search tools available for retrieving the information, information agent called search tools used to build database indices for web information. But these approaches often causes unacceptable access delays under highly competitive access situations as in search tools. As the amount of information stored in the database indices increases, it leads to the unacceptable access delay problem. To overcome this problem meta search services is used.

Figure 1 Meta Search Service

Meta search service retrieve the information from multiple search tools. But this technique also have a problem, i.e. multiple source problem means if there exist multiple search tool then the question arises to which source the given query be submitted to retrieve the information. To solve this question neural network is used. In this, the internal neural network mechanism discovers the search tools from which the associated information for the query is retrieved. The single agent approach is impractical for large scale information using the Multi-Agent approach. In this IR system number of agents collaboratively retrieves the desired information from the distributed web search tools. Multi agent neural network system is solution to the large web text mining. A multi-agent system is one in which a number of agents cooperates and interact with each other in a complex and distributed environment. Web mining [11] can be defined as mining of the World Wide Web (WWW) to find useful knowledge about user behavior, content, and structure of the web. Web Content Mining focuses on extracting knowledge from the contents or their descriptions. It involves techniques for summarizing, classification and clustering of the web contents. It can provide useful and interesting patterns about user needs and contribution behavior. In this paper, the neural network is applied using multi-agent for semantic web content mining. In the present era of WWW, the user is more interested in getting useful, relevant and knowledge oriented contents from the WWW. The paradigm is shifting from demand of

Research on Neural Network Based MultiAgent Semantic Web Content Mining

1Ms. Ashwini H. Bhuskat , Ms. Nupoor M. Yawale2, Ms. Pranita P. Deshmukh3,

Ms. Rutuja A. Gulhane4 , Ms. Meghana A. Deshmukh5

PRMIT&R, Badnera (INDIA)

Page 2: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 46

information to demand for knowledge. Web content mining when applied on semantic web contents can lead to discovery of knowledge that could be provided to end users to better serve their requirements.

2.RELATED WORK Shu Bo and Kak Subhash [29] propose Meta search engine-Anvish. Determining the relevancy of web pages to a query term is basic to the working of any search engine. The relevancy of search results on a meta search engine is classified by using neural network algorithm. The meta search engine is able to handle a query term in a reasonably short time and return the search results with high accuracy. The meta search engine-Anvish has learning neural network to classify and organize the search results[29]. Anvish is further improved by expanding the current binary network into continuous network so that the Web pages can be ranked quantitatively. Choi, Y.S. and Yoo, S.I.[7], The multi agent learning approach was proposed to the web information retrieval using neural network . This approach provides the method for locating the search tools that will give the desired information and retrieving the relevant information. This approach also captures the user interest and retrieves such information. The learning and generalization mechanism of artificial neural network can be effectively utilized as an internal knowledge mechanism of each IR agent in the context of the multi- agent Web IR. In this approach the BPN (Back Propagation neural Network) algorithm is used for this neural network associative memory. As the large amount of information provided on the web; single agent fails to retrieve the information from multiple search tools. The problem of unacceptable access delays occurs due to the under highly competitive access situations as in search tools. This situation occurs as the amount of information stored in the database increases. So then the neural network [16] is used for single agent, but this have the problem of convergence as the number of search tools increases, the single agent information comes to have difficulty in training its neural network. Such problem is called ‘bounded rationality’ for single agent approach. This approach is impractical or inefficient for the multiple search tools, so the multi agent hierarchically organized multi-agent approach is proposed. In this system, information agents interacting with each other learn collaboratively about their environment from user’s feedback so that it can retrieve the desired information effectively from the distributed Web search tools. In this system, each information agent can dynamically join or leave the collaborative organization and the information sources are subject to asynchronous changes of their themes, contents, and structures. With the rapid increase of the huge amount of online information, there is a strong demand for Web text mining which helps people discover some useful knowledge from Web documents. For this purpose back propagation neural network (BPNN)-based Web text mining system for decision support is used[22][24]. BPNN is used as an intelligent learning agent that learns about Web documents. To access the information from large number of web document, multi-agent-based neural network system for Web text mining is used. Singh A [30][31], uses Ontology to retrieve the information from web. Here ontology plays an important role in enhancing the efficiency of existing agent based focused crawlers [31]. Agent based Focused crawlers (ABFC) [6][11] are the intelligent miners, which selectively seek web pages that are most relevant to input information. ABFC follows the context-based approach that analyzes content of web page thereby reducing the redundant information and hence deduces the relevant information from a page. Ontology is defined as a well organized knowledge scheme that represents high level background knowledge with concepts and relations [31][32]. Ontology based crawling [27] eliminates simple keyword based crawling method as it introduces semantics/context for improving crawl efficiency in which a keyword is being searched .This work proposes an Ontology Driven Agent Based Focused Crawler that attempts to improve the efficiency of exiting ABFCs by introducing semantics in which a keyword is searched. Tremendous growth of web-sites and text and multimedia contents on the WWW (World Wide Web) has lead to demand of a strategy which could provide knowledge from the vast data scattered over different servers and also could make useful predictions for otherwise uncertain user behavior. Web mining [10][28] can be defined as mining of the World Wide Web (WWW) to find useful knowledge about user behavior, content, and structure of the web. Web mining [20] is uniquely different from data mining as it works on web contents that are unstructured files or server logs in contrast to well-structured databases used in data mining. The next generation of WWW will be knowledge oriented and to satisfy the customers web mining is a promising solution [19][30][42]. Scatter/Gather [9] is the first query result visualization algorithm using the clustering technique in the Information Retrieval community. Before this work, document clustering was traditionally investigated mainly as a method for improving document search and retrieval, but was not widely used because of speed and poor performance of improving near-neighbor search. Instead of attempting to reduce the number of documents returned, Cutting et al. introduced document clustering as a document browsing method. They state that the Scatter/Gather system is particularly helpful in situations in which it is difficult or undesirable to specify a query formally: 1. When the user is not looking for anything specific, just wants to discover the general information content of the corpus (to gain an overview);

Page 3: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 47

2. When it is difficult to formulate the query precisely (help user formulate a search request). Two near linear time clustering algorithms were presented: Buckshot and Fractionation. However, their work is based on general document collections, not on dynamically generated search results. Zamir & Etzioni [40] followed this paradigm and propose the notion of search results clustering (also called ephemeral clustering. In their Grouper system, STC (Suffix Tree Clustering) treats a document as a string instead of a set of words. It attempted to cluster documents “snippets” returned by search engine according to common phrases they contain, thus employing information about the proximity and order of single keywords in addition to their frequencies. STC is organized into two phases: discovering base clusters using a suffix tree and merging base clusters. In the first stage, the retrieved document “snippets” are inserted into a suffix tree, where each node in the tree represents a group of documents and a phrase that is common to all of them. For each phrase shared by two or more documents, they assign a score: s(B)=|B| * f(|P|) to penalize single word terms, where |B| is the number of documents in base cluster B, and |P| is the length of the phrase P. Only the base clusters whose score is higher than an arbitrarily chosen minimal base cluster score are retained. In the second phase, N top-ranking base clusters are merged using a version of the AHC algorithm, with binary single-link merge criterion and predetermined minimal similarity between base clusters as the halting criterion [13]. The two distinguishing features of STC are: linear time complexity; clustering documents according to shared phrases instead of word frequency. These make it “a substantial momentum” [17] of ephemeral clustering. Carrot system built by Weiss and Stefanowski extended STC’s application into the Polish Language, by using different stemming techniques. They investigate the influence of two primary STC parameters: merge threshold and minimum base cluster score on the number and quality of results produced by STC algorithm. SnakeT [2, 23] constructs two knowledge bases offline and takes advantage of them. The first one is called Anchor Text and Link Database. It is used to enrich the snippets returned by a search engine. The other is called Semantic Knowledge Base, it is used to help ranking in the process of generate “approximate sentences”. This approach is different from Grouper and other approaches as they treated sentences formed by contiguous terms, while SnakeT extracts sentences involving non-contiguous terms. It first extracts “approximate” sentences from the enriched snippets collection, then uses a knowledge base to help ranking, sentences above threshold are used as a set of meaningful labels. These labels will be then used to form and name the clusters and are called the primary label. To generate the labels of the nodes in the higher levels of the hierarchy, k-approximate sentences are used which have a good rank and occur in at least c% of the documents contained in the cluster. This set of secondary labels “provide a description for the cluster at a coarser level and thus is more useful for hierarchical formation and labeling”. In addition to the above academic tools, there also has been a surge of commercial interest in novel IR-tools that support users in searching tasks [11]. The following are existing industrial systems implementing clustering techniques in their (meta) search engines: Vivisimo, Grokker, Clusty and Iboogie provide cluster hierarchies in addition to the flat ranked list of search results; Kartoo use a network visualization interface, Mooter also use a network visualisation tool but followed by hierarchical clusters presentation when a node in the network is clicked; Copernic and Dog-pile concentrate more on supporting users on query formulation (providing revised/refined query suggestions). Among the various clustering search engines, Vivisimo.com deserves a special mention. This commercial Meta search engine organizes search results into hierarchical and very well described thematic groups and can be considered a benchmark and state of- the-art in current research [17]. But very little information about this software is available as it is not publicly accessible. Much academic research attempts to address the search results clustering problem, but the attainable performance are far from the one achieved by Vivisimo. Only SnakeT claims to achieve efficiency and efficacy performance close to it [12]. Following are some snapshots of available commercial clustering search engines.

Figure 2 Home Page of Grouper Clustering Engine

Page 4: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 48

Figure 3 Results of Grouper Clustering Engine for the word ‘clinton’

Figure 4 Home Page and Results of Vivisimo Clustering Engine for word ‘canada’

Page 5: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 49

Figure 5 Home Page and Results of Clusty Clustering Engine for word ‘data mining’

Figure 6 Home Page and Results of iBoogie Clustering Engine for word ‘data mining’

Figure 7 Home Page and Results of HOB Search Clustering Engine for word ‘paris’

Page 6: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 50

Moreover, as specified by [21] in their paper, they reformalize the search result clustering problem as a salient phrases ranking problem. Thus they convert an unsupervised clustering problem to a supervised learning problem. Although a supervised learning method requires additional training data, it makes the performance of search result grouping significantly improve, and enables us to evaluate it accurately. Given a query and the ranked list of search results, their method first parses the whole list of titles and snippets, extracts all possible phrases (n-grams) from the contents, and calculates several properties for each phrase such as phrase frequencies, document frequencies, phrase length, etc. A regression model learned from previous training data is then applied to combine these properties into a single salience score. The phrases are ranked according to the salience score, and the top-ranked phrases are taken as salient phrases. The salient phrases are in fact names of candidate clusters, which are further merged according to their corresponding documents. They specify that their method is more suitable for Web search results clustering because, emphasize the efficiency of identifying relevant clusters for Web users. It generates shorter (and thus hopefully more readable) cluster names, which enable users to quickly identify the topics of a specified cluster. Furthermore, the clusters are ranked according to their salience scores, thus the more likely clusters required by users are ranked higher. Amin Milani Fard, Reza Ghaemi[3],proposes a neuro-fuzzy architecture for Web content taxonomy using hybrid of Adaptive Resonance Theory (ART) neural networks and fuzzy logic concept. The search engine called Kavosh1 is equipped with unsupervised neural networks for dynamic data clustering. This work implements text mining method. Unsupervised self-organizing map (SOM) based neural networks are mainly used for datasets clustering.

3.PROPOSED WORK 3.1Snippet Fetching and Clustering 3.1.1Preprocessing In web search results clustering, it is the web snippets that serve as the input data for the grouping algorithm. Due to the rather small size of the snippets and the fact that they are automatically generated summaries of the original documents, proper data preprocessing is of enormous importance. Although LSI is capable of dealing with noisy data, in a setting where only extremely small pieces of documents are available, this ability is severely limited. As a result, without sufficient preprocessing, the majority of abstract concepts discovered by the LSI would be related to meaningless terms, which would make them useless as cluster labels. Thus, the primary aim of the preprocessing phase is to remove from the input documents all characters and terms that can possibly affect the quality of group descriptions.

Figure 8 Pre-processing of snippets In STC, there are three steps to the preprocessing phase: text filtering, stemming and stop words marking. 3.1.2Text filtering In the text filtering step, all terms that are useless or would introduce noise in cluster labels are removed from the input documents. Among such terms are: • HTML tags (e.g. <table>) and entities (e.g. &amp;) • non-letter characters such as "$", "%" or "#" (except white spaces and sentence markers such as '.', '?' or '!') Note that at this stage the stop-words are not removed from the input documents. Additionally, words that appear in snippet titles are marked in order to increase their weight in further phases of clustering. 3.1.3Stemming Stemming algorithms are used to transform the words in texts into their grammatical root form, and are mainly used to improve the Information Retrieval System’s efficiency. To stem a word is to reduce it to a more general form, possibly its root. For example, stemming the term interesting may produce the term interest. Though the stem of a word might not be its root, we want all words that have the same stem to have the same root. The effect of stemming on searches of English document collections has been tested extensively. Several algorithms exist with different techniques. 3.1.4Elimination of Stop Words After stemming it is necessary to remove unwanted words. There are 400 to 500 types of stop words such as “of”, “and”, “the,” etc., that provide no useful information about the document’s topic. Stop-word removal is the process of removing these words. Stop-words account for about 20% of all words in a typical document [4]. These techniques greatly reduce the size of the search engine’s index. Stemming alone can reduce the size of an index by nearly 40%. To compare a webpage with another webpage, all unnecessary content must be removed and the text put into an array. Although they alone do not present any descriptive value, stop words may help to understand or disambiguate the

Text Filtering Stemming

Stop words Removal

Page 7: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 51

meaning of a phrase (compare: "Chamber Commerce" and "Chamber of Commerce"). That is why we have decided to retain them in the input documents, only adding appropriate markers. This will enable the further phases of the algorithm to e.g. filter out phrases ending with a stop word or prevent the LSI from indexing stop words at all. Following tables 1 and 2 represent the document “Software for the sparse singular value decomposition” pre-processing steps:

Table 1: Preprocessing Steps

Table 2: Result of Pre-processing Steps 3.2 Finding Similarity Between Input Query And Fetching Snippets Finding similarity between Input query and fetching snippet is the second sub module of this system. Here the fuzzy neural network is used to check the relevancy of query term into documents. To create a term document frequency matrix, Consider the following seven documents as input snippets, also the frequent terms and phrases are generated by using preprocessing frequents phrase extraction component then, all other components in clustering engine will perform computational operations on it to cluster them in groups. Documents:- D1: Large-scale singular value computations D2: Software for the sparse singular value decomposition D3: Introduction to modern information retrieval D4: Linear algebra for intelligent information retrieval D5: Matrix computations D6: Singular value cryptogram analysis D7: Automatic information organization Terms:- T1: Information T2: Singular T3: Value T4: Computations T5: Retrieval Phrases:- P1: Singular value P2: Information retrieval

Figure 9 Steps for query matching

Word in Document

Pre-processing step

Software Word

for, the Stopwords-removed during

Sparse Word

singular Word

Value Word

Decomposition Stemming-decompose

Result of pre-processing 1. Software 2. Sparse 3. Singular 4. Value 5. decompose

Page 8: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 52

3.2.1 Neural Network for Clustering Unsupervised self-organizing map (SOM) based neural networks [3] are mainly used for datasets clustering. SOM neural network structure changes during learning process based on the observed data; however these methods are usually not efficient when dealing with dynamic group clustering such as for web documents. The Adaptive Resonance Theory (ART) neural network [9], however, is an unsupervised incremental clustering neural network especially designed to satisfy this demand. The proposed System uses a fuzzy approach [21] to evaluate and assign a score to a web page. The inputs to the fuzzy inference system are the normalized value of term frequency in the document, position of the word, and number of web links to the page. The final page score is then calculated as bellow:

PS = 2*frequency_score + position_score + link_score The fuzzy rule base and experimental triangular membership functions are shown in Table 3 and figure 10. Following are the linguistics variables Linguistics Variables: FK: frequency_of_keyword FS: frequency_score PK: position_of_keyword NL: number_of_links PS: position_score LS: link_score

Table 3: Fuzzy Rule Base IF THEN FK High FS is High FK Medium FK is Medium FK Low FK is Low PK Close_to_top PK is High PK In_the middle PK is Medium PK Far_from_top PK is Low NL Many NK is High NL Medium NK is Medium NL Few NK is Low

Figure 10 (a) Frequency_score MF

Figure 10 (b) Position_of_keyword MF

Figure 10 (c) Number_of_links MF Figure 10: Membership Functions

Page 9: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 53

3.2.2System Model Web Clustering Engine is a post-retrieval clustering that clusters the search results retrieved for the broad topic. The output of the Web clustering engines ensures fast subtopic retrieval, quick topic exploration within unknown topics, and easy identification of relevant search results within the broad topic. Figure 11 has the System Model. 3.2.2.1Search Keyword The system provides an interface to accept the search keyword from the user. 3.2.2.2Search Result Acquisition or input snippets The component accepts the search keyword as input. It allows us to configure the number of search results to be extracted from various search engines. The component extracts and stores the search results which include the URL pointing to the document, title, and snippet. Here, as discussed earlier in this work already extracted snippets are used AMBIENT and ODP239 datasets. 3.2.2.3Multi-Agent The term agent is a search tool. In this system, there are two datasets in which number of snippets is saved. The dataset is divided into three agents to make it multi-agent. As the Dataset is divided into three agents, the query term is accepted by all agent and check the relevancy of term in that agent by using neuro fuzzy member function as shown above. The fuzzy rule base is applied on the documents for ranking the term. As multi-agent is used, query term is simultaneously matched in number of agents. So that result obtains in less time than single agent. 3.2.2.4 Clustering Engine The clustering engine preprocesses the input snippets and then converts the preprocessed search results to a format suitable for the clustering algorithm. It extracts the features and provides them as input to the clustering algorithm within. The clustering algorithm would build the cluster and identifies the label that best describes each cluster.

Figure 11: System Model

4.SYSTEM IMPLEMENTATION In this system, a search engine is developed to retrieve information from datasets as already discussed. The semantic content mining is done using clustering technique, suffix tree clustering. As query is entered in search box, first pre-processing takes place. In pre-processing document cleaning phase of STC is performed. Then the query is passed to the datasets to retrieve related information. In this system, the time required retrieving snippets from single agent and multi agent are compared. In single agent, data is retrieved from the single searching tool while in multi agent it retrieves the information simultaneously from multiple search tools. In this system, datasets are divided into three parts to make it multi-agents. If System uses multi agents, the time for getting result will reduce to great extent as compare to single agent. 4.1Search In this system, Search module is a first module. In this module, first query is entered in a search box and then choose mode single agent or multi agent from drop down list. If user select multi agent, results will obtain within less time

Page 10: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 54

than selecting single agent. here also dropdown list is given for downloads. In it, system gives the number of results present in a datasets. In figure 12, user enter Art query for 100 results and select the multi agent option.

Figure 12: Search

4.2 Multi-Agent After pressing the search button, query is processed to give results from datasets. As user selects the multi agent mode then it will search in multiple agents simultaneously and gives results within a very less time. In figure 13, user is searching for the Art query and selects multi agent mode from dropdown list then it gives result within 16ms. This time is definitely less than the time required for single agent.

Figure 13: Multi-Agent

4.3 Single Agent The system use single agent option only for the comparison with multi-agents. As in figure 14, user selects single agent it require 750 ms to retrieve the 100 results of Art query while in multi agent for the same query it were return the results in 16ms.

Page 11: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 55

Figure 14: Single Agent

4.4Clusters As user enter the Query and press search button it will check the query term in the datasets for the relevancy or threshold value. If the relevancy is matched using fuzzy neural network then it will produce cluster for particular query term. In the following figure, it will produce number of clusters for Art query and fetches the snippet related to Art. Each cluster contains result of similar meaning for example if user selects animation cluster it will give only result of animation. Such type of mining is called semantic content mining i.e. meaningful search. Also in the figure 15 this system shows clusters for particular snippet. For example if our cursor is on 1st link then our system shows the clusters in which this link is present. That is our system allows a document to appear more than one cluster.

Figure 15:Clusters

Page 12: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 56

Figure 16: Snippets belonging to multiple clusters.

5.COMPARISON In traditional way, particular search engine returns a long list of snippets for a given query. Generally User searches for two to three pages and never goes up to last pages as a human tendency. But if the desired result is on 10th page then user won’t be able to find it. While this system provides clusters which give snippets of similar meaning so that we can retrieve accurate result. As in existing approaches, generally single agent is used to retrieve the information but in this system multi agent approach is used so that system can retrieve information from number of search tools simultaneously to get result in minimum time. Figure 17 compares the existing System and proposed system.

Figure 17: Comparison between traditional system and proposed system.

Page 13: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 57

6.CONCLUSION AND FUTURE SCOPE 6.1Conclusion Organizing web search results fastens the user browsing process as the task of searching for relevant web pages in an unstructured list of results is time consuming. This work has proposed multi-agent based solution for mining semantic web contents using neural network, with the aim to provide context based knowledge oriented results to the user. This system implements Multi-agent neural network based framework for mining contents of semantic web, which would provide query relevant knowledge using STC(Suffix Tree Cluster) clustering technique. To classify the relevancy of search result fuzzy neural network is used. Using the fuzzy neural network, multi-agent matches the query term within the document to create a cluster which return search result with high accuracy in a reasonably short time. Instead of giving the long list of snippets our system provides snippet with cluster which give result within reasonably short time period. 6.2Future Scope The proposed work implements the neural network based multi-agent semantic web content mining. We can extend this work in future as follow: As in the proposed work we use only text mining in the future it can be extended as multimedia mining. Also, system uses static datasets, it wills extended as a dynamic datasets i.e. datasets will update as the values

added. The amalgamation of web mining techniques with agent technology will lead to improved performance, reduced

network traffic, and better results.

REFERENCES [1]. A. Spink, D. Wolfram, B.J. Jansen, T. Saracevis, Searching the Web: The public and their queries. Journal of the

American Society for Information Science and Technology 52 (3), 2001, pp 226-234. [2]. Adam Shenker, Mark Last, and Abraham Kandel. Design and Implementation of a Web Mining System for

Organizing Search Engine Results. Proceedings of the CAiSE'01 Workshop Data Integration over the Web (DIWeb'01), pp. 62 -75, Interlaken, Switzerland, 2001.

[3]. Amin Milani Fard, Reza Ghaemi, Mohammad –R. Akbarzadeh –T.1, Hoda Akbari, Kavosh: An Intelligent Neuro-Fuzzy Search Engine Published in proceedings of Seventh International Conference on Intelligent Systems Design and Applications

[4]. April Kontostathis and William Pottenger. A Mathematical View of Latent Semantic Indexing: Tracing Term Co-ocurrences.http://www.cse.lehigh.edu/techreports/2002/LU-CSE-02-006.pdf.

[5]. B. Yuwono and D. L. Lee. Search and ranking algorithms for locating resources in World Wide Web. Proceedings of the International Conference on Data Engineering (ICDE), pp. 164-171, New Orleans, USA, 1996.

[6]. C.Dimou, A.Batzios, A.L.Symeonidis and P.A.Mitkas, ‘A Multi-agent framework for spiders traversing the semantic web’,Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence.

[7]. Choi, Y.S. and Yoo, S.I. Multi-agent Learning Approach to WWW Information Retrieval using Neural Network, in Proceedings of 1999 ACM International Conference on Intelligent User Interfaces.

[8]. Claudio Carpineto, Stanisiaw Osinski, Giovanni Romano and Dawid Weiss, A Survey of Web Clustering Engines, ACM Computing Surveys, Volume 41, No. 3, Article 17 (2009) July, pages 17:1 17:38.

[9]. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. ACM SIGIR Conference on Research and Development in Information Retrieval, pp.318-329

[10]. Eirinaki M. & Vazirgiannis M., ‘Web Mining for Web Personalization’. Published in ACM Transactions on Internet Technology, Vol.3 , No. 1, February 2003, pp. 1-27.

[11]. F. Buccafurri, G. Lax, D. Rosaci and D. Ursino, ‘Dealing with semantic heterogeneity for improving web usage’, Data Knowl. Eng. 58 (3) (2006), pp. 436–465.

[12]. Ferragina, P. and Gulli, A. (2004) The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets. Technical report, RR04-04 In-formatica, Pisa.

[13]. Ferragina, P. and Gulli, A. (2005) A personalized Search Engine Based OnWeb- Snippet Hierarchical Clustering. In 14th International World Wide Web Conference.

[14]. Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, Jinwen Ma-Learning to Cluster Web Search Results,; SIGIR04, July 2529, 2004, She_eld, South York- shire, UK

[15]. J. Han and M. Kamber. Data Mining - Concepts and Techniques. Academic Press, 2001. [16]. James A. Freeman, David M. Skapura, Neural Networks Algorithms, Applications, and Programming Techniques

in COMPUTATION AND NEURAL SYSTEMS SERIES, Christof Koch ,California Institute of Technology,1991 [17]. Jiang, Z. H., Joshi, A., Krishnapuram, R. and Yi, L. Y. (2002). Retriever: improving web search engine results

using clustering. In Managing Business and Electronic Commerce.

Page 14: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 58

[18]. Jon Kleinberg. Authoritative sources in a hyperlinked environment. ACM-SIAM Symposium on Discrete Algorithms, pp. 668-667, San Francisco, USA,1998.

[19]. Karayannidis N. & Sellis T., ‘Hierarchical Clustering for OLAP: The CUBE File Approach’. Published in The VLDB Journal — The International Journal on Very Large Data Bases, Vol. 17, Issue 4, July 2008.

[20]. Kosala R. & Blockeel H., ‘Web Mining Research: A Survey’. Published in ACM SIGKDD, Vol. 2, Issue 1,July 2000

[21]. Krishna Bharat and Monika R. Henzinger. Algorithms for Topic Distillation in a Hyperlinked Environment. Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.

[22]. Lean Yu, Shouyang Wang and Lai Kin Keung,A Multi-Agent Neural Network System for Web Text Mining, Information Science Reference, (2008) 162-183

[23]. M. A. Hearst and J. O. Pedersen. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. Proceedings of the Nineteenth Annual International ACM SIGIR Conference, Zurich, June 1996.

[24]. Nikola K. Kasabov, Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering, The MIT Press Cambridge, Massachusetts London, England,1998.

[25]. Rasmussen, E. 1992. Clustering Algorithms. Information Retrieval, W.B.Frakes R. Baeza-Yates, Prentice Hall PTR, New Jersey

[26]. S. Osinski and D. Weiss. A concept-driven algorithms for clustering search results. IEEE Intelligent Systems, 20 (3), (2005), pages 48-54.

[27]. S.Ganesh, M. Jayaraj, V.Kalyan & G. Aghila ,’Ontology based Web Crawler’, Published in proceedings of the international Conference on Information Technology Coding & Computation, 2004

[28]. Sharma K., Shrivastava G. & Kumar V., ‘Web Mining: Today and Tommorrow’. In Proceedings of the IEEE 3rd International Conference on Electronics Computer Technology, 2011.

[29]. Shu Bo and Kak Subhash, A neural network-based intelligent metasearch engine, Information Sciences 120 (1999) 1-11

[30]. Singh A, Agent Based Framework for Semantic Web Content Mining, International Journal of Advancements in Technology 3(April 12)

[31]. Singh A., Juneja D. and Sharma A.K., ‘Design of Ontology-Driven Agent based Focused Crawlers’. In proceedings of 3rd International Conference on Intelligent Systems & Networks (IISN-2009),Organized by Institute of Science and Technology, Klawad, 14 -16 Feb 2009, pp.178-181.Available online in ECONOMICS OF NETWORKS ABSTRACTS, Volume 2, No. 8: Jan 25, 2010.

[32]. Singh A., Juneja D., Sharma A.K., ‘General Design Structure of Ontological Databases in Semantic Web’. Published in International Journal of Engineering, Science & Technology, Vol. 2, Issue 5, pp. 1227-1232, 2010

[33]. Steinbach, M., G. Karypis, G., Kumar, V. 2000. A Comparison of Document Clustering Techniques. KDD Workshop on Text Mining

[34]. Udi Manber and Gene Myers. Su_x arrays: a new method for on-line string searches.In Proceedings of the _rst annual ACM-SIAM symposium on Discrete algorithms, pages 319327, 1990.

[35]. Voorhees, E. M. 1986. Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Information Processing Management,22:465-476

[36]. Wang, Y. and Kitsuregawa, M. (2001) Link based clustering of web search results. In proceedings of the 2nd International Conference on Web-Age Information Management (WAIM2001), XiAn, P.R.China, Spring-Verlag LNCS, July, 2001.

[37]. Willett, P. 1988. Recent Trends in Hierarchic document Clustering: a critical review. Information Management, 24(5):577-597

[38]. Y. S. Choi, and I. Yoo,Jaeho Lee Neural Network Based Multi-agent Information Retrieval System, in Proceedings of 2001 ACM International Conference on Intelligent User Interfaces.

[39]. Yuvarani Meiyappan, N. Ch. S. Narayana Iyengar and A. Kannan SRCluster: Web Clustering Engine based on Wikipedia International Journal of Advanced Science and Technology Vol. 39, February, 2012, PP, 1-18

[40]. Zamir O. and Etzioni, O., (1998)Web document clustering: a feasibility demon-stration. SIGIR 98, Melbourne, Australia.

[41]. Zamir O. Clustering web documents: a phrase-based method for grouping search engine results, Doctoral dissertation, University of Washington and O. M. Oren Zamir, Oren Etzioni and R. Karp. Fast and intuitiveclustering of web documents. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pages 287-290, 1999

[42]. Zhan L. & Zhijing L., ‘Web Mining based on Multi-Agents’. Published in proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA’03),

[43]. Zhang, D. and Dong, Y. S. (2001). Semantic, Hierarchical, Online Clustering of Web Search Results. In ACM 3rd Workshop on Web Information and Data.

[44]. Zhao Y and Karypis G, Criterion functions for document clustering: experiments and Analysis, Technical Report, Department of Computer Science, University of Minnesota, (2010).

Page 15: Research on Neural Network Based MultiAgent Semantic Web ... · It uses fuzzy neural network to classify the relevancy of search results on a multi-agents. The fuzzy neural ... [20]

International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: [email protected]

Volume 4, Issue 6, June 2015 ISSN 2319 - 4847

Volume 4, Issue 6, June 2015 Page 59

AUTHOR A.H.Bhuskat has received her ME. Degree in Computer Science & Engineering from P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof. Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India. N.M.Yawale has received her ME. Degree in Computer Science & Engineering from P.R.Patil College of Engineering, Amravati, India.Currently she is working as an Assistant Professor at Prof. Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India. P.P.Deshmukh has received her ME. Degree in Computer Science & Engineering from P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof. Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India R.A.Gulhane has received her ME. Degree in Computer Science & Engineering from P.R.M.C.E.A.M,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof. Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India. M.A.Deshmukh has received her ME. Degree in Computer Science & Engineering from P.R.M.I.T & R,Badnera, Amravati, India. Currently she is working as an Assistant Professor at Prof. Ram Meghe Institute of Technology & Research, Badnera, Maharashtra, India.