01530813

8
An Architecture for Personal Semantic Web Information Retrieval System – Integrating Web services and Web contents Haibo Yu Graduate School of Information Science and Electrical Engineering Kyushu University 6-1 Kasuga-Koen, Kasuga Fukuoka 816-8580, Japan [email protected] Tsunenori Mine and Makoto Amamiya Faculty of Information Science and Electrical Engineering Kyushu University 6-1 Kasuga-Koen, Kasuga Fukuoka 816-8580, Japan {mine, amamiya}@al.is.kyushu-u.ac.jp Abstract The semantic Web and Web services technologies have provided both new possibilities and challenges to automatic information processing. There are a lot of researches on ap- plying these new technologies to current personal Web in- formation retrieval systems, but no research addresses the semantic issues from the whole life cycle and architecture point of view. Web services provide a new way for accessing Web resources, but until now, they have been managed sep- arately from conventional Web contents resources. In this paper, we point out new system requirements and propose a conceptual architecture for a personal semantic Web in- formation retrieval system. It incorporates semantic Web, Web services and multi-agent technologies to enable not only precise location of Web resources but also the auto- matic or semi-automatic integration of hybrid Web contents and Web service resources. 1. Introduction 1.1 Motivation With ever-increasing information overload, Web infor- mation retrieval systems are facing new challenges for help- ing people not only locating relevant information precisely but also accessing and aggregating a variety of information from different resources automatically. The semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation [5]. It provides new possibilities for automatic Web information processing. Currently, there are a lot of researches such as [25] [13] [27] trying to apply semantic Web technologies to Web in- formation retrieval systems, but they all address only prob- lems concerning certain phases or certain aspects of the to- tal complex issues involved. There isn’t any research ad- dressing the semantic issues from the whole life cycle of information retrieval and architecture point of view. However, for the reasons we show below, we argue that it is important to clarify the requirements of a Web infor- mation retrieval system architecture to apply semantic web technology to it. First, we need to ensure the semantics are not lost sight of during the whole life cycle of information retrieval, in- cluding publishing, querying, accessing, processing, stor- ing and reusing. For example, current semantic Web portals such as SEAL [27] manage their semantic data for naviga- tion, and semantic searching internally, but when they pub- lish data to the user, they will transform their semantic data into HTML format in order to present human understand- able information which can be accessed through a browser. At this moment, the semantic is lost, and the user cannot use it for further semantic processing. So the interfaces in- volved in the whole life cycle of information retrieval tasks need to be re-considered. Second, efficient searching for high quality results is based on pertinent matching between well-defined re- sources and user queries, where the matching reflects user preferences. Just as in current Web usage, when users use search engines to search for specific information, the quality of the search results will be improved significantly if they are familiar with the mechanism of the indexing and make use of advanced functionalities to select and combine key- words well. In the same way, in a semantic Web information retrieval system, we also need to help users to submit per- tinent queries and efficiently incorporate their preferences Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Transcript of 01530813

Page 1: 01530813

An Architecture for Personal Semantic Web Information Retrieval System– Integrating Web services and Web contents

Haibo YuGraduate School of Information Science

and Electrical EngineeringKyushu University

6-1 Kasuga-Koen, KasugaFukuoka 816-8580, [email protected]

Tsunenori Mine and Makoto AmamiyaFaculty of Information Science

and Electrical EngineeringKyushu University

6-1 Kasuga-Koen, KasugaFukuoka 816-8580, Japan

{mine, amamiya}@al.is.kyushu-u.ac.jp

Abstract

The semantic Web and Web services technologies haveprovided both new possibilities and challenges to automaticinformation processing. There are a lot of researches on ap-plying these new technologies to current personal Web in-formation retrieval systems, but no research addresses thesemantic issues from the whole life cycle and architecturepoint of view. Web services provide a new way for accessingWeb resources, but until now, they have been managed sep-arately from conventional Web contents resources. In thispaper, we point out new system requirements and proposea conceptual architecture for a personal semantic Web in-formation retrieval system. It incorporates semantic Web,Web services and multi-agent technologies to enable notonly precise location of Web resources but also the auto-matic or semi-automatic integration of hybrid Web contentsand Web service resources.

1. Introduction

1.1 Motivation

With ever-increasing information overload, Web infor-mation retrieval systems are facing new challenges for help-ing people not only locating relevant information preciselybut also accessing and aggregating a variety of informationfrom different resources automatically.

The semantic Web is an extension of the current Webin which information is given well-defined meaning, betterenabling computers and people to work in cooperation [5].It provides new possibilities for automatic Web informationprocessing.

Currently, there are a lot of researches such as [25] [13][27] trying to apply semantic Web technologies to Web in-formation retrieval systems, but they all address only prob-lems concerning certain phases or certain aspects of the to-tal complex issues involved. There isn’t any research ad-dressing the semantic issues from the whole life cycle ofinformation retrieval and architecture point of view.

However, for the reasons we show below, we argue thatit is important to clarify the requirements of a Web infor-mation retrieval system architecture to apply semantic webtechnology to it.

First, we need to ensure the semantics are not lost sightof during the whole life cycle of information retrieval, in-cluding publishing, querying, accessing, processing, stor-ing and reusing. For example, current semantic Web portalssuch as SEAL [27] manage their semantic data for naviga-tion, and semantic searching internally, but when they pub-lish data to the user, they will transform their semantic datainto HTML format in order to present human understand-able information which can be accessed through a browser.At this moment, the semantic is lost, and the user cannotuse it for further semantic processing. So the interfaces in-volved in the whole life cycle of information retrieval tasksneed to be re-considered.

Second, efficient searching for high quality results isbased on pertinent matching between well-defined re-sources and user queries, where the matching reflects userpreferences. Just as in current Web usage, when users usesearch engines to search for specific information, the qualityof the search results will be improved significantly if theyare familiar with the mechanism of the indexing and makeuse of advanced functionalities to select and combine key-words well. In the same way, in a semantic Web informationretrieval system, we also need to help users to submit per-tinent queries and efficiently incorporate their preferences

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 2: 01530813

based on mechanisms through which a provider categorizesand publishes its semantic data and Web services. So thedescription of Web site capability and the way of submit-ting queries incorporating user preferences should be con-sistently considered from an architectural point of view.

Web service mechanisms provide a good solution for ap-plication interoperability between heterogeneous environ-ments. Though they have mainly been used for businessprocesses until now, we have seen that WSRP [15] has beenapproved as a standard of OASIS [4] to integrate remoteportlets, and we can predict that Web services will soon beused by Web portals for information gathering, display anddelivery [14]. Web services will provide a new way for ac-cessing Web information and play a vital role in Web infor-mation retrieval activities. However, the conventional “Webcontents” resources target at human consumption but new“Web services” resources target at machine consumption.Thus they have been managed separately for publishing,discovering, accessing, and processing until now. On theother hand, in the semantic Web, contents are given well-defined meaning, and they are becoming such data that canbe understood and processed by machine as well. As bothWeb contents and Web services will be consumed by ma-chines, this introduces the possibility and necessity of man-aging them together in a personal Web information retrievalsystem.

In this paper, we propose a conceptual architecture fora personal semantic Web information retrieval system. Itincorporates semantic Web, Web services and multi-agenttechnologies to enable not only precise location of Web re-sources but also the automatic or semi-automatic integrationof hybrid semantic information from Web content and Webservice resources.

1.2 Approach

A conceptual architecture of our semantic Web informa-tion retrieval system is constructed based on the followingthree main ideas.

First, “all participants contribute to the semantic descrip-tion consistently.” The Web information retrieval systemconcerns three main kinds of participants: the “consumer”which searches for Web resources, the “provider” whichholds certain resources, and the “mediator” which enablesthe communication between the consumer and the provider.In order to guarantee semantic interoperability during thewhole life cycle of information retrieval, all participantsneed to consistently contribute to the semantic description.The provider needs to precisely describe their capabilitiesand the users need to pertinently describe their requirementsas well. The mediator needs to correctly interpret the se-mantic dimension and to ensure that semantics are not lostsight of during the processing.

Second, “integrating Web contents with Web services.”As we mentioned earlier, Web services will provide a newway for retrieving Web information. In fact, Web users donot care about how the system discovers, accesses and re-trieves information from what kind of resources, they onlycare about the final results which can be directly used effi-ciently. So, the particular characteristics and the concreterealization details of both Web services and Web contentsneed to be hidden from users as much as possible. There-fore an integrated or unified management of Web contentsand Web services needs to be carried out through differentlevels including the description of capabilities and require-ments, querying, discovering, selection and aggregation.

Third, “providing a gateway to all the information thatthe user is interested in.” Since the user needs to accessand process a variety of internal and external information, agateway to all relevant information is necessary. AlthoughWeb portals are trying to provide such gateways, they arecentralized resources using fixed organizational schema tar-geting uniform access by large numbers of people [25].However, “no one size can fit all,” and even a portal witha wealth of resources can not satisfy all the requirementsof a user. As a user is only interested in certain parts ofthe resources provided by the portal, the personalizationfunctionality and the integration of different Web portals arestrongly required.

Currently, there are several Web portals providing per-sonalization such as “My Yahoo [3],” “My AOL [2],” to ag-gregate desired channels, such as news, weather, or sports,and view personalized contents. However their customiza-tion functions are limited as they lack semantics and areseparated from user’s local information. Relevant Web in-formation needs to be stored, modified, searched, even pub-lished as well as existing local information provided by theuser, and the integration of Web information with local userinformation is also necessary. We argue that a personal-ized “Myportal [28]” can satisfy all the Web usage require-ments of a user. The “Myportal” is different from currentpersonalized portals in a sense that it is located on a user’sown desktop or local server, owned by the user her/himself,managing all the information based on semantic Web tech-nologies, enabling integration of existing local user infor-mation and Web information, and providing full person-alization and flexible customization functions for all user-relevant information.

The rest of the paper is organized as follows: Section2 outlines our conceptual architecture, the components andtheir communication interfaces of a personal semantic Webinformation retrieval system. Section 3 describes the inte-gration of Web services and Web contents. In section 4 weexplain the process flow of an information retrieval system.Related work is discussed in section 5 and the concludingremarks will be summarized in section 6.

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 3: 01530813

UIAUser Preference

Inference Engine

KBSE

Database

CSA

Web Site 1

PSA 1

WSCD(GID, WCD, WSD)

“MyPortal”

Web Site 2

PSA 2

WSCD(GID, WCD, WSD)

Web Site n

PSA n

WSCD(GID, WCD, WSD)

Figure 1. A Conceptual Architecture

2 A Conceptual Architecture

Our conceptual architecture for a personal semantic Webinformation retrieval system is illustrated in figure 1.

Because the P2P architecture provides a robust systemwhich accommodates to open and dynamic Web environ-ments, we choose a P2P network architecture to connectconsumers and providers.

Each provider describes their capabilities in what we calla WSCD (Web site capability description) and is assigned aPSA (provider search agent). Each consumer describes theuser’s requirements including preferences. It is assigned aconsumer search agent (CSA) and also has a user interfaceagent (UIA) that provides an intelligent unified interface tothe user. The CSA and PSA will function as mediators be-tween a consumer and a provider by communicating witheach other to fulfill the searching and accessing task. Theconsumer is constructed as a “Myportal” providing a gate-way to all relevant information.

2.1 Web site capability description (WSCD)

Resource location is based on matching between userrequirements and Web site capabilities, so a capability de-scription of Web sites is necessary. We describe the layeredcapabilities of a Web site as shown in figure 2.

First, we semantically describe the general capabilitiesof the Web site, and we call this a “general information de-scription (GID).” We argue that some explicit general ideasabout a Web site are strongly required in order to preciselylocate Web resources based on user preferences. Thereforea brief general information description of the Web site is de-fined at the top level. The GID gives an explicit overview ofthe Web portal capabilities, and can be used as the initial fil-ter for judging congruence with user preferences. The GIDincludes the description about “Category,” “Topic,” “Type,”“Language,” “Scale,” “Audience,” “HomePageLink,” “Lo-cation,” “ServiceLink,” “Security,” and “Functionalities”.

WCD (Web Content Description)

GID (General Information Description)

WSCD (Web Site Capability Description)

WSD (Web Service Description)

SWSD (Semantic Web Service Description)

CWSD (Concrete Web Service Description)

Figure 2. Web Site Capability Description

Second, we give the Web content capability descrip-tion (WCD) and Web service capability description (WSD).There are links from GID to WCD and WSD for fa-cilitating the further matching and use of Web contentsand Web services. In order to semantically describe thecapabilities and support the concrete realization of ser-vices, we express the service capability description in twolayers: “semantic Web service description (SWSD)” and“concrete Web service description (CWSD).” This hierar-chical capability-describing mechanism enables semanticcapability-describing and matchmaking for different levels.Currently, there are only a few drafts of standards avail-able for describing semantic Web services such as OWL-S[9] and WSMO [11], and none of them have been adoptedby any standards body at the present time. As OWL-Sis the first well-researched Web service ontology, and cur-rently has numerous users from industries and academe, weuse OWL-S for the semantic Web service description andWSDL [12] for the concrete Web service description.

The Web content description (WCD) is the metadata ofWeb contents. It is composed of knowledge bases of alldomains involved. The domain ontologies are described inOWL [20] and the metadata is described in RDF [16].

The WSCD is put at the root directory of the Web siteas an RDF file, and the WCD and WSD can be reachedthrough the links of them.

For the details of our Web site capability descriptionmechanism, one can refer to document [29].

2.2 “Myportal”

“Myportal” is a “one stop” that links the user to all theinformation s/he needs. It resites on the user’s own desk-top or local server and is designed to satisfy user’s personalinformation requirements and to be mastered freely by theuser her/himself. The information can be shared by otherswith proper authority. The structure of “Myportal” is shownin figure 3.

“Myportal” is composed of three main functional com-

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 4: 01530813

HumanMachine

Query

EngineWeb Services

Management

User himself Community member

PublicCrawler & Other applications

Web Services

Knowledge

Management

Knowledge Warehouse (KW)

WSCD

User

Preferences

Management

Domain

Knowledge

Inference

Engine

Myportal

Information Collection & Aggregation(Function as a consumer)

Manual(Browsing and

searching)

Automatic / Semi-automatic

(Web Services / Crawler)

Information Accessing�Function as a provider�

Machine(Web

Services,Crawler)

Human(User himself,

Community member, Public user)

Web Services

UIA CSA

Figure 3. Structure of “Myportal”

ponents: core component, consumer component andprovider component.

The core component provides basic support for seman-tic technologies and information management. It consistsof “Knowledge Warehouse (KW),” “Knowledge Manage-ment,” “Query Engine (QE)” and “Inference Engine (IE).”As a consumer, it will bring together a variety of necessaryinformation from different resources automatically or semi-automatically for the user. It is assigned a CSA to fulfillthe information retrieval tasks through the communicationwith provider agents. As a provider, the contents and ser-vices of “Myportal” can be consumed by humans as wellas machines. The human can be the user or other permittedpersons, and the machine can be local or remote. The inter-faces for browsing, searching and facilitating Web contentsand services need to be provided. We described “Myportal”in a little more detail in document [28].

2.3 Mediator

In our architecture, we use a multi-agent system calledKODAMA [30] that has been developed at Kyushu Univer-sity as our mediators. KODAMA is a high quality, large-scale multi-agent system which can operate in open envi-ronments. It is a global distributed computing architec-ture based on agent-oriented programming and was demon-strated suitable for network-aware applications.

The agents in our system consist of UIA, CSAs andPSAs.

The UIA receives requirements from the user, factors

in missing or inherent information based on user prefer-ences, breaks and transforms the requirements into formalqueries and sends them to the CSA. The CSA receivesformal queries from the UIA, communicates with relevantagents, selects and invokes Web services, integrates the in-formation and sends the results back to the UIA.

The PSA receives queries from a CSA and returnsmatching results to the CSA based on different preferencesand requirements.

2.4 Communication interfaces

In order to fullfill the information retrieval task, the in-terfaces between providers and consumers including querylanguage and protocol for communicating those queriesneed to be defined. As semantic Web information is basedon RDF to represent data, a standard interface for query-ing and accessing RDF data is ideal for the interoperabil-ity between heterogeneous environment. Currently, thereare many query languages for RDF data have been created,but they lack both a common syntax and a common se-mantics. The W3C RDF Data Accessing Working Group(DAWG) has published their working drafts of RDF QueryLanguage SPARQL [23] and SPARQL protocol [8] that areexpected to be standards in this field. The RDF Query Lan-guage SPARQL expresses queries over RDF graphs, andSPARQL protocol for RDF defines a protocol for com-municating those queries to an RDF data service. Theapplications can access and combine semantic Web infor-mation across the Web by combining SPARQL query lan-guage and protocol for RDF. Although our architecture isdesigned for any reasonable communication interfaces, weare currently planning to use SPARQL RDF query languageand SPARQL protocol as our communication interfaces be-tween providers and consumers.

2.5 The description of user requirements

The user requirements are reflected by his/her prefer-ences, profile and constraints along with aquery. We pro-vide a user interface which enables the input of all theseinformation. Input templates, default settings, and recom-mendation lists are also provided. The missing or inher-ent information will be inferred based on the user profileand preferences, and the requirements will be broken downand transformed into formal queries. The formal query iscomposed of three types of element fields: user preferences(UPs), content query (CQ) and Web service query (SQ).And the responses will combine Web content and Web ser-vice information together. Even if the user does not explic-itly describe their requirements on Web services for eachquery, searching for Web services potentially relevant tohim/her will automatically be carried unless s/he explicitly

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 5: 01530813

refuses such searching.

2.6 Ontology considerations

The description of Web site capabilities and the man-agement of data in “Myportal” must be based on for-mally defined vocabularies in order to make them machine-understandable and processable. Ontology is used to for-mally define terms and the relationships between them.Currently, the style of the ontology for the future seman-tic Web is still under discussion. A huge common ontologyor numerous small ontologies which are mapped to eachother by mediators are possible styles. Our analysis showedthat a wide and shallow ontology for categorization is nec-essary and narrow and deep ontologies are also needed forthe user’s specific interests such as research topic, businessor hobby. Though it is not yet a reality, we assume that theuser and the providers are using the same ontology as whatthey involved at the current stage.

The Web site capability ontology should include the fol-lowing component ontologies.

1) The general information description ontology: Inthis ontology component, the terms used for the Web sitegeneral information description such as “type,” “location,”and the relationships between them and restrictions on themare formally defined.

2) The domain specific ontology: A domain specificontology should be constructed in order to realize the in-teroperability between all the applications and users of thatdomain. The system can define its own ontology or reuseexisting ones for domains that they involved.

3) The Web service ontology: This ontology compo-nent defines all the terms, relationships, and restrictionsconcerning Web services. Here we use OWL-S [9] Webservice ontology.

3 Integration of Web services and Web con-tents

Conventional Web contents target at human consumptionand are published with standard languages such as HTML,which can be accessed through a client browser. StandardHTTP protocol is used for the communication between aWeb server and a Web client. Web services, on the otherhand, target at machine consumption, and are applicationswhich can be realized at heterogeneous systems, publishedwith a standard language such as WSDL [12] and accessedby applications through a standard protocol such as SOAP[17]. Due to their different usages by different consumers,Web contents and Web services have been managed sepa-rately until now.

However, in the semantic Web, information is markedup with metadata and can be manipulated by autonomous

agents on behalf of their users. So Web contents are in theprocess of becoming data with well-defined meaning thatcan also be consumed by machines. Since they target thesame consumer, Web services and Web contents have thenecessary common ground to be managed together in a per-sonal Web information retrieval system.

On the other hand, users also have requirements forthe aggregation of different Web services and the integra-tion of both Web services and Web contents in a personalWeb information retrieval system. For example, there aremany Web portals or search engines supporting searchingfunctions (services), but users can only make use of theirsearching functions one at a time with a browser interfaceand none of those searching results can be currently aggre-gated together. Especially when we use this kind of seman-tic search functions, the semantic search results are trans-formed into HTML format for human consumption withdetaching semantic metadata described in RDF. Thereforeit is necessary to deliver the semantic data through Web ser-vices and aggregate the semantic data from different Webservices.

Our Web information retrieval system realizes unifiedmanagement and integration of Web services and Web con-tents at different levels, including description, discovery, se-lection, and the aggregation of invocation results as we willdescribe in the following.

3.1 Descriptions of capabilities and requirements

On the provider side, as we described in section 2, wemanage GID, WCD and WSD together as WSCD.

The WSD can be reached through the GID and is de-scribed in two layers: SWSD and CWSD. With unifiedmanagement, the Web services and Web contents can sharethe same general information such as a category and thedomain ontology. The hierarchical capability-describingmechanism enables semantic capability-describing andmatchmaking for different levels. We use WSDL [12] andOWL-S for CWSD and SWSD respectively.

The WCD is the metadata of Web contents. It is com-posed of knowledge bases of all domains involved. The do-main ontologies are described in OWL [20] and the meta-data is described in RDF [16].

On the consumer side, we provide a template-style in-put interface, enabling users to input or select their pref-erences as well as query items from recommendation lists.The formal query is composed of three types of elementfields: user preferences (UP), contents query (CQ) and Webservice query (SQ). And the responses will combine Webcontents and Web services information together.

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 6: 01530813

3.2 Discovery

There are three models that Web service discovery isbased on: matchmaking, broker and P2P [7] and there isalso centralized and decentralized searching for Web con-tents. Our Web information retrieval system is based on aP2P architecture, and the matching is realized by the PSAson the provider side.

OWL-S is an ontology for Web services, but before weuse the ontology of a specific Web service, we need to posi-tion a service within the broad array of services that exists inthe world. OWL-S 1.1 [9] provides an example of profile-based class hierarchies [10] for categorizing Web services.We noticed that almost all the Web service providers pro-vide Web services for machine consumption as well as con-sistent Web information or browser based services for hu-man consumption at the same time. Thus the informationboth for human and machine consumption is generally con-sistent and in the same category. Therefore we think it isreasonable to use the category information inside the GIDto find potential Web sites which possibly contain relevantWeb services first, and then do the detailed matching of ex-isting services based on their OWL-S descriptions.

The information discovery is based on matching betweenuser requirements and provider capabilities. We do match-ing at three levels. First, we do matching of Web site generaldescription (GID) against user preferences to see whetherthey match at the overview level or not. Second, we domatching of Web contents, and finally do the matching ofWeb services. A matching score will be given from thematching of each level and they will be used for the finaljudgment of relevance of Web contents and Web services.

There are researches on semantic Web services such as[18] and [19]. We make use of their research and devel-opment results for our semantic Web service matchmakingand processes.

3.3 Selection

As we described immediately above, matching of userrequirements with provider capabilities will be done at threelevels and a matching score will be given from the matchingof each level. PSAs will send back their matching scoresto the CSA, and the CSA will judge and select the mostrelevant Web services and Web contents based on a totalconsideration of those matching scores.

3.4 Aggregation

After selecting the most relevant Web services, the CSAwill invoke those services. If the input information is notsufficient for triggering invocation, the CSA will requestthe user to provide the necessary information through the

UIA. The results from different Web services invocation aswell as the results of Web contents will be aggregated bythe CSA into a refined final result based on user preferencesand be sent to the user through the UIA. This result canbe evaluated, modified and stored in the user’s “Myportal”knowledge warehouse for the future reuse. The integrationof different Web service invocation results and Web con-tents is based on their common RDF data model.

4 Process Flow

The total process flow of the Web information retrievalsystem can be illustrated as shown in figure 4.

Capability Description(WSCD)

Provider ConsumerProfile & Preferences

“Myportal”Knowledge Warehouse (KW)

User: Requirements Description

UIA: Completing missing information, transforming into formal query

SE: Search inside “Myportal” KW

Found relevant information?

CSA: Send requests to PSAs

PSA: Send matching result to CSAif total score > threshold

List of relevant information

Selection

Websites

Web sites, contents

Webservices

Web sites, contents, services

Potential providers Relevant Web sites,Web contents

Invocation

Invocation results

Integration

User: Evaluation, modification and storing

UIA: Modify preferences

User: Intervention

Yes

PSA: Communication with CSA

PSA: Matching GID with preferences (Score1)

PSA: Matching WCD with CQ (Score2)

PSA: Matching WSD with SQ (Score3)

Figure 4. Process Flow

Although we will not repeat the tasks of each informa-tion retrieval phase that have been described in last section,we will emphasize on the following aspects.

First, searching for relevant information inside “Mypor-tal” knowledge warehouse will be carried out first, andonly when we can not find satisfied information from “My-portal,” we will continue the searching from the otherproviders. As we tend to repeatedly and frequently use acertain amount of information from the Web but seldom ornever use other information, it is essential to locally storefrequently used information for the user and the externalaccess only happens when the request cannot be satisfiedlocally. Because the information that interests the user is alimited resource and external accessing time is decreased,the total retrieval time will be significantly decreased com-pared to a search of the vast open Web.

Second, the list of relevant information sent back fromPSAs will be different depending on the user preferences

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 7: 01530813

and Web site capabilities. The user has different possi-ble Web usages, such as only locating certain kind of Websites, locating certain kind of Web sites and their Web con-tents, only locating Web services, and locating all relevantresources including Web sites, Web contents and Web ser-vices. The provider may only have Web contents or haveboth Web contents and Web services. Therefore the PSAmay send back different possible list of relevant informa-tion resources as shown in figure 4.

Third, during the invocation, if the input information in-side the query is not enough, the PSA will ask the user toinput missed information through the UIA. So the user in-tervention may occur during invocation.

Fourth, the integrated information can be evaluated bythe user and the evaluation results will be used for refine-ment of future searching. The information can also be mod-ified and stored into “Myportal” knowledge warehouse forthe future reuse. The user preferences will be automaticallyrefined based on the searching and evaluation results.

5 Related work

In this section, we discuss some related work that is di-rectly or indirectly of interest to our research work.

Francisco et al. [22] presented an architecture for an in-frastructure to provide interoperability using trusted portalsand implemented such an infrastructure based on ThematicPortals. The searching portals use semantic access pointsbased on metadata for more precise searching of the re-sources associated with the potential sources of informa-tion. The proposed architecture supports specific and crossdomain searching, but only provides semantic representa-tion for the capabilities of Web contents not for their ser-vices as far as we understand. Our semantic Web site capa-bility description and pertinent user requirements and pref-erences description provide interoperability for both Webcontents and Web services.

RSS [26] and Atom [21] are lightweight multipurposeextensible metadata descriptions and syndication formats.They are XML-based applications and conform to the RDFspecification. A brief description of Web site capability canbe summarized with them and the summary can be used foronline publication, retrieval and further transmission or ag-gregation. FOAF vocabulary [6] provides a collection of ba-sic terms that can be used in machine-readable Web home-pages for people, groups, companies and so on. The initialfocus of FOAF has been on the description of people, butnow it is under extension to express other kinds of things.RSS, Atom and FOAF vocabulary all focus on certain kindsof Web contents description such as news, Web blog or peo-ple, they do not include Web services as we proposed. OurWeb site capability description describes not only Web con-tents but also Web services, so the resources of the portal

can not only be located but also used as a computationalpart of the information retrieval system. RSS, Atom andFOAF can be used for the Web contents capability descrip-tion which is a part of our Web site capability description.

There are Web portals based on Semantic Web technol-ogy, such as KA2 [1] and SEAL [27], which support a se-mantic portal solution including ontology-based contentsconstruction and maintenance, but they target uniform ac-cess by large numbers of people for human navigation andsearching. SEAL provided an interface for a software agentbut only for a crawler. None of them supports Web servicesfor information aggregation and publishing at present, as faras we know. Our “Myportal” is a personalized gateway toall user-relevant information and it not only aggregates Webinformation but also shares its information through Web ser-vices.

Haystack’s per-user information environment [25] em-phasizes the relationship between a particular individualand his corpus. It automatically captures and modifies itsdata and its retrieval process based on user behaviors in or-der to adapt its system to the user to realize personalization.This user information system has not been constructed fromthe Web portal point of view and doesn’t emphasize thesupport of machine interoperability between users enablingWeb service functionalities and user information sharing.The semantic Web browser [24] can search and present pos-sible Web services for the user, but it does not aggregate theinvocation results of different Web services and Web con-tents as we proposed. We refer to their ideas of personaliza-tion in information retrieval and filtering, but construct ouruser information system as a fully personalized Web portal,which supports Web services and can be accessed by theothers to form a basic unit of a P2P information retrievalsystem.

OWL-S [9] is an ontology of services which providesa mechanism for semantically expressing the capability ofWeb services. In our approach, we use OWL-S to describeWeb portal service capabilities, and add another “GeneralInformation Description” layer above it to enable the unifiedmanagement of Web services and Web contents. This willhelp in the precise location of Web portals as well as theefficient discovery and invocation of Web services.

6 Conclusion

In this paper, we addressed the main aspects of a seman-tic Web information retrieval system architecture trying toanswer the requirements of next-generation semantic Webusers. We proposed a mechanism for semantically describ-ing the capabilities of Web sites, enabling automatic discov-ery of Web sites and Web contents as well as Web services.Our “Myportal” aims at constructing a fully personalizeduser’s local Web portal, which is adapted to user preferences

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE

Page 8: 01530813

and satisfies all the requirements of a user’s Web usage. Theuser Web portal can be used as a basic unit of a P2P infor-mation retrieval system.

In the future, we will realize a prototype of a multi-agent based P2P personal Web information retrieval system,and evaluate the effectiveness of our proposed architecturebased on it. Currently, we assume that all the portals, usersand agents in a community agree on a common ontologythat involved and use it to represent the semantics of Webportal capabilities and Web services, but it’s not easy to getthis agreement in reality. We need to give further consider-ation to these ontology-mapping issues in the future.

References

[1] KA2 Portal. http://ka2portal.aifb.uni-karlsruhe.de/.[2] My AOL. http://my.aol.com.[3] My Yahoo. http://my.yahoo.com/.[4] OASIS: Organization for the Advancement of Struc-

tured Information Standards. http://www.oasis-open.org/home/index.php.

[5] T. Berners-Lee, J. Hendler, and O. Lassila. The SemanticWeb. Scientific American, May, 2001.

[6] D. Brickley and L. Miller. FOAF Vocabulary Specification.Sept., 2004.

[7] M. Burstein and C. Bussler. A Semantic WebServices Architecture, Version 1.0, January,2005. http://www.daml.org/services/swsa/note/swsa-note v3.html.

[8] K. G. Clark. SPARQL Protocol for RDF, January 14, 2005.http://www.w3.org/TR/rdf-sparql-protocol/.

[9] David Martin et al. OWL-S 1.1 Release, November, 2004.http://www.daml.org/services/owl-s/1.1/.

[10] David Martin et al. Profile-based Class Hierarchies– Explanatory remarks for ProfileHierarchy.owl, OWL-S1.1, November, 2004. http://www.daml.org/services/owl-s/1.1/ProfileHierarchy.html.

[11] Dumitru Roman et al. D2v1.1. Web ServiceModeling Ontology (WSMO), Feb. 10, 2005.http://www.wsmo.org/TR/d2/v1.1/20050210/.

[12] Erik Christensen et al. Web Services Description Language(WSDL) 1.1, March 15, 2001. http://www.w3.org/TR/wsdl.

[13] R. Guha, R. McCool, and E. Miller. Semantic Search. InProceedings of WWW2003, pages 700–709, 2003.

[14] S. Han. Commercial Portal Products. In DERI ResearchReport, 2003-12-31.

[15] A. Kropp, C. Leue, and R. Thompson. Web Services forRemote Portlets Specification. August, 2003.

[16] F. Manola and E. Miller. RDF Primer, February 10, 2004.http://www.w3.org/TR/rdf-primer/.

[17] Martin Gudgin et al. SOAP Version 1.2 Part 1: MessagingFramework, June 24, 2003. http://www.w3.org/TR/soap12-part1/.

[18] Massimo Paolucci, Katia Sycara, Takahiro Kawamura. De-livering Semantic Web Services. In Proceedings of TwelvesWorld Wide Web Conference, WWW2003, pages 111–118,May 2003.

[19] Massimo Paolucci, Katia Sycara, Takuya Nishimura,Naveen Srinivasan. Using DAML-S for P2P Discovery. InProceedings of the First International Conference on WebServices, ICWS 2003, pages 203–207, June 2003.

[20] D. L. McGuinness and F. van Harmelen. OWLWeb Ontology Language Overview, February 10, 2004.http://www.w3.org/TR/2004/REC-owl-features-20040210/.

[21] M. Nottingham. The Atom Syndication Format 0.3 (pre-draft), December, 2003. http://www.mnot.net/drafts/draft-nottingham-atom-format-02.html.

[22] F. Pinto, C. Baptista, and N. Ryan. Using Semantic Search-ing for Web Portal Interoperability. In International Work-shop on Information Integration on the Web - Technologiesand Applications, April 9-11, Rio de Janeiro - Brazil, April2001.

[23] E. Prud’hommeaux and A. Seaborne. SPARQL Query Lan-guage for RDF, April 19, 2005. http://www.w3.org/TR/rdf-sparql-query/.

[24] D. Quan and D. R. Karger. How to Make a Semantic WebBrowser. In Proceedings of WWW2004, pages 255–265,2004.

[25] D. Quan, D. H. uynh, and D. R. Karger. Haystack: A Plat-form for Authoring End User Semantic Web Applications.In Proceedings of ISWC2003, pages 738–753, 2003.

[26] RSS-DEV Working Group. RDF Site Summary (RSS)1.0,2000-12-06. http://web.resource.org/rss/1.0/.

[27] N. Stojanovie, A. Maedche, S. Staab, R. Studer, and Y. Sure.SEAL – a framework for developing SEmantic PortALs. InProceedings of the International Conference on KnowledgeCapture, pages 155–162, 2001.

[28] H. Yu, T. Mine, and M. Amamiya. Towards a Semantic My-Portal. In The 3rd International Semantic Web Conference(ISWC 2004) Poster Abstracts, pages 95–96, 2004.

[29] H. Yu, T. Mine, and M. Amamiya. Towards Automatic Dis-covery of Web Portals -Semantic Description of Web Por-tal Capabilities-. In Semantic Web Services and Web Pro-cess Composition: First International Workshop, SWSWPC2004, LNCS 3387/2005, pages 124–136, 2005.

[30] G. Zhong, S. Amamiya, K. Takahashi, T. Mine, andM. Amamiya. The Design and Implementation of KO-DAMA System. IEICE Transactions on Information andSystems, E85-D(4):637–646, April, 2002.

Proceedings of the IEEE International Conference on Web Services (ICWS’05) 0-7695-2409-5/05 $20.00 IEEE