A model for personal assistance in complex information spaces

A Model for Personal Assistance in Complex Information Spaces

Javier Pereira Departamento de Inforrnatica de Gestion, Universidad de Taka, Avda. Lircay s/n, Taka, Chile. Ernail: [email protected]

Kurt Englmeier German Institute for Economic Research (DIW), Koenigin-Luise-Str. 5, 141 95 Berlin, Germany. Email: [email protected]

Claudio Rojas Escuela de lngenieria Cornercial, Universidad de Talca, Avda. Lircay s/n, Taka, Chile.

Qualitatively enhanced access to data collections requires a more meaningful interface which endows the users with personal agents that are easily adaptable to the users' individual retrieval preferences. Tracking individual needs of users leads to personal digital assistance that appears in the system's interaction mode as the user's personal software agent. The main objective of personal assistance as envisaged here is enabling the system to give the users recommendations that are derived from best practice in information search processes. Best practice reflects the knowledge about how users perform successful retrieval activities and attempt to define the best ways of searching large document collections. Best practices are developed on an individual as well as on a community base. The model presented here was developed for the

information system IRAIA, a portal for economic infomation from huge data collections of Economic Research Institutes (ERI) and National Statistical Institutions (NSI). The personal agent draws on IRAIAs model of context-oriented retrieval and observe, record and analyse the users' search and navigation by applying concepts of the respective taxonomies.

The artefact of personal assistance Personal assistance as discussed here is positioned

within the framework of adaptable Web systems. These systems collect data for a user model from various sources that also include implicitly observing user interaction and enabling explicit user response. (Brusilovsky & Maybury, 2002) Instead of focusing on a user model we more concentrate on a model of usage. The model entails an adaptation effect that tailors retrieval interaction to different types of usage. Whereas a purely adaptive

system bases solely on a user or usage model the adaptable system requires the user to specify major parts of its behaviour. In the context of our application area we concentrate on adaptable navigation support. Based on a number of successful approaches in adaptive hypermedia and web system (see Billsus et al., 2002; Young et al., 1999, for instance), our design of an adaptable system is intertwined with a strong linguistic layer that raises significantly the expressiveness of the interaction mode.

Personal agency endows an information system with an easy and highly user-oriented interaction mode. This is achieved by the installation of a personal agent that helps the user during navigation. It starts with observing the user's selections of entries from the concept hierarchies, i.e. with capturing query profiles. An actual profile then is compared with already stored ones and recommendations are derived if the agent can recognise a certain similarity between profiles. The recommendation refers to further selections of concepts suitable in an actual retrieval situation. Our design of a personal agent results from combining the approaches for agents as personalised companions (AndrC & Rist, 2002; Billsus & Pazzani, 1999) and agents for automatic text analysis (Wermter, 2000; for an overview see Mladenic, 1999) and for developing interaction strategies (Durfee, 2001 ; Englmeier, 200 1).

Getting orientation in complex information spaces

Information arises from data when they are combined, arranged, and presented accordingly. Only a suitable combination of related time series and texts is in the position to convey the information that is contained in these separate and otherwise imperceptible components. Metadata expressing these content links are of outstanding importance when it comes to proliferate information that has to be composed by distributed and heterogeneous data. This holds for most of the large-scale and complex data collections in general, but for those of ERIs and NSIs in particular. Our model for the construction of context-aware information spaces is derived from related models of

ASIST 2002 Contributed Paper 292

retrieval environments for interacting with large data collections (Agosti et al. 1992; Krause, 1996).

These meta-information expressing content links help to construct a retrieval environment where the users explore data collections within a semantic coordinate system derived from taxonomies of the respective information domain. These taxonomies exist for a variety of application areas. They are a solid basis for a controlled and structured vocabulary and therefore and most appropriately for the content semantics defining a domain-related context.

A semantic coordinate system endows the users with a concise as well as comprehensive vocabulary. Hierarchically arranged and grouped along major content facets this vocabulary acts as a stable coordinate system easy to comprehend and memorise'. The users are thus much more in the position to localise themselves effortlessly. Successfully searching and navigating now means guided travelling from information to information just by changing the semantic coordinates, i.e. by pointing to relevant concepts. This structure on the other hand enables to pinpoint the semantic location of any kind of information. It also supports the correct identification of retrieval strategies which extends content searchmg and navigating towards context-assisted retrieval.

For IRAIA we produced a powerful taxonomy that merges two of the most important structures in this field: eurostat's NACE2 nomenclature and the industry systematic of the if0 institute for economic research. The unified taxonomy creates a semantic coordinate system that enables exact and automatic positioning of coherent documents even if they are of different types. It also provides users with the necessary orientation while exploring the information space. Like in using languages it helps users as a passive vocabulary to identify the topics of their information problem. In IRAIA the user just points to relevant concepts displayed on the screen in order to initiate retrieval or to steer navigation. Just pointing to relevant terms is substantially easier than the active search for suitable concepts.

Task model of the personal agent At each decision point that is again represented by a query profile the user can ask the agent to show recommendable concepts. These concepts are part of a profile in the sequence of an archived navigation that follows the profile corresponding to the decision point. Ths means there is a sequence of profiles having significant concepts in common and additionally at least one profile that reaches beyond the decision point. Again, the prerequisite for being

' The coordinate system itself can be presented simultaneously in different languages. This ensures that the domain model mentioned later features multilinguality. Nomenclature des Activitts dans la Communautk europeenne - systematic of the economic activities of the European Union

ASIST 2002 Contributed Paper 293

a candidate for recommendation is that such a profile has a significant overlapping navigation history with the actual sequence in the recent past of the decision point. In the figure below this profile contains concepts depicted by green circles. An archived profile is suitable for recommendation if it

contains similar query profiles up to the corresponding decision point and if there is a query profile that reaches beyond thls scope. At this point, it is not important if the profiles belong to the user's personal history lists or to the common collection of best practices.

ytual profile point ddedslon (k)

Figure 1. Schema of a recommendation situation. The profile above is one of the archived profiles. Due to

concept similarities up to the decision point (k) it is suitable for recommendation. The lower profile reflects the actual

navigation of the user. Similar concepts are marked by red circles. The order of the nodes within a query profile is not

important.

Human-agent interaction A profile can be regarded similar to another one if it

covers at least 80% of its concepts. Taking into account that coverage varies among the profiles of an observed track the same threshold is applied to measure the overall similarity between two tracks. Among all tracks with similarity values above this threshold the one with the highest value is selected for recommendation.

1. to identify signifmantly similar queries, 2. to decide if there is a significantly common navigation history, 3. to present recommendable concepts, 4. to show the user the way of its decision, 5 . to adapt its way of deciding by asking the user whether the recommendation was h e f i l , potentially useful or not useful, and 6. to accept new decision rules.

The work of the agent now is

It is important to note that the archived sequences may reflect the user's individual navigation history or best practices from the user community. Tasks and decisions outlined above are the same for both cases even if different kinds of agents are in charge with.

Figure 2. The user can ask the personal agent to look for recommendations in its local archive of personal navigation

data or to ask remote agent for a suggestion derived from best practices.

During navigation, recommendable concepts are shown by green crosses:

The following dialog (see Fig. 5) enables the user to set the policy of storing and passing actual navigation information as well as of putting different emphasis on selected concepts. The first group of parameters refers to permissions given

to the agent on storing or passing actual navigation information. The user has three possibilities to handle his or her query profiles. They can be stored in the personal archive of navigation history, passed to a remote agent site, or sent directly to oblivion. Passing to remote agents is equal to sharing navigation information with other users. If the submitted profile reflects best practice it is used for community-wide recommendation purposes. With the second group of parameters the user decides whether some concepts should get more/less emphasis in the decision process or should be excluded entirely. Relevant concepts are simply put into the respective window by drag and drop.

Figure 3. Nodes are marked according to the concepts of recommendation profile.

Here, the user can also inform the agent on the usefulness of its recommendations. Whenever a user right-clicks a recommended term a small popup-menu appears where the user can tell the agent about the usefulness of its suggestions. This influences the agent's decision model.

Figure 5 . The user can adjust the agent's decision process if she wants it to put more/less emphasis on selected

concepts and not to regard some others.

Fig. 4. By right-clicking a recommended node the user can tell the system about the usefulness of its suggestions.

The user has also the possibility to modify some parameters of the agent's decision model.

The model itself may be less interesting to the user than the control on which navigation activities are captured locally or are passed to some remote analysis instance. At each navigation window the user has the possibility to define these agent related parameters.

ASIST2002 Contributed Paper

A decision model for the personal agent Architecture of the agent system The multi-agent environment in IRAIA addresses agents interacting in order to produce suggestions to the users navigating information spaces that are presented by theme specific taxonomies. In the case of the portal application IRAIA, these taxonomies deal with economic information. The content related interaction among the agents is composed of the vocabulary defined by the terms of concept hierarchies that are derived from the taxonomies. The background of the agents' interaction can be characterised by the following processes (see Fig. 1):

The IRAIA project has established a retrieval procedure where a user may construct a query exclusively in the form of selected nodes of domain- dependent concept hierarchies (Englmeier et al., 2001). The results obtained by the user may be documents (texts or time series). The annotation

294

process determines suitable index terms from the hierarchies to position the document adequately within the semantic space spanned up by the same herarchies as a coordinate system of this information space. In a retrieval session, when a user sends a query, her Personal Agent (PA) intercepts it in order to maintain a stock of query profiles (much alike a history list), that are sequences of concepts representing her navigation' through an information space. After asking the user for permission, the PA sends an actual query profile to a remote Best Practices Agent (BPA) which searches in its own stock for frequently visited navigation tracks, close to this query profile4. The BPA sends a set of similar navigation tracks as recommendations to the PA. However, because the PA is the best one that knows its user, some BPA recommendations could be inadequate or non interesting for the user. Then, the PA must apply a decision model in order to provide helpful suggestions or to show its user eventually which ones the non interesting navigation tracks are. Both PA and BPA do the same: They collect query profiles and determine which ones appear quite frequently. Those seem to have a certain importance for the users. The PA does it on an individual basis, whereas the BPA evaluates the profiles for the whole community of portal users.

TIze evaluation criteria of recommetrdatiotrs Let us define 0 = {ol, 0 2 , . . . , on } as the user's (observed) navigation track and 7;: = kil , t i 2 , ... , ti,,,, , . .. , tini } as a

recommendation from the BPA or from the user's private history list (managed by the personal agent). Remember that track elements correspond to profiles containing nodes of the concept hierarchies. At each navigation step (query) the users selects a number of concepts that constitutes a profile within her query.

An important role has the recommendation profile, when after a number of navigation steps the user needs support for the decision which concepts to choose next. At this decision point k the user asks for recommendations that should be based on representative and helpful navigation examples from his own or the community's past (ix. on best practices)'.

' Navigation in this context is recursively defining a query, reading a certain document and redefining a query. The concepts of redefined query base on the annotated concepts of the regarded document.

In this case a profile reflecting best practice is tantamount with profiles observed very often among all users.

The minimum number of similar profiles up to the decision point must be at least three. This value representing a sufficiently

We assume that a (community or individual) best practice T, is adequate for a user's retrieval if it contains in

its first part (up to the point mi that corresponds to the

decision point k ) as many profiles of 0 as possible and, at the same time, if there are only a few ones not contained in 0 . Additionally, if there is a profile in the best practice beyond t i m i , we may consider it as interesting to the user. Thus a recommendation for further navigation comprises all the nodes of the first profile of T. beyond the point

timi . In general, it can be assumed that as closer the profile comes to the decision point the more useful it is for recommendation.

We assume that an information problem triggering and steering the user's navigation can be divided into a smaller number of sub-problems that can be observed frequently over a number of more complex problems. Even if two users have the same information problem they satisfy their need in a different way resulting in different sequences of profiles. Certain clusters among these sequences, however, are very similar. We therefore apply our analysis to these "atomic" problems. At the same time it can be assumed that the user needs decision support that addresses the most recent navigation history, that usually coincides more with such an "atomic" problem rather than a larger sequence of past navigation steps.

Let 0, = ( 0 k - L ;-. , ok-1, ok } E 0 be a sequence of profiles in the actual navigation and q, = { t jm , -L , . .~ , t im ,-,, ti,,,,} E 7;. the part to be compared

from an archived navigation (of best practices or personal

Then the agent resorts to a certain archived track if it hlstory).

complies the following conditions6. 1. Sufficient similarity between pairs of profiles.

Let us assume that Ioc and I T c are the indexes of

concepts in 0, and qc respectively, W; the weight of

a concept i, c the set of indexes of 0, also

contained in q,, and D the set of indexes of 0, not contained in qc . Then, we define the rate of common concepts as

common navigation history among the considered parts of both tracks should be regarded as an initial one that has to be approved during follow-up experiments. All the values mentioned here are just initial ones. They all have to be approved by later tests.

ASlST 2002 Contributed Paper 295

In the same way we define the rate of distinct concepts - gI(T. , )2~ , and g,(T,) 2 c 2 , T is not a C wi recommendable practice.

C w i ’ recommendable practice. ‘%

as d = L - g,(T.,) < 5, and g,(T,) < 5,, T is not a

Thus, we could define the following similarity function - g,(T.,)<{, and g2(T,)26,, T is not a

cp =W-W recommendable practice. Here, 5, E [0.5,1),5, E (0,0.5] are thresholds If 0” and T.” contain exactly the same concepts holds

IL

representing the expected similarity and dissimilarity bounds, respectively, for the recommended practices (or archived sequences).

cp = 1 , whereas it holds cp = 0 if they are completely different. According to our model we may put

l0,l = lKcl 9 w; = AOcl ‘di E I , and

additionally the half of concepts are common, then we have p = 1/4. In other words, the existence of distinct concepts impose a penalty to the similarity aggregated function. The more the distinct concepts in II., are important, the bigger this penalty.

2. Sufficient similar history among both sequences of

profiles and % . ‘do,, E 0, A Vtiv E T., ; o,, Z t , with k - L I u 5 k

and mi - L 5 v I mi. For the time being we put L 1 2 .

‘do,, ~ O , , 3 t ; ~ ET., 1 0 , = t i V , with k - L S u S k

and mi - L 5 v I m i . That is, at least one profile in q, is similar to a single profile of 0,.

The ordering of concepts within a profile o,, or t , , as

well as of profiles within 0, or qc is irrelevant. Ok Et .

3. Availability of recommendation profiles. Irnl and j t i m , +I

Elaborating suggestions to user Evaluating and selecting recommendations for the user means assigning best practices selected by the BPA to one of three classes: interesting ( C, ), potentially interesting

( c, ) or non interesting ( c,) suggestions. A category is explained by one (or two) bounder profile, i.e. a vector where each component indicates the evaluation on a criterion for a fictitious track (see Fig. 7). Let b, ( h = 1,2) a category bounder profile; then, it is perfectly defined, in terms of criteria, by the vector

(81 (bh 1, g2 ( b ~ ), g3 (bh 1)’

c2

b, S b2 This means that the decision point may lay anywhere in 0 as long as there is a corresponding point mi in 71. that has a profile in mi + 1 .

Fig. 6 Criteria, categories and bounders of category profiles.

Furthermore, let 1;. be the shortest path from ok E 0, to

tii E T,, (tii P 0,). Then, total distance to non common

concepts is expressed by 6 = mink;}. Thus we have

three criteria used for evaluate the recommendation quality:

k i

g, (T,) = 4 , g2(Tc) = 6 Y g, (T,.,) = 0 .

- g, (T:,) 2 5, and g, (T.,) < 5, 2 we have a

Note that the similarity and dissimilarity functions permit to establish what a practices are recommendable or not:

recommendable practice.

ASIST 2002 Contributed Paper

Thus, let s E S be a suggested best practice and S a binary preference relation, nominated an outranking relation, such that the expression SS bh means that s is “at least as good as” b, , i.e. (g, (s),gz (s) ,g3 (s)) is globally evaluated, in terms of preferences, as similar or right-side from (g, (b, ), g, (b, ), g, (b, )). For instance, we have sS bl, but not(sS b3 . At this point, we propose to use the pessimistic

ELECTRE TRI method (Mousseau et al., 2001) to assign best practices. This method bases on a paired comparison between suggested best practices and the category bounder profiles. Therefore, in the pessimistic assignment procedure

296

s E s is assigned to Ch+l i f b, ( h = 2,1) is the first

profile where ss b,z. Actually, in ELECTRE TRI, one must set an index ~ ( s , b,) reflecting the cre lb i l i ty of

statement sS bh. A cutting level h is also set to found the minimal credibility which permits to validate the statement. The next relations describe the global comparison among a suggested best practice and a bounder profile:

o(s,4)rd and 4 4 , s ) > d 3 sS4 and 4Ss ;s assign&Ch+, o(s,4)>d and o(4,s)<d 3 sS4 and no(b,,Ss) ;s a.wignedoChtl o(s,4)cd and o(4,s)>d 3 no(sS4) and 4Ss ;s nolnssignedoCh+l 0(s,4)<d and 0(4,s)<d 3 no(sS4) and no(b,,Ss) ;s notassignedoChtl.

Acknowledgments Research outlined in this paper is part of the projects eStage and IRAIA that were supported b y the European Commission under the Fifth Framework Programme (IST- 2000-283 14 and IST-1999-10602). However views expressed herein are ours and d o not necessarily correspond to the eStage or the IRAIA consortium.

References Agosti, M., Gradegnio, G., Marchetti, P. (1992). A hypertext

environment for interacting with large databases. Information processing and management, 28,371-387.

AndrC, E., Rist, T. (2002). From adaptive hypertext to personalized web companions. Communications of the ACM, 4515,4346.

Billsus, D., Brunk, C., Evans, C., Gladish, B., Pazzani, M. (2002) Adaptive interfaces for ubiquitous web access. Communications of the ACM, 4515, 34-38.

Billsus, D., Pazzani, M. (1999). A personal news agent that talks, learns and explains. In: Etzioni, 0. et al (eds), Proceedings of the third annual conference on autonomous agents. Seattle

Brusilovsky, P., Maybury, M. (2002). From adaptive hypermedia to the adaptive web. Communications of the ACM, 45/5,3 1-33.

Durfee, E. (200 I). Scaling up agent coordination strategies. Computer. 3417,3945.

Englmeier, K., Mothe, J., Pauer, B. (2001). Users bootstrap searching the Web through interactive agents supporting best practice sharing. In: M.J. Smith, G. Salvendy, D. Harris, R.J. Koubek. Usability Evaluation and Interface Design: Cognitive Engineering, Intelligent Agents and Virtual Reality. Mahwah,

Krause, J. (1 996). Informationserschliel3ung und -bereitstellung zwischen Deregulation, Kommerzialisierung und welhveiter Vemetzung. ("Schalenmodell"). IZ-Arbeitsbericht Nr. 6. 19.

Mladenic, D. (1 999). Text-learning and related intelligent agents: a survey. IEEE INTELLIGENT SYSTEMS, 14/4,44-54.

Mousseau, V., Figueira, J., Naux, J. (2001). Using assignment examples to infer weights for ELECTRE TRI method: some experimental results. European Journal of Operational Research

Young, Y. et al. (1999). Learning Approaches for Detecting and Tracking News Events. IEEE INTELLIGENT SYSTEMS, 14/4, 3243.

Wermter, S. (2000). Neural network agents for learning semantic text classification. Information Retrieval, 312, 87-1 03.

(USA). 268-275.

USA. 923-927.

130,263-275.

ASlST 2002 Contributed Paper 297

A model for personal assistance in complex information spaces

Documents

Transcript of A model for personal assistance in complex information spaces