Information Retrieval in Virtual Universities

36 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006

Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited.

ABSTRACT

Information retrieval in the context of virtual universities deals with the representation,organization, and access to learning objects. The representation and organization of learningobjects should provide the learner with an easy access to the learning objects. In this article, wegive an overview of the ONES system, and analyze the relevance of two information retrievalmodels for virtual universities. We argue that keywords based search (i.e., the Boolean model),though well suited for Web searches, is overly coarse for virtual universities. Instead, the vectormodel, on which our implemented search engine is also based on, seems to be more appropriateas it provides similarity measure (i.e., the learning object having the best match is presentedfirst). We also compare the performance of four algorithms for computing the similarities(matching).

Keywords: algorithms; case study; distance learning; information retrieval; Web-basededucation

Information Retrievalin Virtual UniversitiesJuha Puustjärvi, Helsinki University of Technology, Finland

Päivi Pöyry, Helsinki University of Technology, Finland

INTRODUCTIONToday people in all professions are faced

with increasing demands. Technology devel-ops in an ever-increasing speed, and the rolesof people in work, society, and industry are shift-ing constantly. Keeping up with the pace ofchange requires continuous education and learn-ing. Traditional campus-universities are tryingto answer to this need of lifelong learning bybuilding virtual universities, whilst facing com-petition from the commercial continuing edu-cation providers in the form of e-learning.

E-learning can be defined as informationtechnology enabled and supported form of dis-tance learning, in which the traditional restric-tions of classroom learning have disappeared.The main tool of e-learning is a personal com-puter, and the Internet serves as the principalcommunication and distribution channel. Thelearners can participate in online Web-basedcourses and interact with both the peers, in-structors, and the learning materials.

E-learning sets new requirements for uni-versities: they have to build global learning in-

IDEA GROUP PUBLISHING

This paper appears in the publication, International Journal Distance Education Technologies, Volume 4, Issue 3edited by Shi-Kuo Chang and Timothy Shih © 2006, Idea Group Inc.

701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USATel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com

ITJ3292

Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 37


frastructures, course material has to be in digi-tal form, course material has to be distributed,and learners must have access to various vir-tual universities.

As single virtual universities are inde-pendently created, they may provide very het-erogeneous functionalities and user interfaces.Ideally, the learner should be able to access allthe virtual universities in a similar way (i.e., theheterogeneity of various virtual universitiesshould not burden the learner). How this goalcan be achieved is the main topic of the ONES-project. Consequently, the main functions ofthe ONES system are to hide the distribution ofe-learning portals, and to hide the semantic het-erogeneity (i.e., problems arising from usingsame words in different meaning and vice versa).

In order to achieve these goals, the sys-tem will deploy many new technologies suchas “one-stop portals,” Web services, serviceoriented architecture, RDF-based annotation,ontology editors, and distance measures insearching learning objects.

In this article, we will restrict ourselveson the role of searches in the ONES-system. Inparticular, we will analyze the applicability ofdifferent information retrieval technologies. Ourmain argument is that the technology based onthe Boolean model (Yan & Garcia-Molina, 1994),though well suited for searches in the Web, isnot suitable for the emerging virtual universi-ties. Instead, for virtual universities we have todevelop methods, which allow learners to bemore concerned with retrieving informationabout a subject than with retrieving data, whichsatisfy a given query. For example, a learnermay be interested in courses dealing with ob-ject-oriented programming rather than in thecourses where the term “java” or “C++” isstated.

When searching for information about asubject (e.g., object oriented programming) thesearch engine must somehow interpret themetadata of the learning objects and rank themaccording to a degree of relevance to thelearner’s query. The primary goal is to retrieveall the learning objects, which are relevant to alearner’s query while retrieving as few non-rel-evant objects as possible. Unfortunately, char-acterization of the learner’s information need is

not a simple task. Furthermore, the difficulty isnot only in expressing the information need butalso in knowing how the learning objects shouldbe characterized with the help of the metadatadescriptions.

The rest of this article is organized asfollows. First, in the second section we give anoverview of the architecture of the ONES-sys-tem. In the third section we characterize virtualuniversities. In particular, we will give an over-view of the e-learning environment, and specifywhat the notion of resource-based learning in-corporates. Then, in the fourth section, the roleof metadata and ontologies in virtual universi-ties is illustrated. In addition, the usability ofthe Boolean and the vector model in a virtualuniversity is analyzed. Especially, two interpre-tations of a hierarchical ontology in the contextof the vector model, called weighted leaves andmultilevel weighting, are introduced. Then, inthe fifth section, the performance of four match-ing algorithms based on weighted leaves andmultilevel weighting principles is compared.Finally, the sixth section concludes the articleby summarizing the feasibility of the proposedideas.

THE ARCHITECTUREOF THE ONES SYSTEM

The name ONES stands for One Stop e-learning Portal. As this name suggests, a sa-lient feature of the system is the aggregation ofdistance learning information from differentlearning sources in one portal. The idea of theone-stop portals originated from one-stopshops, and later on it is also adopted in e-gov-ernment applications. All one-stop applicationshave the same goal: hide the heterogeneity anddistribution of local systems. So, from user’spoint of view one-stop portal behaves like acentralized system.

The four main components of the ONES-system are (see Figure 1):

• Aggregation portal (mediator),• Wrappers,• E-learning portals, and• Course providers’ tools.



The aggregation portal supports thelearners in searching the courses that match totheir specific needs. It differs from traditionaldatabase interfaces in a way that in addition tothe traditional database queries it supports fuzzyqueries. Fuzzy queries are similarity based,which means that if the similarity between thecourses’ profiles and the learner’s query exceedsa certain threshold, they are said to match. Aproblem is that the current database manage-ment systems do not support fuzzy queries andtherefore the ONES-system has to supportthem.

From technological point of view, theaggregation portal is a mediator (Garcia-Molina, Ullman, & Widom, 2000). It supports avirtual view that integrates several learningsources in much the same way as data ware-houses do. However, since the mediator doesnot store any data, the mechanisms of media-tors and warehouses are rather different. Sincethe mediator has no data of its own, it must getthe relevant data from its sources and use thatdata to form the answer to the learner’s query.As the data sources (e-learning portals) are in-dependently created it is obvious that they pro-vide heterogeneous interfaces (e.g., they may

provide different kind of functionalities or thesame functionalities are provided by differentoperations).

In order to hide this heterogeneity thereis a wrapper (Garcia-Molina et al., 2000) be-tween the mediator and each e-learning portal.So a wrapper is a software module that extractsdata from local e-learning portals. This impliesthat the wrapper must be able to accept a vari-ety of queries from the mediator and translateany of them to the terms of local eLearning por-tal. The wrapper must also communicate theresult to the mediator. An important point isthat each wrapper provides equal functionalityfor the mediator. Ideally, each wrapper providesan interface for requesting the metadata oflearning objects (i.e., descriptive informationof courses, course packages and programs of-fered by educational institutions, e.g., univer-sities).

From a technological point of view, eache-learning portal is a Web service (Vasudevan,2001). Web services are self-describing modu-lar applications that can be published, located,and invoked across the Web. Once a service isdeployed, other applications (e.g., an aggrega-tion portal) can invoke the deployed service. In

Le a rn e r Le a rn e r

A g g reg a t io n po rta l(a me d ia to r )

W ra p p e r W ra p p e r

e Lea rn in gp o rta l

e Lea rn in gp o rta l

Co u rs ep ro v id e r’sto o l

Co u rs ep ro v id e r’sto o l

Co u rs ep ro v id e r

Co u rs ep ro v id e r

Figure 1. ONES-architecture



general, a Web service can be anything from asimple request to complicated business pro-cess.

A course provider can enter data about acourse through the course provider’s tool. Themain function of this tool is to provide an inter-face, which facilitates the creation of themetadata attached to learning objects. Basically,this tool is analogous to the tools that supportthe content providers of electronic newspapers(Yli-Koivisto & Puustjärvi, 2002) in creatingmetadata items to news articles. The tool mayeven generate suggestions of the suitablemetadata items, after which the author can makethe necessary modifications and enter this in-formation to the system.

CHARACTERISTICS OFVIRTUAL UNIVERSITIES

E-Learning EnvironmentE-learning can be defined as information

technology enabled and supported form of dis-tance learning, in which the traditional restric-tions of classroom learning have disappeared(Liu, Chan, Hung, & Lee, 2002). The main toolof e-learning is a personal computer, and theInternet servers as the principal communica-tion and distribution channel. The learners canparticipate in online Web-based courses andinteract with both the peers and instructors andwith the learning materials. The teacher-centeredness of traditional learning does nothold for e-learning, where the learning processhas become more and more learner centered.The learning process and the resources may becustomized according to the individual needsof the learner. At the same time, the role of theteacher becomes that of a facilitator or of a men-tor guiding and supporting the individual pro-cess of learning (Liu et al., 2002).

Typical e-learning environments, such asWebCT and Virtual-U, offer the basic elementsfor delivering e-learning courses: course con-tent delivery tools, synchronous and asynchro-nous discussion forums and conferencing sys-tems, possibilities for quizzes and polling,workspaces for sharing resources, white boards,

possibilities for evaluation and grading, log-books, possibilities for submitting assignments,and so forth (Liu et al., 2002).

Studying in Virtual UniversitiesIn the recent years, the idea of a virtual

university has been becoming more and morepopular in many countries all over the world.The enormous development in the field of in-formation and communication technologies hasenabled the rise of e-learning and virtual learn-ing environments. As a result, the traditionaluniversities have faced a new challenge emerg-ing from the commercial sector of education.There is a growing need for new kind of learn-ing and teaching as the technology advancesrapidly and the skills and competencies requiredin the working life become more demanding andincreasingly dynamic.

Virtual university has been defined as aspace where the students are provided withhigher education courses with the help of thenewest information and communication tech-nology (Niemi, 2002). The degree of utilizingtechnology in organizing the studies may varyfrom pure technology-based studies to face-to-face or mixed studies that are supported bylearning technologies.

The main channel of communication anddelivery of teaching is the Internet (Niemi, 2002;Ryan, Scott, Freeman, & Patel, 2000). Thus, avirtual university can be seen as closely re-lated to e-learning that provides learning op-portunities via the Internet. The difference be-tween these two concepts is the level of stud-ies offered; virtual university is aimed to offerhigher education studies while e-learning canbe used for all educational levels.

A virtual university may be an institutionthat uses the information and communicationtechnologies for its core activities such as pro-viding learning opportunities, administration,materials development and distribution, deliv-ering teaching and tuition, and providing coun-seling, advising and examinations. On the otherhand, a virtual university may also be a virtualorganization created through partnerships be-tween traditional universities and other educa-tional institutes. In addition, the traditional cam-



pus universities may be regarded as virtual uni-versities if they offer learning opportunities viathe Internet or combine traditional ways oflearning with e-learning (Ryan et al., 2000).

Virtual universities are expected to offeropportunities for life-long learning for audi-ences otherwise excluded from university stud-ies. The emerging virtual university can be seenvery beneficial especially for the industry, whentechnology-supported learning can be broughtto the workplaces and integrated more closelyto work. Moreover, virtual university can en-hance organizational learning and bring com-petitive advantage by continuously develop-ing the skills and knowledge of the employees(Teare, Davies, & Sandelands, 1999).

Resource-Based LearningThe Internet is able to store and transmit

vast amounts of information in different formsand formats. Therefore the Internet is an idealsupport for resource-based learning (RBL) thatis one of the corner stones of learning andteaching in the virtual university. RBL has beendefined a student-centered way of learning thatexploits various specially designed learningmaterials, interactive media and technologies.RBL can be realized as self-study or as interac-tive group learning both in distance and in theface-to-face mode (Ryan et al., 2000).

The Internet can be used to enable andsupport RBL in several ways (Ryan et al., 2000):

• Courses can be delivered via the Internet.• Resources can be identified and used.• Internet serves as a communication and

conferencing channel.• Learning activities and assessment can be

done in the Net.• Collaborative work is enabled.• Student management and support is enabled.

In the next section, we focus on the start-ing point of RBL, namely on searching learningresources.

INFORMATIONRETRIEVAL MODELS

Information retrieval in the context of vir-tual universities deals with the representation,organization, and access to learning objects.The representation and organization of learn-ing objects should provide the learner with aneasy access to the learning objects. The sys-tem retrieves all the learning objects, which arerelevant to learner while retrieving as few non-relevant learning objects as possible

In this section, we will analyze the use-fulness of different information retrieval mod-els (Baeza-Yates & Ribeiro-Neto, 1999) for a vir-tual university. The used model determines theway the metadata of the learning objects aregiven as well as the way the learner’s queries(information needs) are presented. Before ana-lyzing the information retrieval models we char-acterize the role of metadata and ontologies invirtual universities.

Metadata and OntologiesIn order to transfer data seamlessly and

efficiently in the virtual university, there has tobe a standard way for both people and comput-ers to communicate all necessary knowledge,with both people and computer systems(Stojanovic, Staab, & Studer, 2001). One pos-sible solution is to use metadata and an ontol-ogy attached to it for describing the learningobjects.

The term metadata has variable interpre-tations depending upon the circumstances inwhich it is used. For example, in the context ofdocuments the common forms of metadata in-clude the author(s), the source of publication,the length of document, and so forth. This kindof metadata in commonly called descriptivemetadata. For example, the metadata elementsof the Dublin Core (Pöyry, Pelto-Aho, &Puustjärvi, 2002) represent descriptivemetadata.

Educational metadata is needed for im-proving the retrieval of learning objects, forsupporting the management of collections oflearning objects, and for supporting the deci-sion process of the learners looking for educa-



tional resources. LOM seems to be the mostpowerful and most widely used metadata stan-dard for educational information systems(Holzinger, Kleinberger, & Müller, 2001;Lamminaho, 2000). More generally, educationalmetadata can be used by educational institutesand professionals as well as by learners in or-der to describe (e.g., the content, structures,and relationships of the learning objects and tosearch for educational objects) (Lamminaho,2000; Stojanovic et al., 2001).

Educational metadata may describe anyclass of educational objects, such as studycourses. The pedagogical features of thecourse, the contents, special target groups, andthe technical requirements of the study coursecan be described with the help of a metadataschema (Lamminaho, 2000). More generally,educational metadata can be used to describe,for example, the content, structures, and rela-tionships of the learning objects (Stojanovic etal. 2001). Educational metadata can be utilizedby educational and pedagogical professionals,by the institutions offering education, and bythe students searching for education. Well-de-signed and sufficient metadata aid the decisionmaking process of the students and help theeducational institutions to provide suitable in-formation about their educational supply(Lamminaho, 2000). Educational metadata isvery much semantic metadata, but a thoroughmetadata schema must include also at leaststructural metadata in order to be able to de-scribe the learning objects efficiently.

The idea of using standardized metadataschemas is being able to develop universallyapplicable tools dealing with the metadata de-scriptions of the learning objects. In order tocreate metadata records containing the resourcedescriptions specific tools are needed for cre-ating the metadata according to the standards(Kassanke, El-Saddik, & Steinacker, 2001).Metadata is also useful when guiding non-ex-perienced users through a large collection oflearning resources (Strijker, 2001). Moreover,metadata is seen as value-added informationthat is used to arrange, describe, track or other-wise enhance the access to the object content.At the moment metadata becoming increasingly

important when digital government and e-com-merce are emerging. Metadata enables in-creased accessibility, expanded use of objects,multi-versioning, and system improvement. Thegranularity of metadata, which refers to the levelof details in the description, is an importantquestion when developing a metadata set(Gilliland-Swetland, 2000).

A salient feature of descriptive metadatais that it is external to the meaning of the docu-ment, (i.e., it describes the creation of the docu-ment rather than the content of the document).The metadata describing the content of thedocument is commonly called semanticmetadata. For example, the keywords attachedto many scientific articles represent semanticmetadata (Jokela, 2001).

An ontology provides a general vocabu-lary of a certain domain (Fridman &McGuinness, 2001), and it can be defined as“an explicit specification of aconceptualisation” (Gruber, 1993). In essence,an ontology gives the semantics to themetadata. Ontologies are formal, explicit, andshared specifications of someconceptualizations. Formal means that the on-tology should be machine readable, and explicitrefers to having defined the types of conceptsand the constraints on their use are explicitlydefined. Shared refers to the fact that an ontol-ogy must reach a consensus (Fensel, 2001).Ontologies together with metadata enhanceefficient access to information by offering pos-sibilities to organize and categorize the contentof the information system in question. In thiscontext an ontology is defined as a means toformalize and to specify a common terminologyfor a defined area of interest (Turpeinen, 2000).

In order to standardize semantic metadataspecific ontologies are introduced in many dis-ciplines. Typically, such ontologies are hierar-chical taxonomies of terms describing certaintopics. For example, the ACM Computing Clas-sification System is a hierarchy (a tree) in whichthe nodes represent the classes of the tax-onomy. In Figure 2, a subset of that hierarchyis represented.



The Boolean ModelApplying the Boolean model in searches

requires that each learning object is augmentedby a set of metadata items such as keywords orclassification identifiers (e.g., the searches inthe CUBER system (Pöyry et al., 2002; Pöyry &Puustjärvi, 2003) are based on the Booleanmodel). A learner can then query learning ob-jects by Boolean expressions comprising ofoperands and operations. The operands are theused keywords and the operators are typically“and,” “or,” and “not.” For example, by usingACM Computing Classification system (Fig-ure 2) the keywords attached to a learning ob-ject might be D, H.1, and H.2.2 (correspondingthe keywords Software, Models and Principles,and Physical Design). Now, if a learner pre-sents the query “D and (B or H.1)” (i.e., learn-ing objects having the keyword “Software” andat least one of the keywords “Hardware” and“Models and Principles”), then the previouslearning object will match that query.

The Boolean model is intuitive and clear.Moreover, it can be efficiently implementedeven in the case of huge amount of objects. Forexample, many Web search engines are basedon this model. However, using that model in avirtual university gives rise to following draw-backs:

• First, the model is based on a binary deci-sion criterion, meaning that each learningobject is predicted to be relevant or non-relevant. In reality, it is obvious that the re-sulting learning objects fit more or less tothe query (i.e., some kind of grading shouldbe possible).

• Second, expressing the requirements oflearning objects by a Boolean expressionmay be difficult.

• Third, a typical problem concerning searchengines based on the Boolean model is thateither the result of the query includes toomany or too few learning objects.

In the next section, we consider a moreadvanced model, which avoids many of thedrawbacks just described.

The Vector ModelThe vector model differs from the Bool-

ean model in that weights can be assigned toeach metadata item of a document as well as tothe keywords of the query. The idea behindthis model is that we can more accurately specifythe queries and the contents of the documents(e.g., learning objects).

Assuming that the standard metadataitems (e.g., the classes in Figure 2) specify avector space (i.e., each item (keyword) in the

S u b je c t

H . In fo r ma t io nS y s t e ms

D . S o ftw a re B . H a rd w a re

H . 1. M o d e ls a n dP rin c ip le s

H . 2. D a ta b a s eM a n a g e me n t

H . 2. 2. P h y s ic a lD e s ig n

H . 2. 1. Lo g ic a lD e s ig n

H . 2. 3. La n g u a g e s H . 2. 4. S y s t e ms

Figure 2. A subset of the ACM Computing Classification System



hierarchy represents a dimension in the vectorspace), we can represent each document andquery as a vector in that vector space. Then wecan process the query by computing the dis-tance of the query vector and the documentvectors. This kind of computing requires thatthe sum of the weights of each document andquery equals to a predefined constant. For con-venience, the used constant is usually one.

As the result of the query the documentsare sorted in the order determined by the simi-larity (i.e., the document having the best matchwith the query is presented first). The numberof the documents in the result should be re-stricted by requiring a certain degree of similar-ity.

Using the vector model in a virtual uni-versity requires that the course provider as-sign the metadata items and their weights intoeach learning object. The metadata items to beused are selected from the used domain ontol-ogy. Depending on the used course provider’sinterface this can be done in various ways. Forexample, as in our prototype system, there maybe an ontology structure on which the courseprovider inserts the weights. In Figure 3, theontology structure of the Figure 2 is augmentedby setting weights on the nodes “B.H.2,” and“H.2.2.” Note that the node having no weightmeans that its weight is actually zero. Hence,

the profile of the learning object can be pre-sented by a vector in 9-dimensional vectorspace as follows: [0 x D, 0 x H, 0.3 x B, 0 x H.1, 0.6x H.2, 0 x H.2.1, 0.1 x H.2.2, 0 x H.2.3, 0 x H.2.4].That is, the profile is a point in an orthogonal 9-dimensional vector space.

The gain of attaching metadata descrip-tion for learning objects is that we can use math-ematical distance measures in computing learn-ers’ queries. Further, computing the distancerequires that the descriptions (vectors) be speci-fied in an orthogonal vector space. In otherwords, the nodes in the hierarchy that are usedin profile vectors must be independent. In prac-tice this means that we have to follow one orthe other of the following interpretations:

• Multilevel weighting interpretation: Theleaves and the nodes of the ontology hier-archy represent independent concepts.

• Weighted leaves interpretation: The parentnode represents the union of its siblings. Inother words, each sibling represents a sub-set of its parent. Yet the siblings representindependent concepts.

The intuition behind multilevel weight-ing is that we can express the level of a leaningobject (as well of a query) by altering the weightson a node and its siblings. To illustrate this let

S u b je c t

H . In fo r ma t io nS y s te ms

D . S o ftw a re

H . 1 . M o d e ls a n dP rin c ip le s

H . 2 . D a ta b a s eM a n a g e me n t 0 .6

H . 2. 2. P h y s ic a lD e s ig n 0 .1

H . 2. 1. Lo g ic a lD e s ig n

H . 2 . 3. La n g u a g e s H . 2 . 4. S y s te ms

B. H a rd w a re0 .3

Figure 3. A metadata specification of a learning object



us consider the weighting of the course “Physi-cal design in database management systems.”Now, it is obvious that the weights should begiven on the node H.2 (Database management)and its siblings H.2.2 (Physical design) andH.2.4 (Systems). Assuming that approximatelyhalf of the course deals with databases in gen-eral and the other part deals with physical de-sign and database management systems, thengiving weight 0.4 to H.2 (Database manage-ment), 0.3 to H.2.2 (Physical design) and weight0.3 to H.2.4 (Systems) could be an appropriateassignment. On the other hand, if the course isvery specific then the weight of H.2 could bezero.

If we follow the weighted leaves inter-pretation, then in determining the profile of alearning object weights are set only on theleave nodes of the hierarchy. Consequently, theprofiles of the learning objects are specified byvectors in an orthogonal vector space, whichis determined by the leave nodes of the hierar-chy. To illustrate this approach let us considerthe weighting of the course “Physical designin database management systems.” In this case,all the weights are given on the nodes H.2.2(Physical design) and H.2.4 (Systems) indepen-dently of the level of the course.

PROCESSING LEARNER’SQUERIES

The learner presents queries in the sameway as the content provider determines theweights of the learning object; both these arepresented by vectors. Hence the query presentsan ideal profile of the learning objects that sat-isfy the learner’s requirements. For example,assuming that the multilevel weighting inter-pretation of the ontology is used, and a learnerwants to find basic courses concerning data-base management. In this case the learner willset rather heavy weight on H.2 (databaseManagement) and lighter weights on H.2.1(Logical Design), H.2.2 (Physical Design) andH.2.3 (Languages). In contrast, if a student islooking more advanced courses on databasemanagement then the student will give a lighterweight on H.2 and heavier weights on its sib-lings.

As the learners interact with the systemby submitting queries it is reasonable to re-quire that the response times should be only afew seconds. We investigated the effects ofdifferent matching algorithms and the amountof stored learning objects on response times.The test environment was equipped withPentium II processor and 192 MB memory. Thecomputers were running the Sun Solaris 5.8operating system. We implemented and testedfour matching algorithms (i.e., algorithms) whichcompute the distance measures of learning ob-jects and learners’ queries. We next give a shortdescription of the algorithms.

The Cosine matching algorithm (Baeza-Yates et al., 1999) calculates the cosine mea-sure between the query (a vector) and the docu-ments profiles. As a matter of fact the algorithmdoes not compute distance measures but ratherapproximates distance measures by computingthe angles of the query vector and the vectorsrepresenting documents, such as the learningobjects.

The Euclidean matching algorithm(Friedman, Bentley, & Finkel, 1977) calculatesthe Euclidean distance from the query profileto all learning objects’ profiles. The Manhat-tan distance algorithm (Bentley, Weide, & Yao,1980) calculates a so called “city block-dis-tance.” The name comes from the fact that thismeasure in two dimensions tells how manyblocks in a city one would have to walk be-tween two points.

Our developed Fuzzy matching algo-rithm attempts to achieve more efficient match-ing procedure than the “exact” matching algo-rithms. The improved efficiency is achieved byperforming the actual matching on a pre-selectedsubset of all learning objects. The predefinedsubset of the documents’ profiles is determinedby choosing the three biggest weights fromthe query and then computing the subset basedon these weights. Then only the profiles, theweights of which are within a specified toler-ance interval are selected for the final queryprocessing. Therefore the result set is not guar-anteed to contain all the profiles that are clos-est to the matching profile. However, the close-



ness values of the profiles in the actual resultset are exact, since they are calculated usingthe Euclidean measure.

The computing time for matching of eachalgorithm is presented in Table 1. The test wasperformed for different amount (1000, 5000 and10 000) of learning objects. Basically, the differ-ences of Euclidean, Cosine and Manhattan al-gorithms were rather small (less than 10%).Fuzzy matching algorithm required least com-puting time (about 20% less than others). How-ever, the test proves that all the algorithms arequick enough in the test environment as theresponse times are less than 1.2 seconds. If thenumber of the learning objects or the dimen-sions of the vector space (i.e., the used at-tributes in the profile) increases, then it obvi-ous that the Fuzzy Matching algorithm will bemore superior to the other algorithms. In ourtest environment the vector space comprisedof 15 dimensions (i.e., each profile could haveat most 15 attributes). In practice, the numberof attributes cannot increase significantly asotherwise the determining the weights for learn-ing objects would overly burden the coarse cre-ators. In addition, as the system is developedfor universities it is not obvious that number oflearning objects can be very huge (e.g., over10,000).

CONCLUSIONVirtual university has been defined as a

space where the students are provided withhigher education courses with the help of thenewest information and communication tech-nology (Niemi, 2002). The degree of utilizing

technology in organizing the studies may varyfrom pure technology-based studies to face-to-face or mixed studies that are supported bylearning technologies.

A virtual university may be an institutionthat uses the information and communicationtechnologies for its core activities such as pro-viding learning opportunities, administration,materials development and distribution, deliv-ering teaching and tuition, and providing coun-seling, advising and examinations. On the otherhand, a virtual university may also be a virtualorganization created through partnerships be-tween traditional universities and other educa-tional institutes. In addition, the traditional cam-pus universities may be regarded as virtual uni-versities if they offer learning opportunities viathe Internet or combine traditional ways oflearning with e-learning (Ryan et al., 2000).

E-learning sets new requirements for uni-versities: they have to build global learning in-frastructures, course material has to be offeredalso in digital form, course material have to bedistributed via the Internet and learners musthave access to various virtual universities. Aproblem is that the current virtual universityportals provide heterogeneous functionalities,which in turn hampers the learner in accessingvarious virtual universities.

The main goal of the ONES-project is toinvestigate the ways of integrating various vir-tual universities in a way that such an aggre-gated virtual university would be as easily ac-cessible for a learner as a single virtual univer-sity. Achieving such a goal requires mutualunderstanding of the used technology andstandardized descriptions of the learning ob-

M an ha tt a n

C o sine

E uc lid ea n

F u zzy

0 .8 0 0 .9 3 1 .0 7

0 .8 3 0 .9 6 1 .1 6

0 .8 3 0 .9 8 1 .2 3

0 .6 1 0 .7 3 0 .8 9

1 0 0 0 5 0 0 0 1 0 0 0 0

Table 1. Matching times for the algorithms



jects. Furthermore, searching from various vir-tual universities requires mutual understand-ing of the information retrieval model to be used.

We argue that keywords-based search(i.e., the Boolean model), though well suited forgeneral Web searches, is unsuitable for the vir-tual universities’ purposes. Instead, the vectormodel (on which our implemented search en-gine is also based on) seems to be more appro-priate as it provides a similarity measure (i.e.,the learning object having the best match ispresented first. We also introduced two inter-pretations for the hierarchical ontologies, whichallow increasing the power of the used metadatadescriptions. And finally, we also compare theperformance of four algorithms for computingthe similarities of the profiles. It turned out thatour developed Fuzzy Matching algorithm re-quires less computing time as the other “exactmatching” algorithms represented in the litera-ture.

REFERENCESBaeza-Yates, R., & Ribeiro-Neto, B. (1999). Mod-

ern information retrieval. New York:Addison Wesley.

Bentley, J., Weide, B., & Yao, A. (1980). Optimalexpected-time algorithms for closest pointproblem. ACM Transactions on Mathemati-cal Software, 6(4), 563-580.

Fensel, D. (2001). Ontologies: Silver bullet forknowledge management and electroniccommerce. Berlin: Springer Verlag.

Fridman, N. N., & McGuinness, D. L. (2001,March). Ontology development 101: Aguide to creating your first ontology(Stanford Knowledge Systems LaboratoryTechnical Report KSL-01-05, Stanford Medi-cal Informatics Technical Report SMI-2001-0880).

Friedman, J., Bentley, J., & Finkel, R. (1977). Analgorithm for finding best matches in loga-rithmic expected time. ACM Transactions onMathematical Software, 3(3), 209-226.

Garcia-Molina, H., Ullman, J., & Widom, J. (2000).Database system implementation. New Jer-sey: Prentice Hall.

Gilliland-Swetland, A. J. (2000). Introduction tometadata, setting the stage. Retrieved De-cember 20, 2004, from http://www.getty.edu/research/institute/standards/intrometadata/

Gruber, T. R. (1993, March). Toward principlesfor the design of ontologies used for knowl-edge sharing. In Padua Workshop on For-mal Ontology (p. 23).

Holzinger, A., Kleinberger, T., & Müller, P. (2001).Multimedia learning systems based on IEEELearning Object Metadata (LOM). In Pro-ceedings of ED-MEDIA 2001, Tampere, Fin-land.

Jokela, S. (2001). Metadata enhanced contentmanagement in media companies. In ActaPolytecnica Scandinavica. Mathematicsand computing series no. 114. Doctoral the-sis, Helsinki University of Technology.

Kassanke, S., El-Saddik, A., & Steinacker, A.(2001). Learning objects metadata and toolsin the area of operations research. In Pro-ceedings of ED-MEDIA 2001, Tampere, Fin-land.

Lamminaho, V. (2000). Metadata specification:Forms, menus for description of courses andall other objects. CUBER project, Deliver-able D3.1.

Liu, J., Chan, S., Hung, A., & Lee, R. (2002).Facilitators and inhibitors of e-learning.In L. C. Jain, R. J. Howlett, N. S. Ichalkaranje,& G. Tonfoni (Eds.), Virtual environmentsfor teaching and learning, Series on inno-vative intelligence (Vol. 1, pp. 75-109). WorldScientific.

Niemi, H. (2002). Empowering learners in thevirtual university. In H. Niemi, & P. Ruohotie(Eds.), Theoretical understandings forlearning in the virtual university. Univer-sity of Tampere, Research Center for Voca-tional Education and Training.

Pöyry, P., Pelto-Aho, K., & Puustjärvi, J. (2002).The role of meta data in the CUBER system.In Proceedings of the Annual Conferenceof the SAICSIT 2002 (pp. 172-178).

Pöyry, P., & Puustjärvi, J. (2003). CUBER: Apersonalised curriculum builder. In Proceed-ings of the 3rd IEEE International Confer-



ence on Advanced Learning Technologies,Athens, Greece (pp. 326-327).

Ryan, S., Scott, B., Freeman, H., & Patel, D.(2000). The virtual university. The Internetand resource-based learning. London:Kogan Page.

Stojanovic, L., Staab, S., & Studer, R. (2001). E-learning based on the Semantic Web. In Pro-ceedings of WebNet2001 — World Confer-ence on the WWW and Internet, Orlando,FL.

Strijker, A. (2001). Using metadata for re-usingmaterial and providing user support tools.In Proceedings of ED-MEDIA 2001,Tampere, Finland.

Teare, R., Davies, D., & Sandelands, E. (1999).The virtual university — An action para-digm and process for workplace learning.Cassell.

Turpeinen, M. (2000). Customizing news con-tent for individuals and communities. In ActaPolytechnica Scandinavica. Mathematicsand computing series no. 103. Doctoral the-sis, Helsinki University of Technology.

Vasudevan, V. (2001). A Web service primer.Retrieved December 20, 2004, from http://www.xml/lpt/a/2001/04/04/Webservices/indeax.html

Yan, T., & Garcia-Molina, H. (1994). Index struc-tures for selective dissemination of informa-tion under the Boolean Model. ACM Trans-actions on Database Systems, 19(2), 332-364.

Yli-Koivisto, J., & Puustjärvi, J. (2002). CoMet:An electronic newspaper prototype. Work-shop on XML in Digital Media. In Proceed-ings of the 8th International Conferenceon Distributed Multimedia Systems(DMS’2002) (pp. 703-707).

J. Puustjärvi obtained his BSc and MSc in computer science in 1985 and 1990, respectively,and his PhD in computer science in 1999, all from the University of Helsinki, Finland. Currentlyhe is a professor of information society technologies at the Technical University of Lappeenranta.He is also a docent of e-business technologies at the Technical University of Helsinki, and adocent of computer science at the University of Helsinki. His research interests include e-learning, e-business, knowledge management, Semantic Web and databases.

P. Pöyry obtained her BA (Educ) and MA (Educ) in 2001 and 2002 from the University ofHelsinki, Finland. Ms. Pöyry obtained her LicSc (Tech) in 2004 from the Helsinki University ofTechnology, where she is a doctoral student. Currently she works as a researcher and preparesher PhD thesis at the Helsinki University of Technology in the Software Business and EngineeringInstitute as a member of the Information Ergonomics Research Group. Her research interestsinclude e-learning, knowledge management, CSCW, and usability research.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Information Retrieval in Virtual Universities

Documents

Transcript of Information Retrieval in Virtual Universities