Compass Semantic search . 12.10.2006TMRA '062 Basics Knowledge model based information retrieval...

Post on 19-Dec-2015

220 views 1 download

Tags:

Transcript of Compass Semantic search . 12.10.2006TMRA '062 Basics Knowledge model based information retrieval...

Compass

Semantic search

www.ovitas.no

12.10.2006 TMRA '06 2

Basics

Knowledge model based information retrieval

Fulltext search enhanced with Topic Maps = Semantic search

Search driven navigation

12.10.2006 TMRA '06 3

Search technologies

Level of precision("Intelligence")

Data volume(Domain size)

Semantic search

Full-text search

Conceptual search

Compass

12.10.2006 TMRA '06 4

Given...

a web site with a lot of text,

which is unstructured (no markup, no tags),

a controlled domain (we know what the discourse domain is), and

non-adequate search engine...

12.10.2006 TMRA '06 5

We would like to...

get relevant hits within a meaningful context,

spare the work of structuring our data,

add semantics to the content by defining a knowledge model.

12.10.2006 TMRA '06 6

Compass-bowl:Take a fulltext search engine.

Take a Topic Maps engine.

Add a hint of semantics.

Define the correct processes for orchestrating the components.

Mix them thoroughly.

Serve to public!

12.10.2006 TMRA '06 7

Full text search engine

Apache Lucene (open source)

Possible to index most file formats html, asp, php, jsp, pdf, rtf, txt, doc, ppt, xls,

pst…

The index is independent of the model No need to re-index when changes are made to

the model Small index size

typically less than 10% of the size of the data Fast index lookup

less than 20 ms for index size >20000

12.10.2006 TMRA '06 8

The knowledge model

Based on the ISO International Standard for Topic Maps

Semantic model of the discourse domain

Concept words = topic names/synonyms

Semantic relationships through associations

Compass Weight defines “closeness” between topics property on association types

12.10.2006 TMRA '06 9

Example

Ovitas

Christopher

type hasEmployee

CW=0.7

Compass

hasProduct

CW=0.8

type

12.10.2006 TMRA '06 10

Compass orchestrator

Guides the processes of the search:1. Search for term in the topic map2. Expand the map for relevant/related

topics3. Send all these terms off to a fulltext

search4. Calculates relevance (based on the

combination of CW and Lucene weights) and prepares the result list as an XML instance

5. Render XML as wished

Topic Map expansio

n

Search term

Hits in the fulltext gruouped by the related

topics

Relevant documents ranked by the

weighting result

Search term in the topic map, but not in the text

Relevant information about ”Chris

Searle”

Synonym search

12.10.2006 TMRA '06 15

Creating/maintaining the model

An MS Excel plug-in serves as the topic map editor

Can be put under version control Import the model into the topic map

engine: one click only For complex topic maps a custom

user interface can be used to enter instance data

12.10.2006 TMRA '06 16

Navigation

Navigation through the associations between topics

Navigation by search

12.10.2006 TMRA '06 17

User configurations

What pages to index What topic map to use The number of hops to perform The threshold for relevance

12.10.2006 TMRA '06 18

Content lifecycle management

Easy to integrate with content repositories

A content management or publishing system can send a request to the indexer to re-index a particular resource

Incremental indexing: add, update or delete documents

HTTP is used as the basic mechanism to address content

12.10.2006 TMRA '06 19

Architecture

SOA (service oriented architecture), no dependency on platform or components

Web service interface (HTTPRest) .NET platform Integrated components:

TMCore Topic Maps engine by NetworkedPlanet

Apache Lucene: full text engine

12.10.2006 TMRA '06 20

Architecture diagram

TM Core Full Text

Excel Editor Compass Service

TMNav

TM editor person

User Publishing System Services

12.10.2006 TMRA '06 21

Compass - Summary

Semantic search based on Topic Maps Search in any document formats Organize information in a topic-oriented

manner Link to relevant information without

touching the data content Conceptual navigation by Topic Maps Tools for maintaining/evolving the

classification Fast and easy implementation