05/06/2007CRIS 2008 Personalizing Information Retrieval in CRISs with Fuzzy Sets and Rough Sets...

Post on 13-Jan-2016

216 views 0 download

Tags:

Transcript of 05/06/2007CRIS 2008 Personalizing Information Retrieval in CRISs with Fuzzy Sets and Rough Sets...

05/06/2007 CRIS 2008

Personalizing Information Retrieval in CRISs with Fuzzy Sets and Rough Sets

Germán Hurtado Martín1,2

Chris Cornelis2

Helga Naessens1

1. University College Ghent, 2. Ghent University (Belgium)

CRIS 2008 205/06/2007

Overview

Problems in CRISs Fuzzy sets and Rough sets PAS project

CRIS 2008 305/06/2007

Overview

Problems in CRISs Fuzzy sets and Rough sets PAS project

CRIS 2008 405/06/2007

Problems in CRISs

Fuzzy

RoughTerm = Term

CRIS 2008 505/06/2007

Overview

Problems in CRISs Fuzzy sets and Rough sets PAS project

CRIS 2008 605/06/2007

Fuzzy sets and rough sets

Traditional approach: crisp sets

Young people = {x People | 0<age(x)<27}

CRIS 2008 705/06/2007

Fuzzy sets and rough sets

Fuzzy approach: fuzzy sets

0 if age(x) ≥ 301 if age(x) ≤ 20(30 – age(x)) / 10 otherwise

Young(x) =

CRIS 2008 805/06/2007

Fuzzy sets and rough sets Rough approach: rough sets

Upper approximation (R↑A)

A = {Numerical Analysis} B = {Compilers}

R↑A = {Num. Analysis, Ex. Sciences, Statistics, ... , Coding Theory} R↑B = {Compilers, Programming, GCC, YACC}

CRIS 2008 905/06/2007

Fuzzy rough sets

Fuzzy approach on rough sets Fuzzy set A Fuzzy relation R

R (x,y) Upper approximation

(R↑A)(y) = min(R(x,y),A(y))Xx∈

sup

CRIS 2008 1005/06/2007

Fuzzy rough sets: application

Query expansionAllows more results by using R↑A

R Programming Hardware C++ Java Laptop Algorithm

Programming 1.0 0.8 0.8 0.6

Hardware 1.0 0.4

C++ 0.8 1.0 0.7 0.2

Java 0.8 0.7 1.0 0.2

Laptop 0.4 1.0

Algorithm 0.6 0.2 0.2 1.0

- Query: “Programming”- Expanded query: {(“Programming”,1.0), (“C++”,0.8), (“Java”,0.8), (“Algorithm”,0.6)}

CRIS 2008 1105/06/2007

Overview

Problems in CRISs Fuzzy sets and Rough sets PAS project

CRIS 2008 1205/06/2007

PAS-project

What is the PAS-project? Personal Alert System (HoGent) Goal: to get the researcher’s attention on funding

possibilities that match his/her profile Information: about researchers, projects, funding

possibilities (grants etc.) → matching/collaboration Automation and intelligence

CRIS 2008 1305/06/2007

PAS – How does it work?

-Name

-Staff number

-Department(s)

-Group

-Date of creation of the profile

-Last update of the profile

-Percentage research time

-Skills description

-Diplomas

-Publications

-IWETO-keywords

-Free keywords

Fill in

IWETO

Thesaurus

HoGent

Thesaurus

User

CRIS 2008 1405/06/2007

PAS – How does it work?

-Reference

-Title

-Content

-Attachment(s)

-Level

-Duration

-Institution

-Deadline

-Address

-Contact person

-IWETO-keywords

-Free keywords

IWETO

Thesaurus

Messages

HoGent

Thesaurus

CRIS 2008 1505/06/2007

PAS – How does it work?

The IWETO-classification has 641 research fields:

5 at the 1st level, 31 at the 2nd level, 605 at the 3rd level

1

2

3

CRIS 2008 1605/06/2007

PAS – How does it work?

By adding “free keywords” we can refine the classification

1

2

3

0.6

0.7

0.8

CRIS 2008 1705/06/2007

PAS – How does it work?

Query:A = {k3}

Expanded query:R↑A = {(k1,0.8), (k3,1.0), …}

M1 → R2

CRIS 2008 1805/06/2007

PAS – How does it work?

0.6

0.7

0.80.7

CRIS 2008 1905/06/2007

CRIS 2008 2005/06/2007

CRIS 2008 2105/06/2007

CRIS 2008 2205/06/2007

CRIS 2008 2305/06/2007

CRIS 2008 2405/06/2007

CRIS 2008 2505/06/2007

PAS – Current implementation

Prototype that will be used as skeleton for the final system

Basic algorithm using weights and their products and basic fuzzy rough query expansion1

Basic profiles and messages Manual processing of feedback and manual data

extraction from text files.

1 P. Srinivasan, M. E. Ruiz, D. H. Kraft, J. Chen: Vocabulary mining for information retrieval: rough sets and fuzzy sets, Information Processing and Management, 37(1) (2001) 15-38

CRIS 2008 2605/06/2007

PAS – Future work

Richer representation of profiles and messages Automation of the feedback mechanism Dealing with imprecision and words from different thesauri Dealing with ambiguity and incomplete profiles Tracking research activities for collaboration Automatic extraction of information from text files Search engine

CRIS 2008 2705/06/2007

Thank you