A Vector Space Model for Automatic Indexing

Post on 06-Jan-2016

43 views 5 download

description

A Vector Space Model for Automatic Indexing. Enhanced Vector Space Models for Content-based Recommender Systems. G. Salton, A. Wong and C. S. Yang. Cataldo Musto. Presenter Sawood Alam . A Vector Space Model for Automatic Indexing. G. Salton, A. Wong and C. S. Yang - PowerPoint PPT Presentation

Transcript of A Vector Space Model for Automatic Indexing

A Vector Space Model for Automatic Indexing

G. Salton, A. Wong and C. S. Yang

Enhanced Vector Space Models for Content-based Recommender Systems

Cataldo Musto

PresenterSawood Alam <salam@cs.odu.edu>

A Vector Space Model for Automatic Indexing

G. Salton, A. Wong and C. S. YangCornell University

Introduction

• In document retrieval, best indexing space is where each entity lies far away from others

• Density of the object space becomes a measure of indexing system

• Retrieval performance correlate inversely with space density

Document Space

• Di = (di1, di2, di3, …, dij)

Document Space (cont.)

Document Space (cont.)

Indexing Performance vs. Space Density

Cluster Density vs. Indexing Performance

Discrimination Value Model

Discrimination Value Model (cont.)

Discrimination Value Model Summary

Average Recall vs. Precision

Summary Recall vs. Precision

Enhanced Vector Space Models for Content-based Recommender Systems

Cataldo MustoDept. of Computer Science

University of Bari, Italycataldomusto@di.uniba.it

Introduction

• Vector Space Models (VSM) in Information Retrieval is an established practice

• Investigate the impact of vector space models in Information Filtering– Recommender system

Problems of VSM

• High dimensionality– Becoming more serious due to emerging social

apps and micro-blogging, generating lots of web content and new vocabulary

• Inability to manage document semantics– Order of the term occurrence in the document

Components

• Context vector for each term– Values in {-1, 0, 1}

• Vector Space representation of a term (t)• Vector Space representation of a document (d)• Vector Space representation of a user profile (pu)

Indexing Technique

• Random Indexing-based model• Weighted Random Indexing-based model• Semantic Vector-based model• Weighted Semantic Vector-based model

Experimental Evaluation

Conclusions

• First prototype with naive weighting scheme is comparable to other content based filtering techniques like Bayesian classifier

• Other complex weighting schemes should perform better

• User profiles may be studied based on Linked Data rather than keyword based user profiles