Techniques of information retrieval

Techniques of Information RetrievalTariq Hassan & Sabahat

Road Map :• What is IR ?• Why & How it works?• Evaluation Techniques• Global & Local Methods1. Relevance Feedback2. Probabilistic Relevance Feedback3. Indirect Relevance Feedback4. Rocchio Algorithm5. Linear Classifiers6. Naïve Bayes Text Classification

Question & Discussion

What is IR? Why & How?

• Information needed to satisfy user.

• Why? Due to different formats of Data.• How?

StopListStemmingInverse Document FrequencyWord Counts

What is IR? Why & How?

Generally IR used in 3 scenarios1. Web search2. Personal IR ( Text Classification )3. Enterprise Level

Evaluation Techniques

• Why?• How? Relevant & Non Relevant Documents

Precision And Recall MethodsP = # (relevant Items Retrieved) #(retrieved Items)

R = #(relevant Items Retrieved) #(relevant Items)

Methods:1. Global Methods Reformulation Queries

2. Local MethodsRelative to the initial results against any

Local Methods

1. Relevance Feedback

2. Probabilistic Relevance Feedback

3. Indirect Feedback

1. Relevance FeedbackFeedback given by the user about the relevance of thedocuments in the initial set of results.

1. Relevance Feedback2. Probabilistic Relevance Feedback PRF is implementing by building a classifiers.

1. Relevance Feedback2. Probabilistic Relevance Feedback3. Indirect Relevance Feedback Without user interventions. 1. By using user actions. 2. By using user Histories or Logs

Conclusion : Relevance Feedback

Assumption: User have initial knowledge

Issues : Misspelling Cross Languages Mismatch Vocabulary

Rocchio AlgorithmIncorporates the relevance feedback mechanism in vector space model.Also uses the Cosine Similarity FunctionEuclidean Mechanism

Example

Outcome• Relevance Feedback plays an

important role to understand the user requirements.

• Rocchio Algorithm is not the best but the optimized and better option due to its simplicity and good results.

• Have a significant importance with respect to content based systems.

Classification Problems• Given:

– A document d– A fixed set of categories:

Sports, Informatics, literature, medical, entertainment– A training set of documents each

labeled with its class• Determine:

– A learning method or algorithm which will enable us to learn a classifier

– For a test document dT we have to determine its category

Classification Techniques

• Manual (a.k.a. Knowledge Engineering)

– typically, rule-based expert systems

• Machine Learning

–Naïve Bayesian (Probabilistic)

– Decision Trees (Decision Structures)

– Support Vector Machines (Linear Classification)

Document Representation

• Binary Representation• Frequency Representation• TF*IDF Representation

Naïve Bayes document classification example

• Probabilistic– Prior vs Posterior

• Bernoulli Model– Feature vector with binary

elements• Multinomial Model

– Integers representing frequency of words

Classify the document

Naïve Bayes classfication

• Very fast learning and testing– Why?

• Low storage requirements• Very good in domains with

many equally important features

• More robust to irrelevant features than many learning methods

Linear Classification

• Documents as labeled vectors• Documents in the same class form a

contiguous region of space• Documents from different classes

don’t overlap (much)• Learning a classifier: build surfaces

to delineate classes in the space

Support Vector Machines

• Find a linear hyperplane (decision boundary) that will separate the data

• One Possible Solution

• Another possible solution

• Other possible solutions

• Which one is better? B1 or B2?• How do you define better?

• Find hyperplane maximizes the margin

b21b22

margin

Support Vector MachinesB1

b21b22

margin

Support Vectors

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf 2||||

2 Marginw

1 bxw 1 bxw

1bxw if1

1bxw if1)(

xf 2||||

2 Marginw

Questions & Discussion

Bottom Line• Which classifier do I use for a given document

classification problem? Answer : Depends

How much training data is available? How simple/complex is the problem? How noisy is the data? How stable is the problem over time?

For an unstable problem, its better to use a simple and robust classifier.

Techniques of information retrieval

Technology

Transcript of Techniques of information retrieval

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library

Personalized Information Retrieval system using Computational Intelligence Techniques

Soft Computing Techniques for improving Information ... · Soft Computing Techniques for improving Information Retrieval System 1. Introduction The need to store and retrieve written

Information Retrieval Techniques Israr Hanif M.Phil QAU Islamabad Ph D (In progress) COMSATS.

Vector Space Information Retrieval Techniques for Bioinformatics Data Mining · 2018-09-25 · 0 Vector Space Information Retrieval Techniques for Bioinformatics Data Mining Eric

Personalized information retrieval based on context and ... · Personalized information retrieval based on context and ontological knowledge PH. ... The models and techniques proposed

Retrieval and Evaluation Techniques for Personal Information

Techniques for Information Retrieval from Speech Messages

Evaluating High Accuracy Retrieval Techniques

Semantic Web techniques for multimedia museum information ... · Semantic Web techniques for multimedia museum information handling 3 support search and retrieval across that information,

Information Retrieval Techniques of Google

Learning Techniques for Information Retrieval

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

Sperm retrieval techniques

Information Retrieval Techniques MS(CS) Lecture 5 AIR UNIVERSITY MULTAN CAMPUS.

Introduction to Information Retrieval - cis.csuohio.educis.csuohio.edu/~sschung/cis612/Lecture_Intro_IR_PhrasePositioning.pdf · Information Retrieval • Information Retrieval (IR)

Clustering and Search Techniques in Information Retrieval Systems

Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G.

Translation Techniques in Cross-Language Information … · Translation Techniques in Cross-Language Information Retrieval · 3 Fig. 2: Cross-language information retrieval utilising

Information Retrieval Techniques MS(CS) Lecture 2 AIR UNIVERSITY MULTAN CAMPUS.