Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st...
-
Upload
derrick-eaton -
Category
Documents
-
view
227 -
download
0
Transcript of Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st...
![Page 1: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/1.jpg)
Information RetrievalInformation Retrieval
Lebanese UniversityFaculty of Economics and Business
Administration – 1st Branch
Class: M1Instructor: Dr. Lina A. Nimri
1
![Page 2: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/2.jpg)
Course Text BookCourse Text Book
Modern Information Retrieval,
R. Baeza-yates and B. Ribeiro-Neto.,
Addison-Wesley and ACM Press, 1999,
ISBN: 0-201-39829-X
2
![Page 3: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/3.jpg)
IntroductionIntroduction
Modern Information Retrieval, Chapter 1 Ricardo Baeza-Yates, Berthier Ribeiro-Neto
![Page 4: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/4.jpg)
IntroductionIntroduction Examples of information need in the context of the
world wide web: “Find all documents containing information on
computer courses which: (1) are offered by universities in South England, and (2) are accredited by the BCS/IEE bodies,
To be relevant, the document must include information on admission requirements, and e-mail and phone number for contact purpose.” “Find all docs containing information on college
tennis teams which:
(1) are maintained by a USA university and
(2) participate in the NCAA tournament.
Information Retrieval4
![Page 5: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/5.jpg)
5
Information RetrievalInformation Retrieval
Retrieval SystemRetrieval System
QueryQuery
Set of retrieved documentsdocuments
Docu
men
tsD
ocu
men
tsUser Information NeedUser Information Need
Search EngineSearch Engine
Useful or relevant Useful or relevant information to the userinformation to the user
Primary goal of an IR system“Retrieve all the documents which are relevant to a user
query, while retrieving as few non-relevant documents as possible.”
Representation, storage, organisation, and access to information items
(Usually) keyword-based representation
![Page 6: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/6.jpg)
Data RetrievalData Retrieval
Determine which documents contain the keywords in the user query is not always enough to satisfy the user information need.
Data Retrieval retrieves objects which satisfy clearly defined conditions, such as regular expressions or relational algebra expressions.
Data Retrieval system deals with data with well-defined structure and semantics
6
![Page 7: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/7.jpg)
Information Retrieval SystemInformation Retrieval System
Retrieving information about a subjectDeals with natural language text which
is not well structured and could be semantically ambiguous
It must interpret the contents of documents and rank them according to the degree of relevance to the user need.
7
![Page 8: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/8.jpg)
Area of interestArea of interest
Digital LibrariesInformation expertsWorld Wide Web - Very difficult task
– The hyperspace is vast– The absence of a well defined data model
(format or representation form)
8
![Page 9: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/9.jpg)
Effective retrievalEffective retrieval
The effective retrieval of relevant information is directly affected by:– The user task– The logical view of the document
(document’s representation) adopted by the retrieval system.
9
![Page 10: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/10.jpg)
User tasksUser tasks
Pull technology User requests
information in an interactive manner
3 retrieval tasks– Browsing (hypertext)– Retrieval (classical IR
systems)– Browsing and retrieval
(modern digital libraries and web systems)
Push technology– automatic and
permanent pushing of information to user
– software agents– example: news
service– filtering (retrieval
task) relevant information for later inspection by user
10
![Page 11: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/11.jpg)
PullingPulling
The user can browse the documents when his main objectives are not clear in the beginning and whose purpose might change during the interaction with the system.
Combination of retrieval and browsing is not yet a well established approach.
11
Retrieval
Browsing
Database
![Page 12: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/12.jpg)
DocumentsDocumentsUnit of retrievalA passage of free text
– composed of text, strings of characters from an alphabet
– composed of natural language newspaper article, a journal paper, a
dictionary definition, email messages
– size of documents arbitrary newspaper article vs. journal paper vs.
email12
![Page 13: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/13.jpg)
What is a document?What is a document?
13
![Page 14: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/14.jpg)
Representation of documentsRepresentation of documents Documents are represented thru a set of index
terms or keywords or term descriptors– extracted directly form text– specified by human subjects (information science)
metadata Most concise representation Poor quality of retrieval
Full text representation– Most complete representation– High computational cost
Large collections– Reduce set of representative keywords
Elimination of stop words Stemming Identification of noun phrases Further compression 14
Document term descriptors to access texts
Generation of descriptors for text• By hand
• By analysing the text
![Page 15: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/15.jpg)
Logical View of the Logical View of the documentsdocuments
15
structure
Accentsspacing stopwords
Noungroups stemming
Manual indexingDocs
structure Full text Index terms
![Page 16: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/16.jpg)
The retrieval functionsThe retrieval functions
16
Information need
Query
FormulationFormulation
Documents
Document representation
IndexingIndexing
Retrieved documents
Retrieval functionsRetrieval functions
Rele
vance
fe
edb
ack
![Page 17: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/17.jpg)
QueriesQueries
Information Need: Simple queries
– composed of two or three, perhaps even dozens, of keywords
– e.g., as in web retrieval Boolean queries
– “neural networks AND speech recognition” Context Queries
– Proximity search, phrase queries
17
User term descriptors characterising the user need
![Page 18: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/18.jpg)
Best-Match retrievalBest-Match retrieval
Compare the terms in a document and query
Compute similarity between each document in the collection and the query based on the terms that they have in common
Sorting the documents in order of decreasing similarity with the query
The outputs are a ranked list and displayed to the user - the top ones are more relevant as judged by the system
18
Document term descriptors to access texts
User term descriptors characterising the user need
![Page 19: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/19.jpg)
Conceptual view of text Conceptual view of text retrieval systemretrieval system
19
Queries DocumentsSimilarity
Computation
RetrievedDocuments
![Page 20: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/20.jpg)
Expanded view of text Expanded view of text retrieval systemretrieval system
20
Queries DocumentsIndexingIndexed
DocumentsSimilarity
Computation
RetrievedDocuments
RankedDocuments
![Page 21: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/21.jpg)
Process of retrieving infoProcess of retrieving info
21
User Interface
Text Operations
Query Operations
Indexing
Similarity Computation (Searching)
Ranking
Document RepositoryManager
Index
User need
Logical view Logical view
Inverted file
Query
Retrieved docs
Text
TextUser feedback
Ranked docs
Text repository
![Page 22: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/22.jpg)
Key TopicsKey Topics
Indexing text documents Retrieving text documents Evaluation Query reformulations
Search Engines =
IR + Link Structure + Name Interpretation
22
![Page 23: Information Retrieval Lebanese University Faculty of Economics and Business Administration – 1 st Branch Class: M1 Instructor: Dr. Lina A. Nimri 1.](https://reader033.fdocuments.us/reader033/viewer/2022050723/5697bfd61a28abf838cadcdc/html5/thumbnails/23.jpg)
Information Retrieval Information Retrieval vs Information Extractionvs Information Extraction
Information Retrieval– Given a set of query terms and a set of document
terms select only the most relevant documents [precision], and preferably all the relevant [recall].
Information Extraction– Extract from the text what the document means.
IR systems can FIND documents but need not “understand” them
23