Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan...
-
Upload
felix-adams -
Category
Documents
-
view
216 -
download
2
Transcript of Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan...
An information retrieval system for parliamentary documentsBook: Bayesian Networks : A practical guide to
applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso E. Romero Chapter: 12
Presented byQuratulain
CSE 655 Probabilistic ReasoningFaculty of Computer Science,
Institute of Business Administration
Quratulain 2
OutlineIntroduction
Overview of information retrieval systems
Bayesian network and information retrieval
Theoretical foundations
Building the information retrieval system
Conclusion
10 oct, 2009
Quratulain 3
Introduction/MotivationTo fulfil the objective of democracy, need to make
public all activities of parliament.Previously, information was sent in a printed form
to all official organization and libraries.Currently, electronic document published on the
web, which is fast, cheaper and an easier way.The official bulletin, transcripts of all speeches in
different session, after editing published on website in PDF.
The documents are accessible using database-like queries.
10 oct, 2009
Quratulain 4
ProblemsTo access information user must know about:
Session number
Date of legislature
Difficult to access information
10 oct, 2009
Quratulain 5
GoalA website with real search engine based on
content.The natural language query is applied to
access the information.The obtained the relevant document through
system.The output will be a set of document
components of varying granularity (from complete document to single paragraph, also sorted depending on degree of relevance).
** This will avoid manual search **10 oct, 2009
Quratulain 6
OutlineIntroduction
Overview of information retrieval
systems
Bayesian network and information retrieval
Theoretical foundations
Building the information retrieval system
Conclusion10 oct, 2009
7
Overview of information retrievalInformation retrieval is concerned with representation,
storage, organization, and accessing of information items.
Information retrieval systems work as:Given a set of documentsPre-processing
remove words not useful in search(stopwords) Convert word to its stem word(reduce vocabulary) Each word is associated with weights expressing their
importance (in document or collection of documents)NLP query indexed to match query representation with
the stored document using any IR model.Finally, a set of document identifiers is presented to the
user sorted according to their relevance degree.10 oct, 2009 Quratulain
Quratulain 8
Overview of information retrievalStandard IR treat document as atomic entities.
XML allows structured documents with
semantics.Structured IR views documents as aggregates
interrelated structural elements by indexing.Structured IR models exploit the content and
the structure of documents to estimate the relevance of document components to query.
10 oct, 2009
Quratulain 9
OutlineIntroduction
Overview of information retrieval systems
Bayesian network and information
retrieval
Theoretical foundations
Building the information retrieval system
Conclusion10 oct, 2009
Quratulain 10
Bayesian Networks and information retrievalBayesian networks were first applied to IR at the
beginning of 1990 by croft and turtle.Bayesian network in IR models compute the
probability of relevance given a document and a query.Two important model of BNs within IR:
Belief network modelBayesian network retrieval model.Common feature are:
Each index term and document represented as nodes in network.
Links connecting each document node with all the term nodes.Model differ in:
The direction of arc. Additional arc (relationship b/w documents and terms.)
10 oct, 2009
Quratulain 11
BN-based retrieval model
10 oct, 2009
D2
T1
D1
T7
T6T5
T4T3
T2
D3
Terms
Documents
Quratulain 12
Drawback of Bayesian network1. Time and space require to assess the
distributions and store them(conditional probability per node is exponential with the parent
nodes)2. The efficiency of carrying out inference,
because general inference in BNs is NP-hard problem
ThereforeThe direct approach where we propagate the evidence contained in a query through the whole network is unfeasible .
10 oct, 2009
Quratulain 13
OutlineIntroduction
Overview of information retrieval systems
Bayesian network and information retrieval
Theoretical foundations
Building the information retrieval system
Conclusion
10 oct, 2009
14
Theoretical foundationsSet of documents D={D1 ,D2 , ..., DM}
Set of terms used to index these documentsEach document Di is organized hierarchically,
representing structural associations of elements in Di called structural unit.
These association to a document form a tree. For example scientific article.
10 oct, 2009 Quratulain
Quratulain 15
The structure of scientific article
10 oct, 2009
Index Terms
TitleParag
1Parag
2Title
Parag 1
TitleParag
1
Ref 1
Ref 2
Subsec 1
Subsec 2
Section 1
Section 2
BibligraphyTitle
Author
Abstract
Document 1
Quratulain 16
BN model for documentBN modeling of document contain 3-kind of
nodesTerms set , T={T1, T2, ..., Tl}Basic structural unit, Ub ={B1, B2, ..., Bm}Complex structural unit, Uc ={S1, S2, ..., Sm}
Set of all structural unit U= Ub Uc
To each node T, B, S is associated a binary random variables as {t- , t+}, {b- , b+} or {s- , s+} respectively. (-) not relevant , (+) relevant.
10 oct, 2009
Quratulain 17
BN model for document
10 oct, 2009
Ub
T1 T6T11
T10
T9T8T2 T3 T4 T5 T7T16
T15
T14
T13
T12
B1 B6B2 B3 B4 B5 B7
S1 S2 S3
S4
Uc Uc Us , with Pa(S1) Pa(S2) = , S1 S2 Uc
Quratulain 18
BN for documentConditional Probability
P(t+)P(b+|pa(B))P(s+|pa(S))
Due to greater number of parent, efficient inference procedure is needed.
10 oct, 2009
Quratulain 19
Influence Diagram ModelOnce the BN has been constructed transform
it into influence diagram by including decision and utility nodes.Chance node : previous BNDecision node : Utility node :
10 oct, 2009
Quratulain 20
OutlineIntroduction
Overview of information retrieval systems
Bayesian network and information retrieval
Theoretical foundations
Building the information retrieval
system
Conclusion10 oct, 2009
Quratulain 21
Building the information retrieval system(PAIRS)PAIRS is a software package (store document in
relational database)Written in C++Specifically developed to store and retrieve
documents generated by the parliament of AndalusiaBased on probabilistic model.
10 oct, 2009
PDF documen
t collectio
n
XML documen
t collectio
n
Indexing System
Query
Indexed Query
Search Engine
Indexed Document Collection
Retrieved Document
ComponentsGen
era
l sc
hem
e o
f PA
IRS
Quratulain 22
OutlineIntroduction
Overview of information retrieval systems
Bayesian network and information retrieval
Theoretical foundations
Building the information retrieval system
Conclusion
10 oct, 2009
Quratulain 23
ConclusionThis paper present a retrieval system based on
probabilistic model belong to parliament information.
The system has been proven efficient in term of
indexing and retrieval time.
Bayesian network technologies can be employed in
problem domains whose dimensionality would earlier
avoid its use.
The system is not a finished product, still several
possible improvement are required.10 oct, 2009