INEX: Evaluating INEX: Evaluating content-oriented XML content-oriented XML
retrieval retrieval Mounia LalmasMounia Lalmas
Queen Mary University of LondonQueen Mary University of Londonhttp://qmir.dcs.qmul.ac.ukhttp://qmir.dcs.qmul.ac.uk
OutlineOutline
Content-oriented XML Content-oriented XML retrievalretrieval
Evaluating XML retrieval: Evaluating XML retrieval: INEXINEX
XML RetrievalXML Retrieval Traditional IR is about finding relevant documents to a Traditional IR is about finding relevant documents to a
user’s information need, e.g. entire book.user’s information need, e.g. entire book.
XML retrieval allows users to retrieve document XML retrieval allows users to retrieve document components that are more focussed to their information components that are more focussed to their information needs, e.g a chapter of a book instead of an entire book.needs, e.g a chapter of a book instead of an entire book.
The structure of documents is exploited to identify which The structure of documents is exploited to identify which document components to retrieve.document components to retrieve.
Structured DocumentsStructured Documents
Linear order of words, sentences, paragraphs …
Hierarchy or logical structure of a book’s chapters, sections …
Links (hyperlink), cross-references, citations …
Temporal and spatial relationships in multimedia documents
Book
Chapters
Sections
Paragraphs
World Wide Web
This is only only another to look one le to show the need an la a out structure of and more a document and so ass to it doe not necessary text a structured document have retrieval on the web is an it important topic of today’s research it issues to make se last sentence..
Structured DocumentsStructured Documents ExplicitExplicit structure structure formalised formalised
through document representation through document representation standards (standards (mark-up languagesmark-up languages))
LayoutLayoutLaTeX (publishing), HTML (Web LaTeX (publishing), HTML (Web publishing)publishing)
StructureStructureSGML, SGML, XMLXML (Web publishing, (Web publishing, engineering), MPEG-7 (broadcasting)engineering), MPEG-7 (broadcasting)
Content/Content/SemanticSemanticRDF, DAML + OIL, OWL (semantic RDF, DAML + OIL, OWL (semantic web)web)
World Wide Web
This is only only another to look one le to show the need an la a out structure of and more a document and so ass to it doe not necessary text a structured document have retrieval on the web is an it important topic of today’s research it issues to make se last sentence..
<b><font size=+2>SDR</font></b><img src="qmir.jpg" border=0>
<section> <subsection> <paragraph>… </paragraph> <paragraph>… </paragraph> </subsection></section>
<Book rdf:about=“book”> <rdf:author=“..”/> <rdf:title=“…”/></Book>
XML: eXML: eXXtensible tensible Mark-upMark-up LLanguageanguage
Meta-language (user-defined tags) currently Meta-language (user-defined tags) currently being adopted as the document format being adopted as the document format language by W3Clanguage by W3C
Used to describe content and structure (and Used to describe content and structure (and not layout)not layout)
Grammar described in DTD (Grammar described in DTD ( used for used for validation)validation)<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into XML retrieval </title> <paragraph> …. </paragraph> … </chapter> …</lecture>
<!ELEMENT lecture (title, author+,chapter+)><!ELEMENT author (fnm*,snm)><!ELEMENT fnm #PCDATA>…
XML: eXML: eXXtensible tensible Mark-upMark-up LLanguageanguage
Use of XPath notation to refer to the Use of XPath notation to refer to the XML structureXML structure
chapter/title: title is a direct sub-component of chapter//title: any titlechapter//title: title is a direct or indirect sub-component of chapterchapter/paragraph[2]: any direct second paragraph of any chapterchapter/*: all direct sub-components of a chapter
<lecture> <title> Structured Document Retrieval </title> <author> <fnm> Smith </fnm> <snm> John </snm> </author> <chapter> <title> Introduction into SDR </title> <paragraph> …. </paragraph> … </chapter> …</lecture>
Querying XML documentsQuerying XML documents Content-only (CO) queriesContent-only (CO) queries
''open standards for digital video in distance learningopen standards for digital video in distance learning''
Content-and-structure (CAS) queriesContent-and-structure (CAS) queries
//article [about(., 'formal methods verify correctness aviation //article [about(., 'formal methods verify correctness aviation systems')]systems')]
/body//section/body//section [about(.,'case study application model checking theorem [about(.,'case study application model checking theorem
proving')]proving')]
Structure-only (SA) queriesStructure-only (SA) queries
/article//*section/paragraph[2]/article//*section/paragraph[2]
Content-oriented XML Content-oriented XML retrievalretrieval
Return document components of Return document components of varying granularityvarying granularity (e.g. a book, (e.g. a book,
a chapter, a section, a paragraph, a a chapter, a section, a paragraph, a table, a figure, etc), relevant to the table, a figure, etc), relevant to the user’s information need both with user’s information need both with
regards to regards to contentcontent and and structurestructure..
Content-oriented XML Content-oriented XML retrievalretrieval
Retrieve theRetrieve the bestbest components components according to content and structure according to content and structure criteria:criteria: INEX:INEX: most specific component that satisfies the query, most specific component that satisfies the query,
while being exhaustive to the querywhile being exhaustive to the query
Shakespeare study:Shakespeare study: best entry points, which are best entry points, which are components from which many relevant components can be components from which many relevant components can be reached through browsingreached through browsing
??????
ArticleArticle ?XML,??XML,?retrievalretrieval
??authoringauthoring
0.9 XML 0.5 XML 0.2 XML0.9 XML 0.5 XML 0.2 XML 0.4 retrieval 0.7 0.4 retrieval 0.7
authoringauthoring
ChallengesChallenges
Title Section 1 Section 2
no fixed retrieval unit + nested elements + element types how to obtain document and collection statistics? which component is a good retrieval unit? which components contribute best to content of Article? how to estimate? how to aggregate?
0.4 0.50.2
0.6 0.40.4
0.2
Approaches …Approaches …vector space model
probabilistic model
bayesian network
language model
extending DB model
boolean model
natural language processing
cognitive model
ontology
parameter estimation
tuning
smoothing
fusion
phrase
term statistics
collection statistics
component statistics
proximity search
logistic regression
belief modelrelevance feedback
Vector space modelVector space model
article index
abstract index
section index
sub-section index
paragraph index
RSV normalised RSV
RSV normalised RSV
RSV normalised RSV
RSV normalised RSV
RSV normalised RSV
merge
tf and idf as for fixed and non-nested retrieval units
(IBM Haifa, INEX 2003)
Language modelLanguage modelelement language modelcollection language modelsmoothing parameter
element score
element sizeelement scorearticle score
query expansion with blind feedbackignore elements with 20 terms
high value of leads to increase in size of retrieved elements
results with = 0.9, 0.5 and 0.2 similar
rank element
(University of Amsterdam, INEX 2003)
Evaluation of XML retrieval: Evaluation of XML retrieval: INEXINEX
Evaluating the effectiveness of content-oriented XML Evaluating the effectiveness of content-oriented XML retrieval approachesretrieval approaches
Collaborative effort Collaborative effort participants contribute to the participants contribute to the development of the collectiondevelopment of the collection
queriesqueriesrelevance assessmentsrelevance assessments
Similar methodology as for TREC, but adapted to XML Similar methodology as for TREC, but adapted to XML retrievalretrieval
40+ participants worldwide40+ participants worldwide
Workshop in Schloss Dagstuhl in December (20+ institutions)Workshop in Schloss Dagstuhl in December (20+ institutions)
INEX Test CollectionINEX Test Collection Documents (~500MB), which consist of 12,107 articles in Documents (~500MB), which consist of 12,107 articles in
XML format from the IEEE Computer Society; 8 millions XML format from the IEEE Computer Society; 8 millions elementselements
INEX 2002INEX 200230 CO and 30 CAS queries30 CO and 30 CAS queriesinex2002 metricinex2002 metric
INEX 2003INEX 200336 CO and 30 CAS queries36 CO and 30 CAS queriesCAS queries are defined according to enhanced subset of XPathCAS queries are defined according to enhanced subset of XPathinex2002 and inex2003 metricsinex2002 and inex2003 metrics
INEX 2004 is just startingINEX 2004 is just starting
TasksTasks COCO: aim is to decrease user effort by : aim is to decrease user effort by
pointing the user to the most specific pointing the user to the most specific relevant portions of documents. relevant portions of documents.
SCASSCAS: retrieve relevant nodes that match : retrieve relevant nodes that match the structure specified in the query. the structure specified in the query.
VCASVCAS: retrieve relevant nodes that may : retrieve relevant nodes that may not be the same as the target elements, not be the same as the target elements, but are structurally similar.but are structurally similar.
Relevance in XMLRelevance in XML
A element is relevant if it “has significant A element is relevant if it “has significant and demonstrable bearing on the matter at and demonstrable bearing on the matter at hand”hand”
Common assumptions in IRCommon assumptions in IR ObjectivityObjectivity TopicalityTopicality Binary natureBinary nature IndependenceIndependence
section
paragraph
article
1 2
1 2 3
Relevance in INEXRelevance in INEX
ExhaustivityExhaustivityhow exhaustively a document component discusses the how exhaustively a document component discusses the query: 0, 1, 2, 3query: 0, 1, 2, 3
SpecificitySpecificityhow focused the component is on the query: 0, 1, 2, 3how focused the component is on the query: 0, 1, 2, 3
RelevanceRelevance (3,3), (2,3), (1,1), (0,0), …(3,3), (2,3), (1,1), (0,0), …
section
article all sections relevant article very relevantall sections relevant article better than sectionsone section relevant article less relevantone section relevant section better than article…
Relevance assessment Relevance assessment tasktask
CompletenessCompleteness Element Element parent element, children element parent element, children element
ConsistencyConsistency Parent of a relevant element must also be relevant, Parent of a relevant element must also be relevant,
although to a different extentalthough to a different extent Exhaustivity increase going Exhaustivity increase going Specificity decrease going Specificity decrease going
Use of an online interfaceUse of an online interface Assessing a query takes a week!Assessing a query takes a week! Average 2 topics per participantsAverage 2 topics per participants
section
paragraph
article
1 2
1 2 3
InterfaceInterface
Currentassessments
Navigation
Groups
AssessmentsAssessments With respect to the elemens to assessWith respect to the elemens to assess
26 % assessments on elements in the pool (66 26 % assessments on elements in the pool (66 % in INEX 2002).% in INEX 2002).68 % highly specific elements not in the pool68 % highly specific elements not in the pool
7 % elements automatically assessed7 % elements automatically assessed
INEX 2002INEX 200223 inconsistent assessments per query for one 23 inconsistent assessments per query for one rulerule
MetricsMetricsNeed to consider:Need to consider:
Two dimensions of relevanceTwo dimensions of relevance Independency assumption does not holdIndependency assumption does not hold No predefined retrieval unitNo predefined retrieval unit OverlapOverlap Linear vs. clustered rankingLinear vs. clustered ranking
section
article
INEX 2002 metricINEX 2002 metricQuantization: Quantization:
strict strict
generalizedgeneralized
€
fstrict(exh,spec) =1 if exh = 3 and spec = 30 otherwise ⎧ ⎨ ⎩
€
fgen(exh,spec) =
1.00 if (exh,spec) = 330.75 if (exh,spec)∈ {23,32,31}0.50 if (exh,spec)∈ {13,22,21}0.25 if (exh,spec)∈ {11,12}0.00 if (exh,spec) = 00
⎧
⎨
⎪ ⎪ ⎪
⎩
⎪ ⎪ ⎪
INEX 2002 metricINEX 2002 metricPrecision as defined by Raghavan’89 Precision as defined by Raghavan’89 (based on ESL)(based on ESL)
where n is estimatedwhere n is estimated1
))(|(
+⋅
++⋅
⋅=
risjnx
nxxretrreP
€
n = f (assess(c))c∈C∑
Overlap problemOverlap problem
INEX 2003 metricINEX 2003 metricIdeal concept space (Wong & Yao ‘95)Ideal concept space (Wong & Yao ‘95)
cct
spec∩
=
tct
exh∩
= c
t
INEX 2003 metricINEX 2003 metricQuantization:Quantization:
strictstrict
generalisedgeneralised€
exhstrict(exh) =1 if exh = 30 otherwise ⎧ ⎨ ⎩
€
specstrict(spec) =1 if spec = 30 otherwise ⎧ ⎨ ⎩
€
exhgen(exh) = exh / 3
€
specgen(spec) = spec / 3
INEX 2003 metricINEX 2003 metricIgnoring overlap:Ignoring overlap:
€
recalls =t∩ c i
U
i=1
k
∑
t∩ c iU
i=1
N
∑=
exh(c iU )
i=1
k
∑
exh(c iU )
i=1
N
∑
€
precisions =
t∩ c iU
c iU ⋅ c iT
i=1
k
∑
c iT
i=1
k
∑=
spec(c iU ) ⋅ c iT
i=1
k
∑
c iT
i=1
k
∑
INEX 2003 metricINEX 2003 metricConsidering overlap:Considering overlap:
€
recallo =
exh(c iU ) ⋅
c iT − U j=1
i−1 c jT
c iT
i=1
k
∑
exh(c iU )
i=1
N
∑
€
precisiono =spec(c i
U ) ⋅ c iT − c jT
j=1
i−1Ui=1
k
∑
c iT − c j
Tj=1
i−1Ui=1
k
∑
INEX 2003 metricINEX 2003 metric Penalises overlap by only scoring novel Penalises overlap by only scoring novel
information in overlapping resultsinformation in overlapping results Assume uniform distribution of relevant Assume uniform distribution of relevant
informationinformation Issue of stabilityIssue of stability Size considered directly in precision (is it Size considered directly in precision (is it
intuitive that large is good or not?)intuitive that large is good or not?) Recall defined using exh onlyRecall defined using exh only Precision defined using spec onlyPrecision defined using spec only
Alternative metricsAlternative metrics User-effort oriented measuresUser-effort oriented measures
Expected Relevant RatioExpected Relevant RatioTolerance to IrrelevanceTolerance to Irrelevance
Discounted Cumulated GainDiscounted Cumulated Gain
Lessons learntLessons learnt Good definition of relevanceGood definition of relevance
Expressing CAS queries was not easyExpressing CAS queries was not easy
Relevance assessment process must be Relevance assessment process must be “improved”“improved”
Further development on metrics neededFurther development on metrics needed
User studies requiredUser studies required
ConclusionConclusion XML retrieval is not just about the effective XML retrieval is not just about the effective
retrieval of XML documents, but also about how retrieval of XML documents, but also about how to evaluate effectivenessto evaluate effectiveness
INEX 2004 tracksINEX 2004 tracks Relevance feedbackRelevance feedback InteractiveInteractive Heterogeneous collectionHeterogeneous collection Natural language queryNatural language query
http://inex.is.informatik.uni-duisburg.de:2004/
INEX: Evaluating INEX: Evaluating content-oriented XML content-oriented XML
retrievalretrievalMounia LalmasMounia Lalmas
Queen Mary University of LondonQueen Mary University of Londonhttp://qmir.dcs.qmul.ac.ukhttp://qmir.dcs.qmul.ac.uk
Top Related