An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder...
-
Upload
beatrix-thomas -
Category
Documents
-
view
213 -
download
0
Transcript of An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder...
An Architecture for Emergent Semantics
Sven Herschel, Ralf Heese, and Jens Bleiholder
Humboldt-Universität zu Berlin/Hasso-Plattner-Institut
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 2
Ideas of Emergent Semantics
Improve document representation
• by aggregating many users’ opinions
• Adding keywords implicitly whilequerying the corpus
Living document representationinstead of query reformulation
• Entirely new keywords
• Immediate change of thedocument representation andof the corpus index
corpus/doc repr.
User query
IR Query EngineIR Query Engine
Information Retrievaltoday
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 3
Outline
Basement
(Background)
Construction
(Architecture of Emergent Semantics)
Assessment
(Evaluation)
Roof and Windows
(Conclusion and Future Work)
Basement (Background)
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 5
Information Retrieval
Information Retrieval
• Content-oriented search on a set of documents
• Find an document representation to retrieve documents effectively and efficiently according to the user’s query
Today's approaches
• Capture the semantics of a document by analyzing syntactic information
• No new words in document representation
Synonyms cannot be added
• Query refinement
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 6
Semiotic
SyntaxSyntax
PragmaticsPragmatics
A tall perennial woody plant …
A figure that branchesfrom a single root …
http://www.wordreference.com/definition/tree
t r e e
Basement
Construction
Assessment
Roof and Windows
SemanticsSemantics
signs signssigns represented object signs user interpretation
current IRapproaches
emergentsemantics
Construction(Architecture of Emergent Semantics)
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 8
Components of Emergent Semantics
Query EngineQuery Engine
AnnotationFilter
AnnotationFilter
QualityMeasure
RankingFunction
InterpreterInterpreter21
3
4
?
!
corpus/doc repr.
know-legde
tnt1 t2 tn
Basement
Construction
Assessment
Roof and Windows
RetrievalEngine
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 9
Bootstrapping
corpus/doc repr.
know-legde
tnt1 t2 tn
Index the document corpus,e.g., TF/IDF, Latent Semantic Indexing
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 10
InterpreterInterpreter
Receiving a Query
1
?
corpus/doc repr.
know-legde
tnt1 t2 tn
Reformulate the query,e.g., query expansion, replacing terms
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 11
Query EngineQuery EngineInterpreterInterpreter
RetrievalEngine
Query Evaluation
21
?
corpus/doc repr.
know-legde
tnt1 t2 tn
Select documents according to the query,e.g., inverted index of all terms
Rank the list of matching documents,e.g., vector space model
Groundwork
Construction
Assessment
Roof and Windows
RankingFunction
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 12
Query EngineQuery Engine
RankingFunction
InterpreterInterpreter
Query Result
21
3
?
corpus/doc repr.
know-legde
tnt1 t2 tn
The user determines the set of relevant documentsby evaluating the document surrogates.
!
Basement
Construction
Assessment
Roof and Windows
RetrievalEngine
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 13
Query EngineQuery Engine
AnnotationFilter
AnnotationFilter
QualityMeasure
RankingFunction
InterpreterInterpreter
Feedback
21
3
4
?
!
corpus/doc repr.
know-legde
tnt1 t2 tn
The user retrieves the relevant documents.Add the original query to the document representation
Basement
Construction
Assessment
Roof and Windows
RetrievalEngine
Idea:Document is found by query terms and Document is marked as relevant All query terms are related to the document
Idea:Document is found by query terms and Document is marked as relevant All query terms are related to the document
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 14
Query EngineQuery Engine
AnnotationFilter
AnnotationFilter
QualityMeasure
RankingFunction
InterpreterInterpreter
RetrievalEngine
Emergent Semantics Architecture
21
3
4
corpus/doc repr.
know-legde
tnt1 t2 tn
?
!
What do I meanby my query?
How do most usersformulate this query?
How is thecorpus queried?
Pragmatics Semantics Syntax
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 15
Example – Querying the document corpus
TF/IDF matrix of the document corpus
• RDBMS does not occur in the document corpus
QueryQ = {RDBMS, SQL, language}
Ranked resultDQuery = (d1, d5, d2, d10)Drelevant = {d1, d2}
database SQL language relational
d1 2,76 19,83 0 3,22
d2 3,68 0,94 2,76 3,68
d3 … … … …
Basement
Construction
Assessment
Roof and Windows
TF/IDF: weight = (term freq ∙ #doc) / doc freq
doc repr.
?
!
Query EngineQuery Engine
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 16
Recalculation for keyword: language
Example – Adding the query terms
Adding {RDBMS, SQL, language} to document representation Recalculation of the TF/IDF matrix necessary
database SQL language relational RDBMS
d1 2,76 19,99 0,02 3,21 0,29
d2 3,68 1,13 1,32 3,67 0,33
d3 … … … … 0
database SQL language relational
d1 2,76 19,83 0 3,22
d2 3,68 0,94 2,76 3,68
d3 … … … …
Recalculation for keyword: SQLRecalculation for keyword: RDBMS
Basement
Construction
Assessment
Roof and Windows
AnnotationFilter
AnnotationFilter
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 17
Living Document Representation
Document representations change over time (living document representation)
• Many similar queries weights of the query terms increase
• Unrelated query terms document representation changes only slightly
New keywords / semantic concepts in document representation
Basement
Construction
Assessment
Roof and Windows
Documentrepresentations Query
Assessment(Evaluation)
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 19
Experiment I - Setup
CACM corpus
• 3200 documents + 32 queries + gold standard
• Title and abstract tokenized and indexed using Apache Lucene
Retrieval and Ranking
• Vector space model with TF/IDF weights
Feedback
• Attach the tokenized query to all relevant document representations
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 20
Add query terms to relevantdocument representation
Identical to TF/IDF without EmSem
Add query terms again
Measure again (1st EmSem run))
…
Run query set 2Run query set 2
Run query set 1Run query set 1
Run query set 2Run query set 2
Run query set 1Run query set 1
Exploit corpus correlations
Split the set of queries into halves• Run first half and feed back all query terms
• Run second half
Basement
Construction
Assessment
Roof and Windows
Small overlap between queriesSmall overlap between result sets
Small overlap between queriesSmall overlap between result sets
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 21
Feeding back all query terms
Groundwork
Construction
Assessment
Roof and Windows
Run all queries and feed back all query terms
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 22
Experiment II - Setup
First phase
• Presented a wide variety of images to users
• Which keywords would you use to find the image with a search engine?
Second phase
• Rate the adequacy of the annotations
Basement
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 23
Results
% U
sers
terms
% users
Weihnachtsmann 26.5% Brille 7.8% Nikolaus 7.8% Weihnachten 6.5% Santa Claus 6.0%
Weihnachtsmann 100.0% Brille 51.8% Nikolaus 91.6% Weihnachten 61.5% Santa Claus 75.0%
Phase 1
Phase 2
Groundwork
Construction
Assessment
Roof and Windows
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 24
Conclusions from our Experiments
Document representations become more precise over time.
A small number of terms describe an image sufficiently.
A large number of user queries can be satisfied by indexing a small number of terms.
Basement
Construction
Assessment
Roof and Windows
Roof and Windows(Conclusion)
S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 26
Roof and Windows
Architecture for emergent semantics
Users’ individual pragmatics aggregated into representation of documents
Living document representation
Outlook
Applying EmSem to distributed IR
• Reducing the size of document representations
• Less network traffic
Basement
Construction
Assessment
Roof and Windows
Thank you!