An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder...

27
An Architecture for Emergent Semantics Sven Herschel, Ralf Heese , and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut

Transcript of An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder...

Page 1: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

An Architecture for Emergent Semantics

Sven Herschel, Ralf Heese, and Jens Bleiholder

Humboldt-Universität zu Berlin/Hasso-Plattner-Institut

Page 2: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 2

Ideas of Emergent Semantics

Improve document representation

• by aggregating many users’ opinions

• Adding keywords implicitly whilequerying the corpus

Living document representationinstead of query reformulation

• Entirely new keywords

• Immediate change of thedocument representation andof the corpus index

corpus/doc repr.

User query

IR Query EngineIR Query Engine

Information Retrievaltoday

Page 3: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 3

Outline

Basement

(Background)

Construction

(Architecture of Emergent Semantics)

Assessment

(Evaluation)

Roof and Windows

(Conclusion and Future Work)

Page 4: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Basement (Background)

Page 5: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 5

Information Retrieval

Information Retrieval

• Content-oriented search on a set of documents

• Find an document representation to retrieve documents effectively and efficiently according to the user’s query

Today's approaches

• Capture the semantics of a document by analyzing syntactic information

• No new words in document representation

Synonyms cannot be added

• Query refinement

Basement

Construction

Assessment

Roof and Windows

Page 6: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 6

Semiotic

SyntaxSyntax

PragmaticsPragmatics

A tall perennial woody plant …

A figure that branchesfrom a single root …

http://www.wordreference.com/definition/tree

t r e e

Basement

Construction

Assessment

Roof and Windows

SemanticsSemantics

signs signssigns represented object signs user interpretation

current IRapproaches

emergentsemantics

Page 7: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Construction(Architecture of Emergent Semantics)

Page 8: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 8

Components of Emergent Semantics

Query EngineQuery Engine

AnnotationFilter

AnnotationFilter

QualityMeasure

RankingFunction

InterpreterInterpreter21

3

4

?

!

corpus/doc repr.

know-legde

tnt1 t2 tn

Basement

Construction

Assessment

Roof and Windows

RetrievalEngine

Page 9: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 9

Bootstrapping

corpus/doc repr.

know-legde

tnt1 t2 tn

Index the document corpus,e.g., TF/IDF, Latent Semantic Indexing

Basement

Construction

Assessment

Roof and Windows

Page 10: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 10

InterpreterInterpreter

Receiving a Query

1

?

corpus/doc repr.

know-legde

tnt1 t2 tn

Reformulate the query,e.g., query expansion, replacing terms

Basement

Construction

Assessment

Roof and Windows

Page 11: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 11

Query EngineQuery EngineInterpreterInterpreter

RetrievalEngine

Query Evaluation

21

?

corpus/doc repr.

know-legde

tnt1 t2 tn

Select documents according to the query,e.g., inverted index of all terms

Rank the list of matching documents,e.g., vector space model

Groundwork

Construction

Assessment

Roof and Windows

RankingFunction

Page 12: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 12

Query EngineQuery Engine

RankingFunction

InterpreterInterpreter

Query Result

21

3

?

corpus/doc repr.

know-legde

tnt1 t2 tn

The user determines the set of relevant documentsby evaluating the document surrogates.

!

Basement

Construction

Assessment

Roof and Windows

RetrievalEngine

Page 13: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 13

Query EngineQuery Engine

AnnotationFilter

AnnotationFilter

QualityMeasure

RankingFunction

InterpreterInterpreter

Feedback

21

3

4

?

!

corpus/doc repr.

know-legde

tnt1 t2 tn

The user retrieves the relevant documents.Add the original query to the document representation

Basement

Construction

Assessment

Roof and Windows

RetrievalEngine

Idea:Document is found by query terms and Document is marked as relevant All query terms are related to the document

Idea:Document is found by query terms and Document is marked as relevant All query terms are related to the document

Page 14: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 14

Query EngineQuery Engine

AnnotationFilter

AnnotationFilter

QualityMeasure

RankingFunction

InterpreterInterpreter

RetrievalEngine

Emergent Semantics Architecture

21

3

4

corpus/doc repr.

know-legde

tnt1 t2 tn

?

!

What do I meanby my query?

How do most usersformulate this query?

How is thecorpus queried?

Pragmatics Semantics Syntax

Basement

Construction

Assessment

Roof and Windows

Page 15: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 15

Example – Querying the document corpus

TF/IDF matrix of the document corpus

• RDBMS does not occur in the document corpus

QueryQ = {RDBMS, SQL, language}

Ranked resultDQuery = (d1, d5, d2, d10)Drelevant = {d1, d2}

database SQL language relational

d1 2,76 19,83 0 3,22

d2 3,68 0,94 2,76 3,68

d3 … … … …

Basement

Construction

Assessment

Roof and Windows

TF/IDF: weight = (term freq ∙ #doc) / doc freq

doc repr.

?

!

Query EngineQuery Engine

Page 16: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 16

Recalculation for keyword: language

Example – Adding the query terms

Adding {RDBMS, SQL, language} to document representation Recalculation of the TF/IDF matrix necessary

database SQL language relational RDBMS

d1 2,76 19,99 0,02 3,21 0,29

d2 3,68 1,13 1,32 3,67 0,33

d3 … … … … 0

database SQL language relational

d1 2,76 19,83 0 3,22

d2 3,68 0,94 2,76 3,68

d3 … … … …

Recalculation for keyword: SQLRecalculation for keyword: RDBMS

Basement

Construction

Assessment

Roof and Windows

AnnotationFilter

AnnotationFilter

Page 17: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 17

Living Document Representation

Document representations change over time (living document representation)

• Many similar queries weights of the query terms increase

• Unrelated query terms document representation changes only slightly

New keywords / semantic concepts in document representation

Basement

Construction

Assessment

Roof and Windows

Documentrepresentations Query

Page 18: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Assessment(Evaluation)

Page 19: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 19

Experiment I - Setup

CACM corpus

• 3200 documents + 32 queries + gold standard

• Title and abstract tokenized and indexed using Apache Lucene

Retrieval and Ranking

• Vector space model with TF/IDF weights

Feedback

• Attach the tokenized query to all relevant document representations

Basement

Construction

Assessment

Roof and Windows

Page 20: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 20

Add query terms to relevantdocument representation

Identical to TF/IDF without EmSem

Add query terms again

Measure again (1st EmSem run))

Run query set 2Run query set 2

Run query set 1Run query set 1

Run query set 2Run query set 2

Run query set 1Run query set 1

Exploit corpus correlations

Split the set of queries into halves• Run first half and feed back all query terms

• Run second half

Basement

Construction

Assessment

Roof and Windows

Small overlap between queriesSmall overlap between result sets

Small overlap between queriesSmall overlap between result sets

Page 21: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 21

Feeding back all query terms

Groundwork

Construction

Assessment

Roof and Windows

Run all queries and feed back all query terms

Page 22: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 22

Experiment II - Setup

First phase

• Presented a wide variety of images to users

• Which keywords would you use to find the image with a search engine?

Second phase

• Rate the adequacy of the annotations

Basement

Construction

Assessment

Roof and Windows

Page 23: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 23

Results

% U

sers

terms

% users

Weihnachtsmann 26.5% Brille 7.8% Nikolaus 7.8% Weihnachten 6.5% Santa Claus 6.0%

Weihnachtsmann 100.0% Brille 51.8% Nikolaus 91.6% Weihnachten 61.5% Santa Claus 75.0%

Phase 1

Phase 2

Groundwork

Construction

Assessment

Roof and Windows

Page 24: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 24

Conclusions from our Experiments

Document representations become more precise over time.

A small number of terms describe an image sufficiently.

A large number of user queries can be satisfied by indexing a small number of terms.

Basement

Construction

Assessment

Roof and Windows

Page 25: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Roof and Windows(Conclusion)

Page 26: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

S. Herschel, R. Heese, and J. Bleiholder: Emergent Semantics 26

Roof and Windows

Architecture for emergent semantics

Users’ individual pragmatics aggregated into representation of documents

Living document representation

Outlook

Applying EmSem to distributed IR

• Reducing the size of document representations

• Less network traffic

Basement

Construction

Assessment

Roof and Windows

Page 27: An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Thank you!