Dipl.-Inf. Jörg Waitelonis
Hasso-Plattner-Institut for IT-Systems Engineering
University of Potsdam

University of Potsdam

Seman&sche  Analyse  und  Suche

Semantic Search Engine

Media Analysis‣Structural Video Analysis‣Intelligent Character Recognition‣Face Detection & Clustering ‣Audio Mining‣Visual Concept Detection

Semantic Analysis‣Named Entity Recognition‣Context Analysis‣Semantic Annotation

konzep&oneller  Workflow

Graphical User Interface‣Facetted Search‣Explorative Search‣fine granular User Annotation

Distribution / Production‣Media Asset Management

Digitization | Metadata | Rights

Warum  unbedingt  Seman&k???


Mehrdeutigkeiten durch Kontextbetrachtung auflösen

Die natürliche Sprache ist unfassbar ausdrucksstark UND mehrdeutig.

„Armstrong betrat als erster Mensch den Mond.“

‣Kontext im Text

‣z.B. aus ASR oder OCR

‣Kontext im Bild

‣z.B. aus Visual Concept Detection

Auf  den  Kontext  kommt  es  an.

Named  En&ty  Recogni&on

„Armstrong betrat als erster Mensch den Mond.“

Armstrong Mensch MondGeorge Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe Armstrong

Armstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Mondkrater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

HumanBill Mensch

Bob Mensch

David Mensch

Homer Mensch

Louise Mensch

Halber Mensch

Mensch ärgere Dich nichtMensch Computer

Peter van Mensch

Daniel Mensch

Mensch (album)

Der Mond (Oper)


Mond Nickel CompanyBrunner Mond

Bernard Mond

Peter Mond

Julian Mond

Ludwig Mond

Violet MondMOND Technologies

Robert Mond

Henry Mond

Alfred Mond

Chava Mond

Named  En&ty  Recogni&on

Wikipedia Info-Boxen

Wikipedia Info-Boxen

Die semantische Wikipedia

Named  En&ty  Recogni&on

Named  En&ty  Recogni&on

Web of Data

Web of Data

Neil Armstrong Entities


Science Occupation


is a

is a

is a



is a

has a

Named  En&ty  Recogni&on

Named  En&ty  Recogni&on

„Armstrong betrat als erster Mensch den Mond.“

Named  En&ty  Recogni&on

„Armstrong betrat als erster Mensch den Mond.“

Armstrong Mensch Mond

Zeitabhängige  Seman&sche  Daten


Video Analysis /Metadata Extraction




e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Mapping

Entity Recognition

Kontext  Defini&on

RDF graph to find relations between entities co-occurringin a text maintaining the hypothesis that disambiguationof co-occurring elements in a text can be obtained byfinding connected elements in an RDF graph [7]. In orderto regard the special compilation of non-textual data, staticand user-genrated metadata in audio-visual content our novelapproach combines the use of semantic technologies andLinked Data with linguistic methods.


According to a study about structure and characteristicsof folksonomy tags [8] an average of 83% of user-generatedtags are single terms. Also, an average of 82% of thereviewed tags are nouns. Based on these study results, weignore tag practices, such as camel case (”barackObama”)and treat tags as subjects or categories describing a resource.As a tag could also be part of a group of nouns representingan entity or a name (”flying machine”,”albert einstein”) thetags stored as single words without any given order have tobe combined in term groups of two or more terms to findall appropriate entities. Hence, every tag or group of tagswithin a given context may represent a distinct entity. Theterm combination process and subsequent mapping of termsand term groups to entities are described in sect. III-B.

To disambiguate ambiguous terms we combine two meth-ods: a co-occurences analysis of the terms in the context inWikipedia articles and an analysis of the page link graph ofthe Wikipedia articles of entity candidates. The scores forboth analysis steps are calculated to a total score.

A. Context Definition

Metadata exists in a certain context and has to be inter-preted according to this context. For tags of audio-visualcontent we identified two dimensions:

• temporal dimension• user-centered dimensionIn the temporal dimension a context can be defined as the

entire video, a segment or a single timestamp in the video.The user-centered dimension classifies a context by howmany users created the concerning metadata - only tags by acertain user or all tags regardless of which user. Fig. 1 showsthe combinations of the two dimensions of contexts formetadata in audio-visual content the interpretation regardingthe significance of a context.

Audio-visual content also provides the opportunity tosupply spatial information. Thus, tags in the same regionof a video frame are considered as related to each other.In the current approach we did not consider this contextdimension.

To describe our approach we use a sample context of ourtest set (see sect. IV). This sample context is composed oftags by only one user at a certain timestamp in the video.The video containing this sample context is a presentation

Figure 1. Dimensions of context definition in audio-visual content

by Dr. Garik Israelian at the TED conference3 entitled ”Howspectroscopy could reveal alien life”4. Our sample contextconsists of the tags ”hubble”, ”spitzer”, ”carbon”, ”dioxide”,”methan”, ”co2”, and ”water”.

B. Preprocessing

Term Combination: Our combination algorithm takesall tags of a specified spatio-temporal context (at a certaintimestamp/in a certain segment of a video, of a singleURL/image and generates every possible combination of atmost three terms of the context in every possible order. Inthat way we make sure to rectify groups of single termsthat belong together. We chose to generate combinationsof three words to make sure to also hit named entitiesconsisting of more than two words, such as ”public keycryptography” or ”alberto santos dumont”. About 90% ofthe DBpedia [9] labels consist of at most three words, butless than 5% consist of 4 words. Due to these numbersand performance issues we decided to limit the number ofterms to be combined to three. Subsequently in this paperby terms we will refer to single terms as well as generatedterm groups. The number c of combinations is calcultaed byc =


n!(n−k)! .

For our sample context containing 7 tags and at most3 terms in a combination (j = 3), 259 combinations aregenerated.

Term Mapping: The terms then have to be mapped tosemantic entities. For our approach we use entities of theLinked Open Data Cloud [10], in particular of the DBpedia,version 3.5.1.

DBpedia provides labels for the identification of distinctentities in 92 languages. We use English and German aswell as Finnish labels, as we noticed that neither English northe German labels contain important acronyms as labels, butthe Finnish language version does. As tagging users prefer tokeep it simple and short[2], resources dealing with ”DomainName System” would rather be tagged with ”DNS” than”Domain Name System”.

After simple string matching of the terms of the contextto DBpedia URIs, the URIs are revised for redirects and


User-centered Dimension

Temporal Dimension

Spatial Dimension

‣unterschiedliche Metadatenquellen haben unterschiedliche Zuverlässigkeit

‣autoritative Metadaten (strukturiert / unstrukturiert)

‣analytische Metadaten (zeit- / lagebezogen)

‣nichtautoritative nutzergenerierte Metadaten (global und zeit- bzw. lagebezogen))

En&täten-­‐basierte  Annota&on

‣räumlich und zeitliche Annotation mit semantischen Entitäten

En&täten-­‐basierte  Suche

FaceJerte  Suche

Link  And  Brush


•Ein einfaches Beispiel:

Ich suche das Buch „Wem die Stunde schlägt“ von Ernest Hemingway in der ersten deutschen Ausgabe...

Suchen  ist  nicht  gleich  Suchen

•Ein einfaches Beispiel:

Ich suche das Buch „Wem die Stunde schlägt“ von Ernest Hemingway in der ersten deutschen Ausgabe...

Wem die Stunde schlägt. - Ernest H E M I N G W A Y. (Stockholm usw., Bermann-Fischer Verlag, 1941) 560 S. 8“II 1, 2506, 34548

Suchen  ist  nicht  gleich  Suchen

•...aber was, wenn man nicht genau weiß, was man sucht?

Mir hat das Buch „Wem die Stunde schlägt“ von Ernest Hemingway gefallen und ich weiß nicht genau, was ich als nächstes lesen soll....

Suchen  ist  nicht  gleich  Suchen

• Was, wenn der Benutzer nicht weiß, welchen Suchbegriff er/sie benutzen soll?

• Was, wenn der Benutzer komplexere Antworten sucht?

• Was, wenn er/sie das Wissensgebiet, über das er sich informieren will, nicht (gut) kennt?

• Was, wenn er/sie wissen möchte, welche Dokumente es insgesamt zu einem speziellen Thema in einem Repository gibt?

• ...,Stöbern‘ statt ,Suchen‘• ...etwas ,zufällig‘ finden• ...Serendipity• ...einen Überblick gewinnen

Explora&ve  Suche

Wie soll das semantischeNetzwerk um dbpedia:For_Whom_the_Bell_Tollsherum durchsucht werden?

Explora&ve  Suche

Explora&ve  Suche

Explora&ve  Suche

dbpedia:Jack_Kerouac dbpedia:Raymond_Carverdbpedia:Jerome_D._Salinger

dbpedia-owl:notableWork dbpedia-owl:notableWork dbpedia-owl:notableWork

Explora&ve  Suche

Page 27: Mediaglobe - Semantische Analyse


‣Mediaglobe ermöglicht eine semantische Entitäten-basierte Suche

‣Mediaglobe schlägt damit traditionelle schlüsselwortbasierte Suchmaschinen in Genauigkeit und Trefferquote

‣Mediaglobes semantische Annotationen ermöglichen:

‣neuartige Empfehlungssysteme

‣z.B. als Erweiterung der Suchmöglichkeiten

‣oder als Grundlage für andere Content-sensitive Services

‣Interoperabilität zu anderen Systemen durch Standards

‣neue Gestaltungsmöglichkeiten innovativer User-Interfaces

Dipl.-Inf. Jörg Waitelonis
Hasso-Plattner-Institut for IT-Systems Engineering
University of Potsdam

Vielen Dank !

University of Potsdam

Vielen  Dank  !