Download - IRP of Special Interest Group 2 - Leader ...€¦ · IRP of Special Interest Group 2 - Leader: TU Berlin Tools for Tag Generation Introduction Theaimofthisintegrativeresearchproject(IRP)

www.petamedia.eu

IRP of Special Interest Group 2 - Leader: TU BerlinTools for Tag Generation

Introduction

The aim of this integrative research project (IRP)is the generation of tags and metadata using sig-nal processing and/or users’ annotation.This IRP deals with algorithms for key frame ex-traction and video shot clustering to enable usersto tag more easily videos.

TUB

Database

TUB

TUD

Annotation

TUB

Shot/subshot

boundary

detection

EPFL

Visual quality

TUB

Key framing

TUB

Text detection /

feature

extractionTUB

EPFL

TUD

QMUL

Video clustering

Integration acitivies

The IRP’s main topic is the integration of differentexpertise in the areas of image search engines, au-dio/video signal processing, machine learning andtext detection/recognition.

Preparations

The first activity was to set up a database ofvideos from the unstructured channel “Travel” onYouTube.com using “NUE YouTube Downloader”.This tool is also useful for other IRPs, i. e. “SocialMedia Acquisition”.

Tool used for setting up the database

This common database, consisting of 100 videosand affiliated metadata (keywords, comments,user information etc.), was then annotated forshot boundaries.

Key Frame Extraction

Temporal video segmentation divides the videostream into a set of segments from each of whichone representative frame is extracted based on at-tention features.Key frame extraction methods are simple yet ef-fective form of summarizing a long video sequenceand can be used for applications that only work onimages, like search engines (CBIR) or image clus-ter algorithms. These key frames can also be usedfor automatic or manual tagging, because they fa-cilitate users’ annotation.

Extracted key frames of a video sequence

Tag Generation

A topic of this IRP is the generation of tags. Fol-lowing aspects have been considered:

• Quality Tags: Key frames are used to pro-duce quality tags by no-reference video qual-ity assessment.

good quality

Tags derived by no-reference video quality

assessment

• Tags derived from Text:

Recognizing text within video sequences isalso a possibility for generating tags.

Another possibility to produce tags is to findout persons and locations by analyzing thesentence structure of affiliated descriptionsand user comments.

• Tags generated by concept detectors (likeindoor/outdoor, face, etc.)

Clustering

A fundamental step in this video summarizationis to create a similarity matrix and organize keyframes into the tree-structure using ant-tree clus-tering method.

Tree structuring of video frames by ant-tree clustering

Low-level features and tags of key frames are clus-tered to find related video content, and visualizedusing the FastMap algorithm applied on distancematrices. Propagation of widely common tagswithin compact clusters is to be studied.

Clustered key frames to perform similarity search

Future Work

• Automatic ROI Image Tagging

A solution for automatic image tagging canbe archived by object duplicate detection instatic images or key frames. The goal ofdetection is to propagate tags of objects re-garding from a training set.

What is this object?

Taj Mahal

OK

Taj Mahal

Taj Mahal

Taj

Mah

al

User Annotated Object Automatic Image Tagging

Tag propagation by object duplicate detection

• Subject Classification

Automatic subject tagging of video involvesthe assignment of a subject label to a videoobject or to a time point within a video ob-ject. The subject label reflects the semantictheme treated by the video; it reflects whatthe video is about rather than what is de-picted in the visual channel.

• Semantic Key Frame Extraction

Semantic Key Frame extraction is the taskof selecting one or more keyframes to rep-resent the intellectual content of a video ora given segment of video stream.

• Visual Reranking to improve Video Retrieval

Low-level visual features will be exploited forimproving semantic-theme based retrievalof videos indexed using speech recognitiontranscripts of their spoken content.

Contact

Coordination: Pascal KelmWeb: www.petamedia.euEmail: {[email protected]}