multimedia information

7/29/2019 multimedia information

1/33

Dr. Qi Tian

Department of Computer ScienceThe University of Texas at San [email protected]

October 31 , 2007

Multimedia Information Retrieval


2/33

Qi Tian Univ. of Texas at San Antonio

Overview

Problems

Our WorkTrends and Directions

Outline


3/33


Motivation

With the explosive growth of digitalmedia data, there is a huge demand for

new tools and systems that enablesaverage users to more efficiently andmore effectively search, access, process,manage, author and share these digitalmedia contents.




4/33


Overview

Multimedia Information RetrievalMultimedia Information Retrieval

Tex t -based I n fo r m a t i on Re t r i eva l Too many images to annotate High cost of human interpretation

Subjectivity of visual content, e.g., A picture is worth a thousand words

Con t en t -b ased Ret r ieva l automatically retrieves images, video, and

audio based on the visual and audio content

His to ry Conference on Database Applications of Pictorial Applications in

1 9 7 9 NSF workshop in 1 9 9 2 More active field since 1 9 9 7 when Internet and web browsing

became popular


5/33


Overview


A VERY DI VERSI FI ED FI ELD!

Dat a Typ es Text, hypertext, image, audio, graphics, animation,

paintings, video/movie, rich text, spread sheet, slides,

combinations of these and user interaction

Research Pr ob l em s Systems, content, services, user, evaluation,

implementation, social/business, applications

Methodo log ies

Database, information retrieval, signal and imageprocessing, graphics, vision, human-computerinteraction, machine learning, statistical modeling,data mining, pattern analysis, data fusion, socialsciences, and domain knowledge for applications


6/33


Overview


Overview

Content-based Image Retrieval

Content-based Video Retrieval

Content-based Audio Retrieval


7/33


Approaches

Hierarchical LevelsHierarchical Levels

High-level

Bridge the semantic gap, integration of context andcontent, hybrid (text and content) approaches

Mid-level

Active Learning, Boosting, Incremental Learning

Low-level

Feature Extraction and Representation, DimensionReduction and Selection.


8/33


Overview


metadata

User

Interface

Similarity

ranking

memory

FeatureweightingVisual

C++

Feature

Extraction

C/C++

Image DB

Color

Texture

structureOff-line

Automatic

Color

9Color histogram9Color moments9Color correlogram

Texture9Tamura texture

9Co-occurrencematrices9Gabor features

9Wavelet moments

Shape9Fourier descriptor

Structure9Edge-based features

Cont en t -based I m age Re t r ieva l


9/33

QBIC by IBM AlmadenMARS by UIUC

WISE by StanfordWISE by Stanford

WebSEEK by ColumbiaWebSEEK by Columbia


Approaches

Many CBIR systems have been built..Many CBIR systems have been built..

The g row ing l i st : ADL, AltaVista Photofinder, Amore, ASSERT, BDLP, Blobworld, CANDID, C-bird,Chabot, CBVQ, DrawSearch, Excalibur Visual RetrievalWare, FIDS, FIR, FOCUS, ImageFinder,

ImageMiner, ImageRETRO, ImageRover, ImageScape, Jacob, LCPD, MARS, MetaSEEk, MIR, NETRA,Photobook, Picasso, PicHunter, PicToSeek, QBIC, Quicklook2, SIMBA, SQUID, Surfimage, SYNAPSE,

TODAI, VIR Image Engine, VisualSEEk, VP Image Retrieval System, WebSEEk, WebSeer, WISE


10/33


Overview


Con t en t -b ased V ideo Ret r ieva l

Traditional Video Retrieval

Query-by-textual keyword

Automatic Visual Concept Detectione.g., indoor/outdoor, Sky, Car, Building, US-flag

Example concepts:Airplane, Building, Car, Crowd, Desert, Explosion,

Outdoor, People, Vehicle, Violence

Video Retrieval SceneHow to recognize a scene? Context

Use Proto-Concepts to describe context Machine learning to link context to concepts

Water

Sk y Sk ySky

Water


11/33


Overview


Con t en t -b ased Aud io Re t r ieva l

to search sounds by their features in the waveform, statistics,

or transform domains

Speech, Music, Environment Audio, Silence

App l ica t ionsEntertainmentFilm making - searching sound effects

TV/radio studio - editing programsKaraoke, music stores, or online shopping- query by humming the melody

Audio/video archive management

Segmenting and indexing of raw recordingsSearching and browsing audio/video clips

SurveillanceMonitoring criminal or emergent eventsFilm rating

glass breaking,

explosion,

cry, shot,

Alarming!!!

Short time energy

Zero-crossing rate

Pitch period

MFCC

Spectrogram

LPC


12/33

MIRMIR

Xiong, Zhou, Tian, Rui and Huang, Semantic Retrieval of Video, IEEE SP Mag., March 2006

Top 10 Problems in MIRTop 10 Problems in MIR

Bridge the Semantic Gap high level concept (sites, objects, events) and low-level visual/audio

features (color, texture, shape and structure, layout; motion; audio - pitch,

energy, etc.).How to Best Combine Human Intelligence and Machine Intelligence. Keep human in the loop, e.g. Relevance Feedback

New Query Paradigms Query by keywords, similarity, sketching an object, sketching a trajectory,

painting a rough image, etc. Can we think of useful new paradigms?Multimedia Data Mining Searching for interesting/unusual patterns and correlations in multimedia

has many important applications, including Web Search Engines anddealing with intelligence data.

Work to date on Data Mining has been mainly in Text data.

How to Use Unlabeled Data Active learning, e.g., in Relevance Feedback Label propagation, e.g., image/video annotation


13/33

MIRMIR


Top 10 Problems (continued)Top 10 Problems (continued)

Using Virtual Reality Visualization To Help Can we use 3D audio/visual visualization techniques to help a user

to navigate through the data space to browse and to retrieve? e.g., 3D MARS

Incremental Learning Change the parameters of the retrieval algorithms incrementally,

not needing to start from scratch every time we have new data.

Structuring Very Large Databases Researchers in audio/visual scene analysis and those in

Databases and Information Retrieval should really collaborateCLOSELY to find good ways of structuring very large multimedia

databases for efficient retrieval and search.

Performance Evaluation e.g., TRECVID for video retrieval, how about image retrieval?

What Are the Killer Applications of Multimedia Retrieval? e.g., medical multimedia document management


14/33


Approaches

Our Recent WorkOur Recent Work

UserFeatureExtraction

DataModeling

SimilarityEstimation

Dimension reduction

Statistical estimation

Dimension reduction

Statistical estimation

Classification

Ranking/indexing

Classification

Ranking/indexing

Training

Image DatabaseImage Database

Discr im inan t Ana lysi s

Useful in other applicationsUseful in other applications


15/33

ApproachesApproaches


Our Recent Work in CBIROur Recent Work in CBIR

Semantic Subspace Projection9 Bridge the semantic gap

Hybrid Discriminant Analysis9 Learn high dimensional data with small samples

Adaptive Discriminant Projection9 Adaptively learn a projection from data distribution

Distance Measure for Similarity Estimation9 Investigate the relations between probabilisticdistributions, distance metric, and mean estimation


16/33


Our Recent Work in CBIROur Recent Work in CBIR

Related Publications

Journalso IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)

o IEEE Transactions on Circuits and Systems for Video Technology (CSVT)

o IEEE Multimedia (MM)

o International Journal on Computer Vision (IJCV)

o Pattern Recognition (PR)

o ACM Transactions on Knowledge Discovery from Data (TKDD)

o IEEE Transactions on Computational Biology and Bioinformatics

Conferenceso ACM Multimedia

o IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

o International Conference on Pattern Recognition (ICPR)

o Others including ICME, ICIP, ICASSP, MIR, CIVR


17/33


Web Image Search and Mining

Image Annotation

Affective Video Retrieval

Information Fusion in MIR

Integration of Context and Content for MultimediaManagement

Multimodal Emotion Recognition

Current Directions


18/33

Trends and DirectionsTrends and Directions


Web SearchWeb Search 1.0 Traditional Text Retrieval

Web Search 2.0 Page-level Relevance Ranking

Web Search 3.0 Object-level Structured Search

Object Level Vertical Search (MSRA Libra:http://libra.msra.cn/

Live Product Search (http://products.live.com)


19/33



Image AnnotationPhoto sharing through the Internet has become a commonpractice.

flickr.com: 19.5 million photos (30% growth/month), 2005Photo.net and airliners.net: millions of images

Most image search engines relies on textual descriptions ofthe images, e.g., Google, Yahoo, MSN

In general, people do not spend time labeling orannotating their personal photos


20/33


Image Annotation

Image Annotation System

A statistical model that can relate words to image features.

Sketch an image, extract feature vectors.

Descriptive words--top words ranked according to likelihood.

Current work

Li, Wang (alipr.com2006), Blei, Jordan (2003); Vasconcelos (UCSD-SML2007), Zhang et al. (MSRA 2005), Li et al. (MSRA 2007)

Promising Direction:

Web Image Annotation an integration of IR and Content Analysis

Can computer do this? Building

Sky

Lake Landscape

Tree



21/33



Affective:

a feeling or emotion as distinguished from cognition, thought, oraction

Real Multimedia RetrievalSearch for a subset of nicest holiday pictures to show them to friends

Selecting the most appropriate background music for the given situation

Search for the most impressive video clips

Search for the most appealing photographs of one and the same content

Search for all film comedies I like most

Alternative approach

Search for

9 Mood

9 Matches to users profile

Like/dislike, interest/no interest




22/33


Underlying IdeaVideo

Temporal content flow

Continuous transitions from

one affective state to another

Temporal measurement of Arousal

Valence

Combining the two curvesinto the Affect Curve

Arousal

t

Valence

t

Arousal

Valence

Affect curve



23/33


Affective Media Content Characterization

Media Feature extraction Arousal = f1 (feature values)

Valence = f2 (feature values)

Mapping AV values onto

the 2D affect space

Affective content characterizationRomantic

feel-good

Hilarious

fun

Suspense

horror

A bit somber,

medium excitement

Hanjalic & Xu



24/33


Hanjalic & Xu



25/33



Fusion: A merging of diverse, distinct, or separate elements into a

unified whole (Merriam-Webster dictionary).

Feature Extraction Module:

9 Multiple features ->vectors

9 Concatenated vector

9 Feature Fusion: more discriminating hyperspace can be found in the new

vector

Matching Module:9 One type of classifiers for multiple features or

9 Multiple types of classifiers for one feature or

9 Both

9 The output score can be combinedDecision Module:

9 The output decision of each classifier can be combined



26/33



Two Forms

Multi-modality9 e.g. Video clip->visual information, audio information, textural

information

9 Multi-modality fusion occurs at feature extraction module.

9 Single source information may be represented by multiple features,

e.g. Color image->color, texture, shape

Multi-Classifiers (Ensemble of Classifiers)

9 A set of classifiers are trained to solve the same problem

9 Applied on single or multiple source of information

9 A single type of base classifiers or

9 Different types of classifiers (Bayesian, K-NN, SVM)



27/33




Fusion Schemes

The prediction of multiple classifiers need to be integrated into onefused decision by Fusion Scheme.9The output of different classifiers need to be normalized.

Rule-based9Decision is made by a simple operation on the output of all classifiers9 e.g. Max, Min, Sum, Mean (Matching Module)9 e.g. AND, OR (Decision Module)

Learning-based9The output of all classifiers is fed into a learning processto obtain the

final decision9 e.g. Decision Tree, Neutral Network

No one is guaranteed to be the best empirically or theoretically.

Applications9Multimodal Biometrics Fusion (face, fingerprint, iris, palmprint, voice,

hand geometry)9Audio/visual fusion for multimodal emotion recognition


28/33


Integration of Context and Content forMultimedia Management


from Carlson & Hatfield, 1992


29/33


Integration of Context and Content forMultimedia Management


An increasing number of active research in this direction

Crucial to human-human communication and humanunderstanding of multimedia.9 without context it is difficult for a human to recognize various

objects,

Enable that the (semi)automatic content analysis and indexingmethods become more powerful in managing multimedia data

Contextual information9

e.g., cell ID for the mobile phone location, GPS integrated in adigital camera, camera parameters, time information, andidentity of the producer


30/33


Integration of Context and Content forMultimedia Management (T-MM Special Issue)


Topics of interest include:

Contextual metadata extraction

Models for temporal context, spatial context, imaging context(e.g., camerametadata), social context, and so on

Web context for online multimedia annotation, browsing, sharing and reuse

Context tagging systems, e.g., geotagging, voice annotation Context-aware inference algorithms

Context-aware multi-modal fusion systems (text, document, image, video,metadata, etc.)

Models for combining contextual and content information Context-aware interfaces and collaboration Novel methods to support and enhance social interaction, including

innovative ideas like social, affective computing, and experience capture. Applications such as using context and similarity for face and locationidentification

Context-aware mobile media technology and applications Using context to browse and navigate large media collections


31/33

Projects in High ImpactFeatures

Web-based

Large Scale User participated

A combination of internet and multimedia

Examples: Photo Search Home photo management, mobile photo search, face search

Millions Books Project

Tremendous Space for Search and Multimedia Data Mining

Human-centered Multimedia Search Search for UCC data (user preference, profile, and opinion), Social search (public relations, names, personalization)


32/33


Acknowledgement

Current Collaborators

Academia

University of Illinois

University of Amsterdam

Chinese Academy of Science

University of Science and Technology of

China

Industry

Institute of Infocomm Research, Singapore

HP Labs, Palo Alto, CA

Microsoft Research (MSR) Redmond

Microsoft Research Asia (MSRA)NEC Labs America

Kodak Research Lab

IBM T.J. Watson Research Ctr

Funding Agencies

Army Research Office (ARO)

Department of Homeland Security (DHS)

San Antonio Life Science Institute(SALSI)

Center for Infrastructure Assurance andSecurity (CIAS)

Students

Jerry Yu (2007, Kodak Research, NJ)

Yijuan Lu (2008 expected)

Yuning Xu


33/33


Q & AQ & A

Thank y ou !

Qu est ion s ?

multimedia information

Documents

Transcript of multimedia information