multimedia information

download multimedia information

of 33

Transcript of multimedia information

  • 7/29/2019 multimedia information

    1/33

    Dr. Qi Tian

    Department of Computer ScienceThe University of Texas at San [email protected]

    October 31 , 2007

    Multimedia Information Retrieval

  • 7/29/2019 multimedia information

    2/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Problems

    Our WorkTrends and Directions

    Outline

  • 7/29/2019 multimedia information

    3/33

    Qi Tian Univ. of Texas at San Antonio

    Motivation

    With the explosive growth of digitalmedia data, there is a huge demand for

    new tools and systems that enablesaverage users to more efficiently andmore effectively search, access, process,manage, author and share these digitalmedia contents.

    Multimedia Information Retrieval

    Multimedia Information Retrieval

  • 7/29/2019 multimedia information

    4/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    Tex t -based I n fo r m a t i on Re t r i eva l Too many images to annotate High cost of human interpretation

    Subjectivity of visual content, e.g., A picture is worth a thousand words

    Con t en t -b ased Ret r ieva l automatically retrieves images, video, and

    audio based on the visual and audio content

    His to ry Conference on Database Applications of Pictorial Applications in

    1 9 7 9 NSF workshop in 1 9 9 2 More active field since 1 9 9 7 when Internet and web browsing

    became popular

  • 7/29/2019 multimedia information

    5/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    A VERY DI VERSI FI ED FI ELD!

    Dat a Typ es Text, hypertext, image, audio, graphics, animation,

    paintings, video/movie, rich text, spread sheet, slides,

    combinations of these and user interaction

    Research Pr ob l em s Systems, content, services, user, evaluation,

    implementation, social/business, applications

    Methodo log ies

    Database, information retrieval, signal and imageprocessing, graphics, vision, human-computerinteraction, machine learning, statistical modeling,data mining, pattern analysis, data fusion, socialsciences, and domain knowledge for applications

  • 7/29/2019 multimedia information

    6/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    Overview

    Content-based Image Retrieval

    Content-based Video Retrieval

    Content-based Audio Retrieval

  • 7/29/2019 multimedia information

    7/33

    Qi Tian Univ. of Texas at San Antonio

    Approaches

    Hierarchical LevelsHierarchical Levels

    High-level

    Bridge the semantic gap, integration of context andcontent, hybrid (text and content) approaches

    Mid-level

    Active Learning, Boosting, Incremental Learning

    Low-level

    Feature Extraction and Representation, DimensionReduction and Selection.

  • 7/29/2019 multimedia information

    8/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    metadata

    User

    Interface

    Similarity

    ranking

    memory

    FeatureweightingVisual

    C++

    Feature

    Extraction

    C/C++

    Image DB

    Color

    Texture

    structureOff-line

    Automatic

    Color

    9Color histogram9Color moments9Color correlogram

    Texture9Tamura texture

    9Co-occurrencematrices9Gabor features

    9Wavelet moments

    Shape9Fourier descriptor

    Structure9Edge-based features

    Cont en t -based I m age Re t r ieva l

  • 7/29/2019 multimedia information

    9/33

    QBIC by IBM AlmadenMARS by UIUC

    WISE by StanfordWISE by Stanford

    WebSEEK by ColumbiaWebSEEK by Columbia

    Qi Tian Univ. of Texas at San Antonio

    Approaches

    Many CBIR systems have been built..Many CBIR systems have been built..

    The g row ing l i st : ADL, AltaVista Photofinder, Amore, ASSERT, BDLP, Blobworld, CANDID, C-bird,Chabot, CBVQ, DrawSearch, Excalibur Visual RetrievalWare, FIDS, FIR, FOCUS, ImageFinder,

    ImageMiner, ImageRETRO, ImageRover, ImageScape, Jacob, LCPD, MARS, MetaSEEk, MIR, NETRA,Photobook, Picasso, PicHunter, PicToSeek, QBIC, Quicklook2, SIMBA, SQUID, Surfimage, SYNAPSE,

    TODAI, VIR Image Engine, VisualSEEk, VP Image Retrieval System, WebSEEk, WebSeer, WISE

  • 7/29/2019 multimedia information

    10/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    Con t en t -b ased V ideo Ret r ieva l

    Traditional Video Retrieval

    Query-by-textual keyword

    Automatic Visual Concept Detectione.g., indoor/outdoor, Sky, Car, Building, US-flag

    Example concepts:Airplane, Building, Car, Crowd, Desert, Explosion,

    Outdoor, People, Vehicle, Violence

    Video Retrieval SceneHow to recognize a scene? Context

    Use Proto-Concepts to describe context Machine learning to link context to concepts

    Water

    Sk y Sk ySky

    Water

  • 7/29/2019 multimedia information

    11/33

    Qi Tian Univ. of Texas at San Antonio

    Overview

    Multimedia Information RetrievalMultimedia Information Retrieval

    Con t en t -b ased Aud io Re t r ieva l

    to search sounds by their features in the waveform, statistics,

    or transform domains

    Speech, Music, Environment Audio, Silence

    App l ica t ionsEntertainmentFilm making - searching sound effects

    TV/radio studio - editing programsKaraoke, music stores, or online shopping- query by humming the melody

    Audio/video archive management

    Segmenting and indexing of raw recordingsSearching and browsing audio/video clips

    SurveillanceMonitoring criminal or emergent eventsFilm rating

    glass breaking,

    explosion,

    cry, shot,

    Alarming!!!

    Short time energy

    Zero-crossing rate

    Pitch period

    MFCC

    Spectrogram

    LPC

  • 7/29/2019 multimedia information

    12/33

    MIRMIR

    Xiong, Zhou, Tian, Rui and Huang, Semantic Retrieval of Video, IEEE SP Mag., March 2006

    Top 10 Problems in MIRTop 10 Problems in MIR

    Bridge the Semantic Gap high level concept (sites, objects, events) and low-level visual/audio

    features (color, texture, shape and structure, layout; motion; audio - pitch,

    energy, etc.).How to Best Combine Human Intelligence and Machine Intelligence. Keep human in the loop, e.g. Relevance Feedback

    New Query Paradigms Query by keywords, similarity, sketching an object, sketching a trajectory,

    painting a rough image, etc. Can we think of useful new paradigms?Multimedia Data Mining Searching for interesting/unusual patterns and correlations in multimedia

    has many important applications, including Web Search Engines anddealing with intelligence data.

    Work to date on Data Mining has been mainly in Text data.

    How to Use Unlabeled Data Active learning, e.g., in Relevance Feedback Label propagation, e.g., image/video annotation

  • 7/29/2019 multimedia information

    13/33

    MIRMIR

    Qi Tian Univ. of Texas at San Antonio

    Top 10 Problems (continued)Top 10 Problems (continued)

    Using Virtual Reality Visualization To Help Can we use 3D audio/visual visualization techniques to help a user

    to navigate through the data space to browse and to retrieve? e.g., 3D MARS

    Incremental Learning Change the parameters of the retrieval algorithms incrementally,

    not needing to start from scratch every time we have new data.

    Structuring Very Large Databases Researchers in audio/visual scene analysis and those in

    Databases and Information Retrieval should really collaborateCLOSELY to find good ways of structuring very large multimedia

    databases for efficient retrieval and search.

    Performance Evaluation e.g., TRECVID for video retrieval, how about image retrieval?

    What Are the Killer Applications of Multimedia Retrieval? e.g., medical multimedia document management

  • 7/29/2019 multimedia information

    14/33

    Qi Tian Univ. of Texas at San Antonio

    Approaches

    Our Recent WorkOur Recent Work

    UserFeatureExtraction

    DataModeling

    SimilarityEstimation

    Dimension reduction

    Statistical estimation

    Dimension reduction

    Statistical estimation

    Classification

    Ranking/indexing

    Classification

    Ranking/indexing

    Training

    Image DatabaseImage Database

    Discr im inan t Ana lysi s

    Useful in other applicationsUseful in other applications

  • 7/29/2019 multimedia information

    15/33

    ApproachesApproaches

    Qi Tian Univ. of Texas at San Antonio

    Our Recent Work in CBIROur Recent Work in CBIR

    Semantic Subspace Projection9 Bridge the semantic gap

    Hybrid Discriminant Analysis9 Learn high dimensional data with small samples

    Adaptive Discriminant Projection9 Adaptively learn a projection from data distribution

    Distance Measure for Similarity Estimation9 Investigate the relations between probabilisticdistributions, distance metric, and mean estimation

  • 7/29/2019 multimedia information

    16/33

    Qi Tian Univ. of Texas at San Antonio

    Our Recent Work in CBIROur Recent Work in CBIR

    Related Publications

    Journalso IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)

    o IEEE Transactions on Circuits and Systems for Video Technology (CSVT)

    o IEEE Multimedia (MM)

    o International Journal on Computer Vision (IJCV)

    o Pattern Recognition (PR)

    o ACM Transactions on Knowledge Discovery from Data (TKDD)

    o IEEE Transactions on Computational Biology and Bioinformatics

    Conferenceso ACM Multimedia

    o IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)

    o International Conference on Pattern Recognition (ICPR)

    o Others including ICME, ICIP, ICASSP, MIR, CIVR

  • 7/29/2019 multimedia information

    17/33

    Qi Tian Univ. of Texas at San Antonio

    Web Image Search and Mining

    Image Annotation

    Affective Video Retrieval

    Information Fusion in MIR

    Integration of Context and Content for MultimediaManagement

    Multimodal Emotion Recognition

    Current Directions

  • 7/29/2019 multimedia information

    18/33

    Trends and DirectionsTrends and Directions

    Qi Tian Univ. of Texas at San Antonio

    Web SearchWeb Search 1.0 Traditional Text Retrieval

    Web Search 2.0 Page-level Relevance Ranking

    Web Search 3.0 Object-level Structured Search

    Object Level Vertical Search (MSRA Libra:http://libra.msra.cn/

    Live Product Search (http://products.live.com)

  • 7/29/2019 multimedia information

    19/33

    Trends and DirectionsTrends and Directions

    Qi Tian Univ. of Texas at San Antonio

    Image AnnotationPhoto sharing through the Internet has become a commonpractice.

    flickr.com: 19.5 million photos (30% growth/month), 2005Photo.net and airliners.net: millions of images

    Most image search engines relies on textual descriptions ofthe images, e.g., Google, Yahoo, MSN

    In general, people do not spend time labeling orannotating their personal photos

  • 7/29/2019 multimedia information

    20/33

    Qi Tian Univ. of Texas at San Antonio

    Image Annotation

    Image Annotation System

    A statistical model that can relate words to image features.

    Sketch an image, extract feature vectors.

    Descriptive words--top words ranked according to likelihood.

    Current work

    Li, Wang (alipr.com2006), Blei, Jordan (2003); Vasconcelos (UCSD-SML2007), Zhang et al. (MSRA 2005), Li et al. (MSRA 2007)

    Promising Direction:

    Web Image Annotation an integration of IR and Content Analysis

    Can computer do this? Building

    Sky

    Lake Landscape

    Tree

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    21/33

    Qi Tian Univ. of Texas at San Antonio

    Affective Video Retrieval

    Affective:

    a feeling or emotion as distinguished from cognition, thought, oraction

    Real Multimedia RetrievalSearch for a subset of nicest holiday pictures to show them to friends

    Selecting the most appropriate background music for the given situation

    Search for the most impressive video clips

    Search for the most appealing photographs of one and the same content

    Search for all film comedies I like most

    Alternative approach

    Search for

    9 Mood

    9 Matches to users profile

    Like/dislike, interest/no interest

    Affective Video Retrieval

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    22/33

    Qi Tian Univ. of Texas at San Antonio

    Underlying IdeaVideo

    Temporal content flow

    Continuous transitions from

    one affective state to another

    Temporal measurement of Arousal

    Valence

    Combining the two curvesinto the Affect Curve

    Arousal

    t

    Valence

    t

    Arousal

    Valence

    Affect curve

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    23/33

    Qi Tian Univ. of Texas at San Antonio

    Affective Media Content Characterization

    Media Feature extraction Arousal = f1 (feature values)

    Valence = f2 (feature values)

    Mapping AV values onto

    the 2D affect space

    Affective content characterizationRomantic

    feel-good

    Hilarious

    fun

    Suspense

    horror

    A bit somber,

    medium excitement

    Hanjalic & Xu

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    24/33

    Qi Tian Univ. of Texas at San Antonio

    Hanjalic & Xu

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    25/33

    Qi Tian Univ. of Texas at San Antonio

    Information Fusion in MIR

    Fusion: A merging of diverse, distinct, or separate elements into a

    unified whole (Merriam-Webster dictionary).

    Feature Extraction Module:

    9 Multiple features ->vectors

    9 Concatenated vector

    9 Feature Fusion: more discriminating hyperspace can be found in the new

    vector

    Matching Module:9 One type of classifiers for multiple features or

    9 Multiple types of classifiers for one feature or

    9 Both

    9 The output score can be combinedDecision Module:

    9 The output decision of each classifier can be combined

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    26/33

    Qi Tian Univ. of Texas at San Antonio

    Information Fusion in MIR

    Two Forms

    Multi-modality9 e.g. Video clip->visual information, audio information, textural

    information

    9 Multi-modality fusion occurs at feature extraction module.

    9 Single source information may be represented by multiple features,

    e.g. Color image->color, texture, shape

    Multi-Classifiers (Ensemble of Classifiers)

    9 A set of classifiers are trained to solve the same problem

    9 Applied on single or multiple source of information

    9 A single type of base classifiers or

    9 Different types of classifiers (Bayesian, K-NN, SVM)

    Trends and DirectionsTrends and Directions

  • 7/29/2019 multimedia information

    27/33

    Qi Tian Univ. of Texas at San Antonio

    Information Fusion in MIR

    Trends and DirectionsTrends and Directions

    Fusion Schemes

    The prediction of multiple classifiers need to be integrated into onefused decision by Fusion Scheme.9The output of different classifiers need to be normalized.

    Rule-based9Decision is made by a simple operation on the output of all classifiers9 e.g. Max, Min, Sum, Mean (Matching Module)9 e.g. AND, OR (Decision Module)

    Learning-based9The output of all classifiers is fed into a learning processto obtain the

    final decision9 e.g. Decision Tree, Neutral Network

    No one is guaranteed to be the best empirically or theoretically.

    Applications9Multimodal Biometrics Fusion (face, fingerprint, iris, palmprint, voice,

    hand geometry)9Audio/visual fusion for multimodal emotion recognition

  • 7/29/2019 multimedia information

    28/33

    Qi Tian Univ. of Texas at San Antonio

    Integration of Context and Content forMultimedia Management

    Trends and DirectionsTrends and Directions

    from Carlson & Hatfield, 1992

  • 7/29/2019 multimedia information

    29/33

    Qi Tian Univ. of Texas at San Antonio

    Integration of Context and Content forMultimedia Management

    Trends and DirectionsTrends and Directions

    An increasing number of active research in this direction

    Crucial to human-human communication and humanunderstanding of multimedia.9 without context it is difficult for a human to recognize various

    objects,

    Enable that the (semi)automatic content analysis and indexingmethods become more powerful in managing multimedia data

    Contextual information9

    e.g., cell ID for the mobile phone location, GPS integrated in adigital camera, camera parameters, time information, andidentity of the producer

  • 7/29/2019 multimedia information

    30/33

    Qi Tian Univ. of Texas at San Antonio

    Integration of Context and Content forMultimedia Management (T-MM Special Issue)

    Trends and DirectionsTrends and Directions

    Topics of interest include:

    Contextual metadata extraction

    Models for temporal context, spatial context, imaging context(e.g., camerametadata), social context, and so on

    Web context for online multimedia annotation, browsing, sharing and reuse

    Context tagging systems, e.g., geotagging, voice annotation Context-aware inference algorithms

    Context-aware multi-modal fusion systems (text, document, image, video,metadata, etc.)

    Models for combining contextual and content information Context-aware interfaces and collaboration Novel methods to support and enhance social interaction, including

    innovative ideas like social, affective computing, and experience capture. Applications such as using context and similarity for face and locationidentification

    Context-aware mobile media technology and applications Using context to browse and navigate large media collections

  • 7/29/2019 multimedia information

    31/33

    Projects in High ImpactFeatures

    Web-based

    Large Scale User participated

    A combination of internet and multimedia

    Examples: Photo Search Home photo management, mobile photo search, face search

    Millions Books Project

    Tremendous Space for Search and Multimedia Data Mining

    Human-centered Multimedia Search Search for UCC data (user preference, profile, and opinion), Social search (public relations, names, personalization)

  • 7/29/2019 multimedia information

    32/33

    Qi Tian Univ. of Texas at San Antonio

    Acknowledgement

    Current Collaborators

    Academia

    University of Illinois

    University of Amsterdam

    Chinese Academy of Science

    University of Science and Technology of

    China

    Industry

    Institute of Infocomm Research, Singapore

    HP Labs, Palo Alto, CA

    Microsoft Research (MSR) Redmond

    Microsoft Research Asia (MSRA)NEC Labs America

    Kodak Research Lab

    IBM T.J. Watson Research Ctr

    Funding Agencies

    Army Research Office (ARO)

    Department of Homeland Security (DHS)

    San Antonio Life Science Institute(SALSI)

    Center for Infrastructure Assurance andSecurity (CIAS)

    Students

    Jerry Yu (2007, Kodak Research, NJ)

    Yijuan Lu (2008 expected)

    Yuning Xu

  • 7/29/2019 multimedia information

    33/33

    Qi Tian Univ. of Texas at San Antonio

    Q & AQ & A

    Thank y ou !

    Qu est ion s ?