multimedia information
-
Upload
muhamad-jafar-sidik -
Category
Documents
-
view
222 -
download
0
Transcript of multimedia information
-
7/29/2019 multimedia information
1/33
Dr. Qi Tian
Department of Computer ScienceThe University of Texas at San [email protected]
October 31 , 2007
Multimedia Information Retrieval
-
7/29/2019 multimedia information
2/33
Qi Tian Univ. of Texas at San Antonio
Overview
Problems
Our WorkTrends and Directions
Outline
-
7/29/2019 multimedia information
3/33
Qi Tian Univ. of Texas at San Antonio
Motivation
With the explosive growth of digitalmedia data, there is a huge demand for
new tools and systems that enablesaverage users to more efficiently andmore effectively search, access, process,manage, author and share these digitalmedia contents.
Multimedia Information Retrieval
Multimedia Information Retrieval
-
7/29/2019 multimedia information
4/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Tex t -based I n fo r m a t i on Re t r i eva l Too many images to annotate High cost of human interpretation
Subjectivity of visual content, e.g., A picture is worth a thousand words
Con t en t -b ased Ret r ieva l automatically retrieves images, video, and
audio based on the visual and audio content
His to ry Conference on Database Applications of Pictorial Applications in
1 9 7 9 NSF workshop in 1 9 9 2 More active field since 1 9 9 7 when Internet and web browsing
became popular
-
7/29/2019 multimedia information
5/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
A VERY DI VERSI FI ED FI ELD!
Dat a Typ es Text, hypertext, image, audio, graphics, animation,
paintings, video/movie, rich text, spread sheet, slides,
combinations of these and user interaction
Research Pr ob l em s Systems, content, services, user, evaluation,
implementation, social/business, applications
Methodo log ies
Database, information retrieval, signal and imageprocessing, graphics, vision, human-computerinteraction, machine learning, statistical modeling,data mining, pattern analysis, data fusion, socialsciences, and domain knowledge for applications
-
7/29/2019 multimedia information
6/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Overview
Content-based Image Retrieval
Content-based Video Retrieval
Content-based Audio Retrieval
-
7/29/2019 multimedia information
7/33
Qi Tian Univ. of Texas at San Antonio
Approaches
Hierarchical LevelsHierarchical Levels
High-level
Bridge the semantic gap, integration of context andcontent, hybrid (text and content) approaches
Mid-level
Active Learning, Boosting, Incremental Learning
Low-level
Feature Extraction and Representation, DimensionReduction and Selection.
-
7/29/2019 multimedia information
8/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
metadata
User
Interface
Similarity
ranking
memory
FeatureweightingVisual
C++
Feature
Extraction
C/C++
Image DB
Color
Texture
structureOff-line
Automatic
Color
9Color histogram9Color moments9Color correlogram
Texture9Tamura texture
9Co-occurrencematrices9Gabor features
9Wavelet moments
Shape9Fourier descriptor
Structure9Edge-based features
Cont en t -based I m age Re t r ieva l
-
7/29/2019 multimedia information
9/33
QBIC by IBM AlmadenMARS by UIUC
WISE by StanfordWISE by Stanford
WebSEEK by ColumbiaWebSEEK by Columbia
Qi Tian Univ. of Texas at San Antonio
Approaches
Many CBIR systems have been built..Many CBIR systems have been built..
The g row ing l i st : ADL, AltaVista Photofinder, Amore, ASSERT, BDLP, Blobworld, CANDID, C-bird,Chabot, CBVQ, DrawSearch, Excalibur Visual RetrievalWare, FIDS, FIR, FOCUS, ImageFinder,
ImageMiner, ImageRETRO, ImageRover, ImageScape, Jacob, LCPD, MARS, MetaSEEk, MIR, NETRA,Photobook, Picasso, PicHunter, PicToSeek, QBIC, Quicklook2, SIMBA, SQUID, Surfimage, SYNAPSE,
TODAI, VIR Image Engine, VisualSEEk, VP Image Retrieval System, WebSEEk, WebSeer, WISE
-
7/29/2019 multimedia information
10/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Con t en t -b ased V ideo Ret r ieva l
Traditional Video Retrieval
Query-by-textual keyword
Automatic Visual Concept Detectione.g., indoor/outdoor, Sky, Car, Building, US-flag
Example concepts:Airplane, Building, Car, Crowd, Desert, Explosion,
Outdoor, People, Vehicle, Violence
Video Retrieval SceneHow to recognize a scene? Context
Use Proto-Concepts to describe context Machine learning to link context to concepts
Water
Sk y Sk ySky
Water
-
7/29/2019 multimedia information
11/33
Qi Tian Univ. of Texas at San Antonio
Overview
Multimedia Information RetrievalMultimedia Information Retrieval
Con t en t -b ased Aud io Re t r ieva l
to search sounds by their features in the waveform, statistics,
or transform domains
Speech, Music, Environment Audio, Silence
App l ica t ionsEntertainmentFilm making - searching sound effects
TV/radio studio - editing programsKaraoke, music stores, or online shopping- query by humming the melody
Audio/video archive management
Segmenting and indexing of raw recordingsSearching and browsing audio/video clips
SurveillanceMonitoring criminal or emergent eventsFilm rating
glass breaking,
explosion,
cry, shot,
Alarming!!!
Short time energy
Zero-crossing rate
Pitch period
MFCC
Spectrogram
LPC
-
7/29/2019 multimedia information
12/33
MIRMIR
Xiong, Zhou, Tian, Rui and Huang, Semantic Retrieval of Video, IEEE SP Mag., March 2006
Top 10 Problems in MIRTop 10 Problems in MIR
Bridge the Semantic Gap high level concept (sites, objects, events) and low-level visual/audio
features (color, texture, shape and structure, layout; motion; audio - pitch,
energy, etc.).How to Best Combine Human Intelligence and Machine Intelligence. Keep human in the loop, e.g. Relevance Feedback
New Query Paradigms Query by keywords, similarity, sketching an object, sketching a trajectory,
painting a rough image, etc. Can we think of useful new paradigms?Multimedia Data Mining Searching for interesting/unusual patterns and correlations in multimedia
has many important applications, including Web Search Engines anddealing with intelligence data.
Work to date on Data Mining has been mainly in Text data.
How to Use Unlabeled Data Active learning, e.g., in Relevance Feedback Label propagation, e.g., image/video annotation
-
7/29/2019 multimedia information
13/33
MIRMIR
Qi Tian Univ. of Texas at San Antonio
Top 10 Problems (continued)Top 10 Problems (continued)
Using Virtual Reality Visualization To Help Can we use 3D audio/visual visualization techniques to help a user
to navigate through the data space to browse and to retrieve? e.g., 3D MARS
Incremental Learning Change the parameters of the retrieval algorithms incrementally,
not needing to start from scratch every time we have new data.
Structuring Very Large Databases Researchers in audio/visual scene analysis and those in
Databases and Information Retrieval should really collaborateCLOSELY to find good ways of structuring very large multimedia
databases for efficient retrieval and search.
Performance Evaluation e.g., TRECVID for video retrieval, how about image retrieval?
What Are the Killer Applications of Multimedia Retrieval? e.g., medical multimedia document management
-
7/29/2019 multimedia information
14/33
Qi Tian Univ. of Texas at San Antonio
Approaches
Our Recent WorkOur Recent Work
UserFeatureExtraction
DataModeling
SimilarityEstimation
Dimension reduction
Statistical estimation
Dimension reduction
Statistical estimation
Classification
Ranking/indexing
Classification
Ranking/indexing
Training
Image DatabaseImage Database
Discr im inan t Ana lysi s
Useful in other applicationsUseful in other applications
-
7/29/2019 multimedia information
15/33
ApproachesApproaches
Qi Tian Univ. of Texas at San Antonio
Our Recent Work in CBIROur Recent Work in CBIR
Semantic Subspace Projection9 Bridge the semantic gap
Hybrid Discriminant Analysis9 Learn high dimensional data with small samples
Adaptive Discriminant Projection9 Adaptively learn a projection from data distribution
Distance Measure for Similarity Estimation9 Investigate the relations between probabilisticdistributions, distance metric, and mean estimation
-
7/29/2019 multimedia information
16/33
Qi Tian Univ. of Texas at San Antonio
Our Recent Work in CBIROur Recent Work in CBIR
Related Publications
Journalso IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
o IEEE Transactions on Circuits and Systems for Video Technology (CSVT)
o IEEE Multimedia (MM)
o International Journal on Computer Vision (IJCV)
o Pattern Recognition (PR)
o ACM Transactions on Knowledge Discovery from Data (TKDD)
o IEEE Transactions on Computational Biology and Bioinformatics
Conferenceso ACM Multimedia
o IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
o International Conference on Pattern Recognition (ICPR)
o Others including ICME, ICIP, ICASSP, MIR, CIVR
-
7/29/2019 multimedia information
17/33
Qi Tian Univ. of Texas at San Antonio
Web Image Search and Mining
Image Annotation
Affective Video Retrieval
Information Fusion in MIR
Integration of Context and Content for MultimediaManagement
Multimodal Emotion Recognition
Current Directions
-
7/29/2019 multimedia information
18/33
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Web SearchWeb Search 1.0 Traditional Text Retrieval
Web Search 2.0 Page-level Relevance Ranking
Web Search 3.0 Object-level Structured Search
Object Level Vertical Search (MSRA Libra:http://libra.msra.cn/
Live Product Search (http://products.live.com)
-
7/29/2019 multimedia information
19/33
Trends and DirectionsTrends and Directions
Qi Tian Univ. of Texas at San Antonio
Image AnnotationPhoto sharing through the Internet has become a commonpractice.
flickr.com: 19.5 million photos (30% growth/month), 2005Photo.net and airliners.net: millions of images
Most image search engines relies on textual descriptions ofthe images, e.g., Google, Yahoo, MSN
In general, people do not spend time labeling orannotating their personal photos
-
7/29/2019 multimedia information
20/33
Qi Tian Univ. of Texas at San Antonio
Image Annotation
Image Annotation System
A statistical model that can relate words to image features.
Sketch an image, extract feature vectors.
Descriptive words--top words ranked according to likelihood.
Current work
Li, Wang (alipr.com2006), Blei, Jordan (2003); Vasconcelos (UCSD-SML2007), Zhang et al. (MSRA 2005), Li et al. (MSRA 2007)
Promising Direction:
Web Image Annotation an integration of IR and Content Analysis
Can computer do this? Building
Sky
Lake Landscape
Tree
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
21/33
Qi Tian Univ. of Texas at San Antonio
Affective Video Retrieval
Affective:
a feeling or emotion as distinguished from cognition, thought, oraction
Real Multimedia RetrievalSearch for a subset of nicest holiday pictures to show them to friends
Selecting the most appropriate background music for the given situation
Search for the most impressive video clips
Search for the most appealing photographs of one and the same content
Search for all film comedies I like most
Alternative approach
Search for
9 Mood
9 Matches to users profile
Like/dislike, interest/no interest
Affective Video Retrieval
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
22/33
Qi Tian Univ. of Texas at San Antonio
Underlying IdeaVideo
Temporal content flow
Continuous transitions from
one affective state to another
Temporal measurement of Arousal
Valence
Combining the two curvesinto the Affect Curve
Arousal
t
Valence
t
Arousal
Valence
Affect curve
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
23/33
Qi Tian Univ. of Texas at San Antonio
Affective Media Content Characterization
Media Feature extraction Arousal = f1 (feature values)
Valence = f2 (feature values)
Mapping AV values onto
the 2D affect space
Affective content characterizationRomantic
feel-good
Hilarious
fun
Suspense
horror
A bit somber,
medium excitement
Hanjalic & Xu
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
24/33
Qi Tian Univ. of Texas at San Antonio
Hanjalic & Xu
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
25/33
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIR
Fusion: A merging of diverse, distinct, or separate elements into a
unified whole (Merriam-Webster dictionary).
Feature Extraction Module:
9 Multiple features ->vectors
9 Concatenated vector
9 Feature Fusion: more discriminating hyperspace can be found in the new
vector
Matching Module:9 One type of classifiers for multiple features or
9 Multiple types of classifiers for one feature or
9 Both
9 The output score can be combinedDecision Module:
9 The output decision of each classifier can be combined
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
26/33
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIR
Two Forms
Multi-modality9 e.g. Video clip->visual information, audio information, textural
information
9 Multi-modality fusion occurs at feature extraction module.
9 Single source information may be represented by multiple features,
e.g. Color image->color, texture, shape
Multi-Classifiers (Ensemble of Classifiers)
9 A set of classifiers are trained to solve the same problem
9 Applied on single or multiple source of information
9 A single type of base classifiers or
9 Different types of classifiers (Bayesian, K-NN, SVM)
Trends and DirectionsTrends and Directions
-
7/29/2019 multimedia information
27/33
Qi Tian Univ. of Texas at San Antonio
Information Fusion in MIR
Trends and DirectionsTrends and Directions
Fusion Schemes
The prediction of multiple classifiers need to be integrated into onefused decision by Fusion Scheme.9The output of different classifiers need to be normalized.
Rule-based9Decision is made by a simple operation on the output of all classifiers9 e.g. Max, Min, Sum, Mean (Matching Module)9 e.g. AND, OR (Decision Module)
Learning-based9The output of all classifiers is fed into a learning processto obtain the
final decision9 e.g. Decision Tree, Neutral Network
No one is guaranteed to be the best empirically or theoretically.
Applications9Multimodal Biometrics Fusion (face, fingerprint, iris, palmprint, voice,
hand geometry)9Audio/visual fusion for multimodal emotion recognition
-
7/29/2019 multimedia information
28/33
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content forMultimedia Management
Trends and DirectionsTrends and Directions
from Carlson & Hatfield, 1992
-
7/29/2019 multimedia information
29/33
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content forMultimedia Management
Trends and DirectionsTrends and Directions
An increasing number of active research in this direction
Crucial to human-human communication and humanunderstanding of multimedia.9 without context it is difficult for a human to recognize various
objects,
Enable that the (semi)automatic content analysis and indexingmethods become more powerful in managing multimedia data
Contextual information9
e.g., cell ID for the mobile phone location, GPS integrated in adigital camera, camera parameters, time information, andidentity of the producer
-
7/29/2019 multimedia information
30/33
Qi Tian Univ. of Texas at San Antonio
Integration of Context and Content forMultimedia Management (T-MM Special Issue)
Trends and DirectionsTrends and Directions
Topics of interest include:
Contextual metadata extraction
Models for temporal context, spatial context, imaging context(e.g., camerametadata), social context, and so on
Web context for online multimedia annotation, browsing, sharing and reuse
Context tagging systems, e.g., geotagging, voice annotation Context-aware inference algorithms
Context-aware multi-modal fusion systems (text, document, image, video,metadata, etc.)
Models for combining contextual and content information Context-aware interfaces and collaboration Novel methods to support and enhance social interaction, including
innovative ideas like social, affective computing, and experience capture. Applications such as using context and similarity for face and locationidentification
Context-aware mobile media technology and applications Using context to browse and navigate large media collections
-
7/29/2019 multimedia information
31/33
Projects in High ImpactFeatures
Web-based
Large Scale User participated
A combination of internet and multimedia
Examples: Photo Search Home photo management, mobile photo search, face search
Millions Books Project
Tremendous Space for Search and Multimedia Data Mining
Human-centered Multimedia Search Search for UCC data (user preference, profile, and opinion), Social search (public relations, names, personalization)
-
7/29/2019 multimedia information
32/33
Qi Tian Univ. of Texas at San Antonio
Acknowledgement
Current Collaborators
Academia
University of Illinois
University of Amsterdam
Chinese Academy of Science
University of Science and Technology of
China
Industry
Institute of Infocomm Research, Singapore
HP Labs, Palo Alto, CA
Microsoft Research (MSR) Redmond
Microsoft Research Asia (MSRA)NEC Labs America
Kodak Research Lab
IBM T.J. Watson Research Ctr
Funding Agencies
Army Research Office (ARO)
Department of Homeland Security (DHS)
San Antonio Life Science Institute(SALSI)
Center for Infrastructure Assurance andSecurity (CIAS)
Students
Jerry Yu (2007, Kodak Research, NJ)
Yijuan Lu (2008 expected)
Yuning Xu
-
7/29/2019 multimedia information
33/33
Qi Tian Univ. of Texas at San Antonio
Q & AQ & A
Thank y ou !
Qu est ion s ?