Video Retrieval
description
Transcript of Video Retrieval
Video Retrieval
2
Topics
• Shot detection algorithm
• Video indexing• key frame-based video indexing• Adaptive video indexing technique
• Automatic relevance feedback network for video retrieval
• Experiment on iARM: video search engine
3
Video Data
• Video is a continuous media but for database storage and manipulation such as random access it is important to be able to deal with portions of video object.
• Video Segmentation―cutting long video into portions:shot, scene, and clip
• Shot define a low level syntactic building blocks of video sequence.
• Scene is the logical grouping of shots into semantic unit.
• Clip is not clearly defined so it can last from a few seconds to several hours.
4
Video Segmentation
5
Organization of Video Data
6
Shot Boundaries
• Shot boundary detection can be easy, or difficult, depending
• cut: hard boundary complete change of shot between consecutive frames
• fade: fade-out or fade-in, a gradual fade to/from completely back (white?) frame
• dissolve: simultaneous fade-out and fade-in• …..others
• Each of these post-production technique make the detection of shot boundaries more difficult.
CutCut
Fade inFade in
Fade outFade out
DissolveDissolve
8
Frame-to-frame comparison
),( 12 hhd
1h
2h
3h
4h
),( 23 hhd
),( 34 hhd
N
iniminm hhhhd
1
),(
],...,,[ 21 jNjjj hhhh
is the color histogram of the is the color histogram of the jj-th frame-th frame
9
Shot Boundary Detection
10
Key-Frame for Shot Representation
key-framekey-frame
key-framekey-frame
11
Content Representation
• Content of video shot is describe by a low-level feature (e.g., color histogram) of the corresponding key-frame.
• The m-th video shot is indexed by
Video ShotVideo Shot ],...,,[ 21 mNmmm hhhh
mh
Key-FrameKey-Frame Content DescriptorContent Descriptor
12
Querying Video Database
Video Shot 1Video Shot 1 ],...,,[ 112111 Nhhhh
Video Shot 2Video Shot 2 ],...,,[ 222212 Nhhhh
Video Shot 3Video Shot 3 ],...,,[ 332313 Nhhhh
Video Shot Video Shot JJ ],...,,[ 21 JNJJJ hhhh
Query Query ShotShot
qh
DatabaseDatabase
MatchingMatching
13
GUI for Key Frame-Based Video Retrieval
QueryQueryShotShot
PlayPlayshotshot
14
Problems
• Compared to an image, video data contains both spatial and temporal information
• Key frame-based video indexing (KFVI) method can deal with spatial content but does not take into account temporal information.
• Furthermore, KFVI is not well adapted for representing video at scene and story levels
15
Adaptive Video Indexing (AVI) Technique
• A better technique in capturing temporal content as well as the spatial content for effective video indexing
• AVI provide multiple access to video database at three levels:
• shot• group-of-shot• Story
16
Database Organization based on AVI
Where is the descriptor of the video interval
• Video shot database
• group of shots
• story
)}(|),{( FIIIDVD ShotiiIShot i
)}(|),{( FIIIDVD GroupiiIGroup i
)}(|),{( FIIIDVD StoryiiIStory i
Multiple-level access to video database
iID iI
17
Fundamental of AVI
• Video sequence is a collection of visual templates (i.e., image frame)
• Similar video contains use similar visual templates
V 1V 1 V 2V 2 V 3V 3 V 4V 4 V 5V 5 V 6V 6 V 7V 7 V 8V 8 V 9V 9
18
Fundamental of AVI
V 1V 1 V 2V 2 V 3V 3 V 4V 4 V 5V 5 V 6V 6 V 7V 7 V 8V 8 V 9V 9
Descriptor of shot 1: [0 0 2 0 3 0 0 2 0 ...]Descriptor of shot 1: [0 0 2 0 3 0 0 2 0 ...]
V 1V 1 V 2V 2 V 3V 3 V 4V 4 V 5V 5 V 6V 6 V 7V 7 V 8V 8 V 9V 9
Descriptor of shot 2: [0 0 0 0 3 2 0 0 0 ...]Descriptor of shot 2: [0 0 0 0 3 2 0 0 0 ...]
19
• Given a set of initial visual templates, and training vectors
• The templates are optimized through the following steps:• Randomly choose the input vector• If is the closest node to such that
• Then,
Template Generation
},...,1|{ RrgC r
JRx Jjj ,}{ 1
jx
*rg
))()(()()1( *** ngxnngngrjrr
*,,...,1||,|||||| * rrRrgxgx rjrj jx
20
• Let be a set of descriptors for the video interval I, where is the histogram corresponding to the video frame
• Each is mapped to a Voronoi space through
where
and is the label of the n-th cell neighboring to the best match cell,
Template-frequency modeling (TFM)
)},(),...,,(),...,,{( 11 MMmmI fxfxfxD
pmx
mf
Cp
},...,,{,1,
)(*** nrrr
xm lllx m
||)(||minarg* rmr
rgxl
nrl
,*
*rg
mx
21
TFM Cont.
• The resulting of all frames from the mapping of the entire video interval are used as a representation of the video through a weight scheme:
where is the number of times the template is mentioned in the content of the video , N denotes the total number of videos in the system, and denotes the number of videos in which the index template appears.
Mmmx ,...,1,)(
jI
rjr
r
jrjr
jRjrjj
nNfreq
freqw
wwwv
/logmax
),...,,...,( 1
jrfreq rg
jI
rn
rg
22
Test Data
Video sequences # Sequences # Cuts # Frames Lengths (min:sec)
Commercial 20 844 98,733 54:52
Movie clip 2
Headline and story news
46
Description of sequences in the database: CNN broadcast news (at 352 resolution and 30 frames/sec.)
23
Retrieval Results
(a) (b)
A comparison of the retrieval performance at the shot level; (a) obtained by KFVI; and (b) obtained by the AVI
24
Performance Comparison
Precision results averaged over 25 queries, compared between adaptive video indexing (AVI) and key-frame based video indexing (KFVI), using video database containing 844 video shots
25
Query-by-Video-Clip
Precision and recall rates obtained by retrieval of:(a) video groups, employing two links: shot-to-group (STG) and group-to-group
(GTG)(b) video story, employing two links: shot-to-story (STS) and group-to-story (GTS)
(a) (b)
26
Query-by-Video-Clip Cont..
(a) Query clip, <1.8 sec>
(b) Rank 1, <1.8 sec>
(c) Rank 2, <2.4 sec>
(d) Rank 3, <1.9 sec>
(e) Rank 4, <2.7 sec>
(f) Rank 5, <3.3 sec>
27
Relevance Feedback for Video Retrieval: A client-server architecture
Search Engine with Relevance FeedbackSearch Engine with Relevance Feedback
28
Problem with Relevance Feedback (RF)
• user have to play each retrieved video in a feedback cycle
• compared to an image, video files are usually very large• time consuming• high bandwidth in RF training process
29
Automatic and Semi-Automatic RFs
Search Engine with Automatic Relevance Feedback NetworkSearch Engine with Automatic Relevance Feedback Network
30
Automatic Relevance Feedback Network (ARFN)
• Goal: implementation of adaptive system to improve retrieval accuracy
• Strategy: incorporate self-learning neural network in the relevance feedback module in order to avoid user’s interaction during the retrieval process
31
ARFN Architecture
number of nodes in the second layer = number of visual templatesnumber of nodes in the second layer = number of visual templatesnumber of nodes in the third layer = number of video in the database number of nodes in the third layer = number of video in the database
32
Signal Propagation
(a) (c)(b)
(a) Forward propagation; (b) Backward propagation; (c) New video template nodes in (b) introduce a new video node. This process results in the activation of new video nodes by expanding the original query templates, analogous to the traditional relevance feedback technique
33
Signal Propagation Cont..
• Activation level at the video template nodes, can be calculate according to two criterion:• Positive feedback
• Positive and negative feedback
where is the activation of the j-th video node, Pos is the set of positive video nodes, Neg is the set of negative video nodes
R
r jrjrjrPosj
jrvj
tr wwwwaa
1
2)()( ,
Negjjr
vj
Posjjr
vjqrr
rtr
wawawL
La)()(
)(
)(tra
)(vja
# Return Cosine measure 1 Iter. 3 Iter. 20 Iter.
1 100 0.0 0.0 0.0
2 100 0.0 0.0 0.0
3 98.67 +1.33 +1.33 -1.33
4 97.00 +1.00 +2.00 -2.00
5 96.00 +0.80 +1.60 -1.60
6 94.67 +0.67 +2.00 -1.33
7 90.29 +2.29 +3.43 +1.14
8 89.00 +3.00 +2.50 +1.50
9 86.67 +2.67 +3.11 +2.67
10 82.80 +5.60 +5.60 +4.80
11 80.36 +6.18 +6.18 +5.09
12 77.67 +7.33 +7.67 +7.00
13 74.77 +8.62 +10.15 +8.31
14 72.00 +9.43 +11.14 +9.72
15 69.33 +9.87 +11.20 +10.13
16 67.75 +9.50 +11.00 +10.00
Average Precision Rate, APR (%) obtained by retrieving 25 video shot queries. ARFN results are quoted relative to the APR observed with simple retrieval.
ResultsResults
Experiment: Video Search Engine
36
Goals
• Setting video search engine at the shot level, using JSP and J2EE server
• Implementing video indexing using AVI and compared it with KFVI
• Implementing a simple user-controlled interactive retrieval method within the search engine
37
GUI in iARM search engine
QueryQueryShotShot
Selected Selected methodmethod
38
Step I
• Copy all the files in the folder “Experiments” to drive C:
•Feature DatabaseFeature Database•key-frame Databasekey-frame Database
Jsp file andJsp file andJava BeansJava Beans
Video Shot DatabaseVideo Shot Database
39
Step II: load feature vectors to database>> java COM.cloudscape.tools.cview>> java COM.cloudscape.tools.cviewOpen the video feature database “C:\Experiments\database\videoOpen the video feature database “C:\Experiments\database\video
40
Step III: deploy application
deploytooldeploytool
New ApplicationNew Application
Add Web componentsAdd Web componentsto the Applicationto the Application• index.jspindex.jsp• autoFeedback.classautoFeedback.class• CompType.classCompType.class• MyDateJose.classMyDateJose.class• MyLocalRbf.classMyLocalRbf.class• userData.classuserData.class
42
Deploy the Application
43
Open the search engine: “http://localhost:8000/iARM/index.jsp”