Document Preprocessing and Indexing SI650: Information Retrieval
Video Indexing and Retrieval
-
Upload
hadley-franco -
Category
Documents
-
view
58 -
download
0
description
Transcript of Video Indexing and Retrieval
00-04-25 Kyongil Yoon 1/39
Video Indexing and Video Indexing and RetrievalRetrieval
CMSC828KCMSC828K
Kyongil YoonKyongil Yoon
00-04-25 Kyongil Yoon 2/39
ContentsContents
Part 1 : SurveyPart 1 : Survey
““Multimedia Database Management Systems”Multimedia Database Management Systems”Guojun LuGuojun Lu
Chapter 7. Video Indexing and RetrievalChapter 7. Video Indexing and Retrieval
Part 2 : ExamplePart 2 : Example
Automatic Video Indexing via Object Motion AnalysisAutomatic Video Indexing via Object Motion AnalysisJonathan D. CourtneyJonathan D. CourtneyTexas InstrumentsTexas Instruments
00-04-25 Kyongil Yoon 3/39
IntroductionIntroduction
VideoVideo– A combination of text, audio, and images with a time dimensionA combination of text, audio, and images with a time dimension
Indexing and retrieval methodsIndexing and retrieval methods– Metadata-based methodMetadata-based method
– Text-based methodText-based method
– Audio-based methodAudio-based method
– Content-based methodContent-based method
• Video : A collection of independent images or framesVideo : A collection of independent images or frames
• Video : A sequence of groups of similar frames (shot-based)Video : A sequence of groups of similar frames (shot-based)
– Integrated approachIntegrated approach
00-04-25 Kyongil Yoon 4/39
Shot-Based Video …Shot-Based Video …
Video shot : logical unit or segmentVideo shot : logical unit or segment– Same sceneSame scene– Single camera motionSingle camera motion– A distinct event or an actionA distinct event or an action– A single indexable eventA single indexable event
QueryQuery– Which video?Which video?– What part of video?What part of video?
StepsSteps– Segment the video into shotsSegment the video into shots– Index each shotsIndex each shots– Apply a similarity measurement between queries and video shotsApply a similarity measurement between queries and video shots
Retrieve shots with high similaritiesRetrieve shots with high similarities
00-04-25 Kyongil Yoon 5/39
Shot Detections (Segmentation)Shot Detections (Segmentation)
SegmentationSegmentation– A process of dividing a video sequence into shotsA process of dividing a video sequence into shots
Key issue Key issue – Establishing suitable difference metricsEstablishing suitable difference metrics
– Techniques for applying themTechniques for applying them
TransitionTransition– Camera breakCamera break
– Dissolve, wipe, fade-in, fade-outDissolve, wipe, fade-in, fade-out
00-04-25 Kyongil Yoon 6/39
Basic Video Segment TechniquesBasic Video Segment Techniques
Sum of pixel-to-pixel differencesSum of pixel-to-pixel differences Color histogram differenceColor histogram difference
– To be tolerant with object motionTo be tolerant with object motion
– SDSDii = = jj||HHii(j)-H(j)-Hi+1i+1(j)(j)|| wherewhere i : frame number, j : gray leveli : frame number, j : gray level
Modification of color histogramModification of color histogram– SDSDii = = jj((((HHii(j)-H(j)-Hi+1i+1(j)(j)))22 / / HHi+1i+1(j)(j))) 22 test test
Selection of appropriate threshold - CriticalSelection of appropriate threshold - Critical– e.g.) The mean of the frame-to-frame differencee.g.) The mean of the frame-to-frame difference
+ small tolerance value + small tolerance value
00-04-25 Kyongil Yoon 7/39
Detecting Gradual ChangeDetecting Gradual Change
Fade-in, fade-out, dissolve, wipe, …Fade-in, fade-out, dissolve, wipe, … Twin-comparison techniqueTwin-comparison technique
– TTbb : Normal camera breaks : Normal camera breaksTTss : Potential frames of gradual change : Potential frames of gradual change
– If If TTbb < diff < diff shot boundaryshot boundaryTTss < diff < T < diff < Tbb accumulate differencesaccumulate differences diff < T diff < Tss nothingnothing
– If the accumulated value is greater than TIf the accumulated value is greater than Tbb, a gradual change is , a gradual change is detected.detected.
Detection techniques based on wavelet transformationDetection techniques based on wavelet transformation
Very hard to detect!Very hard to detect!
00-04-25 Kyongil Yoon 8/39
False Shot DetectionFalse Shot Detection
Camera panning, tilting, and zoomingCamera panning, tilting, and zooming– Motion analysis techniquesMotion analysis techniques– Camera movementsCamera movements
• Optical flow computed by block matching methodOptical flow computed by block matching method Illumination changeIllumination change
– Normalization of color images before carrying out shot detectionNormalization of color images before carrying out shot detection
1.1. RRii’ = R’ = Rii / Sqrt( / Sqrt( NN R Rii2 2 ), G), Gii’ = …, B’ = …, Bii’ = …’ = …
2.2. ChromaticityChromaticity
1)1) rrii’ = R’ = Rii’ / (R’ / (Rii’ + G’ + Gii’ + B’ + Bii’)’)
2)2) ggii’ = R’ = Rii’ / (R’ / (Rii’ + G’ + Gii’ + B’ + Bii’)’)
3.3. A combined histogram for r and g : CHI (Chromaticity histogram image)A combined histogram for r and g : CHI (Chromaticity histogram image)4.4. Reduce it to 16x16Reduce it to 16x165.5. 2D DCT2D DCT6.6. Pick only 36 significant DCT valuesPick only 36 significant DCT values7.7. Distances are calculated based on these valuesDistances are calculated based on these values
00-04-25 Kyongil Yoon 9/39
Other Shot DetectionOther Shot Detection
Motion removalMotion removal– Ideally, frame-to-frame distance should beIdeally, frame-to-frame distance should be
• Close to zero with very little variation within a shotClose to zero with very little variation within a shot
• Significantly larger than within-values between shotsSignificantly larger than within-values between shots
– However, within a shotHowever, within a shot
• Object motion, camera motion, other changesObject motion, camera motion, other changes
• Filter to remove the effects of camera/object motionFilter to remove the effects of camera/object motion
Based on edge detectionBased on edge detection Advanced camerasAdvanced cameras
– Recording extra information such as position, time, orientation, …Recording extra information such as position, time, orientation, …
00-04-25 Kyongil Yoon 10/39
Segmentation of Compressed VideoSegmentation of Compressed Video
Based on MPEG compressed videoBased on MPEG compressed video– DCT coefficientsDCT coefficients
– Motion informationMotion information
• E.g. # of bidirectional coded macro blocks in B frame, it is very likely E.g. # of bidirectional coded macro blocks in B frame, it is very likely shot boundary occurs around the B frameshot boundary occurs around the B frame
Based on VQ compressed videoBased on VQ compressed video
00-04-25 Kyongil Yoon 11/39
Video Indexing and RetrievalVideo Indexing and Retrieval
Shot detection is preprocessing for indexingShot detection is preprocessing for indexing
R (representative) framesR (representative) frames– One or more key frames for each shotOne or more key frames for each shot
– Retrieval is based on these framesRetrieval is based on these frames
Other informationOther information– Motion, objects, metadata, annotationMotion, objects, metadata, annotation
00-04-25 Kyongil Yoon 12/39
Based on R framesBased on R frames
An r frame captures the main content of the shotAn r frame captures the main content of the shot Image retrieval : color, shape, texture, …Image retrieval : color, shape, texture, … Choosing r framesChoosing r frames
– How many?How many?1.1. One per shotOne per shot2.2. The number of r frames according to their lengthThe number of r frames according to their length3.3. One per subshot/sceneOne per subshot/scene
– How to select?How to select?1.1. First frame of segmentFirst frame of segment2.2. An average frameAn average frame3.3. The frame whose histogram is closest to the average histogramThe frame whose histogram is closest to the average histogram4.4. Large background + all foregrounds superimposedLarge background + all foregrounds superimposed
– First frame + frame with large distanceFirst frame + frame with large distance
00-04-25 Kyongil Yoon 13/39
R frame base ignores temporal or motion informationR frame base ignores temporal or motion information Motion information is derived from optical flow or Motion information is derived from optical flow or
motion vectorsmotion vectors Parameters for indexingParameters for indexing
– Content : Content : talking head vs car crashtalking head vs car crash
– Uniformity : Uniformity : smoothness as a function of timesmoothness as a function of time
– Panning : Panning : horizontal camera movementhorizontal camera movement
– Tilting : Tilting : vertical camera movementvertical camera movement
Based on Motion InformationBased on Motion Information
Camera motionCamera motion–Pan, tilt, zoom, swing, (horizontal/vertical) shiftPan, tilt, zoom, swing, (horizontal/vertical) shift
00-04-25 Kyongil Yoon 14/39
Based on ObjectsBased on Objects
Content based representationContent based representation If one could find a way to distinguish individual If one could find a way to distinguish individual
objects throughout he sequence, …objects throughout he sequence, … In a still image, object segmentation is difficultIn a still image, object segmentation is difficult
In a video sequence, we can group pixels that move In a video sequence, we can group pixels that move together into an object.together into an object.
MPEG-4 object-based codingMPEG-4 object-based coding– How to representHow to represent
– NOT how to segment and detectNOT how to segment and detect
00-04-25 Kyongil Yoon 15/39
Based on OthersBased on Others
MetadataMetadata– DVD-SI : DVD service informationDVD-SI : DVD service information
Title, video type, directorsTitle, video type, directors
AnnotationAnnotation1.1. ManuallyManually
2.2. Associated transcripts or subtitlesAssociated transcripts or subtitles
3.3. Speech recognition on sound trackSpeech recognition on sound track
Integrated methodIntegrated method
00-04-25 Kyongil Yoon 16/39
Effective Video Representation and Effective Video Representation and AbstractionAbstraction
Useful to have effective representation and Useful to have effective representation and abstraction toolabstraction tool
How to show contents in a limited spaceHow to show contents in a limited space ApplicationsApplications
– Video browsingVideo browsing
– Presentation of video resultsPresentation of video results
– Reduce network bandwidth requirements and delayReduce network bandwidth requirements and delay
Then how?Then how?
00-04-25 Kyongil Yoon 17/39
Representation and AbstractionRepresentation and Abstraction
Topical or subject classificationTopical or subject classification– News : (local, international, finance, sport, weather)News : (local, international, finance, sport, weather)
Motion icon (Motion icon (miconmicon) or video icon) or video icon– Easy shot boundary representationEasy shot boundary representation
– Operations : browsing, slicing, extraction a subiconOperations : browsing, slicing, extraction a subicon
Video streamerVideo streamer ClipmapClipmap
– A window containing a collection of 3D miconsA window containing a collection of 3D micons
Hierarchical video browserHierarchical video browser
00-04-25 Kyongil Yoon 18/39
Representation and AbstractionRepresentation and Abstraction
StoryboardStoryboard– A collection of representative framesA collection of representative frames
MosaickingMosaicking– An algorithm to combine information from a number of framesAn algorithm to combine information from a number of frames
Scene transition graphScene transition graph– Node : image which represents one or more video shotsNode : image which represents one or more video shots
– Edge : the content and temporal flow of videoEdge : the content and temporal flow of video
Video skimmingVideo skimming– High-level video characterization, compaction, and abstractionHigh-level video characterization, compaction, and abstraction
00-04-25 Kyongil Yoon 19/39
Automatic Video Indexing via Object Motion AnalysisAutomatic Video Indexing via Object Motion AnalysisAs an Object Tracking ExampleAs an Object Tracking Example
Video indexingVideo indexing– The process of identifying important frames or objects in the video The process of identifying important frames or objects in the video
data for efficient playbackdata for efficient playback
– Scene cut detection, camera motion, object motionScene cut detection, camera motion, object motion
– Hierarchical segmentationHierarchical segmentation
Three stepsThree steps– Motion segmentation, object tracking, motion AnalysisMotion segmentation, object tracking, motion Analysis
EventsEvents– Appearance/DisappearanceAppearance/Disappearance
– Deposit/RemovalDeposit/Removal
– Entrance/ExitEntrance/Exit
– Motion/RestMotion/Rest
00-04-25 Kyongil Yoon 20/39
Motion SegmentationMotion Segmentation
Segmented Image CSegmented Image Cnn
– CCnn = ccomps( T = ccomps( Thh••k )k )
Th : binary image resulting from thresholding |ITh : binary image resulting from thresholding |Inn-I-I00||
TThh••k : morphological close operation on Tk : morphological close operation on Thh
– Reference frame IReference frame I00
Strong assumptions may fail whenStrong assumptions may fail when– Sudden lightingSudden lighting
– Gradual lightingGradual lighting
– Change of viewpointChange of viewpoint
– Objects in reference frameObjects in reference frame
00-04-25 Kyongil Yoon 21/39
Imperfectness of SegmentationImperfectness of Segmentation
All the possible problemsAll the possible problems– True objects will disappear temporarilyTrue objects will disappear temporarily
– False objectsFalse objects
– Separate objects will temporarily join togetherSeparate objects will temporarily join together
– Single objects will split into multiple regionsSingle objects will split into multiple regions
00-04-25 Kyongil Yoon 22/39
Object TrackingObject Tracking
TerminologyTerminology– SequenceSequence - ordered set of N frames - ordered set of N frames
SS = = {{FF00, F, F11, … F, … FN-1N-1}} : : FFii is i-th frameis i-th frame
– ClipClip CC = ( = (S, f, s, lS, f, s, l) : ) : FFff,F,Fll - first and last valid frame, - first and last valid frame, FFss - start frame - start frame
– FrameFrame FF : image : image II annotated with a timestamp annotated with a timestamp tt, , FFnn = ( = (IInn, t, tnn))
– ImageImage II : : r r x x cc array of pixel array of pixel
– TimestampTimestamp records the date and the time records the date and the time
V-objectV-object– Extracted by motion segmentation comparing a frame to a reference Extracted by motion segmentation comparing a frame to a reference
frameframe
– Label, centroid, bounding box, shape maskLabel, centroid, bounding box, shape mask
– VVnn = { = { VVnnpp; p = 1, … P; p = 1, … P } }
00-04-25 Kyongil Yoon 23/39
Object TrackingObject Tracking Tracking procedureTracking procedure
– Iterate (forward) step 1-3 for frames 0, 1, …, N-2Iterate (forward) step 1-3 for frames 0, 1, …, N-21. For each V-object, predict its position in next frame1. For each V-object, predict its position in next frame
6. 6. Determine secondary links from forward stepDetermine secondary links from forward step
7. Determine secondary links from backward step7. Determine secondary links from backward step
np = n
p + np(tn+1-tn)n
p = np + n
p(tn+1-tn)
2. For each V-object, determine the V-object in the next frame with centroid 2. For each V-object, determine the V-object in the next frame with centroid nearest to the predictionnearest to the prediction
3. For every pair, estimate forward velocity3. For every pair, estimate forward velocity
4. Do 1-3 in backward4. Do 1-3 in backward
- For all frames- For all frames
5. Determine primary links5. Determine primary linksfor mutual nearest neighborfor mutual nearest neighbor
00-04-25 Kyongil Yoon 24/39
Object TrackingObject Tracking
Following graph is producedFollowing graph is produced– Node - V-objectsNode - V-objects
– Primary links (mutually nearest)Primary links (mutually nearest)
– Secondary links (others)Secondary links (others)
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
1-dim rep.positions
00-04-25 Kyongil Yoon 25/39
Motion AnalysisMotion AnalysisV-object GroupingV-object Grouping
Group V-objects with difference levelsGroup V-objects with difference levels– Stem, Stem, MM
– Branch, Branch, BB
– Trail, Trail, LL
– Track, Track, KK
– Trace, Trace, EE
E E K K L L B B MM Each level implies a feature of the blobEach level implies a feature of the blob
00-04-25 Kyongil Yoon 26/39
V-object Grouping - StemV-object Grouping - Stem Maximal size path of two or more V-objects with no secondary linksMaximal size path of two or more V-objects with no secondary links MM = { = {VVii : : i = 1, 2i = 1, 2, … , … NNM}}
– outdegree(outdegree(VVii) = 1 for 1 ) = 1 for 1 ii < < MM
– indegree(indegree(VVii) = 1 for 1<) = 1 for 1<i i MM
– either either 11 = = 22 = … = = … = NNMM
or or 11 22 … … NNMM
Stationary/movingStationary/moving
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
AA
AD
D
B B B B/EE/F F
C C C C C H H H H
GG
II
KK
JJ
V-objects with constant/no movement without any change
V-objects with constant/no movement without any change
00-04-25 Kyongil Yoon 27/39
V-object Grouping - BranchV-object Grouping - Branch Maximal size path containing no secondary links and composed Maximal size path containing no secondary links and composed
with only one pathwith only one path BB = { = {VVii : : i = 1, 2, …, Ni = 1, 2, …, NBB}}
– outdegree(outdegree(VVii) = 1 for 1 ) = 1 for 1 ii < < BB
– indegree(indegree(VVii) = 1 for 1<) = 1 for 1<i i BB
Stationary(one stem) / moving(otherwise)Stationary(one stem) / moving(otherwise)
L M M M MM M
LL
OO
N N N N N Q Q Q Q T
PP
S
RR
T
S
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
V-objects without changeV-objects without change
00-04-25 Kyongil Yoon 28/39
V-object Grouping - TrailV-object Grouping - Trail
LL Maximal-size path without secondary linksMaximal-size path without secondary links Stationary/moving/unknown Stationary/moving/unknown
U V V V VV V
UU
UU
W W W W W W W W W W Y
XX
Y Z
YY
Y
Y
Z
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
V-objects with main movementV-objects with main movement
00-04-25 Kyongil Yoon 29/39
V-object Grouping - TrackV-object Grouping - Track KK = { = {LL11, G, G11, … ,L, … ,LNNK-1K-1
, G, GNNK-1K-1, L, LNNKK
}}– LLii : trail: trail– GGi i : connecting dipath with constant velocity: connecting dipath with constant velocity
through through H = { H = {VVllii, G, Gii, V, V11
i+1i+1}} wherewhere VVll
i i is the last object of is the last object of LLii and and VV11i+1 i+1 is the first object ofis the first object of L Li+1i+1
Stationary/moving/unknownStationary/moving/unknown
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
V-objects with movementguessing occluded part
V-objects with movementguessing occluded part
00-04-25 Kyongil Yoon 30/39
V-object Grouping - TraceV-object Grouping - Trace
EE Maximal size connected digraph of V-objectsMaximal size connected digraph of V-objects
1 2 2 2 22 2
11
11
1 1 1 1 1 1 1 1 1 1 1
11
1 1
11
1
1
1
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
Group of V-objects overlappedGroup of V-objects overlapped
00-04-25 Kyongil Yoon 31/39
EventsEvents
AppearanceAppearance - - an object emerges in the scenean object emerges in the scene Disappearance Disappearance - - an object disappears from the scenean object disappears from the scene Entrance Entrance - - moving object enters the scenemoving object enters the scene Exit Exit - - moving objects exits from the scenemoving objects exits from the scene Deposit Deposit - - an inanimate object is added to the scenean inanimate object is added to the scene Removal Removal - - an inanimate object is removed from the scenean inanimate object is removed from the scene Motion Motion - - an object at rest begins to movean object at rest begins to move Rest Rest - - a moving object comes to a stopa moving object comes to a stop (Depositor) (Depositor) - - a moving object adds an inanimate object to a moving object adds an inanimate object to
the scene the scene (Remover) (Remover) - - a moving object removes an inanimate object a moving object removes an inanimate object
from the scene from the scene
00-04-25 Kyongil Yoon 32/39
Annotating V-objectsAnnotating V-objects
Appearance 1. Head of track2. Indegree(V) > 0
1. Head of track2. Indegree(V) = 0
1. Head of track2. Indegree(V) = 1
1. Head of track2. Indegree(V) = 0
Adjacent to V-object with deposit tag
Adjacent from V-object with removal tag
1. Tail of stationary stem2. Head of moving stem1. Tail of moving stem2. Head of stationary stem
DisappearanceEntrance
Exit
Deposit
Removal
(Depositor)
(Remover)
Motion
Rest
1. Tail of track2. Outdegree(V) > 01. Head of track2. Indegree(V) = 01. Tail of track2. Outdegree(V) = 0
1. Tail of track2. Outdegree(V) = 0
1. Tail of track2. Outdegree(V) = 0
1. Tail of track2. Outdegree(V) = 1
Moving Stationary Unknown
V-object motion state
00-04-25 Kyongil Yoon 33/39
Example of AnnotationExample of Annotation
F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
Entrance Entrance
EntranceExit
Exit
ExitDepositor/Deposit Removal/Remover
Motion Rest Appearance Disappearance
00-04-25 Kyongil Yoon 34/39
QueryQuery
Y = (Y = (CC, , TT, , VV, , RR, , EE))– CC : a video clip : a video clip
– T = (tT = (tii, t, tjj)) : a time interval within the clip : a time interval within the clip
– VV : V-object in the clip : V-object in the clip
– RR : a spatial region in the field of view : a spatial region in the field of view
– E E : an object motion event: an object motion event
Processing a queryProcessing a query– Keeps truncating domain with query parametersKeeps truncating domain with query parameters
00-04-25 Kyongil Yoon 35/39
Experimental ResultExperimental Result
3 videos, 900 frames, 18 objects, 44 events3 videos, 900 frames, 18 objects, 44 events
1 false negative, 10 false positive1 false negative, 10 false positive ConservativeConservative
Video 1Video 1 Video 2Video 2 Video 3Video 3
Inventory or Security Inventory or Security monitoringmonitoring300 frs, 10fr/sec300 frs, 10fr/sec5 objects, 10 events5 objects, 10 eventsentrance/exit, entrance/exit, deposit/removaldeposit/removal
retail customer retail customer monitoringmonitoring285 frames, 10 fr/sec285 frames, 10 fr/sec4 objects, 14 events4 objects, 14 eventsall eight eventsall eight events3 foreground objects in 3 foreground objects in ref. frameref. frameMost complicatedMost complicated
parking lot traffic parking lot traffic monitoringmonitoring315 frames, 3fr/sec315 frames, 3fr/sec9 objects, 20 events9 objects, 20 eventsmost noisymost noisy
00-04-25 Kyongil Yoon 36/39
Errors come fromErrors come from
Noise in the sequenceNoise in the sequence Assumption of constant trajectories of occluded Assumption of constant trajectories of occluded
objectsobjects No means to track objects through occlusion by fixed No means to track objects through occlusion by fixed
scene objectsscene objects
00-04-25 Kyongil Yoon 37/39
MosaickingMosaicking
00-04-25 Kyongil Yoon 38/39
Story board, Video MultiplexingStory board, Video Multiplexing
Show 20 minutes of video in 6 secondsShow 20 minutes of video in 6 seconds Loop all shots as thumbnails at same timeLoop all shots as thumbnails at same time Let the user focus on the interesting shotsLet the user focus on the interesting shots
00-04-25 Kyongil Yoon 39/39
MiconMicon