CVHCI-Lecture-Shot boundary TV genre v3
Transcript of CVHCI-Lecture-Shot boundary TV genre v3
![Page 1: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/1.jpg)
Content-based Image and Video Retrieval
Vorlesung, SS 2009
Shot Boundary Detection & TV Genre ClassificationTV Genre Classification
Hazım Kemal Ekenel, [email protected]
Rainer Stiefelhagen, [email protected]
CV-HCI Research Group: http://isl.ira.uka.de/cvhci
18.05.2009
![Page 2: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/2.jpg)
Outline
Shot Boundary Detection
Definition
Types of shot boundary
Detection methods Detection methods
TV Genre Classification
Features
Sample systems
2
![Page 3: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/3.jpg)
Shot Boundary Detection
3
![Page 4: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/4.jpg)
Video
4
![Page 5: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/5.jpg)
Hugely important technology for archiving, content analysis, the Internet etc.
Need for tools to support automatic browsing and retrieval of large amounts of broadcast video.
Some Jargon…..
Digital Video Processing
Some Jargon….. A Frame is 1/25 (for PAL) of a second of video.
A Shot is a sequence of frames captured by a single camera in a single continuous action.
A Shot Boundary is the transition between two shots. Can be abrupt (cut) or gradual (fade, dissolve, wipe, morph).
A Scene is a logical grouping of shots into a semantic unit.
![Page 6: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/6.jpg)
Scene Scene
Video Sequence
…….
Shots and Scenes
Shot Shot Shot …….
Shot Boundaries…….F F F F
![Page 7: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/7.jpg)
Types of Transitions
Identity class: Neither of the two shots involved are modified, and no additional edit frames are added. Hard cuts.
Spatial class: Some spatial transformations are applied to the two shots involved. Wipe, page turn, slide, and iris effects.slide, and iris effects.
Chromatic class: Some color space transformations are applied to the two shots involved. Fade and dissolve effects.
Spatio-Chromatic class: Some spatial as well as some color space transformations are applied to the two shots involved. Morphing effects.
7
![Page 8: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/8.jpg)
Types of Transitions
Cut Fade Out/In Dissolve
8
Wipe
![Page 9: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/9.jpg)
Why do we need Shot Boundary Detection ?
Shots are basic units of a video. They are required for further video analysis, such as Person tracking, identification, High-level feature detection …
They provide cue about high-level semantics
In video production each transition type is chosen carefully to support the content and context.
For example, dissolves occur much more often in feature films and documentaries than in news, sports and shows. The opposite is true for wipes.
9
![Page 10: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/10.jpg)
Hard Cuts
The most common transition type.
Direct concatenation of two shots, &
t : Time stamp of the first frame after the hard cutthardcut : Time stamp of the first frame after the hard cut
u-1(t): The unit step function
Produces a temporal visual discontinuity.
How to measure the discontinuity?
10
![Page 11: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/11.jpg)
Features to Measure Visual Discontinuity
Pixel differences
Statistical differences
Histograms Histograms
Compression differences
Edge differences
Motion vectors
11
![Page 12: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/12.jpg)
Pixel differences
Two common approaches:
(1) Calculate pixel-to-pixel difference & Compare the sum with a threshold
(2) Count the number of pixels that change in value more than some threshold & Compare the total more than some threshold & Compare the total number against a second threshold
Sensitive to camera & object motion! Use an average filtering Motion compensation
12
![Page 13: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/13.jpg)
Camera Motion & Object Motion
13
![Page 14: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/14.jpg)
Absolute Pixel Differences with & w/o Motion Compensation
Frame 66 Frame 69
14
Absolute difference w/o motion compensation Absolute difference with motion compensation
![Page 15: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/15.jpg)
Motion Estimation
Adjacent frames are similar and changes are due to object or camera motion
15
![Page 16: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/16.jpg)
Optical Flow
16
Assumptions:• color constancy : a point in “t-1” looks the same in “t”
– For grayscale images, this is brightness constancy
• small motion : points do not move very far
Frame t-1 Frame t
![Page 17: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/17.jpg)
Optical Flow Constraint Equation
),( yx
),( tvytux δδ ++
ttime tttime δ+),( yx
Optical Flow: Velocities ),( vuDisplacement:
),(),( tvtuyx δδδδ =
• Assume brightness of patch remains same in both images:
• Assume small motion (Taylor expansion of left-hand-side upto first order):
),,(),,( tyxItttvytuxI =+++ δδδ
),,(),,( tyxIt
It
y
Iy
x
IxtyxI =
∂∂+
∂∂+
∂∂+ δδδ
![Page 18: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/18.jpg)
Optical Flow Constraint Equation
0=∂∂+
∂∂+
∂∂
t
It
y
Iy
x
Ix δδδ
0=∂∂+
∂∂+
∂∂
t
I
y
I
dt
dy
x
I
dt
dx
Divide by and take the limit tδ 0→tδu
0=∂
+∂
+∂ tydtxdt
0=++ tyx IvIuIConstraint Equation
v
),( vuNOTE: must lie on a straight line
We can compute using gradient operators! tyx III ,,
![Page 19: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/19.jpg)
A sample optical flow output
Image I Image I -Rotated
19
Absolute difference w/o motion compensation
Absolute difference with motion compensation
Image I Image I -Rotated
Illustration of optical flow
![Page 20: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/20.jpg)
Motion Estimation Methods
Feature/Region Matching: Motion is estimated by correlating/matching features (e.g., edges) or regional intensities (e.g., block of pixels) from one frame to another.
Block Matching Block Matching
Phase Correlation
Gradient-based Methods: Motion is estimated by using spatial and temporal changes (gradients) of the image intensity distribution and the displacement vector field.
Lucas-Kanade
20
![Page 21: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/21.jpg)
Statistical differences
Divide image into regions
Compute statistical measures from these regions (e.g., mean, standard deviation …)
Compare the obtained statistical measures
21
![Page 22: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/22.jpg)
Histogram comparison
The most common method used to detect shot boundaries.
Provides good trade-off between accuracy and speed
The simplest histogram method computes gray level The simplest histogram method computes gray level or color histograms of the two images. If the bin-wise difference between the two histograms is above a threshold, a shot boundary is assumed.
Several extensions available: Using regions, region weighting, different distance metrics …
22
![Page 23: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/23.jpg)
Compression differences
Use differences in the discrete cosine transform (DCT) coefficients of JPEG compressed frames as the measure of frame similarity.
Avoid the need to decompress the frames
23
![Page 24: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/24.jpg)
Edges/Contours
The edges of the objects in the last frame before the hard cut usually cannot be found in the first frame after the hard cut,
The edges of the objects in the first frame after the hard cut in turn cannot be found in the last frame before the hard cut.
Use Edge Change Ratio (ECR) to detect hard cuts!
24Hard cut
![Page 25: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/25.jpg)
Edge Change Ratio (ECR)
),max( 11 −−= noutnn
innn pXpXECR
:np
:innX
The number of edge pixels in frame n
The number of entering edge pixels in frame n
25
:1outnX − The number of exiting edge pixels in frame n-1
To make the measure more robust to object motion:
Edge pixels in one image which have edge pixels nearby in the other image (e.g. within 6 pixels’) are not regarded as entering or exiting edge pixels.
![Page 26: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/26.jpg)
26
Edge Change Ratio (ECR)
Compare Compare
![Page 27: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/27.jpg)
Motion
Use motion vectors to determine discontinuity.
27
Image I Image I -Rotated Illustration of optical flow
![Page 28: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/28.jpg)
Fade Detection
A fade sequence S(x,y,t) of duration T: scaling the pixel intensities/colors of a video sequence S1(x,y,t)by a temporally monotone scaling function f(t)
Fade in: f(0) = 0 and f(T) = 1 Fade out: f(0) = 1 and f(T) = 0 Often f(t) is linear
Fade in: f(t) = t/T, Fade out: f(t) = (T-t)/T
28
![Page 29: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/29.jpg)
Fade Detection –Standard deviation of pixel intensities
Var(S(x,y,t)) = Var(f(t) * S1(x,y,t))
= f2(t) * Var(S1(x,y,t))
= f2(t) * Var(S1(x,y))
σ(S(x,y,t)) = f(t) * σ(S (x,y))σ(S(x,y,t)) = f(t) * σ(S1(x,y))
Method:
Detect the monochrome frames
Search in both directions for a linear increase in the pixels’ intensity/color standard deviation
29
![Page 30: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/30.jpg)
Dissolve Detection
A dissolve sequence D(x,y,t) of duration T: mixture of two video sequences S1(x,y,t) and S2(x,y,t), where the first sequence is fading out while the second is fading in
f1(t) = (T – t) / T = 1 - f2(t)
f2(t) = t / T
Method
Train support vector machines
30
![Page 31: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/31.jpg)
Fade out/in vs. Dissolve
Fade out/in (FOI) Dissolve
31
![Page 32: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/32.jpg)
Shot Boundary Detection@ TRECVID Evaluations
A video retrieval evaluation campaign from the National Institute of Standards and Technology (NIST), US.
Promote progress in content-based analysis, detection, retrieval in large amount of digital videodetection, retrieval in large amount of digital video Combine multiple errorful sources of evidence Achieve greater effectiveness, speed, and usability
Confront systems with unfiltered data and realistic tasks
Measure systems against human abilities
Content-based image and video retrieval 32
![Page 33: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/33.jpg)
Evaluated each year from 2001 – 2007
57 different research groups worldwide
Shot Boundary Detection@ TRECVID Evaluations
33
![Page 34: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/34.jpg)
Cut vs. Gradual Transition Performance
Cuts Gradual Transitions
Content-based image and video retrieval 34
![Page 35: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/35.jpg)
A short break
35
![Page 36: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/36.jpg)
TV Genre Classification
Multimedia content annotation
Key issue in current convergence of audiovisual entertainment and information media
Good information and communication technologies availableavailable
but multimedia classification not mature enough
Lack of good automatic algorithms
Main challange: combine and map low-level descriptors and high-level concepts
36
![Page 37: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/37.jpg)
Sample Genres
37
![Page 38: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/38.jpg)
Subgenres
38
![Page 39: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/39.jpg)
Sample Feature -Scene LengthNews Cast Sports - Tennis
39
Commercials Cartoon
![Page 40: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/40.jpg)
Sample Feature -Audio Statistics: Wave Forms
News Cast Sports - Race
Sports - Tennis Commercials
40
Cartoon
![Page 41: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/41.jpg)
Sample Feature -Audio Statistics: Frequency Spectrum
News Cast Sports - RaceA
mpl
itude
Am
plitu
de
41
Sports - Tennis Commercials
Am
plitu
de
Am
plitu
de
![Page 42: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/42.jpg)
A sample system
TV Genre Classification Using Multimodal Information and
42
Multimodal Information and Multilayer Perceptrons
Credit: Tomas Semela
Montagnuolo, M., Messina, A: TV Genre Classififcation Using Multimodal Information and Multilayer Perceptrons , AI*IA, LNAI 4733, pp. 730-741, 2007
![Page 43: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/43.jpg)
Modality information in broadcast domain concerns
Physical properties perceived by users like colours, shapes and motion
Structural-syntactic information, e.g. relationships
Feature Sets
Structural-syntactic information, e.g. relationships between frames, shots and scenes
Cognitive information related to high-level semantic concepts like faces
Aural analysis of noise and speech
resulting in a feature vector
43
),,,( ACSVPV c=
![Page 44: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/44.jpg)
Low-level visual feature vector component
Color represented by
hue (H)
saturation (S)
value (V)
Feature Sets
44
Luminance (Y) represented by a grey scale [16, 233]
Textures described through contrast (C) and directionality (D) Tamura’s features
Temporal activity information (T) based of displaced frame difference (DFD)
65- bin histrogram for each feature
Last bin collects undefined values
![Page 45: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/45.jpg)
Feature Sets
Computed on a frame by frame basis
Accumulated over the number of frames
Each histogram modeled by a 10-component Gaussian mixture model
45
Each component being a Gaussian distribution with three parameters
weight , mean and standard deviation
2
2
2
2
)(
, 2
1)( i
i
ii
x
i
ex σµ
σµ πσϕ
−−
=
![Page 46: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/46.jpg)
Feature Sets
Gaussian mixture model
example of 4 component gaussian
∑=
10
1, 2
ii
ii
wσµ
ϕ
example of 4 component gaussian
mixture
different means and standard deviation
46
resulting into a 210 – dimensional feature vector
),,,,,,( TDCYVSHVc =
![Page 47: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/47.jpg)
Structural feature vector component
Extracted using a shot detection module
S1 captures information about the rhythm of the video:
is the frame rate (i.e. 25 fps), ∑∆=
sN
isS11 rF
47
total number of shots
• shot length, measured as the number of frames
• within the shot.
∑=
∆=i
isr
sNF
S1
1sN
is∆thi
![Page 48: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/48.jpg)
Structural feature vector component
S2 describes shot lengths distributed along the video
represented by a 65-bin histogram
64 bins for shot lengths [0,30s]
bin for shots longer than 30sth65
48
bin for shots longer than 30s
histogram normalized by so the area sums to one
resulting into a 66-dimensional feature vector
sN
),( 21 SSS =
65
![Page 49: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/49.jpg)
Cognitive feature vector component
Built by applying face detection Leads to three features
total number of faces
total number of framesp
f
D
NC =1
fNPD
describes how faces are distributed along the video
expressed by a 11-bin histogram
bin contains the number of frames with i faces,
bin containts the number of frames with 10 or more faces
49
pD PD
)( 2C
th11
thi
![Page 50: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/50.jpg)
Cognitive feature vector component
describes how faces are positioned along the video
9-bin histogram where the bin represents the
positions in the frame
Positions are top-left, top-right, bottom-left, bottom-
)( 3Cthi
thi
right, left, right, top, bottom and center
All histograms normalized by so their area sums to one
resulting into a 21-dimensional feature vector
50
fN
),,( 321 CCCC =
![Page 51: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/51.jpg)
Aural feature vector component
Derived by audio analysis of the TV programme
Audio signal segmented into seven classes:speech, silence, noise, music, pure speaker, speaker plus noise, speaker plus music
duration values, normalized by total duration of the 1A duration values, normalized by total duration of the video for the seven classes
the avarage speech rate, computed from speech content transcriptions using a speech-to-text engine
resulting into a 8-dimensional feature vector
51
1A
2A
),( 21 AAA =
![Page 52: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/52.jpg)
Genre Classification
is the TV programme to be classified
the set of available genres
Feature vector of is derived like described in previous slides
Each feature vector of is input of an Neural Network
p,...,, 21 ωωωω N=Ωp
Each feature vector of is input of an Neural Network
52
p
![Page 53: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/53.jpg)
Genre Classification
• Each Neural Network has an output vector
• can be interpreted as the membership value of p to genre i, according to the pattern vector part n
4,...,1,,..., ),(),(),(1
==Φ nnpN
npnp
ωφφ
),( npiφ
• Outputs combined into a resulting vector where:
• The genre j is selected corresponding to the maximum element of
53
,..., )()()(1
pN
pp
ωφφ=Φ
∑=
=Φ4
1
),()(
4
1
n
npi
pi φ
)( pφ
![Page 54: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/54.jpg)
Experimental Results - Dataset
About 110 hours of complete TV programs
Genres: cartoons, football, talk show, weather forecast, news, music videos, commercials
Each TV program manually annotated
Dataset split into K = 6 disjoint subsets of equal size
K-fold cross validation is used
54
![Page 55: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/55.jpg)
Sample Clips from the Data Set
Cartoon Commercial Football
55
Music News Talk show Weather forecast
![Page 56: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/56.jpg)
Experimental Results - Settings
All networks with one hidden layer, seven output neurons with sigmoid activation functions in the range of [0,1]
All hidden neurons have symmetric sigmoid activiation functions in the range of [-1,1]
Aural network has 8 input neurons and 32 hidden neurons
Cognitive network has 21 input neurons and 32 hidden neurons
Structural network has 65 input neurons and 8 hidden neurons
Visual network has 210 input neurons and 16 hidden neurons
56
![Page 57: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/57.jpg)
Obtained accuarcy with an avaraged value of 92 %
In some cases even greater than 95 %
Some news - talk shows and commercials - music clips confused with each other
Music genre shows the most scattered results due to
Experimental Results
structural, visual and cognitive inhomogenity
57
![Page 58: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/58.jpg)
Experimental Results – Comparison
58
![Page 59: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/59.jpg)
Experimental Results - Comparison
59
![Page 60: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/60.jpg)
References
Rainer Lienhart. Reliable Transition Detection In Videos: A Survey and Practitioner's Guide. International Journal of Image and Graphics (IJIG), Vol. 1, No. 3, pp. 469-486, 2001.
M. Montagnuolo, A. Messina: TV Genre Classififcation Using Multimodal Information and Multilayer Perceptrons , AIIA, LNAI 4733, pp. 730-741, 2007
60
![Page 61: CVHCI-Lecture-Shot boundary TV genre v3](https://reader031.fdocuments.us/reader031/viewer/2022011903/61d6ae867d75fb5c4f481986/html5/thumbnails/61.jpg)
Questions?Questions?
61