study Video stylization for digital ambient displays of home

Post on 18-May-2015

352 views 0 download

Tags:

description

a reading note of "Video stylization for digital ambient displays of home movies". I like this paper and try to understand it. enjoy !

Transcript of study Video stylization for digital ambient displays of home

Abstract

“Digital Ambient Display” Video cartoon of home movie

Video segmentation based on multi-label graph cut Video

temporal coherent region maps (tracking regions)

enhance cartoon painting

System

Algorithm

Outline

Introduction System Overview View Stylization

Multi-label Graph cut, region propagation, refining region label, smoothing and filtering, stroke placement and shading

Video Sequencing Stochastic composition, rendering transitions

Results and Discussion Conclusion

Video temporal coherence

region

Video Home movie

Introduction

Digital Ambient Display

A genre of content consumption experience which we call ambient experience

Displaying still images in an ambient way Digital picture frame

Displaying video content in an ambient way ? Digital Ambient Displays (DAD)

Digital Ambient Displays (DADs)

Video “mid-level” scene abstract using Color region segmentation

Video temporal coherence region region propagation, multi-label mraphic cut

Video Home Movie Video selection, composition and transition

Related work

Stochastic selection of video clips Stochastic transitions between video frames

[Schodl et al. 00] Single video and based on visual similarity

Composition of photos for abstract [Collomosse and Hall 03]

Video artwork [Slatter et al. 10; Bizzocchi 08]

Little work of the use of artistic video stylization in ambient displays

Related work

Image segmentation Mid-level models of scene structure [Wang et

al . 04; Collomosse 04] to render in artistic styles

Mean-shift based stylization [Wang et al . 04] small and short-live segments

Spatio-temporal volumes from video [Collomosse et al 05] of 3D dimension (x,y,t)

Abstract video using a bilateral filer [Winnemoller et al. 06]

Lack of temporal coherence

Our approach of DADs

A novel video segmentation Segmentation is guided by motion flow from

the region of past frames

Video selection and composition Video selection, composition and transition is

guided by similarity

Video Segmentation

t

11 1

2 2 23 3 3

4 4 45 5

Labeling regions and tracking regions in temporal

System overview

Video StylizationMulti-label Graph cut, region propagation, refining region label, smoothing and filtering, stroke placement and shading

Video temporal coherence

region

Video Segmentation

A novel coherent video segmentation Multi-label graph cut on successive video

frames

Multi-label graph cut

previous frame fn-1Current frame fn

fn-2

fn-3

label

color distribution built Gaussian Mixture Model (GMM) of each region

past frames

propagated by motion

Video Segmentation

Assign region labels existing in frame It-1 to each pixel p in frame It(p)

Find the best mapping l : P L

where L = { l(1), …, l(p), … l(|P|) }, P is an 8-connected lattice of pixels

To minimize the global energy function to encourage Spatial homogeneity of contrast within each frame Temporal consistency of color distribution between frames

labeling

Minimize global energy E

U : temporal consistency of color distribution between frames

V : spatial homogeneity of contrast within each frame

where 1) L is label set of the previous frame2) P is connected pixels in belong to labels3) Θ is the color history model

Minimize energy of V

V : spatial homogeneity of contrast within each frame

Punish pair points (8-connected neighbor) where they have different label but have high color homogeneity !

2

1

3?

Minimize energy of V

V : spatial homogeneity of contrast within each frame

Punish pair points (8-connected neighbor) where they have different label but have high color homogeneity !

2

1

3?

Minimize energy of U

U : temporal consistency of color distribution between frames

f(n-k) ... f(n-4) f(n-3) f(n-2) f(n-1) f(n)

Label/color L1/255 … L4/245 L4/248 L1/250 L1/255 ?/255

Color histogram at pixel p – label/color at each frame

the color distribution at pixel p with label L1

color color

the color distribution at pixel p with label L4

pixel p

color distributions of different label assignment

255255

Minimize energy of U

the color distribution at pixel n with label l(pn)

color color

the color distribution at pixel m with label l(pm)

pixel n

pixel m

color distributions of different pixel

gP

U1

log

U : temporal consistency of color distribution between frames

Minimize energy of U

N : Normal distribution (μ, Σ)Σ N : Mixture of Gaussians (GMM)Θ : parameters of all GMMs, Θ = {ωik, μik, Σi,k; i = 1, …, L; k = 1, …, Ki}

U : temporal consistency of color distribution between frames

Multi-label Graph Cut

Minimize E is a NP-hard problem

Multi-label graph cut α-expansion iteration for each label

until E can not decrease [Boykov and Kolmogorov 2004].

Graph cut

Min-cut/max-flow

Maximum flow ≡ minimum cut

Hong Chen, “Introduction to Min-Cut/Max-Flow Algorithms”

Multiple-Label Graphic Cut

Hong Chen, “Introduction to Min-Cut/Max-Flow Algorithms”

Multi-label graph cut on Binary Label

[PAMI04] Boykov and Kolmogorov, “An Experimental Comparison of Min-Cur/Max-Flow Algorithms for Energy Minimization in Vision”

Multi-label graph cut on Binary Label

Mini-Cut problemMax-Flow problem of each pixel

Multi-label graph cut on Binary Label

Mini-Cut problem on boundary

Max-Flow problem

Region propagation

Estimate the motion of It-1 using RANSAC search based on SIFT features

[Lowe 04] rigid motion + deformation I’t-1

Propagation labeling per pixel from I’t-1 It Incorrect motion estimation ?

Use thinned skeleton to mitigate imprecise motion estimation

Region/Skeleton

≡regions skeleton

error motion estimation ?

region propagation with motion pruning skeleton robust region

Skeleton to robust motion estimation

use only the skeletons whose distance to the boundary exceeds a pre-set

confidence

Region propagation

It-1 Labeled It-1

skeleton I’t-1 It

region label warped according to per-pixel motion estimation

replace regions with skeleton to robust motion estimation

GMM

Build a GMM color model for each region

li Sampling historical colors of labelled pixels

over recent frames

How to sample historical colors? contribution weight

More recent color contributes more importance

Refining region labels

How about new objects appear in It ?

D

Refining region labels

Keep two color models for each label l in frame It

(1) Historical color model

(2) an update color model

If |Mh – Mu| > threshold, new objects are deemed present

Smoothing and filtering

Spatio-temporal smoothing Gaussian filter of 3x3x3 (x-y-t)

Filtering Remove false segmentation and short-lived

object

smoothing

filtering

D

Filtering - remove short-lived object

K disconnected objects (e.g. c1, c2, c3…, cK) with the same label

dl,k : duration of kth object with label lτr : threshold. in this paper, 6 frames

D l : duration of label l

If the duration of any of these disconnected video object within this time window is shorter than threshold, this video object is removed

Stroke Placement and Shading

β-spline stroke Face detection

Painterly Rendering Painterly Rendering with Curved Brush Strokes of

Multiple Sizes [Hertzmann 1998]

Interpolate an orientation field from the shape of the region in this paper

Result of painterly rendering

Video SequencingStochastic composition, rendering transitions

Video Home movie

Stochastic composition

Stochastic composition

V1 V2

V3

Vi

V4

Video sequencing depends on

1. ds (Va, Vb) : semantic distance (tag) between tags of video A and B

2. dv (i, j) : the similarity of videos

Rendering Transitions

Smoothing video transition similar

similar

similar

similar

Region Morphing

Rendering Transitions

C(:) indicates mean color similarity;

A(:) indicates relative area;

S(:) indicates shape similarity in terms of region compactness

Smoothing video transition

0.5 0.4 0.1

Results

23 videos, manually tagged collection

Comparison - BOY

•BOY sequence•This paper•‘synergistic’ mean-shift + edge (Comaniciu, 2002) •spatio-temporal method (Paris, 2008).

Result - BEAR

fine scale features (e.g. the bear’s eyes and nose) are retained

Result - KITE

background detail may (optionally) be abstracted by modifying the initial frame segmentation to merge unwanted detailed regions

Result - DRAMA

correct handling of regions that disappear and appear within sequences

Conclusion

Digital Ambient Display (DAD) Select, stylized and transitions between clips

automatically

A novel algorithm for coherent video segmentation based on multi-label graph cut

Parse scene structures to enable shading and painterly effects

Create interesting transition effects between clips using region correspondence

Future work

Backward propagation of region labels to improve coherence of segmentations

Improve painterly rendering by region motion caused by occlusion vs. object deformation

Graph optimization algorithm similar to [Kovar et al. 02] to plane routes through a subset of clips e.g. to encompass a theme such as “family vacations” rather than traversing the whole database

Automatic meta-data annotation on user video collection, e.g. photo categorization [Ruiz et al. 03]

END