THE UNIVERSITY OF BRITISH COLUMBIA Random Forests-Based 2D-to- 3D Video Conversion Presenter: Mahsa...

21
THE UNIVERSITY OF BRITISH COLUMBIA Random Forests-Based 2D- to-3D Video Conversion Presenter: Mahsa Pourazad M. Pourazad, P. Nasiopoulos, and A. Bashashati

Transcript of THE UNIVERSITY OF BRITISH COLUMBIA Random Forests-Based 2D-to- 3D Video Conversion Presenter: Mahsa...

THE UNIVERSITY OF BRITISH COLUMBIA

Random Forests-Based 2D-to-3D Video Conversion

Presenter: Mahsa Pourazad

M. Pourazad, P. Nasiopoulos, and A. Bashashati

22

Outline

Introduction to 3D TV & 3D Content Motivation for 2D to 3D Video conversion Proposed 2D to 3D video conversion scheme Conclusions

33

Stereoscopic Dual Camera

Image-Based

Rendering

Technique

Stereo

Video

Stereo Video

3D Depth Range Camera2D Video Depth Map

Introduction to 3D TV & 3D content:

44

Industry is investing in 3D TV and broadcasting

Hollywood already is investing in 3D Technology

Are we ready for this? No!

One of the issues: lack of content Converting existing 2D to 3D:

Resell existing content (Movies, TV series, etc.)

Motivation for 2D to 3D Video Conversion:

55

Sharpness, motion, occlusion, texture, perspective, and…

5

How it Works - 3D Perception

66

2D-to-3D Conversion

Depth Map2D Video

2D to 3D Video conversion :

Monocular Depth Cues(Motion parallax, Sharpness, Occlusion and…)

Proper integration of more monocular depth cues results in more accurate depth map estimate (imitating human brain system)

88

Motion-based 2D to 3D video conversion*:

2D video

Motion Vectors

(MVs)

Motion Correction

Camera Motion

Correction

Object-based Motion Correction

Object-based Motion

Estimation

Non-Linear Transforming

Model*

)MVabs(~

X

*Pourazad, M.T., Nasiopoulos, P. and Ward, R.K. (2009) An H.264-based scheme for 2D to 3D video conversion. IEEE Transactions on Consumer Electronic, vol. 55, no. 2: 742-748.

Estimated Depth Map

) y)(x,MVabs(Cy)D(x,~

x

Main issue: Estimating depth information for static objects

Near objects move faster across the retina than further objects do

9

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion Parallax4x4 blocks

2D Video

Implement a block matching technique between consecutive frames.

11

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture Variation

4x4 blocks

2D Video

Face-texture of a textured material is more apparent when it is closer

12

Texture Variation Depth Cue:

Applying Law’s texture energy masks to 4x4 blocks’ luma information as:

1 2 1

2 4 2

1 2 1

-1 0 1

-2 0 2

-1 0 1

-1 2 -1

-2 4 -2

-1 2 -1

-1 -2 -1

0 0 0

1 2 1

1 0 -1

0 0 0

-1 0 1

1 -2 1

0 0 0

-1 2 -1

-1 -2 -1

2 4 2

-1 -2 -1

1 0 -1

-2 0 2

1 0 -1

1 -2 1

-2 4 -2

1 -2 1

L3L3 L3E3 L3S3

E3L3 E3E3 E3S3

S3L3 S3E3 S3S3

I: Luma information of each 4x4 block (Y)

F: Law’s mask

}2,1{ ),(),()(),(

kyxFyxInEk

Blockyxi

i

Law’s texture energy masks

Feature set with18 components represents texture variation depth cue for each 4x4 block

13

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHaze

4x4 blocks

2D Video

Distant objects visually appear less distinct and more bluish than objects nearby due to haze

14

Haze Depth Cue:

Haze is reflected in the low frequency information of chroma (U & V): Apply L3L3 Law’s texture energy mask (local averaging) to 4x4

blocks’ Chroma information as:

1 2 1

2 4 2

1 2 1

L3L3

C: Chroma information of each 4x4 block (U & V)

F: Law’s mask

}2,1{ ),(),()(),(

kyxFyxCnEk

Blockyxi

i

Feature set with 4 components represents haze depth cue for each 4x4 block

15

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspective

4x4 blocks

2D Video

The more the lines converge, the farther away they appear to be

Applying the Radon Transform to the luma information of each block ( {0, 30, 60, 90, 120, 150}). Amplitude and phase of the most dominant edge are selected

Feature set with 2 components

16

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspectiveVertical Coordinate

4x4 blocks

2D Video

In general the objects closer to the bottom boarder of the image are closer to the viewer

Feature set includes vertical spatial coordinate of each 4x4 block (as a percentage of the frame’s height)

17

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspectiveVertical CoordinateSharpness

4x4 blocks

2D Video

Closer objects appear sharper

Sharpness of each 4x4 block is measured by implementing diagonal Laplacian method*

*A. Thelen, S. Frey, S. Hirsch, and P. Hering, “Improvements in shape-from-focus for holographic reconstructions with regard to focus operators, neighborhood-size, and height value interpolation”, IEEE Trans.on Image Processing , Vol. 18, no. 1, pp. 151-157 , 2009

18

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspectiveVertical CoordinateSharpnessOcclusion

4x4 blocks

2D Video

The object which overlaps or partly obscures our view of another object, is closer.

Extracting all feature sets for each 4x4 patch at three different image-resolution levels (1, 1/2, and 1/4). Capture occlusion Global accountable features

Level 1 Level 1/2 Level 1/4Level 1 Level 1/2 Level 1/4

19

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspectiveVertical CoordinateSharpnessOcclusion

Random Forests (RF) Machine Learning

Depth-Map Model Estimation4x4

blocks

2D Video

81-dimensional feature vectors

RF: A classification & regression technique which is a collection of individual Decision Trees (DTs)* Randomly select the input feature vectors Application: where DTs do not perform well on unseen test data individually, but

the contribution of DTs perform well to unseen data

*L. Breiman, and A. Cutler, “Random forest.” Machine Learning, 45, pp. 5–32, 2001.

Training Set input: feature vectors of 4x4 blocks with pixels mostly belonging to a common

object of key frames output: known depth values

Test Set: 4x4 blocks of an unseen video

20

Our Suggested Scheme: (integrating multiple monocular depth cues)

Extracting Features Representing Monocular

Depth Cues

Motion ParallaxTexture VariationHazePerspectiveVertical CoordinateSharpnessOcclusion

Depth-Map Model Estimation

Estimated Depth Map

Mean shift Image segmentation*

Object-based depth information

4x4 blocks

2D Video

Depth Map

81-dimensional feature vectors

*D. Comaniciu, and P. Meer, “Mean Shift: A Robust Approach toward Feature Space Analysis,” IEEE Trans. Pattern Analysis Machine Intell., vol. 24, no. 5, pp. 603-619, 2002.

Random Forests (RF) Machine Learning

2121

Experiments:

Video Sequence Frame Size Stream Type Test View SourceOrbi 720x576 2D+Depth NA Heinrich Hertz Institute (HHI)Book Arrival 1024x768 Multiview View 8 Heinrich Hertz Institute (HHI)Breakdancer 1024x768 Multiview View 3 Microsoft Research (MSR) Rena 640x480 Multiview View 45 Nagoya UniversityAkko & Kayo 640x480 Multiview View 28 Nagoya UniversityPantomime 1280x960 Multiview View 37 Nagoya UniversityChampagne 1280x960 Multiview View 39 Nagoya University

Video Sequence Frame Size Stream Type Test View SourceInterview 720x576 2D+Depth NA Heinrich Hertz Institute (HHI)Ballet 1024x768 Multiview View 3 Microsoft Research (MSR)

Training sequences:

Test sequences:

2222

Results:

0

5

10

15

20

25

30

35

Texture Motion Haze VerticalCoordinate

Edge Sharpness

Av

gD

epth

-Cue

Im

port

ance

(%)

Depth Cues

0

5

10

15

20

25

30

35

Texture Motion Haze VerticalCoordinate

Edge Sharpness

Av

gD

epth

-Cue

Im

port

ance

(%)

Depth Cues

2D Video

Available Depth Map Existing Motion-based Technique

Our Proposed Technique

Subjective Test (ITU-R BT.500-11):18 people graded the stereo videos from 1 to 10

Original Existing Method Our MethodInterview 7.5 7.1 6.5

Ballet 7 6.8 6.3

2323

Conclusions:

A new and efficient 2D to 3D video conversion method was presented.

The method uses Random Forest Regression to estimate the depth map model based on multiple depth cues.

Performance evaluations show that our approach outperforms a state of the art existing motion based method

The subjective visual quality of our created 3D stream was also confirmed by watching the resulted 3D streams on a stereoscopic display.

Our method is real-time and can be implemented at the receiver side without any burden into the network