Overview on Scalable Video Coding - II Chuan-Yu Cho.
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of Overview on Scalable Video Coding - II Chuan-Yu Cho.
Overview on Scalable Video Coding - II
Chuan-Yu Cho
Outline
HHI and MS SVC Codecs Scalable Video Model 3.0
SVC: 68th meeting
14 full proposals in two categories Category 1: MCTF/2D Wavelet Category 2: AVC based (incl. AVC/MCTF)
Core Experiments (CE) Reference model (software) = the best performing
scheme in each category Category 1: (MCTF/2DWT) Microsoft Research Asia Category 2: (AVC/MCTF) HHI
Result of Core Experiments will be used for decision to adopt one complete codec architecture into the WD.
From: NCTU 69th MPEG meeting report
HHI and MS SVC Codecs
Chuan-Yu Cho
Motion Compensation Temporal Filtering (MCTF)
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2L2 H2
L2 H2L2
H3L3 H3
L1 H1L1 H1
L1 H1L1 H1
L1 H1L1 H1
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2H2L2L2 H2H2
L2L2 H2H2L2L2
H3H3L3L3 H3H3
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1
Motion Compensation Temporal Filtering (MCTF)
H
LH
LLL LLH
video sequence
1st temporal level
2nd temporal level
3rd temporal level
Wavelet tree with reduction of spatial size throughout the temporal levels
H
LH
LLL LLH
video sequence
1st temporal level
2nd temporal level
3rd temporal level
Motion Adaptive Filtering
Motion Compensation Temporal Filtering (MCTF)
Adaptive vs Non-adaptive MCTF
3D Sub-band Video Coding using Barbell lifting Status
Proposal Source
MSRA Asia IM group, MS CMPT group Authors
Jizheng Xu, Ruiqin Xiong, Bo Feng, Gary Sullivan, Ming-Chieh Lee, Feng Wu, Shipeng Li
3-D Wavelet/Subband (Interframe Wavelet) Method
H
LH
LLL LLH
video sequence
1st temporal level
2nd temporal level
3rd temporal level
Temporal Sca
lability
Spatial Scalibility
Barbell lifting
Barbellfunctions
t
S0 S1 S2
)(ˆ 000 Sfs
1s
a a
)(ˆ 222 Sfs
0s2s
t
S0 S1 S2
)(ˆ 000 Sfs
1s
a a
)(ˆ 222 Sfs
0s2s
210 ˆˆ sassat
Lifting schemeH0
X0 X1 X2 X3 X4
)(ˆ 000 Xfx )(ˆ 444 Xfx
1x
H1
)(ˆ 222 Xfx
)(''ˆ 222 Xfx
-a -a -a -a
L0
x0
H0
x2
H1 X4
)(ˆ000 Hgh
)(''ˆ 111 Hgh
L2
)(''ˆ 000 Hgh
)(ˆ111 Hgh
2b b
X0 X2
L1
x4
b 2b
Prediction stage
Update stage
L4: 256Kbps QCIF 15
L5: 512Kbps QCIF 15
L1: 48Kbps QCIF 7.5
L2: 64Kbps QCIF 15
L3: 128Kbps QCIF 15
Scalable Extension of H.264/AVC
Status Proposal => SVM 3.0
Source Fraunhofer Institute for Telecommunications -
Herinrich Hertz Institute (HHI) Authors
Heiko Schwarz, Detlev Marpe, and Thomas Wiegand
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
HHI Temporal Scalability
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2L2 H2
L2 H2L2
H3
L3
H3
L1 H1L1 H1
L1 H1L1 H1
L1 H1L1 H1 {MP}1
{MP}2
{MP}3Temporal Enhancement Layer (Layer 1)
Temporal Base Layer (Layer 0)
Temporal Enhancement Layer (Layer 2)
Temporal Enhancement Layer (Layer 3)
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
H2H2L2L2 H2H2
L2L2 H2H2L2L2
H3H3
L3L3
H3H3
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1
L1L1 H1H1L1L1 H1H1 {MP}1{MP}1
{MP}2{MP}2
{MP}3{MP}3Temporal Enhancement Layer (Layer 1)
Temporal Base Layer (Layer 0)
Temporal Enhancement Layer (Layer 2)
Temporal Enhancement Layer (Layer 3)
L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0*
L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
Spatial Base Layer (Layer 0)
Spatial Enhancement Layer (Layer 1)
reconstructedsequence
reconstructedand upsam pledsequence
H1 H1 H1 H1 H1 L1 H1 H1 H1 H1 H1 H1
reconstructedsequence
temporalsubbandpictures
Spatial upsampling
Base Layer Prediction
Reconstruction
L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0*L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0* L0*
L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1
L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0 L0
Spatial Base Layer (Layer 0)
Spatial Enhancement Layer (Layer 1)
reconstructedsequence
reconstructedand upsam pledsequence
H1 H1 H1 H1 H1 L1 H1 H1 H1 H1 H1 H1H1 H1 H1 H1 H1 L1 H1 H1 H1 H1 H1 H1H1 H1 H1 H1 H1 L1 H1 H1 H1 H1 H1 H1
reconstructedsequence
temporalsubbandpictures
Spatial upsampling
Base Layer Prediction
Reconstruction
HHI SNR Scalability
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
I P
PI B B
Layer 0 & 11
0
32kbps
48kbps
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
P B (Up sample)
2
1
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
Intrapredictionfor intra MB
2
1
Intrapredictionfor intra MB
InterPrediction
H L
96Kbps
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
4
3
SNR Scalability (QP)Temporal Scalability(Motion Compensation)
H00
L21
L22
HHI Joint Scalability
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P B
H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2
{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0H22 H0
0 H12 H0
0 L22 H0
0 H12 H0
0 H22 H0
0 H12 H0
0
I B P B P BI B P B P B
H20 H1
0 L20 H1
0 H20 H1
0H20 H1
0 L20 H1
0 H20 H1
0
Spatial upsampling
H21 H1
1 L21 H1
1 H21 H1
1H21 H1
1 L21 H1
1 H21 H1
1
H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1H23 H0
1 H13 H0
1 L23 H0
1 H13 H0
1 H23 H0
1 H13 H0
1
{MP}1,2{MP}1,2
{MP}0{MP}0
Layer 0: QCIF, 7.5 Hz, 64 kbit/s
Layer 1: QCIF, 15 Hz, 128 kbit/s
Layer 2: CIF, 15 Hz, 256 kbit/s
Layer 3: CIF, 15 Hz, 512 kbit/s
Layer 4: CIF, 30 Hz, 1024 kbit/s
Layer 5: CIF, 30 Hz, 2048 kbit/s
4
3
SNR Scalability (QP)Temporal Scalability(Motion Compensation)
H00
H21
H22
Scalable Video Model 3.0
Tomas Wiegand
Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute (HHI)
To be published in VCIP ‘05
Outline
Introduction
MCTF using the Lifting Representation
Motion model of H.264/MPEG-4 AVC
Prediction and Update operators
Derivation of Update Operators
Temporal Coding Structure
Temporal Decomposition Structure
Impacts on H.264/MPEG-4 AVC
Simulation Results for the MCTF Extension
Simulation Results for the MCTF Extension
SNR-Scalable Extension
Adjustment of the Quantization Parameter
Impacts on H.264/MPEG-4 AVC
Results for Layered SNR-Scalability
Results for Layered SNR-Scalability
Spatial Scalability
Spatial Prediction of Data
Results for Layered Combined Scalability
Results for Layered Combined Scalability
Fine Granular Scalability
Fine Granular Scalability
Fine Granular Scalability
Simulation Results: City
Simulation Results: Crew
Summary
References
“MCTF and Scalability Extension of H.264/AVC and its Application to Video Transmission, Storage, and Surveillance” R. Schäfer, H. Schwarz, D. Marpe, T. Schierl,
and T. Wiegand, Fraunhofer Institute for Telecommunications – Heinrich Hertz Institute (HHI)
To be published at VCIP ‘05