Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower...

22
TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU Camera Motion Identification in the Rough Indexing Paradigm Petra KRÄMER and Jenny BENOIS-PINEAU LaBRI – University Bordeaux I, France {petra.kraemer,jenny.benois}@labri.fr Camera Motion Identification in the Rough Indexing Paradigm – p.1/21

Transcript of Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower...

Page 1: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Camera Motion Identification in theRough Indexing Paradigm

Petra KRÄMER and Jenny BENOIS-PINEAU

LaBRI – University Bordeaux I, France

{petra.kraemer,jenny.benois}@labri.fr

Camera Motion Identification in the Rough Indexing Paradigm – p.1/21

Page 2: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Introduction

Task:Given the shot boundary referenceIdentify the shots in which a certain camera motion (pan, tilt,zoom) is present

Rough Indexing Paradigm:Work on a lower spatial and temporal resolution i.e. P-Frames

Aim:Reuse motion low-level descriptors from the compressed stream

Main challenge in TRECVID 2005:Jitter camera motion due to hand-carried cameras

Camera Motion Identification in the Rough Indexing Paradigm – p.2/21

Page 3: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Overview

P-Frames

1 Global Motion Estimation

2 Significance Value Computation

3 Motion Segmentation

4 Thresholding

5 Classification

Motion feature

θj

sj

sm

ζm

j related to frames, m related to segments of homogeneous motion

Camera Motion Identification in the Rough Indexing Paradigm – p.3/21

Page 4: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Overview

P-Frames

1 Global Motion Estimation

2 Significance Value Computation

3 Motion Segmentation

4 Thresholding

5 Classification

Motion feature

θj

sj

sm

ζm

j related to frames, m related to segments of homogeneous motion

Camera Motion Identification in the Rough Indexing Paradigm – p.3/21

Page 5: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Global Motion Estimation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Robust global motion estimator for P-Frames [DBP01]:

Estimation of the affine 2D motion model:(

dxi

dyi

)

=

(

a1

a4

)

+

(

a2 a3

a5 a6

)(

xi

yi

)

Based on the weighted least squares method:

θ = (HTWH)−1HTWZ

�θ = (a1, a2, a3, a4, a5, a6)T

Z MPEG motion compensation vectors

H macroblock centers

W weights defined by the derivative of the Tukey function

Camera Motion Identification in the Rough Indexing Paradigm – p.4/21

Page 6: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Global Motion Estimation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

The derivative of the Tukey function:

ψ(r, λr) =

{

r(r2 − λ2

r)2 if |r| < λr

0 otherwise

The weights are [OB95]:

wi =ψ(ri)

ri�

�λr threshold

ri = zi − zi residuals

zi i-th MPEG motion vector

zi estimation of zi

-10

-8

-6

-4

-2

0

2

4

6

8

10

-4 -3 -2 -1 0 1 2 3 4

PSfrag replacements

ψ

Camera Motion Identification in the Rough Indexing Paradigm – p.5/21

Page 7: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Global Motion Estimation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

a) -150

-100

-50

0

50

100

150

-200 -150 -100 -50 0 50 100 150 200

Motion Compensation Vectors (29087)

b) -150

-100

-50

0

50

100

150

-200 -150 -100 -50 0 50 100 150 200

Estimated Vectors (29087)

c)

'

&

$

%

a) P-Frame motion vectors

b) Estimated vectors

c) Macroblocks:

Outliers

Dominant estimation support D

(wi > 0)

Camera Motion Identification in the Rough Indexing Paradigm – p.6/21

Page 8: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Global Motion Estimation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Problem:

The global motion parameters are noisy due to jitter motions.

The global motion parameters have different meanings.

Solution:

Significance test of the motion parameters:

Thresholding of likelihood values

Camera Motion Identification in the Rough Indexing Paradigm – p.7/21

Page 9: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Significance Value Computation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Based on [BGG99]:

Change to another basis of elementary motion-subfields:

φ = (pan, tilt, zoom, rot, hyp1, hyp2) with

zoom = 1

2(a2 + a6) rot = 1

2(a5 − a3)

hyp1 = 1

2(a2 − a6) hyp2 = 1

2(a3 + a5)

Consider two hypotheses H0 and H1

H0: the considered component of φ is significantwith φ0 as the corresponding motion modelH1: the considered component of φ is not significant (= 0)with φ1 as the corresponding motion model

Camera Motion Identification in the Rough Indexing Paradigm – p.8/21

Page 10: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Significance Value Computation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Likelihood function associated to each hypothesis:

f(φl) =∏

i∈D

(

1

2π√

det(Σl)exp

(

−1

2(rT

i Σ−1

l ri)

)

)

=1

(2πσx,lσy,l)||D||exp (−||D||), l = 0, 1

Assumption:Σl =

(

σ2

x,l 0

0 σ2

y,l

)

��

��

Σ covariance matrix

σx, σy variances for x and y

D dominant estimation support

Camera Motion Identification in the Rough Indexing Paradigm – p.9/21

Page 11: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Significance Value Computation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

The significance value s is:

s = ln

(

f(φ1)

f(φ0)

)

= ||D|| (ln(σx,0σy,0) − ln(σx,1σy,1))

=∗ ||D||(

ln(σ2

0) − ln(σ2

1))

∗ assuming that σx = σy

Aim: Use s to test the significance

Idea:

If a motion feature (pan, zoom, tilt) is present in a shot, itscorresponding motion parameter is significant during a sufficientnumber of frames.

Camera Motion Identification in the Rough Indexing Paradigm – p.10/21

Page 12: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Significance Value Computation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Problem:

The significance values can be noisy due to jitter motions.

The motion models θ can be inaccurate.

Solution:

Smooth the significance value along the time and take decision onthe temporal mean value.

–> Segment shots into subshots of homogeneous motion

Introduce confidence measures in order to reject frames with aninaccurate motion model.

Camera Motion Identification in the Rough Indexing Paradigm – p.11/21

Page 13: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Significance Value Computation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Two reasons for inaccurate motion models:

Failure of the MPEG encoder–> Confidence measure cD ≈ ||D||

Failure of the global motion estimation algorithm–> Confidence measure cσ ≈ σ2

0

Reject of the frame if: cD < λD || cσ > λσ

��

λD thresholdλσ threshold

Camera Motion Identification in the Rough Indexing Paradigm – p.12/21

Page 14: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Motion Segmentation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Hinkley test to detect changes on the temporal mean value s(t):

Downward jump:

Uk =

k∑

t=0

(

st − s+δmin

2

)

(k ≥ 0)

Mk = max0≤i≤k

Ui; detection if Mk − Uk > λH

Upward jump:

Vk =

k∑

t=0

(

st − s−δmin

2

)

(k ≥ 0)

Nk = min0≤i≤k

Vi; detection if Vk −Nk > λH��

��

s temporal mean value

δmin minimal jump magnitude

λH predefined threshold

Camera Motion Identification in the Rough Indexing Paradigm – p.13/21

Page 15: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Motion Segmentation

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Principle of the Hinkely test:

s and s

DownMk − Uk

UpVk −Nk

Camera Motion Identification in the Rough Indexing Paradigm – p.14/21

Page 16: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Thresholding

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Selection of the hypothesis:

s(t) =1

T − t0

t=T∑

t=t0

s(t)

H0

<

>

H1

λs

And relative thresholding to determine the dominant motion:

ζ(t) =

{

s(t) if s(t) < α · min{span, stilt, szoom, srot, shyp1, shyp2}

0 otherwise��

��

T − t0 segment of homogeneous motion

λs threshold

α constant

Camera Motion Identification in the Rough Indexing Paradigm – p.15/21

Page 17: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Classification

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

The following classification scheme is applied to the thresholded meansignificance values ζ = (ζpan, ζtilt, ζzoom, ζrot, ζhyp1, ζhyp2):

ζ motion feature

1 (0, 0, 0, 0, 0, 0) static camera/ no significant motion

2 (ζpan, 0, 0, 0, 0, 0) pan

3 (0, ζtilt, 0, 0, 0, 0) tilt

4 (ζpan, ζtilt, ζzoom, 0, 0, 0) zoom

5 others complex camera motion

Camera Motion Identification in the Rough Indexing Paradigm – p.16/21

Page 18: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Classification

P

1

2

3

4

5

Mf

θj

sj

sm

ζm

Postprocessing:

Join neighbored segments with the same motion feature

Reject segments with a duration shorter than tmin frames

PSfrag replacements

tmin t

If a motion feature is still present:

The shot is identified to contain the motion feature.

Camera Motion Identification in the Rough Indexing Paradigm – p.17/21

Page 19: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Results

Results for the shot “shot106_136”:

a)-30

-25

-20

-15

-10

-5

0

5

10

15

29060 29080 29100 29120 29140 29160 29180 29200

frame number

pan static tilt zoom

reje

ct

reje

ct

a1a2a3a4a5a6

PSfrag replacementsλs b)

-250

-200

-150

-100

-50

0

50

29060 29080 29100 29120 29140 29160 29180 29200

frame number

pan static tilt zoom

reje

ct

reje

ct

pantilt

zoomrot

hyp1hyp2

PSfrag replacementsλs

c)-250

-200

-150

-100

-50

0

50

29060 29080 29100 29120 29140 29160 29180 29200

frame number

pan static tilt zoom

reje

ct

reje

ct

pantilt

zoomrot

hyp1hyp2

PSfrag replacementsλs

��

��

a) Global motion parameters θ

b) Significance values s

c) Online mean values s

Camera Motion Identification in the Rough Indexing Paradigm – p.18/21

Page 20: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Results

Precision and recall for all submissions:

0

0.2

0.4

0.6

0.8

1

0.4 0.5 0.6 0.7 0.8 0.9 1

reca

ll

precision

UyUD

2RRS

LabsRI

HUVISION

05LFMarburgA_CAM �

��

RI –> LaBRIPrecision 0.912Recall 0.737

Camera Motion Identification in the Rough Indexing Paradigm – p.19/21

Page 21: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

Conclusion and Perspectives

Conclusion:

Proposition of a method based on global motion estimation andsignificance test.

The proposed method can handle moving objects and jittermotions.

No decoding of the compressed stream.

Performance 3-4 times faster than real time.

Since no ground truth available, difficulties to determine the bestparameter set.

Future work:

Focus mainly on the correction of motion models if the encoderblock-matching algorithm fails.

Camera Motion Identification in the Rough Indexing Paradigm – p.20/21

Page 22: Camera Motion Identification in the Rough Indexing ParadigmRough Indexing Paradigm: Work on a lower spatial and temporal resolution i.e. P-Frames Aim: Reuse motion low-level descriptors

TRECVID 2005 – P. KRÄMER and J.BENOIS-PINEAU

References

[BGG99] P. Bouthemy, M. Gelgon, and F. Ganansia. A unified approach to shot changedetection and camera motion characterization. IEEE Trans. on Circuits andSystems for Video Technology, 9(7):1030–1044, October 1999.

[DBP01] M. Durik and J. Benois-Pineau. Robust motion characterisation for video indexingbased on MPEG2 optical flow. In International Workshop on Content-BasedMultimedia Indexing, CBMI’01, pages 57–64, 2001.

[OB95] J.M. Odobez and P. Bouthemy. Robust multiresolution estimation of parametricmotion models. Journal of Visual Communication and Image Representation,6(4):348–365, 1995.

Camera Motion Identification in the Rough Indexing Paradigm – p.21/21