Impact of the Content on Subjective Evaluation of Audiovisual Quality: What dimensions influence our...

20
Impact of the Content on Subjective Evaluation of Audiovisual Quality: What dimensions influence our perception? J.Lassalle, L.Gros (Orange Labs) , G.Coppin (Télécom Bretagne) & T. Morineau (UBS)

Transcript of Impact of the Content on Subjective Evaluation of Audiovisual Quality: What dimensions influence our...

Impact of the Content on Subjective Evaluation of Audiovisual Quality:

What dimensions influence our perception?

J.Lassalle, L.Gros (Orange Labs) , G.Coppin (Télécom Bretagne) & T. Morineau (UBS)

Audiovisual (AV) perceived quality is the result of the interaction between sound and image qualities:

what are the impacts of A and/or V degradations on the perceived AVQ and consequently on overall QoE ?

Several studies have shown a test material dependency on perceived quality:

Consequently, methods for evaluating AV quality should consider the influence of contents (regarding the influence of audio, video and the relationship between audio and video) to avoid uncontrolled effects

Today, only the ITU P.911 method is dedicated to AV quality evaluation in a passive viewing context:

It suggests a classification of test sequences but only on the basis of individual characteristics of the content A and V separately

Thus, it does not provide recommendations on the characterization of the AV event (i.e. considering semantic link between audio and video)

Current Context

VQEG june 2012

Corpus:

Dance Theatre Opera Sport Documentary

Expert characterization: extraction of 9 low-levels descriptors

Gwinner & Lalaurette, 2004 (MPEG7); Amiar, 1995

5 Semantic Descriptors: audio-visual relationship/diegesis (sound in, sound off, off-screen sound), sound expression (speech, music, sound effects), number of characters (few, some, many), content dynamic (low, moderate, high) and dominant modality (A, V, AV)

4 Technical Descriptors: brightness (low, moderate, high), color temperature (hot, moderate, cool), dynamic camera (low, moderate, high), and level of details (low, moderate, high).

=> The entire corpus has been characterized by considering these descriptors

Experiment 1Expert Characterization

VQEG june 2012

Descriptors Modes Dim. LevelUsed Expert

Brightness low, moderate, high Tech Low X

Color temperature hot, moderate, cool Tech Low X

Details low, moderate, high Tech Low X

Camera dynamic low, moderate, high Tech Low X

AV diegesisSound-in/off, off screen

Sem. Low X

Sound typespeech, music, sound effects

Sem. Low X

Content dynamic low, moderate, high Sem. Low X

Dominant modality

A, V or AVSem. Low X

Nb of characters few, some, high Sem. Low X

Comprehension low, moderate, high Sem. High

Quantity of information

low, moderate, highSem. High

Interest low, moderate, high Hed. HighValence 9-points scale Hed. HighArousal 9-points scale Hed. high

VQEG june 2012

Experiment 1Used Description Material

28 non expert participants

Corpus:

20 sequences (8-10s), extracted from the 5 contents characterized by the expert

Task:

Quality Evaluation (P.911 9-point scale) :

• AVQ, VQ and AQ

Evaluation of 4 expert low-level descriptors (unchanging nature for the other criteria) :

• dominant modality, color, brightness and content dynamic

Evaluation of the five additional high-level descriptors:

• 3 Hedonic descriptors: interest, valence and arousal (Self-Assessment Manikin-Scales)

• 2 Semantic descriptors: comprehension and quantity of information

VQEG june 2012

Experiment 1Protocol

1. Verify the relevance of descriptors

all semantic, technical and hedonic descriptors and Quality scores significantly depend on the sequence and more generally on the content

2. Obtain a corpus of AV sequences representing these different descriptors

VQEG june 2012

Experiment 1Results

Impact of “Sequence” condition on MOS for AV, A and V qualities

A V V A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

S E QUE NC E S

1

2

3

4

5

6

7

8

9

MO

S

VQEG june 2012

Experiment 1Results

35 non expert Participants

Corpus:

200 sequences: 20 sequences (from experiment 1) * 10 degradation conditions

• AV asynchrony (1500 ms audio delay),

• A bitrate variation (at 64Kbps/8Khz),

• V bitrate variations (between 93 to1600 Kbps),

• Freeze of frames packets

• A packet loss (10%)

• A and V degradation combined

Task:

AV quality assessment (as recommended by P.911 – only overall AV quality)

SEOVQ software was used to perform the test and collect the judgments of participants

VQEG june 2012

Experiment 2Content and Degradations: which interactions ?

1. Study the potential impacts of low-level descriptors on the overall perceived AV quality, in interaction with various degradations

discomfort > for “verbal” sequences compared to "nonverbal" sequences (sound / music), for asynchrony degradation

S peech (S 9) E ffects sound (S 10) M usic (S 13) S peech (S 14)

Re

f.

Ab

V

Ap

L

Ab

V

AfrZ A

synC

Vb

V_

Ap

L

Vb

V_

Ab

V

VfrZ

_A

pL

VfrZ

_A

bV

1

2

3

4

5

6

7

8

9

MO

Sa

v

Effect of Sound Type

VQEG june 2012

Experiment 2Results

1. Study the potential impacts of low-level descriptors on the overall perceived AV quality, in interaction with various degradations

discomfort > for “verbal” sequences compared to "nonverbal" sequences (sound / music), for asynchrony degradation

Speech (S9) Effects sound (S10) M usic (S13) Speech (S14)

Ref.

AbV

ApL

VbV

VfrZ A

synC

VbV

_ApL

VbV

_AbV

VfrZ

_ApL

VfrZ

_AbV

1

2

3

4

5

6

7

8

9

MO

Sav

S14 tagged “speech” and “sound-off” : a Diegesis effect

Effect of Sound Type

VQEG june 2012

Experiment 2Results

2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors

Effect of dynamic, modality and interest on AVQ scores

Low M oderate High

D ynam ic

3,9

4,0

4,1

4,2

4,3

4,4

4,5

4,6

4,7

MO

Sa

v

V A A V

M odality

3 ,9

4,0

4,1

4,2

4,3

4,4

4,5

4,6

4,7

MO

Sa

v

VQEG june 2012

Experiment 2Results

2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors

Effect of dynamic, modality and interest on AVQ scores

Low M oderate High

Interest

3 ,9

4,0

4,1

4,2

4,3

4,4

4,5

4,6

4,7

MO

Sa

v

VQEG june 2012

Experiment 2Results

2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors

Interactions between dynamic, modality and interest and the factor “Degradation”

Illustration of modality and “Degradation“ interaction

A V AV

Ref.

AbV

ApL

VbV

VfrZ A

synC

VbV

_ApL

VbV

_AbV

VfrZ

_ApL

VfrZ

_AbV

1

2

3

4

5

6

7

8

9

MO

Sav

VQEG june 2012

Experiment 2Results

It would be relevant to consider:

1. a complete content characterization which takes dominant modality, dynamic, sound type (with sounds-effects class) and diegesis (sound-in/off/off screen) into account

2. a multi criteria evaluation in addition to the overall AVQ evaluation with:

a separate assessment of Audio and Video quality (as recommended in P.920)

a specific question on asynchrony to allow participants to express their discomfort on this kind of degradation

VQEG june 2012

Conclusions and normalization perspectives

Thank you for your attention

VQEG june 2012

Test material dependency on perceived quality

D. H. Hands, “A basic multimedia quality model,” IEEE Trans. Multimedia, vol.6(6), pp.806-816, December 2004

N. F. Dixon, and L. Spitz, “The diction of auditory visual desynchrony,” Perception, vol. 9, pp. 719–721, 1980

M. P. Hollier, A. N. Rimell, D.S. Hand, and R.M. Voelcker, “Multi-modal perception,” J. BT Technol., vol. 17, pp. 35–46 January 1999

A/V interaction

J. G. Beerends, and F. E. De Caluwe, “The influence of video quality on perceived audio quality and vice versa,” J. Audio Eng. Soc., vol. 47(5), pp. 355-362, May 1999

ITU-T Contribution COM 12-19-E, Relation between audio, video and audiovisual qualitys, KPN, The Netherlands, December 1997

VQEG june 2012

References

Category Description

A One person, mainly head and shoulders, limited detail and motion

B One person with graphics and/or more detail

C More than one person

D Graphics with pointing

EHigh object and/or camera motion beyond the range usually found in video teleconferencing

Table A.1/P.911 – Video content categories

VQEG june 2012

Current Context

Category Description

I Speech/one speaker

II Speech/Multiple speakers

III Speech + background music

IV Music/single instrument

V Music/multiple instruments

Table A.2/P.911 – Audio content categories

VQEG june 2012

Current Context

VQEG june 2012

Naive characterization: protocol

VQEG june 2012

Experiment 1Protocol