Insertion of Impairments in Test Video Sequences for Quality Assessment Based on Psychovisual...

Post on 10-Jul-2015

53 views 4 download

Tags:

description

Presentation for published work in AIMS 2014 congress in Madrid in session 2.A: Image, Speech and Signal Processing. Tue, November 18, 2014 15:30

Transcript of Insertion of Impairments in Test Video Sequences for Quality Assessment Based on Psychovisual...

Insertion of Impairments in Test Video

Sequences for Quality Assessment

Based on Psychovisual Characteristics

Juan Pedro López Velasco, Juan Antonio Rodrigo,

David Jiménez and José Manuel Menéndez

Universidad Politécnica de Madrid

Madrid, 18th November 2014

Index

• Introduction: Problem description

• Artificially impaired video sequences

generation:

Impairment and artifacts insertion process

Creation of masks based on ROI’s

• Results and examples of masks appliance

• Example of future work for psychovisual model

• Conclusions

Problem description (I) • Assessing video quality is still a complex task.

• Video Quality Assessment needs to correspond

to human perception.

• Visual attention is focused on concrete areas of

an image as demonstrated with fixation maps.

Original image Fixation map Image with visual

attention weights

• Most pixel-based metrics do not present enough

correlation between objective and subjective results,

algorithms need to correspond to human perception

when analyzing quality in a video sequence.

• For example, these four frames have the same MSE.

Problem description (II)

High blocking High blurring

(defocus) Salt and pepper

noise artifact JPEG encoding

Problem description (III)

• Video quality metrics should correlate with

visual attention and psychovisual models

adapted to concrete artifacts and their

visualization.

Problem description (IV)

• But…

– How do we evaluate concrete artifacts

and effects of hiding/highlighting??

• Answer:

– With databases created to analyze these concrete

areas adapted to concrete artifacts.

– Subjective assessment reveals the relative

importance of each combination of artifact/ROI

Problem description (V)

Scheme of artificially impaired

video sequences generation

Impaired

video

sequence

Original

video

sequence

Artificially

impaired

sequence

Inverse

Feature

Mask

Feature

Mask

Distortion

(2 sequences

for each distortion:

One and the

opposite case

As seen in next

example)

Example of artificially

impaired sequences • Impaired area (with blocking artifact)

located in human faces ROI.

Impairment and artifacts

insertion process

Original

video

sequence

Feature

or

Artifact

Distortion

Impaired

video

sequence

Blocking

Blurring

Ringing

Artifacts simulation

Blocking simulated

with 8x8 mosaic filter

Blurring simulated with

gaussian lowpass filter

Ringing simulated with

JPEG codification filter

Creation of masks based on

ROI’s (I) • Types of regions of interest for masks

Original

video

sequence Feature

Detection

Feature Mask

Inverse

Feature Mask

Motion

Spatial

Detail

Faces

Position

Color

Motion mask

• For motion detection, temporal information in

consecutive frames is scrutinized

• Temporal information is analyzed:

0),(),(,.),( 1 yxFyxFifMaskyxPix frameiii

Original frame Motion mask based on TI

Spatial Detail Mask • Textures, edges and objects in motion are the source

of hiding or highlighting a determined impairments, in

cases such as blocking or blurring artifacts.

• Canny algorithm is used to create binary masks for

separating homogenous from high-frequencies areas.

Original frame Spatial detail mask based on Canny algorithm

Pixel Position Masks (I)

• The image is divided into nine sections

as indicated in research by Nojiri et al.

Nojiri’s sections

distribution

• The objective is to analyze the influence of

pixel position depending on the area in which

is located.

• Three types of masks are created depending

on the regions where pixels are located on a

corner, or on a lateral or central area

Pixel Position Masks (II)

Corner mask Lateral mask Central mask

Facial Mask

• Haar algorithm included in OpenCV

based on a boosted cascade of simple

features is used for face detection

Face detection Face mask

Color masks • Range of colors should be analyzed to determine the

weight of this factor to visual algorithms.

• The mask contains pixels corresponding to the

determined color and the ones with a similarity related

to a threshold.

• 3 ranges of colors define masks: red, blue and green.

Shades of red mask Original frame

Results

• Results based on subjective tests are

analyzed to demonstrate the validity of

test sequences.

“News Report”: Faces “Barrier”: Motion “Crowd”: Pixel Position

Examples for different effects

• 3 FR Metrics are analyzed (PSNR, Blur and MSE)

parallely to MOS result: 5 (excellent) to 1 (Poor).

• Examples where FR Metrics obtains bad correlation

with subjective results.

Sequence FR

Metric

H.264 Impairment located in

Faces ROI.

75Mbps 500Kbps D. Inv.

News

Report

PSNR 47.93 37.58 46.82 34.52

Blur 0.44 3.63 0.38 5.17

MSE 0.67 1.93 0.10 2.30

MOS 4.81 1.54 1.33 3.78

Sequence FR

Metric

H.264 Impairment located in

Motion ROI.

75Mbps 500Kbps D. Inv.

Barrier

PSNR 49.82 33.19 39.85 34.24

Blur 0.27 8.36 1.97 6.24

MSE 0.51 3.34 0.359 2.98

MOS 4.77 1.33 3.11 3.89

Seq. FR

Metric

H.264 Impairment located in Position ROI’s

75

Mbps

500

Kbps

Center Lateral Corner

D. Inv. D. Inv. D. Inv.

Crowd

PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88

Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47

MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87

MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22

Example 1: Faces

• When distortion is located in the areas

corresponding to human faces, the subjective MOS

values are lower (1.33) than when located in the rest

of the picture and faces appear sharp (3.78). This

effect is completely opposite to PSNR (46.82 vs.

34.52) or MSE’s behavior (0.10 vs. 2.30)

Sequence FR

Metric

H.264 Impairment located in

Faces ROI.

75Mbps 500Kbps D. Inv.

News

Report

PSNR 47.93 37.58 46.82 34.52

Blur 0.44 3.63 0.38 5.17

MSE 0.67 1.93 0.10 2.30

MOS 4.81 1.54 1.33 3.78

Example 2: Motion

• A similar situation occurs when

analyzing motion in “Barrier” sequence

Sequence FR

Metric

H.264 Impairment located in

Motion ROI.

75Mbps 500Kbps D. Inv.

Barrier

PSNR 49.82 33.19 39.85 34.24

Blur 0.27 8.36 1.97 6.24

MSE 0.51 3.34 0.359 2.98

MOS 4.77 1.33 3.11 3.89

Example 3: Pixel Position • When comparing distortions located in a corner, a lateral or the

center area in sequence “Crowd”.

• For observers, a high distortion located in a corner is insignificant.

On the other hand, when impairment is located in central area,

opinion scores decrease to 1.44.

• PSNR and MSE reveals the distortion related to the size of the

impaired area, while the influence in human eye is related to the

position of that impaired area

Seq. FR

Metric

H.264 Impairment located in Position ROI’s

75

Mbps

500

Kbps

Center Lateral Corner

D. Inv. D. Inv. D. Inv.

Crowd

PSNR 34.33 25.34 30.74 26.82 33.87 26.00 35.95 25.88

Blur 3.44 22.55 6.27 15.33 2.60 19.44 0.95 22.47

MSE 3.55 8.76 2.30 6.21 1.21 7.30 0.64 7.87

MOS 4.68 1.22 1.44 2.44 3.78 1.33 4.11 1.22

Example of future work for

psychovisual model (I)

Original frame from sequence “News Report”

Example of future work for

psychovisual model (II)

Motion Mask Spatial Details Mask

Pixel Position Mask Faces Mask

Example of future work for

psychovisual model (III)

Psychovisual Model (combination of 4 masks)

Conclusions • Algorithms are not adapted to subjective

human eye response.

• Subjective tests revealed the importance of

some concrete regions.

• Psychovisual models adapted to visual

attention obtain a better correlation when

weighting pixels than treating them equally.

• Versatility of the process allows to analyze

new artifacts apart from the ones included in

the paper.

Thanks for your attention!!