Perceptual Quality Evaluation for Image and Video: from ...€¦ · windscreen attracts more eye...

Post on 15-May-2020

5 views 0 download

Transcript of Perceptual Quality Evaluation for Image and Video: from ...€¦ · windscreen attracts more eye...

Tutorial for APSIPA13:S

Perceptual Quality Evaluation for Image and Video: from Modules to Systems

WeisiWeisi LinLinWeisiWeisi LinLin

Email: Email: wslin@ntu.edu.sgwslin@ntu.edu.sgSchool of Computer EngineeringSchool of Computer EngineeringSchool of Computer EngineeringSchool of Computer Engineering

NanyangNanyang Technological UniversityTechnological UniversitySingaporeSingapore

Question 1: Why are pictures important in our lif & k?life & work?

• Physiology/Psychology~ 50% of cerebral cortex is for visionVision: major channel for us we to experience the world

• Visual representations: the most efficient way toVisual representations: the most efficient way to represent information

Even when people speak different languages

I i il bilit• Increasing availability Digital cameras Anytime, everywhere, anyhow… y y y

• Large interest & commercial valuemovies, television, Internet/social/ mobile media, gaming, content search, advertisements, surveillance, politics, scientific research, medical applications, military, … 2

Question 2: Why is picture appreciation by Q y p pp ymachines important?

li ( d l d l )• Quality assurance (as a standalone module)– product inspection– test equipmentq p– on-line, in-service monitoring– visual/multimedia algorithm/system benchmarking

T h l d l t & ti i ti ( b dd d i t )• Technology development & optimization (embedded in a system)– VCD/DVD/HDTV/3DTV, multimedia, mulsemedia (multiple sensorial

media)– computer graphics/animation– computational photography– visual/multimedia transmission– …

3

Traditional Visual Signal Quality Measures ( ill id l d )(still widely used now)

• MSE (Mean Square Error)• MSE (Mean Square Error) • SNR (Signal Noise Ratio)• PSNR (Peak SNR)• PSNR (Peak SNR)• QoS (Quality of Service)•• …

4

Problems with the existing metricsProblems with the existing metrics

(a)  (b)   (c)

(a) Original image

MSE=324(b) Gaussian noise(c) Brightened 

5

( ) g(d) JPG compressed

(e)(d) 

Image Quality Assessment (more examples)

All images have nearly the same MSE

Device-centricift

power, memory,di l

perception-centricff i

m sh

i display, …Network-centric

bit rate

effectiveefficient

f l

adig

m bit rate,error rate,packet loss,

usefulenjoyable

l

para

pdelay, …

…natural

A

Gap in most current systems:Target: human consumption, appreciation & interaction

7

gTechnical design: non-perceptual criteria

H t i i l l tiHuman-centric visual evaluation

• majority of visual content we handle: for human consumptionp

• human perception: effective and efficient, so machines that emulate its functioningso machines that emulate its functioning have technical advantages

• an increasing need: harmonious human• an increasing need: harmonious human-machine interaction

8

A part of a bigger “picture”

Multimedia & Mulsemedia (multiple sensorial media):–Audition (hearing)

–Vision (seeing)

Touch (taction)–Touch (taction)

–Olfaction (smell)Olfaction (smell)

–Gustation (tasting)( g)9

Special Issue on “Multiple Sensorial (MulSeMedia) Multi-modal Media:

Paper Submission: 18/11/2013

Guest Editors(MulSeMedia) Multi modal Media: Advances and Applications”

Transactions on Multimedia Computing,

George Ghinea (Brunel University, UK)

Stephen Gulliver (Univ. of Reading, UK)

Christian Timmerer (Alpen-Adria-Universität, Klagenfurt,

A stria)p g,Communications and Applications

Austria)

Weisi Lin (Nanyang Technological University, Singapore)

Possibilities of Perceptual EvaluationPossibilities of Perceptual Evaluation

• Subjective viewing tests j g– ITU BT 500 standard– MOS (mean opinion score)– Shortcomings

• Expensive, time consuming• Not suitable for automatic in-loop/service on-lineNot suitable for automatic, in-loop/service, on-line

real-time processing e.g., encoding, transmission, relaying, etc.

• Not always reliable• Not always reliable depending on viewers' physical conditions, emotional states, personal experience,

11

display context

Solution:

A bj ti (b hi ) tAn objective (by machine) measure to emulate MOS!

12

e Artificial visionct

ure Artificial vision

which is where, and whatis done and how?

d pi

cin

g

lty

base

cess

i Representation: compression“zipping” & restoration

difficu

hine

-bpr

oc

Pixel manipulations

d

Mac

h cropping, edition (addition/subtraction/size change)object boundary detection, edge enhancementhistogram equalization, …

13

M g q ,

Picture quality evaluation

• 1st-party evaluation– by the photographer or image makerby the photographer or image maker

• 2nd-party evaluationby the subject of an image– by the subject of an image

• 3rd-party evaluationb i h h h h h bj– by neither the photographer nor the subject

– general and most meaningful situation

14

tem

s Sy

stod

ules

M

o

15

O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk

1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion

16

Which square is brighter, A or B?

17

Adelson’s “Checker-shadow illusion”http://web.mit.edu/persci/people/adelson/checkershadow illusion.html

18

http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

Color Contrast: Background color can affect visual perception [http://www.psy.ritsumei.ac.jp/~akitaoka/shikisai2005.html]

It appears that a = d or b = c in color, but actually b = d!! pp yab

cd

Shepard's paradox (for sound) p p ( )the finishing pitch is the same as the starting pitch synthesized by Jean-Claude Risset)

Spectrum of Shepard's The same spectrum looped 5 timesSpectrum of Shepard s ascending paradox The same spectrum looped 5 times

Our ear cannot perceive where the sample starts and finishes: appear p ppto increase in pitch even if the sample loops back to the beginning and starts over

20

and starts over.impossible to tell where the sample begins and ends.

Useful HVS Properties

o Sensitivity to structural changeso Sensitivity to structural changes o Masking effects

S i ffo Saturation effecto Role of visual attentiono Worst case effect

Useful HVS Properties (cont.)• Sensitivity to structural changes: Features like edges and

contours play a key role in visual quality assessment

Original image Noisy image Blurred imageg g y g g

Lower visual quality due to edge damageLower visual quality due to edge damage

Useful HVS Properties (cont.)• Masking effects: Visual impact of the same distortion can be

different depending on signal content

(a) (b)

T t kiMore annoying to the eye

Texture masking

Useful HVS Properties (cont.)• Texture Masking: Effect of distortion is reduced due to texture

(a) (b)

Original image Image with lowest distortion

(c) (d)

Image with highest distortiondistortion

Useful HVS Properties (cont.)• Saturation effect: Sensitivity to perceived distortion decreases at

high distortion levels

Sensitivity to perceived variations decreases at high distortion levelsOriginal image

(a)

(c)(b)

Higher level of blurring than the image on left

Perceived distortion level in the two images is however nearly the same

Useful HVS Properties (cont.)• Role of visual attention: Distortion in regions attracting the human

attention are more annoying than that in non-attentional ones

( ) (b) ( )(a) (b) (c) Original image Attentional region Non-attentional

(face) distorted region distorted

Observe that distortion in image (b) is more annoying than in image (c).This is because ‘face’ is attentional region as compared to ‘table’

Useful HVS Properties (cont.)• Worst case effect: Human eyes tend to focus more on the

distorted portions (in image and videos)

• Explains the quality fluctuation effect: bad frames have higher impact on perceived qualityp p q y

• An intuitive example: A small dot (or scratch) on the mirror or windscreen attracts more eye attention!!!windscreen attracts more eye attention!!!

– Area covered by the dot (scratch) could be thousand times smaller than the whole object (mirror windscreen)than the whole object (mirror, windscreen)

• Human eyes “penalize” more for the distorted or bad quality regions/frames/areasregions/frames/areas.

Masking in stereo viewingWu, et. al., ‘13

left view right view right view

coding artifacts in right view: completely masked in 3-D viewing

severe loss of depth perception

, ,

H.264/AVC (High Profile) 10 Mbps

right view2 Mbps

right view 512 kbps

28a cropped portion of the above

a cropped portion of the above

a cropped portion of the above

Perceptual Characteristics & Phenomena Gallery (for visual and audio aspects):

http://fyp-demo-gallery.appspot.com/index.html

29

O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk

1. Relevant Human Visual Perception 2. Basic Computational Modules

• Signal decomposition• Just-noticeable difference (JND)• Visual attention (VA)

3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion

30

T l D i iTemporal Decomposition

• Physiological evidence – two main visual pathways

i l– visual cortex

• Signal decomposition– Implemented as FIR/IIR filters

• sustained (low-pass) channelt i t (b d ) h l• transient (band-pass) channel

31

Spatial Decomposition

– FiltersGabor, Cortex, waveletsGaussian/sterrable pyramid

– Stimuli: orientations, frequencies

Simoncelli et al.’92

32

Just-noticeable Difference (JND)

• JND: the visibility threshold below which changes cannot be detected by typical HVS

noise injection original image white-noise injected (29 0 d )

33

(29.00 dB) guided by JNDJND in noise shaping

Image (29.05 dB),

Wu, et. al., ‘13

Factors for JND: Visual Sensitivity with spatial ffrequency (Spatial Contrast Sensitivity Function)

34

Factors for JND: Contrast MaskingFactors for JND: Contrast Masking

(a) A weak (b) A masking (c) Combining (d) Combining ( ) d (b) ithvisual

stimulus:can be seen

signal (a) and (b): the stimuluscannot be seen

(a) and (b) with increased contrast in (a):

35alone the stimulus can

be seen.Wu, et. al., ‘13

Fusion of different factorsFusion of different factors• Multiplication

(bl k k bb d b f j)(block k, subband b, frame j)

• Exponentiation not very oft-used.

• Addition

36(pixel location n)

The key in accurate JND determination

• To distinguish smooth and edge regions (Yang, et al ‘05)al, 05)

• To distinguish texture and edge (Liu, et al, ‘10)

= +

• To further distinguish texture into ordered• To further distinguish texture into ordered (structural) and disordered regions (Wu, et al, ‘13)

37

Visual Attention (VA)

• Selectivityselective awareness of sensory environment– selective awareness of sensory environment

– selective responsiveness to visual stimuli

T o t pes• Two types– bottom-up: external stimuli– top-down: task/experience related

38

p p

Auto-generation of VA mapAuto generation of VA map

motion face-eye skin color contrast texture

39

Lu, et al, ’05, IEEE T-IP

Itti’s Bottom up Visual Attention (VA) ModelItti s Bottom-up Visual Attention (VA) Model

40

Improved framework for video VA determination

(Fang, et al, ‘13)

Adaptive uncertainty evaluation: decide which (spatial or temporal saliency) contributes more to the final saliency

41

Alternative approach to detectAlternative approach to detect bottom-up VAp

For an image I(x, y)F i T fFourier TransformVA:

(Hou & Zhang’07)( g )

42

More on VA modeling…

• Influence from audio/speech• Integration of “aural attention”• Integration of aural attention

– Ma, et. al. ’05Y l ’07– You, et. al. ’07

• VA Detection Model in Compressed Domain (Fang, et al, ‘13)

• Data driven approaches (emerging)

43

VerificationVerification with eye tracking

various eye trackers

44

O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk

1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion

45

Visual Quality Gauge

∑∑ Δ=X Y

yxdXY

MAE ),(1

Visual Quality Gauge

a traditional metric fails∑∑− =x yXY 1 1

∑∑=X Y

MAEXY

MSE 21

MSEAPSNR

2

lg10=

− =x yXY 1 1

MSEMajor reasons for failure:(1) Not every change in an image is noticeable;(2) Not every pixel/region in an image receives the same attention level;(3) Not every change leads to distortion (otherwise, many edge sharpeningand post-processing algorithms would have not been developed);(4) Not every change yield a same extent of perceptual effect with a samemagnit de of change (d e to spatial/temporal/chrominance masking)

46

magnitude of change (due to spatial/temporal/chrominance masking).

Classification Classification for PVQMs According to methodology:

Vision-based MetricsVision based Metrics Signal-driven Metrics (more often used now)Learning based Metrics (emerging)g g g

According to reference requirement:Full-reference (FR) Metrics

3 possibilities: FR RR and NR

Reduced-reference (RR) MetricsNo-reference (NR) Metrics

PVQM(Reference image)

Distorted imageQuality score

3 possibilities: FR, RR, and NR

47

g

Obj i Q li AObjective Image Quality AssessmentSignalVision LearningSignal

Driven ModelsVision Based Models

Learning Based Model

Based on extraction d l i f i

Early approach: tries

Emerging…iand analysis of image

featuresFocus is on image

approach: tries to model HVS

Generally

New metric development

Fusion of content and distortion analysis rather than fundamental vision

based on data from psychophysical

multiple existing metrics

fundamental vision modeling

p y p yexperiments

48

Common Operations: Feature ExtractionCo o Ope a o s: ea u e ac o

Image Data RawLess organized

Analysis

Less organizedHidden underlying structure

(may use domain specific prior knowledge) Feature extraction

Mathematical/engineering Tools

(Fourier transform wavelets(Fourier transform, wavelets, KPCA…)

Transformed/

ProcessedMore organized

Easier interpretationTransformed/Processed Data

pReduced dimensions

Vision based ModelsVision-based Models• attractive in principles: to incorporate relevant HVS

properties pertaining to visual qualityproperties pertaining to visual quality• Major limitation:

– limited understanding of the HVS and its intricatelimited understanding of the HVS and its intricate mechanisms

– Metrics can be complex and computationally expensive

as discussed earlier accounting for masking effect

50

Signal Driven ModelsSignal Driven Models

FR RR or NR

Feature extraction(Reference image)

Distorted imageFeature pooling

(cognitive mapping)Quality score

FR, RR or NR

( g pp g)

may also incorporate appropriate HVS properties, like JND, VA, various masking effects, and so on.

Recently, more research effort for signal driven models

51

Widely-acknowledged visual metric: SSIM (S l SIMil i )SSIM (Structural SIMilarity)For any two image blocks x and y :

luminance similarity contrast similarity structural similarity(e g blurring) (edge loss or false edge)(e.g., blurring) (edge loss or false edge)

Evaluated for an overlapped or un-overlapped block

:

52

Noticeable Contrast ChangesNot ceab e Co t ast C a ges

⎪⎪⎧ ≤− yxjndyxIyxIif ),(),(),(0

⎪⎪⎩

⎪⎪⎨ −=

otherwiseyxjnd

yxIyxIyxc

),(

),(),(),(

),( yxI is calculated in a image neighborhood

53

A A Visual Quality Visual Quality Metric Metric Q yQ y(for (for video with video with both distortion & enhancement)both distortion & enhancement)

• Discrimination of c(x,y)( ,y)c+

ne: c increase at non-edge pixels—degradationc-

ne: c decrease at non-edge pixels—degradationc+ : c increase at edge enhancementc+

e: c increase at edge—enhancementc-e: c decrease at edge contrast—the worst degradation

ccccD +−+− −++= αααα 3α > ),max( 21 αα > 4α >0

eenene ccccD −++= 4321 ααααwhere Lin, et al.’05

• D reduces to the mean absolute error (MAE) measure, if– JND is constant– different contrast changes are not differentiated

54

different contrast changes are not differentiated

An emerging class of metrics: machine learning-based approachesg pp

To tackle problems for feature pooling in spatial or spatiotemporal domain

• Currently-employed techniques:Currently employed techniques: – simple summation– Minkowski combination, linear (i.e. weighted) combination– Visual attention based weightings

• Problem: impose constraints on the relationship between f d lifeatures and quality score– A simple summation or averaging of features implicitly

constraints the relationship to be linearconstraints the relationship to be linear. – Minkowski summation for spatial pooling of the features/errors

implicitly assumes that errors at different locations are statistically independent.

Machine learning for feature pooling

• use of machine learning is general more systematic• use of machine learning is general, more systematic and reasonable

• more databases available for trainingmore databases available for training• effective feature extraction: still a key• Support Vector Regression: encouraging results• Support Vector Regression: encouraging results

56

Metric Benchmarking• oft-used full-reference image quality metrics

– SSIM, VIF,IFC, VSNR, FSIM, PSNR, …, , , , , ,

• pubic image quality databases– LIVE, TID, A57, WIQ, CSIQ, IVC, Toyama, …

57

sab

ases

ge d

ata

n0: number of original i

of im

ag images; n: number of test images; R: image resolution (N LIVE h

ptio

n o (Notes: LIVE has

many images of size 768x512, but also of other size like 480 720 632 505

Des

crip 480x720, 632x505,

634x505, 618x453 and 610x488); S: type of subjective

liD quality score.

Video quality databasesVideo quality databasesComparison of Video Quality Databases

SRC (# of HRC (# of Total # SubjectiveDatabase Year

SRC (# of reference videos)

HRC (# of test conditions)

Total # of test videos

Subjective Testing Method

Subjective Score

VQEG FR-TV-I [23] 2000 20 16 320 DSCQS DMOS (0 ~ 100)IRCCyN/IVC 1080iIRCCyN/IVC 1080i [24] 2008 24 7 192 ACR MOS (1 ~ 5)

IRCCyN/IVC SD RoI[25] 2009 6 14 84 ACR MOS (1 ~ 5)

EPFL-PoliMI [26] 2009 16 9 156 ACR MOS (0 ~ 5)LIVE [27] 2009 10 15 150 ACR DMOS (0 ~ 100)LIVE Wireless [28] 2009 10 16 160 SSCQE DMOS (0 ~ 100)MMSP 3D Video [29] 2010 6 5 30 SSCQE MOS (0 ~ 100)MMSP SVD [30] 2010 3 24 72 PC MOS (0 100)

Retargeting databases (Ma et al ‘12)

MMSP SVD [30] 2010 3 24 72 PC MOS (0 ~ 100)

VQEG HDTV [31] 2010 45 15 675 ACR MOS (0 ~ 5), DMOS (1 ~ 5)

Retargeting databases (Ma, et. al, 12)3D Video quality databases (Shao, et al, ‘12) 59

ases

data

ba

n0: number of original i

imag

e images; n: number of test images; R: image resolution (Notes: LIVE has

on o

f i (Notes: LIVE has many images of size 768x512, but also of other size like 480x720 632x505

scrip

tio 480x720, 632x505, 634x505, 618x453 and 610x488); S: type of subjectivequality score

Des

60

quality score.

Pearson coefficient for image databases

FR signal-driven image quality metrics

Spearman coefficient for image databasesp g

Lin & Kuo’10

Pearson coefficient for 5 distortion types in TID image databaseimage database

(mean intensity shift, contrast change, image denoising, non eccentricity pattern noise, and local block-wise distortions of different intensity.)

• PSNR: not capable of predicting• PSNR: not capable of predicting the quality for this sub-dataset at all (CP< 0.3)

ll PVQM h l C• all PVQMs have lower CP , although they do much better than PSNR

Li & K ’10Lin & Kuo’10

Performance Comparison with Learning-based Metrics PSNR

0 75

0.8

0.85

0.9

0.95PSNRSSIMRef [73]MSVDVIFIFCVSNRQvector

0.55

0.6

0.65

0.7

0.75 Qvector Qfull

0.5LIVE CSIQ IVC Toyama A57 TID WIQ

(a) 1.08

0.48

0.58

0.68

0.78

0.88

0.98

RMSE

11

13

15

17

RM

SE

0.08

0.18

0.28

0.38

CSIQ IVC Toyama A57 TID5

7

9

LIVE WIQ

(b) (c) (b) (c)

(a) CP (Pearson correlation) comparison on different image databases, (b) RMSE (root MSE) for CSIQ, IVC, A57 & TID databases and (c) RMSE for LIVE and WIQ databases Narwaria & Lin’10

O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk

1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications

uses of modules or metrics

5. Summary & Further Discussion

65

Use of JNDControl of quantization in compression

q = 2 x JND maximum error < JND(Hontsch & Karam’02; Zhang, et. al.’05, Wu , et. al.’06)

perceptually lossless (Wu , et. al.’06)

1 370 bpp1.370 bpp45.0303 dB

Rate-perceptual-distortion (RpD) optimizationW t l ’06

66

Wu , et. al.’06

Perceptual Motion Estimation and Residue Filtering

Yang, et al’05

Motion search: pruned when difference < JNDResidues: discarded when they < JND

67

VA modulated JND for Video CodingVA-modulated JND for Video Coding

Yang, et al’05

68

Pre processing for codingPre-processing for coding

• Much work so far: to optimize coders• Much work so far: to optimize coders• New thinking: to optimize signal for compression

Compressibility signal variance∝p y g

69

One dimensional illustration of preprocessingOne-dimensional illustration of preprocessing using JNDs

70

Most Eye pleasing Edge SharpnessMost Eye-pleasing Edge Sharpness • edge sharpening: • optimal edge contrast ~ 2 6 JNDoptimal edge contrast 2.6 JND• less ad hoc approach

most eye-pleasing; right-shifted if c+

ne also increases

0.1

0.15churchfacelena

right shifted if c ne also increases

-0.1

-0.05

0

0.05

0 1 2 3 4 5 6 7Sr,0

car

Perceived quality

Average behavior-0.25

-0.2

-0.15

rExtent of sharpening

71Lin, et al.’06

Other uses of perceptual modelsOther uses of perceptual models• Image/video post-processing (Wu, et. al.’13)g p p g• Watermarking (Zhang, et al, ‘11)

• Prioritized transmission (Wu et al ’13)• Prioritized transmission (Wu, et. al. 13)

– discard less important packetsl t t ti t i t t– apply stronger error protection to more important

packetsretransmit only important packets that have been– retransmit only important packets that have been lost

• Content retrieval• Content retrieval72

Image re-targeting with VA modelsg g g

Original Images

Seam C i Ren’s Wolf’s

VA-guidedImages Carving guided

73Fang, et al, ‘13

Rapid target location via VA modelRapid target location via VA model

Imamoglu, et al, ‘12

74

Applications to computer graphicsApplications to computer graphics

• computer graphics: actively developing areasareas – part of multimedia

t ti l l it

“The goal of computergraphics isn’t to controllight, but to control our

– computational complexity– mobile/cloud graphics

perception of it. Light ismerely a carrier of theinformation we gatherby perception.”

75

Tumblin and Ferwerda, 2001

Perceptually graphic rendering

32 samples/pixel 64 samples/pixelPossibilities:Possibilities:• two continuous intermediate images are compared to see which regions

need more samples (Bolin and Meyer (1998)• Computation stops when the difference < JND (Ramasubramanian, et.

al.’99)530 samples per pixel 218.5 sec

390 spp177.5 sec

Lu, et al, ‘13

76

areas with less attention

77

Deployment for commercial products:’ TMSarnoff’s JNDmetrixTM: Tektronix's PQA200/500

luminance fields chrominance fields

Level 0 Level 1 Level 2 Level 3

Pyramid decomposition

Temporal filteringSpatial filtering

Level 0 Level 1 Level 6

Pyramid decomposition

Temporal filtering…

to: chrominance processing

Contrast computation

Contrast gain controlfrom: luminance

processing

Contrast computation

Contrast gain control

Spatial filtering…

Luminance JND map

g

Chrominance JND map

…Lubin’95, Sarnoff’97

78

Industrial Deployment

Vi l Q lit M it i S tVisual Quality Monitoring System

i i t ti f bil d i• in-service testing for mobile devices– PDAs– handphonesp

• in conjunction with a channel simulator

79

O li fO li f h f hi lkh f hi lkOutline of Outline of the rest of this talkthe rest of this talk

1. Relevant Human Visual Perception 2. Basic Computational Modules3. Perceptual Visual Quality Metrics (PVQMs) 4. Demonstrated Systems of Applications 5. Summary & Further Discussion

80

Summary of this talkSummary of this talk

Filling the gap in current technology:– Filling the gap in current technology: – user oriented

perceptually inspired– perceptually-inspired – human-friendly machines

N di i f i t i i l– New dimension of improvement in many visual processing tasks• room for further improvement with existing• room for further improvement with existing

technology: diminishingDifferentiating factor for commercial products

81

– Differentiating factor for commercial products

Possible research ahead:– Model advancement

o temporal and color modelso alternative transformso alternative transforms

SVD (singular value decomposition), NMF (non-negative matrix factorization)

l t t fover-complete transformso multiple strategies or Multi-Metric Fusion approacheso modified PSNR or SSIMo new forms of signals

HDTV 3D/stereo/Free-view TV olfactory sensationsMobile/IP TV

o no-reference models– Modeling for audio, speech, olfaction …

82

g , p ,

Possible research ahead (cont’d):– Joint modeling (multi-modality)

o audio/speech, text, tactile, olfaction, and so onT d t l M lti di M lS M di !o Toward truly Multimedia or MulSeMedia!

– Learning-based methodologyo cloud media, big data & data driven approaches o deep learning, transfer learning & incremental learningo effective data collection

Labeled dataUnlabeled data

– Less investigated scenarioso Image retrievalgo Robot navigation

– Perceptual computer graphics o high dynamic range (HDR) imaging & tone mappingo high dynamic range (HDR) imaging & tone mappingo mobile graphicso post-processing 83

Different views on the role of visual attention (VA)ff f ( )

• no doubt: VA is important to HVS perception• however, it has been argued that VA consideration is

not always beneficial (at least for simple weighting)–Ninassi et al ’07Ninassi, et al. 07

• distortion may change the subjects' eye fixation and duration– Vu, et al.’08 du at o Vu, et a . 08

• visual quality may be influenced by not only attentional regions, but also non-attentional ones-- You, et al.’10

• still an open issue for research

84

Issues Related to Viewing Distance (L)g ( )• limited research on the influence of L

VSNR L 3 5 ti f th i h i ht d• VSNR: L=3.5 times of the image height, and claimed to be reasonable for typical viewing

diticonditions• Multi-scale approach:

– SSIM: downsampling both reference and test images into different resolutions.However, the multi-scale SSIM does not always yield better results than its single-scale version

– IFC and VIF: steerable pyramid transform. 85

Issues Related to Viewing Distance (L) t’dIssues Related to Viewing Distance (L) -cont’d

• Multi-scale approach ppo just a way to compromise the effect of different

L settingso it is still a problem on how to pool the

calculated errors from different scales and decouple the overlapping among different scales.

• a challenge for future research to account for viewing conditions (display resolution, ambient illumination and viewing distance)

86

References for this talkReferences for this talk

Surveys:W Li C C J K “P t l Vi l Q lit M t i A S ” J f Vi l• W. Lin, C.-C. Jay Kuo, “Perceptual Visual Quality Metrics: A Survey”, J. of Visual Communication and Image Representation, 22(4):297-312, 2011.

• H. R. Wu, A. Reibman, W. Lin, F. Pereira, S. S. Hemami, “Perceptual Visual Signal Compression and Transmission”, PROC. OF THE IEEE, SEPTEMBER 2013.Compression and Transmission , PROC. OF THE IEEE, SEPTEMBER 2013.

• T-J Liu, Y-C Lin, W. Lin, C.-C. Jay Kuo, “Visual Quality Assessment: Recent Developments, Coding Applications and Future Trends”, APSIPA Trans. on Signal and Information Processing, in press.

• L. Ma, C. Deng, K. N. Ngan, and W. Lin, “Recent Advances and Challenges of Visual Signal Quality Assessment”, China Communications, in press.

Authored book• L. M. Zhang and W. Lin, Modeling Selective Visual Attention: Techniques and

Applications, John Wiley & Sons, 2013.

87

References for this talk (cont’d)Book chaptersBook chapters• W. Lin, Computational Models for Just-noticeable Difference, Chapter 9 in Digital

Video Image Quality and Perceptual Coding, eds. H. R. Wu and K. R. Rao, CRC Press, 2006.

• W. Lin, Gauging Image and Video Quality in Industrial Applications, Chapter 6 in Advances of Computational Intelligence in Industrial Systems, eds. Y. Liu, et. al., Springer-Verlag, Heidelberg, 2008.

• M. Paul, W. Lin, “Computer vision aided video coding”, in Advanced Video Communications Over Wireless Networks, C. Zhu and Y Li (eds.), CRC Press, 2012.

Some special issues:W Li T Eb hi i P C L i S Möll A R R ib “I t d ti t th• W. Lin, T. Ebrahimi, P. C. Loizou, S. Möller, A. R. Reibman, “Introduction to the Special Issue on New Subjective and Objective Methodologies for Audio and Visual Signal Processing”, IEEE Journal of Selected Topics in Signal Processing, 6(6): 614-615, 2012.,

• W. Zeng and W. Lin, “QoE Modeling and Applications for Multimedia Systems” ,ZTE Communications, Vol. 11(1), 2013.

• T. Dagiuklas, W. Lin and A. Ksentini, “QoE Aware Optimization in Mobile Networks”, IEEE COMSOC MMTC E-Letter, Vol. 8, No. 2, March 2013.

88

89