Mirri w4a2012

Silvia Mirri

Ludovico A. Muratori

Paola Salomoni

Matteo Battistelli

Department of Computer Science

University of Bologna

Getting one voice:

tuning up experts’ assessment in

measuring accessibility

W4A 2012 – April 16th&17th, 2012 - Lyon, France

Summary

2

Introduction

Automatic and manual accessibility evaluations

Our proposed metric

Conclusions and future works


Introduction

3

Web accessibility evaluations

automatic tools + human assessment

Metrics quantify accessibility level or barriers, providing

numerical synthesis

• automatic tools return binary values

• human assessments are subjective and can get values from a

continuous range


Our main goal

4

Providing a metric to measure how far a Web

page is from its accessibility version, taking into

account

• integration of human assessments with automatic

evaluations on the same target

• many humans assessments


Steps

5

1. Mixing up the manual evaluation together with the

automatic ones

2. Combining the assessments coming from different

human evaluations • Values distributed into a given range

• The more experts' assessments contribute to compute a

value, the more this value is stable and reliable


Automatic and manual evaluations: an example

6

Combination between the IMG element and its ALT

attribute:

1. If the ALT attribute is omitted the automatic check outputs 1

2. If the ALT attribute is present the automatic check outputs 0

Manual evaluation might state that:

• there is no lack of information once the images are hidden (this

can happen in case 1, if the image is a pure decorative one)

• there is a lack of information once the image is hidden


Our metric

7

• A first version of our metric (Barriers Impact Factor) is

computed on the basis of a barrier-error association

table

• This table reports the list of assistive

technologies/disabilities affected by any error • screen reader/blindness

• screen magnifier/low vision

• color blindness

• input device independence/movement impairments

• deafness

• cognitive disabilities

• photosensitive epilepsy


Our metric

8

• Comparing automatic checks with WCAG 2.0 success

criteria and identified relationships

• Each barrier is related to one success criterion and to

one level of conformity (A, AA or AAA)

• Manual evaluations take values on the [0, 1] real

numbers interval: • 1 means that an accessibility error occurs

• 0 means the absence of that accessibility error

A check fails a certain error occurs or a

manual control is necessary


Our metric

9


Weighting automatic and manual checks

10

1. m(i)=a(i): the formula is a mere average among automatically

and manually detected errors

2. m(i)>a(i): the failure in manual assessment is considered more

significant than the automatic one

3. m(i)<a(i): the failure in automatic assessment is considered

more significant than the manual one

MANUAL [0,

AUTOMATIC

,1]

10

I

II

III

IV MANUAL [0,

AUTOMATIC

,1]

10

I

III

II

IV


Some considerations

11

• The more human operators provide evaluations about

an accessibility barrier and the more the value of

accessibility level is reliable

• Behavior similar to online rating systems ones

• New users rating can be influenced by already

expressed evaluations from other users

• Variance must be considered so as to reinforce the

computed accessibility level


A first assessment

12

MANUAL EVALUATIONS

0,7 Expert A

1 Expert B

0,8 Expert C

1 Expert D

0,5 Expert E

AUTOMATIC EVALUATION

0 (no known errors,

1 alert: placeholder

detected)

m=2

a=1

Average=0,8

Variance=0,036

CBIF=0,53

ALT=“Image”

NO LINK, NO TITLE

PAGE CONTENT

CBIF


Conclusions

13

• We have defined an accessibility metric with the aim to

evaluate barriers as a whole, combining results

provided by using automatic tools and manual

evaluations done by experts

• The metric has been preliminary tested by measuring

accessibility barriers in several local public

administration Web sites

• Five experts are manually evaluating barriers related to

WCAG 2.0 1.1.1 (using an automatic monitoring system

to verify the page content and to collect data from

manual evaluations)


Future Work

14

• Propose and discuss weights for the whole WCAG 2.0

set of barriers

• Investigate how the number of experts involved in the

evaluation, together with their rating variance, could

influence the reliability of the computed values

W4A 2012 – April 16th&17th, 2012 - Lyon, France 15

Contacts

Thank you for your attention!

For further information:

[email protected]

Mirri w4a2012

Technology

Transcript of Mirri w4a2012