SSB BART Group Silicon Valley (415) 975-8000 [email protected] IT Accessibility Problem -...
-
Upload
abel-claud-golden -
Category
Documents
-
view
219 -
download
2
Transcript of SSB BART Group Silicon Valley (415) 975-8000 [email protected] IT Accessibility Problem -...
SSB BART GroupSilicon Valley(415) [email protected]
IT Accessibility Problem - Solved™
SSB BART GroupWashington DC
(703) [email protected]
Images,Alternative Text,
and Artificial Intelligence
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Agenda
About Us
About Me
The Project
What’s Next
http://amp.ssbbartgroup.com/public/research/Automatic_Image_Classification_090707.doc
http://amp.ssbbartgroup.com/public/research/SSB_BART_Group_Image_Alt_CSUN_2008.ppt
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Corporate Overview
History
Founded in 1997 by engineers with
disabilities
750 commercial and government
customers
1,500 enterprise projects successfully
completed
Pioneers of commercial accessibility
validation tools
Approach
Data driven and scalable
Violation profiling across 5.5M human validated accessibility issues
Scalable Solutions
One to one million developers
One to one thousand production systems
Fifty percent staffing mix of
individuals with disabilities
Appropriately mixed automated,
human and code level validation
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Supported Platforms
Web
HTML
XML
JavaScript
CSS
AJAX
Adobe Flash and Flex
Adobe Acrobat Documents
Streaming Audio and Video
Compiled Software
JFC and SWT Java Applications
.Net Applications
MFC Windows Native Applications
Macintosh Applications
BMC Remedy Applications
Standalone Systems
Telecommunications Hardware
IVR Systems
Agent Systems
Digital Imaging
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Industry Solutions
Public Sector
Federal Solutions
United States
European Union
Education
K-12
Universities
State and Local
Government System Integrators
Healthcare
Primary Care Providers
Insurance
Information Technology
Manufacturers
Software
Hardware
Web Based Service Providers
Mass Transit
Financial Services
Consumer Banking
Insurance
Legal
Web Based Service Providers
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Accessibility Management Platform
AMP – SSB’s web based platform for managing all aspects of
Accessibility process
Benefits
Single point for tracking compliance over time
Scalable solutions from one to one million developers
across multiple domestic markets
Support for all aspects of a successful accessibility initiative
Requirements Implementation Certification
Baseline Audit Development Audit Maintenance Audit
Standards Development Standards Maintenance VPAT Creation
eLearning Developer Support Certification
InFocus™ Suite
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
About Me
General Story
Founder and Managing Director of SSB
BART Group
Also Known As President and CEO
Professional web site developer for 13
years
Started in 1994 at the dawn of the
Web
BS Computer Science Leland Stanford
Junior University (AKA Stanford)
Odds on Brad Pitt to
play me in the movie
Accessibility Work
Involved in Web Accessibility activities,
validation and education since 1999
Architected and developed first commercial
accessibility testing and fixing tool
InSight and InFocus 1.x -> 4.x
Initial release in mid-200
Next release in a few months
Architected and developed Accessibility
Management Platform (AMP)
Current Version – 2008 R1
Personal work with fifty enterprise class
software vendors
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Project Overview
Project Description
Create a decision tree to classify images into one of eight types
Image types are organized by alternative text requirements
Upon classification, alternative text validity can then be tested via straightforward heuristics
Project Utility
Alternative text provides a textual description of an image
Alternative text validity
Ensures access to content for people with disabilities
Allows pages to be adapted effectively - low resolution, alternative browsers
Increases search engine relevance for pages
Bottom Line – Good alternative text is good for society and good for profits
8
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Automated Testing Tools
A brief note on automated testing tools
First generation of automated testing tools, where we
are now, can test about 25% of requirements accurately
Another 25% with so-so accuracy
And the rest need to be checked manually
We think the next generation of tools can double this
efficacy through better AI, more complex page models
and better leveraging of human judgment…
…but ultimately tools can only facilitate the process of
human review they cannot replace it
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895510
Image Types Layout Element – The image is used solely to layout elements on the page
Decorative Picture – The image is a picture that is used solely for the purpose of making the page more visually appealing and it provides no information
Text – The image is used to stylize text on the page but is not used as an active element on the page
Picture – The image is a picture that contains information important to the use of the page
Hidden Link – The image provides a “hidden” link on a page for search engine optimization or screen reader users
Linked Text – The images is used to stylize text and provide a link to another page
Skip Link - The image is the root of an inner-document link that provides a means of skipping past page content that is not relevant
Linked Picture – The image is a picture that provides a link to another page
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895511
Variables
Width The width of the imageHeight The height of the imageEdge Count The number of vertical and horizontal edges in
the image
Size The rectangular size of the image or width time height
File Size The size of the file in bytesLink Whether or not the image is a link
Inner-document Link Whether or not the image is a link within the current document
Color Depth The number of unique colors that the image has
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895512
Project Functionality
Challenge
No database of relevant image classifications exists
Subject Matter Experts (SMEs) use experience to
determine form of alternative text
Without a good data set the decision tree isn’t going to
decide much
Solution
Build a spider to crawl sites and gather sample data
Classify the images using a basic interface
Store the image classification and additional variables in a
database
Build a decision tree from the database rather than a live site
Repeat using updated tree
Result
Created an image database of 1000 images with about an
hour of actual data entry
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895513
Project Functionality
Challenge
Build the decision tree
…which became build the decision tree before the end of time
…which became build the decision tree once and store it for
later use
Discussion
Building the tree is fairly straightforward and involves splitting
on variables and analyzing remaining sets
Implementation uses Russell, Norvig algorithm
More on the tricky parts later
The “catch” - a lot of the queries involve eliminating groups of
images
SQL doesn’t have good concepts for handling unordered
sets of keys so you enumerate out elements for queries…
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895514
Project Functionality
Discussion (Continued)
This results in lots of nasty queries and a fair amount of
time to build the tree
This more or less grows exponentially as you add variables
and quanta
Solution
Build the tree once and persist to disk
Limit quanta for variables and require minimum information
gain
Result
Creation of the tree takes about forty minutes
Reading in the tree takes about forty milliseconds
Resolving against the tree takes about forty nanoseconds
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-895515
Project Functionality
Challenge
Test the decision tree for accuracy
Avoid peeking at the data set
Solution
Always test on new data [Tank!]
Don’t store the test set so we avoid any temptation to peek
Name AccuracyHi5 – www.hi5.com 94.7%Hillary Clinton for President - http://www.hillaryclinton.com/ 98.6%Department of Defense - http://www.defenselink.mil/ 86.84%Engadget – www.engadget.com 91.45%Gamespot.com – www.gamespot.com 91.57%
Average 92.63%
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
The Tricky Parts
Information Gain
Successful classification provides 2.391 bits of
information
Which means, what, exactly?
Technically – You have enough information
to answer 2.391 yes/no questions
Practically – You can order nodes to split on
by information gain
At each split choose node that provides highest
information gain
Note - The amount of information provided
by an attribute will change as you move
through the tree
Solution
Calculate information gain for each split
This is where the nasty set queries occur
Overfitting
Observe
Permutations of Variable Quanta -
460,800
Sample Data Size – 1000
460,800 >> 1000
Thus the risk of over fitting is significant
Solution
Require that we gain at least .05 bits to split
– otherwise just return the modal value for
the remaining set
16
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
The Tricky Parts
Variable Quantification
Strategy
Make everything an integer
Define ranges for all variables
Initially picked quanta based on guesses
divisions
These turned out to be wildly inaccurate
Solution
Picked variables based on image type
grouping and average
SQL AVG and COUNT make this easy
Edge Detection
Used Sobel Edge detection and Java
convolution application for images
Count the number of edges in the
image
Lots of images have edges
Solution
Count vertical and horizontal edges
Turns out to be a great proxy for
text in the image
Accuracy goes from 78.23% to 92.63%
with this types of edge detection
17
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Future Features
Second Order Variables
First order variables are primary data from images
Second order variables are derived from one or more primary variables
Specifically
edge_count, color_depth have much more relevance as ratios to size
height is more relevant as a ratio for width
Classification Tightening
Current classifications have some overlap which could be refined out
Certain classifications evolved over the course of the project and the data set
should be updated to reflect the final classification
18
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
Future Features
Safe Failure
Okay to require alternative text when not necessary
than not require text when necessary…
…or is it??
Celebrity Endorsement
If K-Fed uses it wouldn’t you
19
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
InFocus 5.0
Silicon Valley (415) 975-8000 www.ssbbartgroup.com Washington DC (703) 637-8955
For More Information
Silicon Valley
Phone (415) 975-8000
E-mail [email protected]
Fax (415) 624-2708
300 Brannan Street
Suite 608
San Francisco, CA 94107-1876
Washington DC
Phone (703) 637-8955
E-mail [email protected]
Fax (703) 734-8381
1489 Chain Bridge Road
Suite 204
McLean, VA 22101