Visual Captcha with handwritten Imange analysis
-
Upload
avinash-devulapalli -
Category
Documents
-
view
214 -
download
0
Transcript of Visual Captcha with handwritten Imange analysis
-
8/7/2019 Visual Captcha with handwritten Imange analysis
1/25
Visual CAPTCHA with HandwrittenImage Analysis
Amalia Rusu and Venu Govindaraju
CEDARUniversity at Buffalo
-
8/7/2019 Visual Captcha with handwritten Imange analysis
2/25
Completely Automatic Public Turing test to tell Computers and Humans
Apart CAPTCHA
CAPTCHA should be automatically generated and graded
Tests should be taken quickly and easily by human users
Tests should accept virtually all human users and reject software agents
Tests should resist automatic attack for many years despite the
technology advances and prior knowledge of algorithms
Exploits the difference in abilities between humans and machines
(e.g., text, speech or facial features recognition) A new formulation of the Alan Turings test - Can machines think?
Background on CAPTCHA
-
8/7/2019 Visual Captcha with handwritten Imange analysis
3/25
Securing Cyberspace Using CAPTCHA
Initialization
Handwritten CAPTCHA Challenge
User Response
Verification
Automatic Authentication Session for Web Services.
Internet
User
Authentication Server
Challenge
Response
User authentication
The user initiate the
dialog and has to be
authenticated by server
Internet
User
Authentication Server
Challenge
Response
User authentication
The user initiates the
dialog and has to be
authenticated by server
-
8/7/2019 Visual Captcha with handwritten Imange analysis
4/25
Objective
Develop CAPTCHAs based on the ability gap between humans
and machines in handwriting recognition using Gestalt laws of perception
Speed and accuracy of a HR. Feature extraction time is excluded.Testing platform is an Ultra-SPARC.
Lexiconsize
Lexicon Driven Grapheme Model
time(secs)
accuracy time(secs)
accuracy
Top 1 Top 2 Top 1 Top 2
10 0.027 96.53 98.73 0.021 96.56 98.77
100 0.044 89.22 94.13 0.031 89.12 94.06
1000 0.144 75.38 86.29 0.089 75.38 86.29
20000 1.827 58.14 66.56 0.994 58.14 66.49
State-of-the-art in HR
[Xue, Govindaraju 2002]
-
8/7/2019 Visual Captcha with handwritten Imange analysis
5/25
H-CAPTCHA Motivation
Machine recognition of handwriting is more difficult than printedtext
Handwriting recognition is a task that humans perform easily and
reliably Several machine printed text based CAPTCHAs have been
already broken Greg Mori and Jitendra Malik of the UCB have written a program that can solve
Ez-Gimpy with accuracy 83%
Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group havewritten a program that can achieve 93% correct recognition rate against Ez-Gimpy
Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Aret Associateshave written a program that can achieve 78% accuracy against Gimpy-R
Speech/visual features based CAPTCHAs are impractical
H-CAPTCHAs thus far unexplored by the research community
-
8/7/2019 Visual Captcha with handwritten Imange analysis
6/25
H-CAPTCHA Challenges
Generation of random and infinite many distinct
handwritten CAPTCHAs
Quantifying and exploiting the weaknesses of state-of-the-art handwriting recognizers and OCR systems
Controlling distortion - so that they are human readable
(conform to Gestalt laws) but not machine readable
-
8/7/2019 Visual Captcha with handwritten Imange analysis
7/25
Use handwritten word images that current recognizers cannot read
Handwritten US city name images available from postal applications
Collect new handwritten word samples
Create real (or nonsense) handwritten words and sentences by gluing isolated
upper and lower case handwritten characters or word images
Generation of random and infinite many distinct
handwritten text images
-
8/7/2019 Visual Captcha with handwritten Imange analysis
8/25
Use handwriting distorter for generating human-like samples
Models that change the trajectory/shape of the letter in a controlled fashion (e.g.
Hollerbachs oscillation model)
Original handwritten image (a). Synthetic images (b,c,d,e,f).
Generation of random and infinite many distinct
handwritten text images
-
8/7/2019 Visual Captcha with handwritten Imange analysis
9/25
Word Model Recognizer (WMR)
Accuscript [Xue, Govindaraju 2002]
[Kim, Govindaraju 1997]
lexicon driven approach
chain code based image processing
pre-processing
segmentation
feature extraction
dynamic matching
grapheme-based recognizer
extracts high-level structural
features from characters such as
loops, turns, junctions, arcs,
without previous segmentation
uses a stochastic finite state
automata model based on the
extracted features
uses static lexicons in the
recognition process
JunctionLoops
LoopTurns
End
End
Grapheme Based Model
1 2 3 4 5 6 7 8 9
w[7.6]
w[7.2]r[3.8]
w[5.0]
w[8.6]
o[7.6]r[6.3]
d[4.9]
w[5.0]
o[6.6]
o[6.0]
o[7.2]o[10.6] d[6.5]
d[4.4]
r[7.5]r[6.4]
o[7.8]r[8.6]
r[7.6]
o[8.3]
o[7.7]r[5.8]
1 2 3 4 5 6 7 8 9
o[6.1]
Find the best way of accounting for characters w, o,
r, d buy consuming all segments 1 to 8 in theprocess
Distance between lexicon entry word
first character w and the image
between:- segments 1 and 4 is 5.0
- segments 1 and 3 is 7.2
- segments 1 and 2 is 7.6
Lexicon Driven Model
Exploit the Source ofErrors forState-of-the-art
Handwriting Recognizers
-
8/7/2019 Visual Captcha with handwritten Imange analysis
10/25
Source ofErrors forState-of-the-art HandwritingRecognizers
Image quality
Background noise, printing surface, writing styles
Image features
Variable stroke width, slope, rotations, stretching, compressing
Segmentation errors
Over-segmentation, merging, fragmentation, ligatures, scrawls
Recognition errors
Confusion with a similar lexicon entries, large lexicons
-
8/7/2019 Visual Captcha with handwritten Imange analysis
11/25
Gestalt psychology is based on the observation that we often
experience things that are not a part of our simple sensations
What we are seeing is an effect of the whole event, not contained
in the sum of the parts (holistic approach) Organizing principles: Gestalt Laws
By no means restricted to perception only (e.g. memory)
Gestalt Laws
-
8/7/2019 Visual Captcha with handwritten Imange analysis
12/25
1. Law of closure 2. Law of similarity
Gestalt Laws
OXXXXXXXOXXXXX
XXOXXXX
XXXOXXX
XXXXOXX
XXXXXOX
XXXXXXO
3. Law of proximity 4. Law of symmetry
**************
**************
**************
[ ][ ][ ]
-
8/7/2019 Visual Captcha with handwritten Imange analysis
13/25
Gestalt Laws
a) Ambiguous segmentationb) Segmentation based on good continuity, follows the path of minimal curvature change
c) Perceptually implausible segmentation
a) Ambiguous segmentation
b) Perceptual segmentation
c) Segmentation based on good continuity proves to be erroneous
6. Law of familiarity
5. Law of continuity
-
8/7/2019 Visual Captcha with handwritten Imange analysis
14/25
Gestalt Laws
7. Figure and ground
8. Memory
-
8/7/2019 Visual Captcha with handwritten Imange analysis
15/25
-
8/7/2019 Visual Captcha with handwritten Imange analysis
16/25
Gestalt laws: closure, proximity, familiarity
Add occlusions by circles, rectangles, lines with random angles
Ensure small enough occlusions such that they do not hide letters completely
ControlO
cclusions
-
8/7/2019 Visual Captcha with handwritten Imange analysis
17/25
Gestalt laws: closure, proximity, familiarity
Add occlusions by waves from left to right on entire image, with various
amplitudes / wavelength or rotate them by an angle
Choose areas with more foreground pixels, on bottom part of the text image
(not too low not to high)
ControlOcclusions
-
8/7/2019 Visual Captcha with handwritten Imange analysis
18/25
Gestalt laws: continuity, figure and ground, familiarity
Add occlusion using the same pixels as the foreground pixels (black pixels),
arcs, or lines, with various thickness
Curved strokes could be confused with part of a character
Use asymmetric strokes such that the pattern cannot be learned
Control Extra Strokes
-
8/7/2019 Visual Captcha with handwritten Imange analysis
19/25
flip-flop
vertical mirror
horizontal mirror
Gestalt laws: memory, internal metrics, familiarity of letters
Change word orientation entirely, or the orientation for few letters only
Use variable rotation, stretching, compressing
Control Letter/Word Orientation
-
8/7/2019 Visual Captcha with handwritten Imange analysis
20/25
Input.
Original (randomly selected) handwritten image (existing US city nameimage or synthetic word image with length 5 to 8 characters or meaningfulsentence)
Lexicon containing the images truth word
Output.
H-CAPTCHA image
Method.
Randomly choose a number of transformations
Randomly establish the transformations corresponding to the given number
If more than one transformation is chosen then A priori order is assigned to each transformation based on experimental results
Sort the list of chosen transformations based on their priori order and apply themin sequence, so that the effect is cumulative
General H-CAPTCHA Generation Algorithm
-
8/7/2019 Visual Captcha with handwritten Imange analysis
21/25
The accuracy of HR on images deformed using Gestalt laws approach. The number of tested images is
4,127 for each type of transformation. HR running time increases from few seconds per image for
lexicon 4,000 to several minutes per image for lexicon 40,000.
Testing Results on MachinesHW Recognizer WMR Accuscript
Lexicon Size 4,000 40,000 4,000 40,000
Occlusion by circles 35.93% 20.28% 32.34% 17.37%
Vertical Overlap 27.88% 14.36% 12.64% 3.94%
Horizontal Overlap
(Small)24.35% 10.70% 2.93% 0.60%
Black Waves 16.36% 5.33% 1.57% 0.38%
Occlusion by waves 15.43% 7.00% 10.56% 4.28%
Horizontal Overlap
(Large)12.93% 3.56% 2.42% 0.36%
Overlap Different
Words 3.80% 0.48% 4.43% 0.92%
Flip-Flop 0.46% 0.14% 0.70% 0.19%
General Image
Transformations9.28% N/A 4.41% N/A
-
8/7/2019 Visual Captcha with handwritten Imange analysis
22/25
-
8/7/2019 Visual Captcha with handwritten Imange analysis
23/25
No risk of image repetition
Image generation completely automated: words, images and distortions
chosen at random
The transformed images cannot be easily normalized or rendered
noise free by present computer programs, although original images
must be public knowledge
Deformed images do not pose problems to humans
Human subjects succeeded on our test images
Test against state-of-the-art: Word Model Recognizer, Accuscript
CAPTCHAs unbroken by state-of-the-art recognizers
H-CAPTCHAEvaluation
-
8/7/2019 Visual Captcha with handwritten Imange analysis
24/25
Future Work
Develop general methods to attack H-CAPTCHA (e.g. pre and postprocessing techniques)
Research lexicon free approaches for handwriting recognition
Quantify the gap between humans and machines in readinghandwriting by category (of distortions & Gestalt laws)
Parameterize the difficulty levels of Gestalt based H-CAPTCHAs
-
8/7/2019 Visual Captcha with handwritten Imange analysis
25/25
Thank You
Questions?