Text Based CAPTCHA Recognition
Transcript of Text Based CAPTCHA Recognition
Text based
Recognition
by The stars
Index
1.Introduction2.Types of CAPTCHA3.Circumvating the CAPTCHA4.Project Flow 5.Implementation Flow6.Hardware requirements7.Software requirements8.References
Introduction
The Completely Automated Public Turing test to Tell Computers and Humans
(CAPTCHA) was designed to prevent bots or programs
from automating work meant to be done by a human.
Like every other attempt at authentication, CAPTCHAs are being continuously
cracked since their inception.
Types of CAPTCHAs
• Visual CAPTCHA– Image based– Text based
• Audio CAPTCHA
Some examples of text based CAPTCHAs
Circumvating the CAPTCHA
There are three primary methods of defeating CAPTCHAs:– exploiting bugs in the implementation
that allow the attacker to completely bypass the CAPTCHA
– improving character recognition software– using cheap human labor to process the
tests
Why bother to break CAPTCHAs ?
• Practical use of AI and harder to solve than regular text recognition.
• Understand the inherent flaws of CAPTCHA design.
"Any program that passes tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem" - Ahn, Blum and Langford (Telling Humans and Computers apart Automatically - CMU Tect report)
The Questions
• Can machine learning techniques be used to recognize text based CAPTCHAs at speeds comparable to a human ?
• If so, how effective are these text based CAPTCHAs in authenticating humans ?
CAPTCHAs are cracked about 22% of the
time.This can then be considered cracked since the computer
can solve many hundreds of
CAPTCHAs before a human can solve
even one.
Project Flow
INPUT CAPTCHA IMAGE
PRE-PROCESSING
SEGMENTATION
FEATURE EXTRACTION
CHARACTER RECOGNITION
Implementation Flow
Step 1: Input CAPTCHA Image
• Using JCaptcha• Using images available at
www.captcha.net• Captchaservice.org
Step 2: Preprocessing
• Colour to Binarization
Step 3: Segmentation
• Whitespace Segmentation• Overlapping Segmentaion• Pruning• Snake Segmentation
Step 4: Feature Extraction and Character Recognition
• Every character has a unique set of features:– number of holes– height of the character– number of black-white transitions– nature of vertial stroke
Learning Features
Support Vector Machines(SVM)
Neural Networks
accuracy drops after 3-4 chatacters
more accurate for larger texts
requires extensive training
doesnt require extensive training
faster slower
Challenges
• Overlapping characters
• Rotated characters
Hardware and Software Requirments
Hardware Requirements:Intel® Core™ i3 Processor 2GB RAM 3.06 GHz clock
Software Requirements: Python 2.5
Python Image Library
References• K Chellapilla, and P Simard, “Using Machine Learning to
Break Visual Human Interaction Proofs (HIPs),” Advances in Neural Information Processing Systems 17, Neural Information Processing Systems (NIPS), MIT Press, 2004.
• Greg Mori and Jitendra Malik. “Recognising Objects in Adversarial Clutter: Breaking a Visual CAPTCHA”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03), Vol 1, June 2003, pp.134-141.
• H.Zhang, Wang peng-po and Han Hong-wei “ A CAPTCHA recognition algorithm based on holistic verification” , 2011 International Conference on Instrumentation, Measurement, Computer, Communication and Control 978-0-7695-4519-6/11 $26.00 © 2011 IEEE DOI 10.1109/IMCCC.2011.136 525