Text Based CAPTCHA Recognition

Post on 27-Oct-2014

77 views 4 download

Transcript of Text Based CAPTCHA Recognition

Text based

Recognition

by The stars

Index

1.Introduction2.Types of CAPTCHA3.Circumvating the CAPTCHA4.Project Flow 5.Implementation Flow6.Hardware requirements7.Software requirements8.References

Introduction

The Completely Automated Public Turing test to Tell Computers and Humans

(CAPTCHA) was designed to prevent bots or programs

from automating work meant to be done by a human.

Like every other attempt at authentication, CAPTCHAs are being continuously

cracked since their inception.

Types of CAPTCHAs

• Visual CAPTCHA– Image based– Text based

• Audio CAPTCHA

Some examples of text based CAPTCHAs

Circumvating the CAPTCHA

There are three primary methods of defeating CAPTCHAs:– exploiting bugs in the implementation

that allow the attacker to completely bypass the CAPTCHA

– improving character recognition software– using cheap human labor to process the

tests

Why bother to break CAPTCHAs ?

• Practical use of AI and harder to solve than regular text recognition.

• Understand the inherent flaws of CAPTCHA design.

"Any program that passes tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem" - Ahn, Blum and Langford (Telling Humans and Computers apart Automatically - CMU Tect report)

The Questions

• Can machine learning techniques be used to recognize text based CAPTCHAs at speeds comparable to a human ?

• If so, how effective are these text based CAPTCHAs in authenticating humans ?

CAPTCHAs are cracked about 22% of the

time.This can then be considered cracked since the computer

can solve many hundreds of

CAPTCHAs before a human can solve

even one.

Project Flow

INPUT CAPTCHA IMAGE

PRE-PROCESSING

SEGMENTATION

FEATURE EXTRACTION

CHARACTER RECOGNITION

Implementation Flow

Step 1: Input CAPTCHA Image

• Using JCaptcha• Using images available at

www.captcha.net• Captchaservice.org

Step 2: Preprocessing

• Colour to Binarization

Step 3: Segmentation

• Whitespace Segmentation• Overlapping Segmentaion• Pruning• Snake Segmentation

Step 4: Feature Extraction and Character Recognition

• Every character has a unique set of features:– number of holes– height of the character– number of black-white transitions– nature of vertial stroke

Learning Features

Support Vector Machines(SVM)

Neural Networks

accuracy drops after 3-4 chatacters

more accurate for larger texts

requires extensive training

doesnt require extensive training

faster slower

Challenges

• Overlapping characters

• Rotated characters

Hardware and Software Requirments

Hardware Requirements:Intel® Core™ i3 Processor 2GB RAM 3.06 GHz clock

Software Requirements: Python 2.5

Python Image Library

References• K Chellapilla, and P Simard, “Using Machine Learning to

Break Visual Human Interaction Proofs (HIPs),” Advances in Neural Information Processing Systems 17, Neural Information Processing Systems (NIPS), MIT Press, 2004.

• Greg Mori and Jitendra Malik. “Recognising Objects in Adversarial Clutter: Breaking a Visual CAPTCHA”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03), Vol 1, June 2003, pp.134-141.

• H.Zhang, Wang peng-po and Han Hong-wei “ A CAPTCHA recognition algorithm based on holistic verification” , 2011 International Conference on Instrumentation, Measurement, Computer, Communication and Control 978-0-7695-4519-6/11 $26.00 © 2011 IEEE DOI 10.1109/IMCCC.2011.136 525