Text Based CAPTCHA Recognition

24
Text based Recognition by The stars

Transcript of Text Based CAPTCHA Recognition

Page 1: Text Based CAPTCHA Recognition

Text based

Recognition

by The stars

Page 2: Text Based CAPTCHA Recognition

Index

1.Introduction2.Types of CAPTCHA3.Circumvating the CAPTCHA4.Project Flow 5.Implementation Flow6.Hardware requirements7.Software requirements8.References

Page 3: Text Based CAPTCHA Recognition

Introduction

The Completely Automated Public Turing test to Tell Computers and Humans

(CAPTCHA) was designed to prevent bots or programs

from automating work meant to be done by a human.

Like every other attempt at authentication, CAPTCHAs are being continuously

cracked since their inception.

Page 4: Text Based CAPTCHA Recognition

Types of CAPTCHAs

• Visual CAPTCHA– Image based– Text based

Page 5: Text Based CAPTCHA Recognition

• Audio CAPTCHA

Page 6: Text Based CAPTCHA Recognition

Some examples of text based CAPTCHAs

Page 7: Text Based CAPTCHA Recognition

Circumvating the CAPTCHA

There are three primary methods of defeating CAPTCHAs:– exploiting bugs in the implementation

that allow the attacker to completely bypass the CAPTCHA

– improving character recognition software– using cheap human labor to process the

tests

Page 8: Text Based CAPTCHA Recognition

Why bother to break CAPTCHAs ?

• Practical use of AI and harder to solve than regular text recognition.

• Understand the inherent flaws of CAPTCHA design.

"Any program that passes tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem" - Ahn, Blum and Langford (Telling Humans and Computers apart Automatically - CMU Tect report)

Page 9: Text Based CAPTCHA Recognition

The Questions

• Can machine learning techniques be used to recognize text based CAPTCHAs at speeds comparable to a human ?

• If so, how effective are these text based CAPTCHAs in authenticating humans ?

Page 10: Text Based CAPTCHA Recognition
Page 11: Text Based CAPTCHA Recognition

CAPTCHAs are cracked about 22% of the

time.This can then be considered cracked since the computer

can solve many hundreds of

CAPTCHAs before a human can solve

even one.

Page 12: Text Based CAPTCHA Recognition

Project Flow

Page 13: Text Based CAPTCHA Recognition

INPUT CAPTCHA IMAGE

PRE-PROCESSING

SEGMENTATION

FEATURE EXTRACTION

CHARACTER RECOGNITION

Page 14: Text Based CAPTCHA Recognition

Implementation Flow

Page 15: Text Based CAPTCHA Recognition

Step 1: Input CAPTCHA Image

• Using JCaptcha• Using images available at

www.captcha.net• Captchaservice.org

Page 16: Text Based CAPTCHA Recognition

Step 2: Preprocessing

• Colour to Binarization

Page 17: Text Based CAPTCHA Recognition

Step 3: Segmentation

• Whitespace Segmentation• Overlapping Segmentaion• Pruning• Snake Segmentation

Page 18: Text Based CAPTCHA Recognition
Page 19: Text Based CAPTCHA Recognition

Step 4: Feature Extraction and Character Recognition

• Every character has a unique set of features:– number of holes– height of the character– number of black-white transitions– nature of vertial stroke

Page 20: Text Based CAPTCHA Recognition

Learning Features

Support Vector Machines(SVM)

Neural Networks

accuracy drops after 3-4 chatacters

more accurate for larger texts

requires extensive training

doesnt require extensive training

faster slower

Page 21: Text Based CAPTCHA Recognition

Challenges

• Overlapping characters

• Rotated characters

Page 22: Text Based CAPTCHA Recognition

Hardware and Software Requirments

Hardware Requirements:Intel® Core™ i3 Processor 2GB RAM 3.06 GHz clock

Software Requirements: Python 2.5

Python Image Library

Page 23: Text Based CAPTCHA Recognition

References• K Chellapilla, and P Simard, “Using Machine Learning to

Break Visual Human Interaction Proofs (HIPs),” Advances in Neural Information Processing Systems 17, Neural Information Processing Systems (NIPS), MIT Press, 2004.

• Greg Mori and Jitendra Malik. “Recognising Objects in Adversarial Clutter: Breaking a Visual CAPTCHA”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR'03), Vol 1, June 2003, pp.134-141.

• H.Zhang, Wang peng-po and Han Hong-wei “ A CAPTCHA recognition algorithm based on holistic verification” , 2011 International Conference on Instrumentation, Measurement, Computer, Communication and Control 978-0-7695-4519-6/11 $26.00 © 2011 IEEE DOI 10.1109/IMCCC.2011.136 525

Page 24: Text Based CAPTCHA Recognition