Attacks Against Captcha Systems - DefCamp 2012

Attacking CAPTCHAs explained

Ioan – Carol Plangu

What's a CAPTCHA

Completely

Automated

Public

Turing test to tell

Computers and

Humans

Apart

Three attack methods

Implementation attack

Automated recognition

Manual labor

The implementation attack

Scenario 1

the image session id can be reused


Scenario 1

the image session id can be reused

id

Captcha form

Restricted page


Scenario 2

the number of captcha tests is limited


Scenario 2

the number of captcha tests is limited

we just need to solve them all and store them in a hash table


Scenario 3

hash of solution sent to client


Scenario 3

hash of solution sent to client

rainbow tables :)

Manual labor

There are two options:

Pay a bunch of monkeys

XXXComplete this captcha form to continue

Or not...

Automated recognition

We're going to actually reproduce a human response for the given question

Can you understand my voice?

The sound sample is usually generated

It's hard to add noise to the generated speech without making it

hard for the human

But can you read?

Sort of.....

The most common approach

Greedy optimization – reverse engineer everything

Character segmentation OCR

Possible security measures


Funky background image


Funky background image

usually can be removed with basic preprocessing


Funky background image usually can be removed with basic preprocessing

Text distortions



Text distortions

modern OCR techniques can beat it



Text distortions modern OCR techniques can beat it

Anti segmentation measures

Beating segmentation


If a character signature can be extracted from only the vertical signature, character segmentation becomes trivial

A Low-cost Attack on a Microsoft CAPTCHA - Jeff Yan, Ahmad Salah El AhmadSchool of Computing Science, Newcastle University, UK


We can otherwise ignore it!


We can otherwise ignore it!

The following slides are about an experiment about this approach

A Monte-Carlo experiment

Note: for testing performance, the variance of the characters has been kept to a minimum

f(x) → y

x in binary( 0 - 2^3000 )

y in 10^6

Training:

Select one character image at random Select N black spots Sort the points for uniqueness Subtract the first point from all others for position

independence Assign it a 'weight' for each character using the

following formula:

matched characters count / sample size Assign it a 'score' (indicates classification quality)

selected digit weight / (1 + other digit weights)

Recognition:

Make a score map for all points Select the most appropriate character for each

column Process the resulting string into a 6 digit string

An equivalent model

input layer

linear hidden layer(feature layer)

threshold layers

softmax layer

An equivalent model

input layer

linear hidden layer(feature layer)

threshold layers

softmax layer

OCR

without zero penalty

==

No biases for the first layer

(avoids the 2*binary - 1 effect)

Hacking the OCR:

To negate the effect the biases, for each image we add random noise in the white areas

This will greatly improve the recognition in a noisy image

An more powerful model

input layer

Hacked OCR layer

Score map

output layer

Questions?

The demo source is hosted athttps://github.com/theshark08/howtobreakacaptcha01

https://github.com/theshark08/howtobreakacaptcha01

Attacks Against Captcha Systems - DefCamp 2012

Documents

Transcript of Attacks Against Captcha Systems - DefCamp 2012