Attacks Against Captcha Systems - DefCamp 2012
description
Transcript of Attacks Against Captcha Systems - DefCamp 2012
Attacking CAPTCHAs explained
Ioan – Carol Plangu
What's a CAPTCHA
Completely
Automated
Public
Turing test to tell
Computers and
Humans
Apart
Three attack methods
Implementation attack
Automated recognition
Manual labor
The implementation attack
Scenario 1
the image session id can be reused
The implementation attack
Scenario 1
the image session id can be reused
id
Captcha form
Restricted page
The implementation attack
Scenario 2
the number of captcha tests is limited
The implementation attack
Scenario 2
the number of captcha tests is limited
we just need to solve them all and store them in a hash table
The implementation attack
Scenario 3
hash of solution sent to client
The implementation attack
Scenario 3
hash of solution sent to client
rainbow tables :)
Manual labor
There are two options:
Pay a bunch of monkeys
XXXComplete this captcha form to continue
Or not...
Automated recognition
We're going to actually reproduce a human response for the given question
Can you understand my voice?
The sound sample is usually generated
It's hard to add noise to the generated speech without making it
hard for the human
But can you read?
Sort of.....
The most common approach
Greedy optimization – reverse engineer everything
Character segmentation OCR
Possible security measures
Possible security measures
Funky background image
Possible security measures
Funky background image
usually can be removed with basic preprocessing
Possible security measures
Funky background image usually can be removed with basic preprocessing
Text distortions
Possible security measures
Funky background image usually can be removed with basic preprocessing
Text distortions
modern OCR techniques can beat it
Possible security measures
Funky background image usually can be removed with basic preprocessing
Text distortions modern OCR techniques can beat it
Anti segmentation measures
Beating segmentation
Beating segmentation
If a character signature can be extracted from only the vertical signature, character segmentation becomes trivial
A Low-cost Attack on a Microsoft CAPTCHA - Jeff Yan, Ahmad Salah El AhmadSchool of Computing Science, Newcastle University, UK
Beating segmentation
We can otherwise ignore it!
Beating segmentation
We can otherwise ignore it!
The following slides are about an experiment about this approach
A Monte-Carlo experiment
Note: for testing performance, the variance of the characters has been kept to a minimum
f(x) → y
x in binary( 0 - 2^3000 )
y in 10^6
Training:
Select one character image at random Select N black spots Sort the points for uniqueness Subtract the first point from all others for position
independence Assign it a 'weight' for each character using the
following formula:
matched characters count / sample size Assign it a 'score' (indicates classification quality)
selected digit weight / (1 + other digit weights)
Recognition:
Make a score map for all points Select the most appropriate character for each
column Process the resulting string into a 6 digit string
An equivalent model
input layer
linear hidden layer(feature layer)
threshold layers
softmax layer
An equivalent model
input layer
linear hidden layer(feature layer)
threshold layers
softmax layer
OCR
without zero penalty
==
No biases for the first layer
(avoids the 2*binary - 1 effect)
Hacking the OCR:
To negate the effect the biases, for each image we add random noise in the white areas
This will greatly improve the recognition in a noisy image
An more powerful model
input layer
Hacked OCR layer
Score map
output layer
Questions?
The demo source is hosted athttps://github.com/theshark08/howtobreakacaptcha01