Collective Intelligence CAPTCHAs -...
Transcript of Collective Intelligence CAPTCHAs -...
Collective Intelligence
CAPTCHAsEran Hershko
1) Introduction to CAPTCHA.
2) reCAPTCHA (and Collective Intelligence).
3) How To Break Two CAPTCHAs:
EZ- GIMPY & GIMPY.
4) Summery & Future Work.
Outline
CAPTCHA
* e- mail websites- in order to stop spam.
* Blogs & forums- in order to stop automatic posting.
* Websites that sell tickets- in order to prevent scalpers from buying
a lot of tickets.
* …
Who Uses CAPTCHAs & Why?
A CAPTCHA is a test that can be automatically generated,
which most humans can pass,
but most computers can’t.
Completely
Automated
Public
Turing test
to tell
Computers
and
Humans
Apart
CAPTCHA
EZ- GIMPY Code
The CAPTCHA Paradox:
A CAPTCHA is a program that can generate and
grade tests that it itself can’t pass!
CAPTCHAThe CAPTCHA achieves two opposite goals:
1) If the CAPTCHA is not broken- there is a way to
differentiate humans from computers.
2) If the CAPTCHA is broken- a useful computer
vision problem is solved.
Part I
Part II
EZ- GIMPYGIMPY
reCHAPTCHAESP-PIX
SQUIGL-PIX
The Evolution Of CAPTCHA
reCAPTCHA
reCPATCHA uses Collective Intelligence
in order to contribute to humanity!
Collective Intelligence is a shared or group
intelligence that emerges from the
collaboration of many individuals.
Who uses reCAPTCHA?
* reCAPTCHA is used by more than 40,000 websites!
* Google purchased reCAPTCHA in 2009.
How Does It Work?
Come, come with me, and we
will make short work;
For, by your leaves, you shall
not stay alone
Till holy Church incorporate two
in one.
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Ohurch incorporate two in
one.
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Chulch incorporate two in
one.
Optical Character
Recognition (OCR) I
Optical Character
Recognition (OCR) IIDictionary
Ohurch
Chulch
Romeo & Juliet
“suspicious”
wordcontrol word
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Church incorporate two in
one.
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Chulch incorporate two in
one.
Church
Chulch
How Does It Work?
fiery Church
fiery Bhurch
How Does It Work?
= 1 point = 1/2 pointOCR
A Suspicious word is Correct
if Suspicious word > 2.5 points
fiery Church Church chief Church overlooks
Inquiry Church
How Does It Work?
fiery Church Church chief Church overlooks
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Ohurch incorporate two in
one.
Come, come with me, and we will
make short work;
For, by your leaves, you shall not
stay alone
Till holy Chulch incorporate two in
one.
OCR OCR
The Suspicious word becomes a Control word!
How Does It Work?
The word is Unreadable!
Reject
Reject Reject
Reject
Reject
Reject
Some Statistics
67.87%
How many humans are required for a word to be considered
correct?
17.86%
7.10%
3.11% 4.06%
* Including words which are considered unreadable.
…*
Why is reCAPTCHA more secure
than CAPTCHA?
3) reCAPTCHA has two natural distortions and one artificial:
a. The fading of the text in time (natural).
b. The noise introduced by the scanning process (natural).
c. The added distortion (artificial).
Algorithms that succeeded in more than 90% in recognizing
CAPTCHA were completely unable to recognize reCAPTCHA!
1) reCAPTCHA uses only words that OCR already failed to decipher.
2) CAPTCHAs generate their own artificial distorted characters.
A smart learning algorithm can recognize them.
Several Results
50
= 99.1%
OCR
= 83.5%
Several Results
* After one year of running:
More than 1.2 billion reCAPTCHAs were solved!
More than 440 million suspicious words were correctly
deciphered!
reCAPTCHA has successfully achieved its goal
in efficiently harnessing Collective Intelligence!
Breaking
EZ- GIMPY
CAPTCHA
EZ- GIMPY- How is it done?
1) Choosing a word out of 561 words dictionary.
2) Distorting and blurring its characters.
3) Adding a cluttered and confusing background.
The Algorithm
This algorithm treats every letter as an individual:
requires low
computational power
requires high
computational power
The algorithm’s steps:
Step A & B- Finding individual letters in
the image and extracting candidate words.
Step C- Choosing the most likely word.
Step A
Producing a training set:
1) Extracting a letter from a EZ- GIMPY image.
1) 2) 3) 4)
4) Extracting the 2600 (26*100) Shape Contexts.
2) Running a Canny edge detection.
3) Sampling 100 points from the letter’s interior and exterior edges.
Step A
Finding letters in the image:
1) Choosing randomly several sample points from the image.
1) 3)
2) Generating a shape context for each point.
3) Finding the letters from the training set with closest shape contexts.
Step B
Finding Sequences of letters that form candidate words:
For every letter, trying to construct a possible word.
There are several constrains: letters must be from left to right, not
be too far from each other nor too close and the candidate words
must be from the dictionary.
profit roll
Step C
Choosing the most likely word:
1) For each letter, building generalized shape contexts
(which assumes many possible deformations in the letters).
3) The answer to EZ- GIMPY is the word with the highest score.
2) Giving a score to each letter according to the distance.
Results
* This algorithm has a success rate of 83% of the time.
collar canvas jewel
smile spade soap
line here till
Breaking
GIMPY
CAPTCHA
GIMPY- How is it done?
1) Choosing words out of 411 words dictionary.
2) Distorting and blurring the characters.
3) Locating the words randomly in the image in 5 pairs (one on the other).
4) Adding a cluttered and confusing background.
* The user must recognize 3 words correctly.
The Algorithm
This algorithm treats every word as a whole and not individual letters:
requires low
computational power
requires high
computational power
The algorithm’s steps:
Step A & B- Finding candidate words in
the image.
Step C- Choosing the most likely words.
Step A
Finding candidate words in the image:
1) Finding the suspicious places which contain pairs of words.
1) 2)
2) For every pair, conducting edge detection and finding the first two
letters and the last two letters, by using shape contexts.
3) Producing a list of the possible candidate words from the dictionary.
The result is a list of approximately 4 candidate words.
Step B
Removing layers of words:
1) Removing the edges of the candidate word from the image and
repeating step A (trying to find candidate words).
2) Each pair of words in the image has approximately 16 pairs
of candidate words.
r o u n d
Step C
Giving final score:
1) For each pair, producing a synthetic image of the two words overlaid
with their estimated locations.
2) Computing the shape contexts of the synthetic image.
3) Every suspicious word in a pair of the original image gets a score
according to the distance of its shape contexts from the shape contexts
of the synthetic word.
4) The three words with the highest scores are chosen as the answer to
the GIMPY CAPTCHA.
r o u n d c o w
r o w r o u n d
Results
* This algorithm has a success rate of 33% in guessing the correct three
words of GIMPY.
true, with, sponge
narrow, bulb, right
carriage, potato, clock door, farm, important
church, tongue, bad sudden, oven, apple
* Applying this algorithm on EZ- GIMPY results in a success rate of
92% (The previous algorithm gave only 83%)..
Summery
2) The reason of reCAPTCHA’s success:
Solving a reCAPTCHA is an action that people have to do anyway.
They feel better when it’s for an important cause.
1) EZ GIMPY is successfully broken (92% success).
There is still work to be done on GIMPY-
as a Computer Vision challenge.
3) The new CAPTCHAs will set new challenges in the Computer
Vision field.
Future Work
Breaking reCAPTCHA and the new image based CAPTCHAs with
a reasonable rate of success .
Finding new forms of image based problems that humans can easily
solve but computers and computer vision algorithms can’t.
The “Evil” Side:
The “Good” Side:
The constant battle between “Good” and “Evil”
Questions?
1) ‘reCAPTCHA: Human-Based Character Recognition via
Web Security Measures’- Luis von Ahn et al.
2) ‘Recognizing Objects in Adversarial Clutter: Breaking a
Visual CAPTCHA’- G Mori et al.
3) ‘Shape Matching and Object Recognition Using Shape
Context’- Serge Belongie et al.
4) ‘Telling Humans And Computers Apart Automatically’-
Luis von Ahn et al.
5) ‘Breaking reCAPTCHA: A Holistic Approach via Shape
Recognition’-Paul Baecher et al.
6) http://www.google.com/recaptcha
References
Related Work From 2011
New CAPTCHA
Which Uses Empathy