Captchas
-
Upload
shashwat-shriparv -
Category
Documents
-
view
211 -
download
4
Transcript of Captchas
CAPTCHA
• CAPTCHA is an acronym for “ Completely
automated public Turing Test To Tell the
Computers and Human apart”
• A CAPTCHA is a challenge response test used
in computing to determine the user is human .
• Trademarked in 2000 by Luis von Ahn,Manuel
Blum,Nicholas Hopper and John Langford of
Carnegie Mellon University ,who developed the
first CAPTCHA.
CAPTCHA
• A common type of CAPTCHA requires the user to type the letters of a distorted image sometimes with the addition of an obscured sequence of letters or digits appears on screen.
• This string which the user has to type to submit a form .This is a simple problem for humans,but a very hard problem for computers which have to use character recognition,because the displayed string is alienated in a way,which makes it very hard for a computer to decode
• Early CAPTCHAs such as these distorted images generated by EZ-Gimpy program were used on Yahoo.
CAPTCHA
A program that can generate and grade
tests that:
1. Most humans can pass
2. Current computer programs cannot pass
Contd…
• The concept of a CAPTCHA is motivated by real-world problems faced by internet companies such as Yahoo! and AltaVista.
• These companies offer free email accounts, intended for use by humans.
• However, they found that many online vendors were using "bots", computer programs that would sign up for thousands of email accounts, from which they could send out masses of junk email.
Origin
• The first discussion of automated tests which
distinguish humans from computers for the
purpose of controlling access to web services
appears in a 1996 manuscript of Moni Naor from
the Weizmann Institute of Science, entitled
"Verification of a human in the loop, or
Identification via the Turing Test".
• Primitive CAPTCHAs seem to have been later
developed in 1997 at AltaVista by Andrei Broder
and his colleagues to prevent bots from adding
URLs to their search engine
Contd…
• In order to make the images resistant to
OCR (Optical Character Recognition), the
team simulated situations that scanner
manuals claimed resulted in bad OCR.
• In 2000, von Ahn and Blum developed
and publicized the notion of a CAPTCHA,
which included any program that can
distinguish humans from computers.
Characteristics
• A CAPTCHA system is an automated means of generating new challenges which current computers are unable to accurately solve, but most humans can solve .
• CAPTCHAs are by definition fully automated, requiring little human maintenance or intervention in administering the test.
• This has obvious benefits in cost and reliability.By definition, the algorithm used to create the CAPTCHA must be made public, though it may be covered by a patent.
Accessibility
• Because CAPTCHAs rely on perception, users unable to perceive a CAPTCHA due to a disability (such as blindness) will be unable to perform the task protected by a CAPTCHA. In certain Cases, failing to provide a universally accessible means of bypassing the CAPTCHA could make site owners a target of litigation
• In order to combat this problem, many implementations of CAPTCHAs permit users to opt for an audio CAPTCHA in addition to a text based one.
Contd…
• While the combination of an audio and
visual CAPTCHA can not satisfy all users
(for example, those with deafblindness),
the choice of adding a CAPTCHA to an
application is a balance between ease of
use for legitimate users and creating
enough of a challenge for abusers that
abusing the application is not worthwhile
Contd…
• The inconvenience caused by a
CAPTCHA is sometimes higher for users
with disabilities. For some applications, the
potential for abuse is so high that the
application author feels that a CAPTCHA
is necessary. For other applications, the
need for accessibility outweighs the abuse
that a CAPTCHA would prevent.
EZ-Gimpy
• EZ-Gimpy and Gimpy, the CAPTCHAs that we have broken, are examples of word-based CAPTCHAs.
• In EZ-Gimpy, the CATPCHA used by Yahoo! the user is presented with an image of a single word.
• This image has been distorted, and a cluttered, textured background has been added.
• The distortion and clutter is sufficient to confuse current OCR (optical character recognition) software.
Contd…
• However, using our computer vision techniques
we are able to correctly identify the word 92% of
the time.
• Gimpy is a more difficult variant of a word-based
CAPTCHA. Ten words are presented in
distortion and clutter similar to EZ-Gimpy.
• The words are also overlapped, providing a
CAPTCHA test that can be challenging for
humans in some cases.
Contd…
• to the Right series or to the Left displays
two series of blocks, the Left and the Right
• blocks in the Left series differ from those
in the Right, and the user must find the
characteristic that sets them apart.
• then, the user is presented with a single
block and is asked to determine whether
this block belongs
Sound Based CAPTCHAs:Eco
• Sound-based CAPTCHA
picks a word or a sequence of numbers at random, renders the word or the numbers into a sound clip and distorts the sound clip.
then presents the distorted sound clip to its user and asks them to enter the contents of the sound clip
Character Recognition
• A number of research projects have attempted (often with success) to beat visual CAPTCHAs by creating programs that contain the following functionality
1.Extraction of the image from the web page.
2.Removal of background clutter, for example with color filters and detection of thin lines.
3.Segmentation, i.e. splitting the image into segments containing a single letter.
4.Identifying the letter for each segment
Contd…
• Steps 1, 2, and 4 are easy tasks for computers
The only part where humans still out perform
computers is segmentation.
• If the background clutter consists of shapes
similar to letter shapes, and the letters are
connected by this clutter, the segmentation
becomes nearly impossible with current
software. Hence, an effective CAPTCHA should
focus on the segmentation
Image-recognition CAPTCHAs
• Some researchers promote image recognition CAPTCHAs as a possible alternative for text based CAPTCHAs. To date, no major website has made use of an image based CAPTCHA. As such, the technology would be best described as in the stage of theoretical research. Image recognition CAPTCHAs face many potential problems which have not been fully studied:
• It is difficult for a small site to acquire a large dictionary of images which an attacker does not have access to. Without a means of automatically acquiring new labelled images, an image based challenge does not meet the definition of a CAPTCHA.
Principles
The principles behind CAPTCHA are as follows:
• The user is presented with a garbled image on which some text is displayed. This image is generated by the server using random text.
• The user must enter the same letters in the text into a text field that is displayed on the form to protect.
• When the form is submitted, the server checks if the text entered by the user matches the initial generated text. If it does, the transaction continues. Otherwise, an error message is displayed and the user has to enter a new code.
CAPTCHA would look like…
• The captcha would look like this:
• On the main registration form a regular captcha is presented just like before. Users that can see the image may use this test. A link informs users that there is an alternative test.
• Clicking the link leads to the audio based test form. This form provides access to an audio file and three input fields. The audio file contains three numbers that the user has to enter into the fields
Applications
• Online polls
• Protecting Website Registration
• Preventing Comment Spam in Blogs.
• Search Engine Bots
• Worms and Spam
• Prevent Dictionary attacks
Applications
• Online polls
In November 1999,htttp://slashdot.com
Released an online poll asking which was the best graduate school in computer science!. As is the case with most online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots by using programs that voted for CMU thousands of times.
Contd…
CMU's score started growing rapidly. The
next day, students at MIT wrote their own
voting program and the poll became a
contest between voting “bots". MIT
finished with 21,156 votes, Carnegie
Mellon with 21,032 and every other school
with less than 1,000.
Applications
• Protecting Website Registration
Several companies offer free email services. Up Until a few years ago most of these services suffered from a a specific type of attack:”bots” that would sign up for thousands of email accounts every minuite.The solution to this problem was to use CAPTCHAs to ensure that only humans obtain free accounts.
Applications
• Preventing Comment spam in Blogs
Most Bloggers are familiar with programs that
submit bogus comments usually for the purpose
of raising search engine ranks of some
website.This is called comment spam.By using a
CAPTCHA only humans can enter comments on
a blog.There is no need to make users sign up
before they enter a comment,and no legitimate
comments are over lost!
Applications
• Search Engine Bots
It is sometimes desirable to keep
webpages unindexed to prevent others
from finding them easily.There is an html
tag to prevent search engine bots from
reading webpages.
Applications
• Worms and Spam
CAPTCHA tests also offer a plausible
solution against email worms and spam:
only accept an email message if you know
there is a human behind the other
computer.
Applications
• Preventing Dictionary attacks
CAPTCHA can also be used to prevent dictionary attacks in password systems.The idea is simple:prevent computer from being able to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins.