On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

23
On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study Niharika Sachdeva*, Nitesh Saxena, Ponnurangam Kumaraguru* University of Alabama, Birmingham*IIITDelhi InformaDon Security Conference, 2013 (Nov 13 – 15)

description

Telephony systems are imperative for information exchange offering low cost services and direct reach to million of customers. They have not only benefited users but have also provided a convenient medium for spammers. Voice spam is often encountered on telephony, such as in the form of an automated telemarketing call asking to call a number to win million of dollars. A large percentage of voice spam is generated through automated systems which introduces the classical challenge of distinguishing machines from humans on the telephony. CAPTCHA is a conventional solution used for distinguishing humans and machines, and audio-based CAPTCHAs have been proposed as a solution to curb voice spam. In this paper, we conduct a field study with 90 participants in order to answer two primary research questions: how much inconvenience does CAPTCHA cause to users, and how different features of the CAPTCHA, e.g., duration and size influence usability of CAPTCHA on telephony. Our results suggest that currently proposed CAPTCHAs are far from usable. We provide certain guidelines that may help improve existing CAPTCHAs for use in telephony systems.

Transcript of On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Page 1: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability

Field Study Niharika  Sachdeva*,  Nitesh  Saxena,  Ponnurangam  Kumaraguru*  

University  of  Alabama,  Birmingham*IIIT-­‐Delhi  InformaDon  Security  Conference,  2013  (Nov  13  –  15)  

Page 2: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Overview � MoDvaDon    � Research  quesDon  � Study  Design  � Experimental  setup  � ParDcipants  � Results  � Guidelines  

2  

Page 3: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

3  

Some Attacks

Page 4: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

CAPTCHA

� Completely  Automated  Public  Turing  Test  to  tell  Computers  and  Humans  Apart  

       

Google  ReCAPTCHA  

Yahoo  Math  FuncDon  

Page 5: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Is it Really Useful ??

� FrustraDng                  � Lack  of  incenDve          � Hard  to  recognize  � Difficult  to  solve  � NaDve  language    

5  

E.  Bursztein,  S.  Bethard,  C.  Fabry,  J.  Mitchell,  and  D.  Jurafsky.  How  Good  Are  Humans  at  Solving  CAPTCHAs?  A  Large  Scale  EvaluaDon.  SP  ’10,  pages  399–413.    J.  Yan,  A.  Ahmad.  Usability  of  CAPTCHAs  Or  usability  issues  in  CAPTCHA  design.  In  Symposium  On  Usable  Privacy  and  Security,  pages  44–52,  2008.      

Page 6: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

But CAPTCHA continues to Rule

CAPTCHA  for  RoboCalls  

6  

Page 7: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

But CAPTCHA continues to Rule

CAPTCHA  for  RoboCalls  

7  

Page 8: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Audio CAPTCHA a solution?

�   Yahoo        �   Google    � Patent  CAPTCHA  

8  

Page 9: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Research Question

� QuanDfy  the  amount  of  inconvenience  CAPTCHA  causes  to  users.  

� How  different  features  of  CAPTCHA,  e.g.  duraDon,  size  and  character  set  influence  the  users’  performance?  

 -  H1:  Close  to  the  expected  /  correct  answers  even  though  the  overall  CAPTCHA  solving  accuracy  is  low.    

 -  H2:  Accuracy  of  answering  the  CAPTCHA  correctly  on  telephony  decreases  as  the  number  of  key  presses  required  increases.    

 -  H3:  Users  will  take  more  Eme  responding  to  a  CAPTCHA  that  requires  more  key  presses  than  to  the  one  requiring  less  key  presses.  

9  

Page 10: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Study Design

10  

LaDn    Square  

Polakis,  G.  Kontaxis,  and  S.  Ioannidis.  CAPTCHuring  Automated  (Smart)  Phone  Aiacks.    In  SYSSEC,  2011.    

 

Page 11: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

CAPTCHA Features

Category   Char. Set   Word   Repeat   Duration   Nois

e   Voice   Beep   Min length  

Max length  

Google   0-9   No   Yes   34.4   Yes   M   Yes   5   15  

Ebay   0-9   No   No   3.7   Yes   V   No   6   6  

Yahoo   0-9   No   No   18.0   Yes   Child   No   6   8  

Recaptcha   a-z   Yes   No   10.6   Yes   F   No   6   6  

Slashdot   a-z   Yes   No   2.9   No   M   No   1   1  

CD   1-5   No   No   14   Yes   M   No   1   1  

Math-function   0-9   No   No   6.0   No   M   No   4   3  

RPC   0-9   No   No   20.0   No   M   No   3   2  

C+CD   0-9   No   No   14.0   No   M   No   4   3  

11  

M  =  Male  ;  F  =  Female;  V=Various  Voices    

Page 12: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Deployment

Linux Server acting as CAPTCHA Shield (With

FreeSWITCH)

Source (Legitimate or malicious)

Database

File System

Java Application

Linksys Gateway SPA 3102

IVRS Playing CAPTCHA

PSTN

Cellular Network

IP phone

VOIP

Architecture  Diagram  12  

Page 13: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Participants

� 90  ParDcipants    � Five  ciDes  - Delhi  - Mumbai  - Chennai  - Noida  - Vellore  

 � Real  world  deployment  

13  

Page 14: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Results: Accuracy

14  

CAPTCHA   Category   Accuracy  (%)   Skip  (%)  

CD   Telephony   18.71   35.67  

Math-­‐FuncDon   Telephony   17.47   26.51  

RPC   Telephony   15.47   40.33  

C+CD   Telephony   4.57   40.10  

Ebay   Web  (Number)   8.75   13.13  

Google   Web  (Number)   0.00   43.83  

Yahoo   Web  (Number)   7.74   20.24  

ReCaptcha   Web  (Alphabet)   0.00   46.07  

Slashdot   Web  (Alphabet)   13.73   30.06  

Page 15: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Results: Time taken

15  

CAPTCHA   Category   Time  (s)  

CD   Telephony   96.11  

Math-­‐Func   Telephony   90.23  

RPC   Telephony   147.44  

C+CD   Telephony   109.59  

Ebay   Web  (Number)   80.25  

Google   Web  (Number)   123.49  

Yahoo   Web  (Number)   95.88  

ReCaptcha   Web  (Alphabet)   120.64  

Slashdot   Web  (Alphabet)   122.57  

Page 16: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Results:H1 � H1:    Close  to  the  expected  /  correct  answers  even  though  the  overall  CAPTCHA  solving  accuracy  is  low.  

 

16  

1 2 3 4 5 6 70

5

10

15

20

25

30

35

40

45

50

Edit Distance

Num

ber o

f Cap

tcha

Yahoo!

eBay

Google

Slashdot

RPC

Math−Function

CD

C+CD

Page 17: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

17  

Math function, but we noticed a negative relationship with correlation coefficient r =� 0.47 forweb-based captcha. Finally, we found significant difference (t-test, t-value = 5.30 p-value <0.001) between Expected Key Press (Average DTMF) and accuracy in statistical results showsthat these two were independent of each other. The results mentioned above do not approve ourhypothesis H2.

Table 3: Presents the Average DTMF expected for captcha (Avg. DTMF), accuracy, time andAverage DTMF input by users (Avg. User DTMF) of each captcha.

Scheme Category Avg.DTMF

Accuracy Time Avg. UserDTMF

CD Telephony 1.00 18.71 96.11 1.76Math-function

Telephony 2.05 17.47 90.23 2.71

RPC Telephony 3.00 15.47 147.44 3.92C + CD Telephony 2.06 4.57 109.59 2.65Ebay Web 6.00 8.75 80.25 3.85Google Web 6.36 0.00 123.49 4.68Yahoo Web 7.09 7.74 95.88 4.99Slashdot Web 15.34 13.73 120.64 6.02ReCaptcha Web 64.93 0.00 122.57 10.97

H3 – Time vs. Number of key press: Table 3 shows that users spent varying amount of timein submitting a comparable number of DTMF responses. For example, the average time spentfor Google was 123.49 seconds (min: 17.15 and max: 341.21) whereas for Yahoo, it was 95.88seconds (min: 25.88 and max: 278.00), although both of them had same average DTMF (5) toinput. There was a significant difference between the time taken to solve Google vs. Yahoo!(t-Test, t-value = -12.39, p-value < 0.01). Further, we found a correlation (r =0.85) betweentime spent and DTMF input for Math-function captcha, suggesting increase in the time wasproportionate to DTMF input. However, this correlation dropped to r = 0.56 for web-basedcaptcha, implying an absence of any strong relativity between time and DTMF input. Theresults from our study suggest lack of any strong relationship between the time spent by theparticipants in solving a captcha and the number of DTMF input from them. We found that thecorrelation between the time spent to answer the captcha and DTMF response from the userswas 0.36 for all the captcha used in our study. We found significant difference (t-test, t-value =4.33, p-value = 0.00045) between number of key press (Average User DTMF) and accuracy instatistical results suggesting that these two were independent of each other. We further tested,if the duration for which a captcha is played influences the accuracy but found that exposingusers longer to a captcha did not help improve solving accuracy. Figure 5 shows the averageplaytime of the number web-based captcha (eBay, Yahoo, Google) varied from as low as 3.7to 34.4 seconds where all these required similar number of DTMFs to be recognized. Googlecaptcha provided a feature to repeat the challenge in each attempt, without users asking for itexplicitly, irrespective of these; the correct response was 0% for Google and 8.75% for eBay.

12

•  H2:  Accuracy  of  answering  the  CAPTCHA  correctly  on  telephony  decreases  as  the  number  of  key  presses  required  increases  

Results: H2

Page 18: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Results: H3 � H3:    Users  will  take  more  Dme  responding  to  a  CAPTCHA  that  requires  more  key  presses  than  to  the  one  requiring  less  key  presses.  

0"

10"

20"

30"

40"

50"

60"

70"

80"

�Ebay"" Google"" Yahoo"" Slashdot"" Recaptcha"

Key"press"(#)" Accuracy"(%)" Avg"play"Gme(sec)"

18  

Page 19: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

User Experience

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

#Strongly#disagree#

Disagree#

Nuetral#

Agree#

Strongly#Agree#

Complexity# Frequently#use# Confidence# Technical#help#

Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.

types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.

6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-

convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.

Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice

0%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

18-24# 25-35# 36-50# 51-65#

Par$cipa

nts+(%)++

Age+

#Numeric# #Mathfunc;on# #Alphabets#

Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.

recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the

mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.

System Usability Scale (SUS).We calculated the SUS score for the telephony captcha

as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another

9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php

9

19  

� User  friendliness  

Page 20: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

#Strongly#disagree#

Disagree#

Nuetral#

Agree#

Strongly#Agree#

Complexity# Frequently#use# Confidence# Technical#help#

Figure 6: Users reported the system to be complex,not usable, suggesting the need for technical helpfor using captcha over telephony.

types. Users took the least of 80.25 seconds to solve eBaycaptcha. There was no significant di↵erence in time takenby users for solving web-based captcha to telephony-basedcaptcha. We found that the most time consuming captchaon telephony was RPC captcha with an average solving timeof 147.44 seconds. Among the web-based captcha, Google,ReCaptcha and Slashdot were the most time consuming withmean greater than 120 seconds (min: 120.64 seconds andmax: 123.49 seconds). On analyzing the existing studieson the web, we found that Google and ReCaptcha were themost time consuming with a mean value greater than 25 sec-onds. However, users took on an average 12 seconds to solveSlashdot captcha on the web [8] in comparison to 122.57seconds on telephony. The increase in the solving time in-dicates inconvenience caused to users for solving captcha ontelephony is more than that on the web.

6.1.3 User ExperienceAs the ultimate assessment metric to understand the in-

convenience caused, we study the feedback provided by theparticipants for di↵erent captcha types. About 50% of theparticipants found the system (on which users called to an-swer captcha in our study) extremely complex to use, andonly 17% felt confident using this system (as shown in Fig-ure 6). Users’ feedback suggests that they did notlike alphabet captcha at all. When asked about whichcaptcha they would prefer, numeric, alphabetic or mathfunction, only 14.44% of the users preferred use of alpha-bet audio challenge. Most of the users (52.22%) preferrednumeric captcha (Contextual Degradation, Random menucaptcha and numeric web based captcha) and 33.33% of theusers favored captcha involving math functions i.e. math-function and math-function with contextual degradation.We also analyzed if age has any e↵ect on captcha preferenceof the participants. Participants in the age group of 36 to50 did not wish to use the Alphabet captcha at all, whereas numeric captcha was appreciated the most among all agegroups.

Next, we wanted to analyze if participants felt that us-ing speakerphones or headphones would help them solvethe captcha better. We found that 30% of the users dis-agreed and 8.89% strongly disagreed, feeling that speaker-phones will be of no use. A participant mentioned ‘ ‘the voice

0%#

10%#

20%#

30%#

40%#

50%#

60%#

70%#

80%#

90%#

100%#

18-24# 25-35# 36-50# 51-65#

Par$cipa

nts+(%)++

Age+

#Numeric# #Mathfunc;on# #Alphabets#

Figure 7: Participants in various age groups foundthe numeric captcha to be most usable whereas al-phabet captcha was least preferred by the partici-pants.

recording was not clear therefore any audio accessory wouldnot have helped” disagreeing with the use of speakerphones.Some participants supported the use of speakerphones with24.44% agreeing and 8.89% strongly agreeing. We also askedthe participants if they felt the use of headphones wouldhelp them solve the captcha better. The results show that15.56% of the users strongly agreed and 70% agreed; usingheadphones would help them respond better to the chal-lenge. The users also complained about audio captcha notbeing clear and being less audible.In order to understand the user perspective about the

mode of input, we asked the users if they would prefer to re-spond verbally to the captcha challenge; 36.67% of the usersagreed with the statement and 28.89% strongly agreed. Thissuggests that entering responses using a keypad is di�cultand causes trouble for respondents; this might be one of theprimary reasons for errors in the captcha responses leadingto low solving accuracy in the study [25]. Further, we sug-gest the need for further research to investigate what makesan audio captcha easier to answer - verbal response or key-pad touch.

System Usability Scale (SUS).We calculated the SUS score for the telephony captcha

as 38.42, which is extremely low. 9 We further analyzedthe raw comments from the users. The users mentionedfollowing problems with respect to audio captcha on tele-phony: accent, clarity of the voice, and noise. Participants(35.71%) complained about voice being not clear, a partici-pant commented“It sounded like ghost voices. I was not ableto understand almost any utterance,”with another user feel-ing “The voice is not clear and the words are not distinctlyspoken.” Around 8.92% explicitly complained about the ac-cent in the voice captcha stating, “at least the accent shouldbe better,” another participant mentioned, “with Accent bet-ter, the system is good to use.” Another issue, which wasmentioned, was noise. Participants (17.85%) complainedabout the noise in the audios being very disturbing. A par-ticipant commented “too many disturbances and the accentwas bad and 80% of time I couldn’t understand,”and another

9Given that SUS is 68 for average usable sys-tem.http://www.measuringusability.com/sus.php

9

20  

User Experience

� User  preferred  scheme  

Page 21: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Guidelines

� One  Dme  instrucDon  � Loss  /  Error  Tolerant  � Feedback  � Verbal  Responses  

21  

Page 22: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

Thank you!! QuesDons  

22  

Page 23: On the Viability of CAPTCHAs for Use in Telephony Systems: A Usability Field Study

For any further information, please write to

[email protected]  precog.iiitd.edu.in