The art of breaking and designing captchas - RSA … art of breaking and ... Recaptcha 0% 0% Reddit...

Post on 26-Jun-2018

235 views 0 download

Transcript of The art of breaking and designing captchas - RSA … art of breaking and ... Recaptcha 0% 0% Reddit...

Insert presenter logo here on slide master. See hidden slide 

4 for directions

Session ID:Session Classification:

Elie BurszteinStanford University

The art of breaking and designing captchas

HT2-402Intermediate

2

Insert presenter logo here on slide master. See hidden slide 

4 for directions 2

3Elie Bursztein (@elie)http://elie.im

3

4Elie Bursztein (@elie)http://elie.im

World Most-Popular Captchas

4

5Elie Bursztein (@elie)http://elie.im

Captcha Design Goal

AI ?

Human

sweet spot

5

6Elie Bursztein (@elie)http://elie.im

Focus of this talk

xw

How to break and designCAPTCHAs

6

7Elie Bursztein (@elie)http://elie.im

Based on the analysis of 21 of the most popular schemes

7

8Elie Bursztein (@elie)http://elie.im

Outline

How to break text captchaHow to break audio captchaHow to make captchas easier for humanWhat’s next ?

8

9Elie Bursztein (@elie)http://elie.im

Evaluation metrics

AccuracyLearnability

9

Solving time

10

Insert presenter logo here on slide master. See hidden slide 

4 for directions

How to Break Text-Captchas

10

11Elie Bursztein (@elie)http://elie.im

Think Lego

11

12Elie Bursztein (@elie)http://elie.im

Pre-processing: captcha binarizationPre-processing: background removalHow to break a captcha: examplePre-processing: Line detectionPre-processing: Line removalSegmentation: clustering algorithmSegmentation: cluster separationPost-segmentation: inverting rotationRecognition: 3173

12

13Elie Bursztein (@elie)http://elie.im

Breaker 5 Stages Pipeline

Preprocessing

Segmentation

Post-segmentation

Recognition f a e t e s t

f a s t e s tPost-recognition

Slashdot captcha

13

14

From the image to the matrix representation

15

L1 L2 L3 L4 L5 L6vector

From the matrix representation to the vector representation

16

B

B

A

A

C

C

42

40

32

70

12

18

vector

DistanceKnown vectors

C 12

From the vector representation to the segment value (classification)

17Elie Bursztein (@elie)http://elie.im

Breaker efficiency

Solver accuracy = Coverage * Precision^length

Coverage: Segmentation ratePrecision: Recognition rate

17

18Elie Bursztein (@elie)http://elie.im

Anti-recognition techniques

Blurring

Distortion

Rotation

Fonts

Charsets 0123456789

18

19Elie Bursztein (@elie)http://elie.im

SVM learning rate

19

20Elie Bursztein (@elie)http://elie.im

KNN learning rate

20

21Elie Bursztein (@elie)http://elie.im

Anti-recognition taxonomy

Background Confusion

Lines

Collapsing

B k d f i21

22Elie Bursztein (@elie)http://elie.im

Breaking World of Warcraft

22

23Elie Bursztein (@elie)http://elie.im

Breaking Captcha.net

23

24Elie Bursztein (@elie)http://elie.im

Breaking Wikipedia

24

25Elie Bursztein (@elie)http://elie.im

Breaking Digg

25

26Elie Bursztein (@elie)http://elie.im

Breaking Slashdot

26

27Elie Bursztein (@elie)http://elie.im

Breaking eBay

27

28Elie Bursztein (@elie)http://elie.im

Failing to break eBay

28

29Elie Bursztein (@elie)http://elie.im

Breaking Baidu

29

30Elie Bursztein (@elie)http://elie.im

Segmentation rate

Solving rate

Authorize 84% 66%Baidu 98% 5%Blizzard 75% 70%Captcha.net 96% 73%

CNN 50% 16%Digg 86% 20%eBay 95% 43%Google 0% 0%MegaUpload n/a 93%

NIH 87% 72%Recaptcha 0% 0%Reddit 71% 42%Skyrock 30% 2%Slashdot 52% 35%Wikipedia 57% 25%

30

31Elie Bursztein (@elie)http://elie.im

Learning rate for real schemes

31

32Elie Bursztein (@elie)http://elie.im

Building a breaker guidelinesImmediate visual feedbackVisual debuggingAlgorithm independenceExposing algorithm parameters

32

33Elie Bursztein (@elie)http://elie.im

Decaptcha main interface

33

34Elie Bursztein (@elie)http://elie.im

Apply design principles

Core design principlesRandomize lengthRandomize character sizeWave the captcha

Use anti-recognition as a means of strengthening captcha securityDon’t use a complex charset

Bad for human (see our research on this)Useless for security

Use collapsing or lines

34

35

Insert presenter logo here on slide master. See hidden slide 

4 for directions

Designing Better Captchas

35

36Elie Bursztein (@elie)http://elie.im

Think Lego againDecompose in featuresAnalyze

feature in isolationfeatures interaction

36

37Elie Bursztein (@elie)http://elie.im

Real vs Generated

37

38Elie Bursztein (@elie)http://elie.im

Real vs Generated

38

39Elie Bursztein (@elie)http://elie.im

Evaluation system

39

40Elie Bursztein (@elie)http://elie.im

Experiment details

40

41Elie Bursztein (@elie)http://elie.im

Some of the features tested

41

42Elie Bursztein (@elie)http://elie.im

Angle of rotation

42

43Elie Bursztein (@elie)http://elie.im

Collapsing

43

44Elie Bursztein (@elie)http://elie.im

Character size

44

45Elie Bursztein (@elie)http://elie.im

Resolution invariant

45

46Elie Bursztein (@elie)http://elie.im

2D interactions

46

47Elie Bursztein (@elie)http://elie.im

Length vs Angle interaction

47

48Elie Bursztein (@elie)http://elie.im

Perception Does Not Match Number

48

49

Insert presenter logo here on slide master. See hidden slide 

4 for directions

How to Break Audio-Captcha

49

50Elie Bursztein (@elie)http://elie.im

Audio Captchas

50

51Elie Bursztein (@elie)http://elie.im

Super secure captcha

CaptchaMaker

Creating Audio Captcha

Noises Voices

51

52Elie Bursztein (@elie)http://elie.im

Noise intensity (RMS/SNR)

2 9 0 0Microsoft

JDigg

A K

Authorize

K J 5 H

52

53Elie Bursztein (@elie)http://elie.im

Sound representation

WAV DFT

Cep

TFR

TCR

TDC

53

54Elie Bursztein (@elie)http://elie.im

Solving an audio captcha

CT

T A R A FR

S2

54

55Elie Bursztein (@elie)http://elie.im

Dealing with random noiseStatistical learningSupervised learningRLS (Regularized least square) classifier

55

56Elie Bursztein (@elie)http://elie.im

Semantic noise

56

57Elie Bursztein (@elie)http://elie.im

Results

Length Coverage Digit Captcha

Authorize 5 100 97 89.2%

Digg 5 100 76 41.4%

eBay 6 85.6 92.5 82.9%

Microsoft 10 80.6 89.6 48.9%

Recaptcha 8 99.9 40.5 1.5%

Yahoo 7 99.1 74.7 45.4%

57

58Elie Bursztein (@elie)http://elie.im

Recaptcha semantic noise

58

59Elie Bursztein (@elie)http://elie.im

Confusion matrices

59

60Elie Bursztein (@elie)http://elie.im

How many captchas do you need ?

60

61Elie Bursztein (@elie)http://elie.im

Apply

Within 3 monthsMake sure you have a strong captcha schemeEnsure that your site is accessible

Within 6 monthsLog your captchas failure rate and monitor themHave a backup captcha scheme in case your scheme is broken

61

62Elie Bursztein (@elie)http://elie.im

Thank you !

Questions ?

Follow-me !

Thank you

Twitter: @elie

62

Captcha research: http://elie.im/captcha