AN IMPROVED AUDIO Jenn Tam [email protected] Computer Science Dept. Carnegie Mellon University SOAPS...
-
Upload
lillian-lambert -
Category
Documents
-
view
216 -
download
1
Transcript of AN IMPROVED AUDIO Jenn Tam [email protected] Computer Science Dept. Carnegie Mellon University SOAPS...
![Page 1: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/1.jpg)
1
AN IMPROVED AUDIO
Jenn [email protected]
Computer Science Dept.
Carnegie Mellon University
SOAPS 2008, Pittsburgh, PA
![Page 2: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/2.jpg)
2
WHAT ARE CAPTCHAS?
CAPTCHAs are tests generated by computers and generally passable by humans but not current computer programs.
![Page 3: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/3.jpg)
3
THE PROBLEM WITH CURRENT AUDIO CAPTCHAS
In some cases the human passing rate is only 70%!
To make the CAPTCHAs secure, noise was injected into the audio files making it harder for both computers and humans to pass.
![Page 4: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/4.jpg)
4
ARE CURRENT AUDIO CAPTCHAS SECURE?
A CAPTCHA is considered broken once a program can pass it 5% of the time.
Since the current audio CAPTCHAs use a limited vocabulary, it was possible for us to collect enough data to train a system that could pass the current audio CAPTCHAs more than 45% of the time.
![Page 5: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/5.jpg)
5
HOW DID WE TEST THE CURRENT AUDIO CAPTCHAs?
Selected three different types of audio CAPTCHAs: google, reCAPTCHA, and digg
Collected 1000 CAPTCHAs per type of audio CAPTCHA to use for training and testing
Created an ASR system using machine learning techniques
![Page 6: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/6.jpg)
6
THE ALGORITHM
Given the .wav file of an audio CAPTCHA Segmentation - selecting portions of the audio
which most likely are digits/letters Recognition
Extract features from the segmentClassify segment as digit/letter or noise and
output the label Stop once a maximum number of segments are
classified
![Page 7: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/7.jpg)
7
ALGORITHM DETAILS - SEGMENTATION
CAPTCHAs were manually labeled and segmented. We created training segments using this information.
For testing, we chose the highest energy peaks in the test CAPTCHA and selected fixed size segments roughly centered at the peaks.
QuickTime™ and a decompressor
are needed to see this picture.
![Page 8: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/8.jpg)
8
ALGORITHM DETAILS - FEATURES
We used three popular techniques for extracting features from speech to derive 5 sets of features from the audio.Mel-frequency cepstral coefficients (MFCC)Perceptual linear prediction (PLP)Relative spectral transform with PLP (RASTA-PLP)
![Page 9: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/9.jpg)
9
ALGORITHM DETAILS - AdaBoost
Used decision stumps for weak classifiers For each type of audio CAPTCHA we created
enough classifiers to label a segment as a digit, letter, or noise.
![Page 10: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/10.jpg)
10
ALGORITHM DETAILS - SVM
Created a single multiclass classifier using all the training segments (from 900 CAPTCHAs)
![Page 11: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/11.jpg)
11
ALGORITHM DETAILS - k-NN
Created 5 classifiers corresponding to each of the feature sets
Used Euclidean distance as our distance metric Cross-validation gives us k=1
![Page 12: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/12.jpg)
12
THE ALGORITHM
Input: Audio CAPTCHA as an audio file Segmentation
Find the highest energy peak, and extract a fixed size segment centered at that peak
RecognitionExtract features from segmentGive segment to classifier and obtain label
Stop extracting segments once all segments have been labeled or a max solution size is reached.
![Page 13: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/13.jpg)
13
Using three machine learning techniques to perform ASR on the CAPTCHAsAdaBoostSupport Vector
Machines (SVM)k-Nearest Neighbor
(k-NN)
0
10
20
30
40
50
60
70
80
%
GooglereCAPTCHA Digg
Exact Match Rate
AdaBoostSVMk-NN
ANALYSIS OF CURRENT AUDIO CAPTCHAs
![Page 14: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/14.jpg)
14
THE GOAL
Make a secure audio CAPTCHA which will be easier for a human to pass and harder for a computer to pass.
Equate solving a CAPTCHA with doing some useful work. In other words, create an audio reCAPTCHA.
![Page 15: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/15.jpg)
15
WHAT IS reCAPTCHA?
reCAPTCHA helps digitize text on which OCR fails by using the text as its CAPTCHA.
Since millions of people solve CAPTCHAs each day, millions of words get digitized each day!
![Page 16: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/16.jpg)
16
![Page 17: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/17.jpg)
17
THE AUDIO RECAPTCHA
Takes advantage of the human ability to understand words through context.
Will help transcribe digital audio on which ASR systems fail.
The audio being used was originally recorded with the intention that it should be easily understood by humans.
![Page 18: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/18.jpg)
18
HOW WILL IT WORK?
Start with a database of phrases with known transcriptions.
Give user adjacent phrases to transcribe as the CAPTCHA .
Check user solution against the database to determine the result of the test. Store the rest of the solution as transcription
![Page 19: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/19.jpg)
19
That was the shot that killed Harry Lime. He died in aThat was the shot that killed Harry Lime. He died in a
Harry Lime he died in a sewer beneath ViennaHarry Lime he died in a sewer beneath Vienna
Harry Lime. He died in aHarry Lime. He died in a
Segment #1 Segment #2 Segment #3
![Page 20: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/20.jpg)
20
ANALYSIS OF SECURITY
Speaker independent recognition is difficult. Open vocabularies make it even more difficult
for ASR systems AM broadcasts and .mp3 compression cause the
loss of important data needed for automatic analysis
![Page 21: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/21.jpg)
21
CONCLUSION
CAPTCHAs need to be more accessible, yet remain secure and not too difficult for humans.
Deploy audio reCAPTCHA through reCAPTCHA site.
Help make knowledge captured in audio available in text form
![Page 22: AN IMPROVED AUDIO Jenn Tam jdtam@cs.cmu.edu Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.](https://reader031.fdocuments.us/reader031/viewer/2022032309/56649d155503460f949ea110/html5/thumbnails/22.jpg)
22
ACKNOWLEDGEMENTS
Dr. Luis von Ahn, CMU Dr. Manuel Blum, CMU Dr. Roni Rosenfeld, CMU David Huggins-Daines, CMU Jiri Simsa, CMU Sean Hyde, CMU