Bhiksha Raj Pittsburgh, PA 15213 21 April 2011

Bhiksha Raj

Pittsburgh, PA 15213

21 April 2011

Hearing without ListeningCollaborators: Manas Pathak (CMU), Shantanu Rane (MERL), Paris

Smaragdis (UIUC) and Madhusudana Shashanka (UTC), Jose Portelo (INESC)

http://www.lti.cs.cmu.edu/

http://www.cmu.edu/index.shtml

About Me• Bhiksha Raj• Assoc. Prof in LTI• Areas:

o Computational Auditiono Speech recognition

• Was at Mitsubishi Electric Research prior to CMU

o Led research effort on speech processing

2

A Voice Problem

• Call center records customer responseo Thousands of hours of customer callso Must be mined for improved response to market conditions

and improved customer response

• Audio records carry confidential informationo Social security numbero Names, addresseso Cannot be revealed!!

3

Call center mans calls relatingto various company servicesand products

Mining Audio• Can be dealt with as a problem of keyword spotting

o Must find words/phrases such as “problem”, “disk controller”o Data holder willing to provide list of terms to search for

• Must not discover anything that was not specified!o E.g. Social security number, names, addresses, other identifying

information

• Trivial solution: Ship keyword spotter to data holdero Unsatisfactoryo Does not prevent operator of spotter from having access to private

information

• Work on encrypted data?

4

5

A more important problem• ABC NEWS Oct 2008

• Inside Account of U.S. Eavesdropping on Americans Despite pledges by President George W. Bush and

American intelligence officials to the contrary, hundreds of US citizens overseas have been eavesdropped on as they called friends and family back home...

The Problem• Security:

o NSA must monitor call for public safety Caller may be a known miscreant Call may relate to planning subversive activity

• Privacy:o Terrible violation of individual privacy

• The gist of the problem:o NSA is possibly looking for key words or phrases

Did we hear “bomb the pentagon”??o Or if some key people are calling in

Was that Ayman al Zawahiri’s voice?o But must have access to all audio to do so

• Similar to the mining problem6

Abstracting the problem

• Data holder willing to provide encrypted datao A locked box

• Mining entity willing to provide encrypted keywordso Sealed packages

• Must find if keywords occur in datao Find if contents of sealed packages are also present in the locked box

Without unsealing packages or opening the box!

• Data are spoken audio7

Privacy Preserving Voice Processing• Problems are examples of need for privacy

preserving voice processing algorithms

• The majority of the world’s communication has historically been done by voice

• Voice is considered a very private mode of communication

o Eavesdropping is a social no-noo Many places that allow recording of images in public

disallow recording of voice!

• Yet little current research on how to maintain the privacy of voice ..

8

The History of Privacy Preserving Technologies for Voice

9

The History of Counter Espionage Techniques against Private Voice Communication

10

Voice Database Privacy: Not Covered• Can “link” uploaders of video to Flickr, youtube etc.,

by matching the audio recordingso “Matching the uploaders of videos across accounts”

Lei, Choi, Janin, Friedland, ICSI Berkeley

• More generally, can identify common “anonymous” speakers across databases

o And, through their links to other identified subjects, can possibly identify them as well..

• Topic for another day

11

Automatic Speech Processing Technologies• Lexical content comprehension

o Recognition Determining the sequence of words that was spoken

o Keyterm spotting Detecting if specified terms have occurred in a recording

• Biometrics and non-lexical content recognitiono Identification

Identifying the speaker

o Verification/Authentication Confirming that a speaker is who he/she claims to be

12

Audio Problems involve Pattern Matching• Biometrics:

o Speaker verification / authentication: Binary classification

– Does the recording belong to speaker X or noto Speaker identification:

Multi-class classification– Who (if) from a given known set is the speaker

• Recognition:o Key-term spotting:

Binary classification– Is the current segment of audio an instance of the keyword

o Continuous speech recognition Multi- or infinite- class classification

13

Simplistic View of Voice Privacy: Learning what is not intended

• User is willing to permit the system to have access to the intended result, but not anything else

• Recognition/spotting System allowed access to “text” content of message

o But system should not be able to identify the speaker, his/her mood etc

• Biometric system allowed access to identity of speaker

o But should not be able to learn the text content as well E.g. discover a passphrase

14

Signal Processing Solutions to the simple problems

• Voice is produced by a source filter mechanismo Lungs are the sourceo Vocal tract is the filter

• Mathematical models allow us to separate the two• The source contains much of the information about the

speaker• The filter contains information about the sounds

15

Eliminating Speaker Identity

• Modifying the Pitcho Lowering it makes the voice somewhat unrecognizableo Of course, one can undo this

• Voice morphingo Map the entire speech (source and filter) to a neutral third

voiceo Even so, the cadence gives identity away

16

Keep the voice, corrupt the content

• Much toughero The “excitation” also actually carries spoken content

• Cannot destroy content without corrupting the excitation

17

Signal Processing is not the Solution• Often no “acceptable to expose” data

o Voice recognition: Medical transcription. HIPAA rules Court records Police records

o Keyword spotting: Signal processing inapplicable Cannot modify only undesired portions of data

o Biometrics: Identification: Data holder may not know who is being identified

(NSA) Verification/Authentication – many problems

18

What follows• A related problem:

o Song matching

• Two voice matching problemso Keyword spottingo Voice authentication

Cryptographic approach An alternative approach

• Issues and questions

• Caveat: Have squiggles (cannot be avoided)o Will be kept to a minimum

19

A related problem

• Alice has just found a short piece of music on the web

o Possibly from an illegal site!

• She likes it. She would like to find out the name of the song

20

Alice and her song

• Bob has a large, organized catalogue of songs• Simple solution:

o Alice sends her song snippet to Bobo Bob matches it against his catalogueo Returns the ID of the song that has the best match to the

snippet

21

What Bob does

• Bob uses a simple correlation-based procedure• Receives snippet W = w[0]..w[N]• For each song S

o At each lag t Finds Correlation(W,S,t) = i w[i] s[t+i]

o Score(S) = maximum correlation: maxt Correlation(W,S,t)

• Returns song with largest correlation.22

S1

S2

S3

SM

What Bob does

23

S1

S2

S3

SM

W




• Returns song with largest correlation.

What Bob does

24

S1

S2

S3

SM

W





What Bob does

25

S1

S2

S3

SMCorr(W, S1, 0)

W





What Bob does

26

S1

S2

S3

SMCorr(W, S1, 0)

W

Corr(W, S1, 1)





What Bob does

27

S1

S2

S3

SMCorr(W, S1, 0)

W

Corr(W, S1, 1)

Corr(W, S1, 2)





Relationship to Speech Recognition

• Words are like Alice’s snippets.

• Word spotting: Find locations in a recording where patterns that match the word occur

o Computing “match” score is analogous to finding the correlation with an (or the best of many) examples of the word

• Continuous speech recognition: Repeated word spotting

28

Alice has a problem

• Her snippet may have been illegally downloaded• She may go to jail if Bob sees it

o Bob may be the DRM police..

29

Solving Alice’s Problem

• Alice could encrypt her snippet and send it to Bob• Bob could work entirely on the encrypted data

o And obtain an encrypted result to return to Alice!

• A job for SFEo Except we wont use a circuit

30

Correlation is the root of the problem• The basic operation performed is a correlation

o Corr(W,S,t) = i w[i] s[t+i]

• This is actually a dot product:o W = [w[0] w[2] .. w[N]]T o St = [s[t] s[t+1] .. s[t+N]]T o Corr(W,S,t) = W.St

• Bob can compute Encrypt[Corr(W,S,t)] if Alice sends him Encrypt[W]

• Assumption: All data are integerso Encryption is performed over large enough finite fields to hold all the

numbers

31

Paillier Encryption• Alice uses Paillier Encryption

o Random P,Qo N = PQo G = N+1o L = (P-1)(Q-1)o M = L-1 mod No Public key = (N, G), Private Key = (L,M)o Encrypt message m from (0…N-1):

Enc[m] = gm rN mod N2

Dec[ u ] = { [ (uL mod N2) – 1]/N}M mod N

• Important property:o Enc [m] == gm

o Enc[m] . Enc[v] = gmgv = gm+v = Enc[m+v]o (Enc[m] )v = Enc[vm]

32

Solving Alice’s Problem• Alice generates public and private keys

• She encrypts her snippet using her public key and sends it to Bob:o Alice Bob : Enc[W] = Enc[w[0]], Enc[w[2]], … Enc[w[N]]

• To compute the dot product with any snippet s[t]..s[t+N]o Bob computes i Enc[w[i]]s[t+i]

= Enc[i w[i]s[t+i]] = Enc[Corr(W,S,t)]

• Bob can compute the encrypted correlations from Alice’s encrypted input without needing even the public key

• The above technique is the Secure Inner Product (SIP) protocol

33

Solving Alice’s Problem - 2• Given the encrypted snippet Enc[W] from Alice

• Bob computes Enc[Corr(S,W,t)] for all S, and all to i.e. he computes the correlation of Alice’s snippet with all

of his songs at all lagso The correlations are encrypted, however

• Bob only has to find which Corr(S,W,t) is the highesto And send the metadata for the corresponding S to Alice

• Except, Bob cannot find argmaxS maxt Corr(S,W,t)o He only has encrypted correlation values

34

Bob Solves the Problem• Bob ships the encrypted correlations back to Alice

o She can decrypt them all, find the maximum value, and send the index back to Bob

o Bob retrieves the song corresponding to the indexo Problem – Bob effectively sends his catalogue over to Alice

• Problems for Bobo Alice may be a competitor pretending to be a customer

She is phishing to find out what Bob has in his catalogueo She can find out what songs Bob has by comparing the

correlations against those from a catalog of her own Even if Bob sends the correlations in permuted order

35

The SMAX protocol• Alice sends her public key to Bob• Bob adds a random number to each correlation

o Enc[Corrnoisy(W,S,t)] = Enc[Corr(W,s,t)]*Enc[r] = Enc[Corr(W,s,t) + r]o A different random number corrupts each correlation

• Bob permutes the noisy correlations and sends the permuted correlations to Alice

o Bob Alice (Enc[Corrnoisy(W,S,t)]) for all S,t is a permutation

• Alice can decrypt them to get (Corrnoisy(W,S,t))

• Bob retains (-r), i.e. the permuted set of random numbers

36

The SMAX protocol• Bob has (-r), Alice has (Corr(W,s,t)+r)

• Bob generates a public/private key, encrypts (-r), and sends it along with his public key to Alice

o Bob Alice: EncBob[(-r)]

• Alice encrypts the noisy correlations she has and “adds” them to what she got from Bob:

o EncBob[(Corr(W,s,t)+r)]* EncBob[(-r)] = EncBob[(Corr(W,s,t))]

• Alice now has encrypted and permuted correlations.37

The SMAX protocol• Alice now has encrypted and permuted correlations.

EncBob[Corr(W,s,t)]

• Alice adds a random number q : EncBob[Corr(W,s,t)]*EncBob[q]

= EncBob[Corr(W,s,t)+q]o The same number q corrupts everything

• Alice permutes everything with her own permutation scheme and ships it all back to Bob:

o Alice Bob : AliceEncBob[Corr(W,s,t)+q]

38

The SMAX protocol• Bob has : AliceEncBob[Corr(W,s,t)+q]

• Bob decrypts them and finds the index of the largest value Imax

• Bob unpermutes Imax and sends the result to Alice:o Bob Alice -1 Imax

• Alice unpermutes it with her own permutation to get the true index of the song in Bob’s catalogue

o Alice-1 -1 Imax = Idx

The above technique is insecure; in reality, the process is more involved

• Alice finally has the index in Bob’s catalogue for her song39

Retrieving the Song• Alice cannot simply send the index of the best song

to Bobo It will tell Bob which song it is

The song may not be available for public download i.e. Alice’s snippet is illegal i.e.

40

Retrieving the Song• Alice cannot simply send the index of the best song

to Bobo It will tell Bob which song it is

The song may not be available for public download i.e. Alice’s snippet is illegal i.e.

41

Retrieving Alice’s song: OT• Alice sends Enc[Idx] to Bob (and public key)

o Alice Bob : Enc[Idx]

• For each song Si, Bob computeso Enc[M(Si)]*(Enc[-i]*Enc[idx])ri = Enc[M(Si) + ri(Idx-i)]o Bob sends this to Alice for every song

Bob Alice : Enc[M(Si) + ri(idx-i)]

o M(Si) is the metadata for Si

• Alice only decrypts the idx-th entry to get M(Sidx)o Decrypting the rest is pointless

Decrypting any other entry will give us M(S i) + ri(idx-i)42

Unrealistic solution• The proposed method is computationally unrealistic

o Public key encryption is very slowo Repeated shipping of entire catalog

Can be simplified to one shipping of catalog Still inefficient

• Illustrates a few primitiveso SIP: Secure inner product; computing dot products of private datao SMAX: Finding the index of the largest entry without revealing

entries Also possible – SVAL: Finding the largest entry (and nothing else)

o OT as a mechanism for retrieving information

• Have ignored a few issues..

43

Security Picture• Assumption: Honest but curious

• What to they gain by being Malicious?o Bob: Manipulating numbers

Can prevent Alice from learning what she wants Bob learns nothing of Alice’s snippet

o Alice: Can prevent herself from learning the identity of her song

• Key Assumption:o Sufficient if malicious activity does not provide immediate

benefit to agent

44

Pattern matching for voice: Basics

• The basic approach includeso A feature computation module

Derives features from the audioo A pattern matching module

Stores “models” for the various classes

• Assumption: Feature computation performed by entity holding voice data

45

FeatureComputation

PatternMatching

Pattern matching in Audio: The basic tools

• Gaussian mixtureso Class distributions are generally modelled as Gaussian

mixtureso “Score” computation is equivalent to computing log

likelihoods Rather than computing correlations

• Hidden Markov modelso Time series models are usually Hidden Markov modelso With Gaussian mixture distributions as state output

densities Conceptually not substantially more complex than Gaussian

mixtures

46

Computing a log Gaussian

• The Gaussian likelihood for a vector X, given a Gaussian with mean and covariance is given by

• The log Gaussian is given by

• With a little arithmetic manipulation, we can write

47

)()(5.0exp||2

1),;( 1

XXXG T

D

CXXXG T )()(5.0),;(log 1

1]1[

10

5.0]1[),;(log

11 XWX

X

DXXG TT

Computing a log Gaussian

• Which can be further modified to

• is a vector containing all the components of W• is a vector containing all pairwise products

• In other words, the log Gaussian is an inner producto If Bob has parameters and Alice has data , they can

compute the log Gaussian without exposing the data using SIPo By extension, can compute log Mixture Gaussians privately

48

XWXX

WXXGTT

1]1[),;(log

XWXGT

),;(log

W

X

ji XX

XW

Keyword Spotting

• Very similar to song matching• The “snippet” is now the keyword• A “Song” is now a recording• Instead of finding a best match, we must find every

recording in which a “correlation” score exceeds a threshold

• Complexity is comparable to song retrievalo Actually worse

We will use HMMs instead of sample sequences and correlations are replaced by best-state-sequence scores

49

A Biometric Problem: Authentication

• Passwords and text-based forms of authentication are physically insecure

o Passwords may be stolen, guessed etc.

• Biometric authentication is based on “non-stealable” signatures

o Fingerprintso Iriso Voice

We will ignore the problem of voice mimicry..

50

Voice Authentication

• User identifies himselfo May also authenticate himself/herself via text-based

authentication

• System issues a challengeo User often asked to say a chosen sentenceo Alternately, User may say anything

• System compares voiceprint to stored model for user

o And authenticates user if print matches.

51

Authentication• Most common approach

o Gaussian mixture distribution for User Learned from enrollment data recorded by the user

o Gaussian mixture distribution for “background” Learned from a large number of speakers

o Compare log likelihoods of “speaker” Gaussian mixture with “background” Gaussian mixture to authenticate

• Privately done:o Compute all loglikelihoods on encrypted data E[X]o Engage in a protocol to determine which is higher

To authenticate the user without “seeing” the user’s data

52

Private Voice Authentication• Protecting user’s voice is perhaps less critical

o Actually opens up security issues not present in the “open” mechanism

OT can be manipulated and must be protected from malicious usero May be useful, for surveillance-type scenarios

The NSA problem

• Bigger problemo The “authenticating” site may be phishing

Using the learned speaker models to synthesize fake recordings for the speaker

To spoof the speaker at other voice authentication sites

53

Private Voice Authentication• Assumption:

o User is using a client on a computer or a smartphone Which can be used for encryption and decryption

• Protocol for modified learning procedure:o Learn the user model from encrypted recordings

The system finally obtains a set of encrypted models Cannot decrypt the models

– Unable to Phish

• Assumption 2:o Motivated user: will not provide fake data during enrollment

54

Authenticating with encrypted models• Authentication procedure changes

o System can no longer compute dot products on encrypted data

Since models are themselves encrypted

• User must perform some of the operationso Which will be seen without observing partial results

• Malicious usero Imposter who has stolen Alice’s smartphone

Has her encryption keys! Can manipulate data sent by Alice during shared computation

55

Malicious User• User intends to break into the system

o Is happy if probability of false acceptance increases even just slightly

Improves his chance of breaking in

• System:o Is satisfied if manipulating partial results of computation

results in false acceptance rate no greater than that obtained with random data

56

Protecting from Malicious User• Modify likelihood computation• Instead of computing

o L(X, Wspeaker), ando L(X, Wbackground)

o Compute

o J(X, g(Wspeaker, Wbackground))o J(X, h(Wspeaker, Wbackground))

• System “unravels” these to obtain Enc[L(X, Wspeaker)] and Enc[L(X, Wbackground)]

o P(L(X,Wspeaker) > L(X,Wbackground)) comparable to that for random X if user manipulates intermediate results

• Finally engages in a millionaire protocol designed for malicious users to decide user is authentic

57

Speaker Authentication• Ignoring many issues w.r.t. malicious user• Computation overhead large

• Signal processing techniques: Simple, but insecure• Cryptographic methods: Secure, but complex

o Is there a golden mean

58

An Efficient Mechanism• Find a method to convert voice recording to a

sequence of symbols

• Sequence must be such that:o The sequence s(X) for any recording X must belong to a

small set S of sequences with high probability if the speaker is the target speaker

o The sequence must not belong to S with high probability if the speaker is not the target speaker

• Reduces to conventional password approacho S is the set of passwords for the speaker

Any of which can authenticate him59

Converting speaker’s audio to symbol sequence

• Is as secure as conventional password systemo Speaker’s client converts voice to symbol sequenceo Symbol sequence shipped to system as password

o Speaker’s “client” maintains no record of valid set of passwords

Cannot be hackedo System only stores encrypted passwords

Cannot voice-phish

• Can improve false alarm/false rejection by using multiple mappings..

60

Converting the Speaker’s voice Symbol Sequences

• Technique 1:o A generic HMM that represents all speecho Most likely state sequences are “password” set for speaker

• Learn model parameters to be optimal for speaker

61

Converting the Speaker’s voice Symbol Sequences

• Technique 2:o Locality sensitive Hashing

• Hashes are “passwords”o Hashing must be learned to have desired password

properties

62

Other voice problems• Problems currently being addressed:

• Keyword detection (for mining)• Authentication• Speaker identification

• Speech recognition is a much tougher problemo Infinite class classificationo Requires privacy-preserving graph search

63

Work in progress• Efficient / practical authentication schemes• Efficient/practical computational schemes for voice

processingo On the side: Investigations into privacy preserving

classification schemeso Differential privacy / combining DP and SMC

• Better signal processing schemes

• Database privacy: Issues and solutions..

• QUESTIONS?64

Bhiksha Raj Pittsburgh, PA 15213 21 April 2011

Documents

Transcript of Bhiksha Raj Pittsburgh, PA 15213 21 April 2011