Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf ·...

16
Privacy preserving similarity detection for data analysis Iraklis Leontiadis 1 Melek Önen 1 Refik Molva 1 M.J. Chorley 2 G.B. Colombo 2 CSAR 2013 1 Eurecom - France 2 Cardiff - UK

Transcript of Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf ·...

Page 1: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Privacy preserving similarity detection for data analysis

Iraklis Leontiadis1 Melek Önen1 Refik Molva1 M.J. Chorley2 G.B. Colombo2

CSAR 2013

1Eurecom - France 2Cardiff - UK

Page 2: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Privacy vs Utility

Data A1,A2,A3,…An

Data B1,B2,B3,… Bn

Clustering

Similarity

Privacy preserving similarity detection for data analysis 2

.

.

.

? ? ? ? ? ?

Personality test

Page 3: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Naïve solutions

• Encrypt data with standard crypto – Renders operations infeasible.

• Data separation – Vertical separation is not always applicable.

• Anonymizing techniques – Don’t protect individuals data.

Privacy preserving similarity detection for data analysis 3

Page 4: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Our Approach

• Combine crypto with data processing

User Data Data analysis

Alice 𝐴1′, …𝐴𝐴′ 𝐹(𝐴1′, …𝐴𝐴′)

Bob 𝐵1′, …𝐵𝐴′

𝐹(𝐵1′, …𝐵𝐴′)

𝐹(𝐴1, …𝐴𝐴) = 𝐹(𝐴1′, …𝐴𝐴′)

Data A’1,A’2,A’3,…A’n

Data B’1,B’2,B’3,… B’n

Privacy preserving similarity detection for data analysis 4

.

.

.

Page 5: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Outline

• Our solution – Cosine similarity – Privacy with Geometrical Transformations

• Security Analysis • Performance Evaluation

– Hierarchical clustering – Results

• Looking Ahead

Privacy preserving similarity detection for data analysis 5

Page 6: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Cosine similarity

A

B θo

1 1 w1 w2

w4

w3

wn

F1

Dictionary

F2

“Next CSAR workshop will be held in Karlsruhe”

“Next CSAR workshop will be held in London”

A= 1 1

1 1 0 1 1

1 1

1 1 0 1

1

1

1

1

1

B=

Privacy preserving similarity detection for data analysis 6

Page 7: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Random Scaling

• Data encoded as unique vectors in ℝ𝐴

• φr:ℝ𝐴 → ℝ𝐴 s.t:

cos a, b = cos φr1(a),φr2(b)

• Random scaling

– r ⟵ℝ𝑛

– S(r, A) = r ∙ A =r ⋯⋮ 𝑟 ⋮

⋯ 𝑟∙ A

Privacy preserving similarity detection for data analysis 7

θo θo

Page 8: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Vector Rotation

• Rotation by a common angle λ°

– R λ° a = a ∙cos (λ°) ⋯ sin (λ°)

⋮ ⋮−sin (λ°) ⋯ cos (λ°)

• φr = a ∙R λ° a ∙ 𝑆𝑟(a)

F1’

F2’

θo

F1

F2

Privacy preserving similarity detection for data analysis 8

Page 9: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Our solution

Privacy preserving similarity detection for data analysis 9

Dimension reduction

Random Scaling

A S(r1, A1) = A1

A2

A3

S(r2, A2) =

S(r3, A3) =

r1 ∙

r2 ∙

r3 ∙

Rotation

R λ° r1 ∙ A1 =

R λ° r2 ∙ A2 =

R λ° r3 ∙ A3 =

R λ° ∙ r1 ∙

R λ° ∙ r2 ∙

R λ° ∙ r3 ∙

Page 10: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Security analysis

𝑉′1 = R λ° (S r1,𝑑1,𝑑2 , S r2, 𝑑3,𝑑4 , S r3,𝑑1𝑑5 )

Privacy preserving similarity detection for data analysis 10

• Internal:

– Rotation angle is known.

• External:

– Rotation angle remains unknown.

Page 11: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Security analysis cont’d

Privacy preserving similarity detection for data analysis 11

Per user equivalent coefficient are exposured as auxiliary information

∙𝐜𝐨𝐨 (𝝀𝝀) ⋯ 𝐨𝐬𝐬 (𝝀𝝀)

⋮ ⋮−𝐨𝐬𝐬 (𝝀𝝀) ⋯ 𝐜𝐨𝐨 (𝝀𝝀)

?

∙ r1

∙ r2

∙ r3

∙ r1

∙ r2

∙ r3

Page 12: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Evaluation

• 173 users willing to run 4sqPersonality test • 5 factor personality test

– Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism.

Privacy preserving similarity detection for data analysis 12

Page 13: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Clustering approach

• Hierarchical Agglomerative clustering (HAC) – Input: n points and N*N similarity matrix – Output: Single cluster containing all n points C=MakeSingletonClusters(); for i=0 to i=n: Find “closest” clusters c1,c2; Merge(c1,c2); RecomputeDistances(C); if #C=1 exit();

Agglomerative: O(n3) Divisible: O(2n)

Privacy preserving similarity detection for data analysis 13

Cosine Similarity

Presenter
Presentation Notes
Single-linkage vs complete linkage: In single-link clustering or single-linkage clustering , the similarity of two clusters is the similarity of their most similar members In complete-link clustering or complete-linkage clustering , the similarity of two clusters is the similarity of their most dissimilar members. Agglomerative vs divisible: O(n^3) O(n^2) Why not K-means? -K-means is extremely sensitive to cluster center initialization -difficult to predict the k-value -Hierarchical Clustering can give different partitionings depending on the level-of-resolution we are looking at -Flat clustering needs the number of clusters to be specified -Hierarchical clustering doesn’t need the number of clusters to be specified -No clear consensus on which of the two produces better clustering
Page 14: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Results

Presenter
Presentation Notes
Equivalent clusters between encrypted and unencrypted data
Page 15: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Recap

1. Pairwise cosine similarity for multidimensional vectors.

2. Geometrical transformations compatible with cosine similarity.

Privacy preserving similarity detection for data analysis 15

Page 16: Privacy preserving similarity detection for data analysisleontiad/publications/CSAR2013.pdf · Cosine Similarity . Single-linkage vs complete linkage:\爀䤀渀ꀀ猀椀渀最氀攀ⴀ氀椀渀欀

Looking Ahead

• Other privacy preserving similarity detection algorithms.

• Privacy preserving data analysis algorithms: – MAX,MIN

Thank you! Iraklis Leontiadis

[email protected]

Privacy preserving similarity detection for data analysis 16