A Study on Privacy Level in Publishing Data of Smart Tap Network

Post on 08-Jul-2015

95 views 5 download

description

Using entropy to quantify privacy leve when publishing smart grid data.

Transcript of A Study on Privacy Level in Publishing Data of Smart Tap Network

A Study on Privacy Level in Publishing Data of Smart Tap Network

The University of Tokyo Esaki Laboratory

Tran Quoc Hoan 2014.03.18@Niigata

1

Outline1. Background & Purpose

2. Related works

3. Proposal

4. Methodology

5. Result & Discussion

6. Conclusion

2

Background & Purpose• Background

1. Smart tap & Big data

2. Privacy Preserving Data Publishing (PPDP)

3. Difficulty in anonymising time series data

• Research purpose

• Using entropy to quantify the risk of publishing smart tap’s data

Alice Bob Peter

Original Dataset

Data Recipient

Data

Pub

lishi

ngDa

ta C

olle

ction

Data

ano

nymise

Data Processor

3

Related works 1. Smart Metering & Privacy (Quinn, 2009)

2. Time series chaos analysis in physiology

• Approximate Entropy (Pincus, 1992)

• Bias effect (Ex. random noise)

• Sample Entropy (Richman, 2001)

• Avoiding of bias effect

• Difference from original entropy definition

4

15.556%31.111%46.667%62.222%

Proposal(1): Privacy Level• “Privacy level” = quantity of human activity information in power consumption data (%)

Refrigerator (regularity)

Time points Time points Time points

power value power value power value

White-noise (irregularity)

Laptop (???)

Priva

cy le

vel

• Evaluation of regularity (entropy) 5

22.222%

44.444%

66.667%

88.889%

Proposal(2): Entropy rate• Entropy Rate = Entropy(data)/Entropy(white-noise)

1

0

Privacy Level = EnRate

Entropy rate

Refrigerator (regularity)

White-noise (irregularity)

Laptop (???)

HRate

LRate

Time points Time points Time points

power value power value power value

Publish Safe

Publish Safe

6

Proposing Methodology

1. Decide parameters for entropy calculation

• Time lag, m, r

2. Calculate entropy value, entropy rate

3. Decide LRate, HRate and privacy level

• Using Approximate Entropy (ApEn) & Sample Entropy (SaEn)

7

Parameters for entropy calculation

80

15

30

45

60Ex. lag = 1, m=3

• Time series x[1], x[2], …, x[N] • pattern i: (x[i],x[i+lag],…,x[i+(m-1)lag]) • m: number of data points in pattern • lag: sampling interval in pattern

• dis(i,j)=max(|x[i+(p-1)lag]-x[j+(p-1)lag]|, p=0,m-1) • r: dis(i,j) ≤ r → pattern i ~ pattern j

pattern i j ki j ki8

Entropy Calculation• A(i): number of pattern k similar with pattern i ( k != i)

• B(i): number of pattern (k+lag) similar with pattern (i+lag)

Bias when A(i)=B(i)=0 (random noise)

ー ー

0

15

30

45

60

Time points

Ex. lag = 1, m=3

j kii+lag

j+lag k+lagi+lag

9

Setting time lag

First ACF zero-crossing lag = 7 ApEn = 1.223; SaEn = 0.944

First ACF zero-crossing lag = 198 ApEn = 1.299; SaEn = 1.457

10

Setting m, r

Choose m, r satisfy 95%Confidence Interval of the Estimate ≤ 10%SaEn

White-noise Entropy

Choose m, r maximum ApEn

バイアス 領域

std: standard deviation

m=2,3 r=0,1->0.4

11

Evaluation1. Learning data set (for setting m, r)

• Tracebase (tracebase.org) (138 devices)

• m=2,3; r=0.1→0.4

2. Evaluation data set

• IREF Building 2F-5F (136 devices, 5 weeks)

12

IREF 136 devs EnRate (5 weeks)

SaEn

Rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ApEnRate

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Result (1)

m=2, r=0.2*standard deviation13

Result (2)IREF Laptop EnRate (11 devs, 5 weeks)

SaEn

Rate

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

ApEnRate

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

HRate

LRate

LRate HRate

LRate = Mean - Standard Deviation HRate = Mean + Standard Deviation

Warning

14

Discussion1. Entropy is sensitive to data sets that include outliers

2. Relation between entropy and privacy of data

3. Future work

• Calculate entropy with meaning patterns

• Using entropy for other knowledge (device classification, abnormal pattern detection,…)

• Privacy Preserving Protocol

15

Conclusion1. Quantified the human activity

information included in smart-taps’ data

2. Applied entropy in physiology (ApEn, SaEn) to power consumption data

3. Defined entropy rate to determine privacy level of published power consumption data

16

A Study on Privacy Level in Publishing Data of Smart Tap Network

Esaki Laboratory zoro@hongo.wide.ad.jp

Thank you for listening !

17

Backup slides

18

Demand and Supply1. Demand Oriented Approach of Power Grid

• Supply matches volatile demand

• Supply side is volatile as well

2. Bi-directional communication (Internet of Things)

• Anticipate future supply/demand

• Shape demand, supply-oriented

• Personal data is needed for effective demand side management

19

Risk of Privacy Abuse

20

Inference forward channel

Inference backward channel

By consumption patterns • Appliance detection • Use mode detection • Behavior deduction

By demand response data • Incentive sensitivity • Customer preference

Household Managements Data collectors

Ex. Behavior Patterns: • Washing (10h-12h) • TV (19h-23h) • Out (12h-18h)

The Concept of EU for Privacy

21

Discriminator

Machine learning

x Pseudonym

Consumption Data

non-identifying information

identifying information

Pseudonymization

Template Data

Source: “Privacy in the Smart Energy Grid”, Lecture at NII 2014-03-13, Prof. Gunter Muller

Service Feedback Loop

22

Household

Service Provider Billing

Aggregation Compliance Verification

Data collectors

• Bill • Consumption Target

Consumption trace

(My research) Privacy level = (??)%

Query

Privacy Preserving Protocol

$$$

Future workEncryption

Service Provider Billing

Aggregation Compliance Verification

Service Provider Billing

Aggregation Compliance Verification

Privacy Preserving Query Scenario

23

Q1. How many people have energy consumption between 19h-20h which is over the average ?

Q2. How many people have energy consumption between 19h-20h which is over the average except Tanaka ?

None-privateQ1: 125, Q2: 124

Attacker Detection

Privacy preservingQ1: 125, Q2: 127

Service Provider

Data Collectors

Evaluation SystemTime series segmentation

Real Event Mapping

Quantify Privacy Level

24

Linkage Attack

in: 9h-10h, 13h-13h30 out: 10h-13h, 13h30-

in: 12h-14h, 16h-18h out: 18h-

peak: 16h-18hCategorization

Alice Bob Peter

Third party information

3 people in the room: Alice, Bob, Peter Peter has printer, Alice has monitor, Bob has PC

Published Data

Identify

25

Regularity in Time Series • Linear method can’t solve problem => Nonlinear Analysis

Refrigerator data and its surrogate

ACF and periodgramTime points

26

Entropy (1)• Display time series data in phase-space

y(m,t) = [x(t), x(t+lag), …, x(t+(m-1)lag)]

• Approximate Entropy (ApEn) and Sample Entropy (SaEn): evaluate trajectory matching conditional probability

x(t+7)

x(t)x(t)

x(t+7)

x(t+14)

m=2, lag=7 m=3, lag=7

27

Setting time lag• Time lag = First zero-crossing of ACF

Dev Lag ApEn SaEn

Unknown 500 0.348 0.473

Refri 7 1.223 0.944

Laptop 198 1.299 1.457

Noise 2 3.025 3.247

28

Setting m, r (1)

29

• K_A, K_B : overlapped template matching patterns number (pattern length m, m+1)

• 95% Confidence of SaEn

Setting m, r (2)

m=2, r=(0.1~0.4)stdtraining for parameters: tracebase dataset 30

ExperimentData set Tracebase IREF

Smart tap type Plugwise Plugwise

Number of devices 138 136

Time range Variation 5 weeks

Sampling interval 1 s 2 mins

Usage Training for m, r Evaluation

Result m=2,3 r=0.1~0.4 std ***

31

Result (1)IREF : 136 devices, 5 weeks

32

Other knowledge from entropy rate (?)

Device classification & abnormal detection

Tracebase data set33