Statistical database security Special purpose: used only for statistical computations. General...

12
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well as statistical ones. Main problem: achievment of compromise between the privacy needs of individuals and the right of organizations to know and process information: preventing statistical inference.

Transcript of Statistical database security Special purpose: used only for statistical computations. General...

Page 1: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Statistical database security• Special purpose: used only for statistical

computations.

• General purpose: used with normal queries (and updates) as well as statistical ones.

• Main problem: achievment of compromise between the privacy needs of individuals and the right of organizations to know and process information: preventing statistical inference.

Page 2: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Statistical database security• Issues:

– Characteristics of the SDB to be protected: Is the database on-line (i.e. queries executed immediately) or off-line (queries executed later)? Is the SDB static (no updates) or dynamic?

– Additional knowledge of users: depending on the knowledge of a user it is easier or more difficult for the user to perform inference.

– Types of attacks: developer needs to “know” the type of inference attacks potential snoopers will use.

Page 3: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Conceptual techniques: definition of

populations, partitioning.

• Restriction-based techniques: restrict the type of queries that may be asked or the kind of result that may be obtained.

• Perturbation-based techniques: distort the data so that the statistical results are still correct but possibly inferred data are incorrect.

Page 4: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Conceptual techniques:

– The lattice model: a lattice can be built for combinations of conditions on attributes. The n-respondent k%-dominance criterion says: a statistic is sensitive if n or fewer records represent more than k% of the population.

– Conceptual partitioning: populations are defined at a semantic level. (e.g. male employees in a department.)

Page 5: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Restriction-based techniques:

– Query-set size control: a statistic query q(C) is permitted only if its query set X(C) satisfies

k | X(C) | N – k

(N is the number of SDB record and k 0 is a fixed parameter.)

This prevents simple attacks based on very small or very large query sets.

It does not prevent more sophisticated attacks using trackers, general trackers, double trackers and union trackers.

Page 6: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Restriction-based techniques:

– Expanded query-set size control: given query q(A=a and B=b and … and C=c) there are 2m implied query sets where m is the number of parts in the query:

q(<Xa> A=a and <Xb> B=b and … and <Xc> C=c) where Xi is either “” or “not”.

The query q is only allowed if all 2m implied query sets fall in the allowable range [k, N – k].

This technique becomes very expensive for large values of m.

Page 7: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Restriction-based techniques:

– Query-set overlap control: check the overlap in query sets of successive queries against the number of common records they have. Query q(C) is permitted only if:

| X(C) X(D) | , > 0

thus, the number of common records between query set of q(C) and the query set of all the query sets q(D) of all earlier released queries is not more than .

Page 8: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Restriction-based techniques:

– Audit-based controls: while query-overlap control may not be very effective at preventing inference, it is possible to detect attempts at such inference by observing audit-trails of successive queries (by the same user or by a group).

– Techniques based on number of attributes: the DBA determines that statistical queries involving more than d attributes are not permissible.

Page 9: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Restriction-based techniques:

– Partitioning: the population is divided into small disjoint subgroups (and population of 1 is not allowed). Queries are only allowed on such groups, thus forbidding arbitrary sets.

– Cell suppression: like with partitioning, but all “cells” which satisfy the n-respondent k%-dominance rule are considered sensitive and cannot be examined.

Page 10: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Perturbation-based techniques:

– Record-based perturbation: the records in the database are distorted before applying the statistics.

– Result-based perturbation: the correct result is distorted before releasing it.

– The difference between the true value and the released value of a statistic is called bias.

– Perturbed statistics must be consistent, i.e. free of paradoxes. Whatever the bias, the results should be “possible”.

Page 11: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Perturbation-based techniques:

– Data swapping: attribute values between the records of the original SDB are exchanged in such a way that the resulting modified SDB has no records in common with the original SDB.

– Random-sample queries: the actual query set is replaced by a random sampled query set. This only works if the query sets are large enough, otherwise attacks based on small-size query sets become possible.

Page 12: Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Inference protection techniques• Perturbation-based techniques:

– Fixed perturbation: the values of the attributes used in the computation of statistics are modified in a fixed way (does not vary from query to query). This fixed way eliminates the risk of improving the estimates by repeating a query.

– Query-based perturbation: The perturbation is different for different queries.

– Rounding: The result of a statistical query is rounded before being released. There is systematic, random and controlled rounding.