A workload-adaptive mechanism for linear queries under ... · beyond these special cases, as it can...

A workload-adaptive mechanism for linear queries underlocal differential privacy

Ryan McKenna Raj Kumar Maity Arya Mazumdar Gerome MiklauCollege of Information and Computer Sciences

University of Massachusetts, Amherst

{rmckenna, rajkmaity, arya, miklau}@cs.umass.edu

ABSTRACTWe propose a new mechanism to accurately answer a user-provided set of linear counting queries under local differentialprivacy (LDP). Given a set of linear counting queries (theworkload) our mechanism automatically adapts to provide ac-curacy on the workload queries. We define a parametric classof mechanisms that produce unbiased estimates of the work-load, and formulate a constrained optimization problem toselect a mechanism from this class that minimizes expectedtotal squared error. We solve this optimization problemnumerically using projected gradient descent and providean efficient implementation that scales to large workloads.We demonstrate the effectiveness of our optimization-basedapproach in a wide variety of settings, showing that it out-performs many competitors, even outperforming existingmechanisms on the workloads for which they were intended.

PVLDB Reference Format:Ryan McKenna Raj Kumar Maity Arya Mazumdar GeromeMiklau . A workload-adaptive mechanism for linear queries underlocal differential privacy . PVLDB, 13(11): 1905-1918, 2020.DOI: https://doi.org/10.14778/3407790.3407798

1. INTRODUCTIONIn recent years, Differential Privacy [15] has emerged as the

dominant approach to privacy and its adoption in practicalsettings is growing. Differential privacy is achieved withcarefully designed randomized algorithms, called mechanisms.The aim of these mechanisms is to extract utility from thedata while adhering to the constraints imposed by differentialprivacy. Utility is measured on a task-by-task basis, anddifferent tasks require different mechanisms. Utility-optimalmechanisms, or mechanisms that maximize utility for a giventask under a fixed privacy budget, are still not known inmany cases.

There are two main models of differential privacy: thecentral model and the local model. In the central model,users provide their data to a trusted data curator, who runs aprivacy mechanism on the dataset in its entirety. In the local

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copyof this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. Forany use beyond those covered by this license, obtain permission by [email protected]. Copyright is held by the owner/author(s). Publication rightslicensed to the VLDB Endowment.Proceedings of the VLDB Endowment, Vol. 13, No. 11ISSN 2150-8097.DOI: https://doi.org/10.14778/3407790.3407798

model, users execute a privacy mechanism before sending itto the data curator. Local differential privacy (LDP) offers astronger privacy guarantee than central differential privacy,as it does not rely on the assumption of a trusted datacurator. For that reason, it has been embraced by severalorganizations like Google [18], Apple [36], and Microsoft [14]for the collection of personal data from customers. While thestronger privacy guarantee is a benefit of the local model, itnecessarily leads to greater error than the central model [16],which makes error-optimal mechanisms an important goal.

Our focus is answering a workload of linear counting queriesunder local differential privacy. Answering a query workloadis a general task that subsumes other common tasks, likeestimating histograms, range queries, and marginals. Fur-thermore, the expressivity of linear query workloads goes farbeyond these special cases, as it can include an arbitraryset of predicate counting queries. By defining the work-load, the analyst expresses the exact queries they care aboutmost, and their relative importance. There are several LDPmechanisms for answering particular fixed workloads, likehistograms [1, 42, 5, 38], range queries [13, 39], and marginals[12, 39]. These mechanisms were carefully crafted to provideaccuracy on the workloads for which they were designed, buttheir accuracy properties typically do not transfer to otherworkloads. Some LDP mechanisms are designed to answeran arbitrary collection of linear queries [4, 17], but they donot outperform simple baselines in practice.

In this paper, we propose a new mechanism that auto-matically adapts in order to prioritize accuracy on a targetworkload. Adaptation to the workload is accomplished bysolving a numerical optimization problem, in which we searchover an expressive class of unbiased LDP mechanisms forone that minimizes variance on the workload queries.

Workload-adaptation [21, 27, 33] is a much more devel-oped topic in the central model of differential privacy and hasled to mechanisms that offer best-in-class error rates in somesettings [22]. Our work is conceptually similar to the MatrixMechanism [27, 33], which also minimizes variance over aclass of unbiased mechanisms. However, because the class ofmechanisms we consider is different, the optimization prob-lem is fundamentally different and requires a novel analysisand algorithmic solution. In addition, the optimal mecha-nism in our problem formulation depends on the setting ofthe privacy parameter, ε, while in the Matrix Mechanismthis is not a factor. We thoroughly discuss the similaritiesand differences between these two mechanisms in Section 7.

1905

Contributions. The paper consists of four main technicalcontributions.

• We propose a new class of mechanisms, called work-load factorization mechanisms, that generalizes manyexisting LDP mechanisms, and we formulate an opti-mization problem to select a mechanism from this classthat is optimally tailored to a workload (Section 3).

• We give an efficient algorithm to approximately solvethis optimization problem, by reformulating it into analgorithmically tractable form (Section 4).

• We provide a theoretical analysis which illuminateserror properties of the mechanism and justifies thedesign choices we made (Section 5).

• In a comprehensive set of experiments we test ourmechanism on a range workloads, showing that it con-sistently delivers lower error than competitors, by asmuch as a factor of 14.6 (Section 6).

2. BACKGROUND AND PROBLEM SETUPIn this section we introduce notation for the input data

and query workload as well as provide basic definitions oflocal differential privacy. A full review of notation is providedin the Appendix.

2.1 Input Data and WorkloadGiven a domain U of n distinct user types ,the input data

is a collection of N users 〈u1, . . . , uN 〉, where each ui ∈ U .We commonly use a vector representation of the input data,containing a count for each possible user type:

Definition 2.1 (Data Vector) The data vector, denotedby x, is an n-length vector of counts indexed by user typesu ∈ U such that:

xu =

N∑j=1

1{uj = u} ∀u ∈ U .

In the local model, we do not have direct access to x, but itis still useful to define it for the purpose of analysis. Belowis a simple data vector one might obtain from a data set ofstudent grades.

Example 2.2 (Student Data) Consider a data set of stu-dent grades, where U = {A,B,C,D, F}. Suppose 10 studentsgot an A, 20 students got a B, 5 students got a C and nostudents got a D or F . Then the data vector would be:

x =[10 20 5 0 0

]TLinear counting queries have a similar vector representa-

tion, as shown in Definition 2.3.

Definition 2.3 (Linear Query) A linear counting queryis an n-length vector w indexed by user types u ∈ U , suchthat the answer to the query is wTx =

∑u∈U wuxu.

A workload is a collection p linear queries w1, . . . ,wp ∈ Rnorganized into a p× n matrix W. Our goal is to accuratelyestimate answers to each workload query under local differ-ential privacy, i.e., we want to privately estimate Wx.

The most commonly studied workload is the so-calledHistogram workload, which is represented by a n×n identitymatrix. A more interesting workload is given below:

Example 2.4 (Prefix workload) The prefix workload con-tains queries that compute the (unnormalized) empirical cu-mulative distribution function of the data, or the number ofstudents that have grades ≥ A,≥ B,≥ C,≥ D,≥ F .

W =

1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1

The workload W is an input to our algorithm, reflecting the

queries of interest to the analyst and therefore determiningthe measure of utility that will be used to assess algorithmperformance. In this setup, we make no assumptions aboutthe structure or contents of W, and allow it to be completelyarbitrary, even including the same query multiple times ormultiple linearly dependent queries.

2.2 Local Differential PrivacyLocal differential privacy (LDP) [16] is a property of a

randomized mechanism M that acts on user data. Insteadof reporting their true user type, users instead report arandomized response obtained by executing M on their trueinput. These randomized responses allow an analyst tolearn something about the population as a whole, while stillproviding the individual users a form of plausible deniabilityabout their true input. The formal requirement on M isstated in Definition 2.5.

Definition 2.5 (Local Differential Privacy) A random-ized mechanism M : U → O is said to be ε-LDP if an onlyif, for all u, u′ ∈ U and all S ⊆ O:

Pr[M(u) ∈ S] ≤ exp (ε) Pr[M(u′) ∈ S]

The output range O can vary between mechanisms. Insome simple cases, it is the same as the input domain U ,but it does not have to be. Choosing the output range istypically the first step in designing a mechanism. Whenthe range of the mechanism is finite, i.e., |O| = m, we cancompletely specify a mechanism by a so-called strategy matrixQ ∈ Rm×n, indexed by (o, u) ∈ O × U . The mechanismMQ(u) is then defined by:

Pr[MQ(u) = o] = Qo,u

This encoding of a mechanism essentially stores a proba-bility for every possible input, output pair in the strategymatrix Q. We translate Definition 2.5 to strategy matricesin Proposition 2.6.

Proposition 2.6 (Strategy Matrix) A mechanism MQ

is ε-LDP if and only if the following conditions are satisfied:

1. Qo,u ≤ exp (ε)Qo,u′ , ∀ o, u, u′.

2. Qo,u ≥ 0,∀o, u ,∑oQo,u = 1,∀ u.

Above, the first condition is the privacy constraint, ensuringthat the output distributions for any two users are close, andthe second is the probability simplex constraint, ensuringthat each column of Q corresponds to a valid probabilitydistribution. Representing mechanisms as matrices is usefulbecause it allows us to reason about them mathematicallywith linear algebra [26, 24]. Example 2.7 shows how a simple

1906

Table 1: Existing LDP mechanisms encoded as a strategy matrix. eu is the one-hot encoding of u, H is the K ×K Hadamardmatrix, and d is a hyper-parameter.

Mechanism Input Output Strategy Matrix

Randomized Response [41] u ∈ [n] o ∈ [n] Qo,u ∝

{exp (ε) o = u

1 o 6= u

RAPPOR [18] u ∈ [n] o ∈ {0, 1}n Qo,u ∝ exp(ε2

)n−‖o−eu‖1Hadamard [1] u ∈ [n]

o ∈ [K]

K = [2dlog2(n+1)e]Qo,u ∝

{exp (ε) Ho+1,u = 1

1 Ho+1,u = −1

Subset Selection [42] u ∈ [n]o ∈ {0, 1}n‖o‖1 = d

Qo,u ∝

{exp (ε) ou = 1

1 ou = 0

mechanism, called randomized response1, can be encoded asa strategy matrix.

Example 2.7 (Randomized Response) The randomizedresponse mechanism [41] can be encoded as a strategy matrixin the following way:

Q =1

eε + n− 1

eε 1 . . . 11 eε . . . 1...

.... . .

...1 1 . . . eε

For this mechanism, the output range is the same as the

input domain, and hence the strategy matrix is square. Thediagonal entries of the strategy matrix are proportional toeε, and the off-diagonal entries are proportional to 1. Thismeans that each user reports their true input with proba-bility proportional to eε and all other possible outputs withprobability proportional to 1. It is easy to see that the con-ditions of Proposition 2.6 are satisfied. While this is one ofthe simplest mechanisms, many other mechanisms can alsobe represented in this way.

For example, Table 1 shows how RAPPOR [18], Subset Se-lection [42] and Hadamard [2] can be expressed as a strategymatrix. Other mechanisms with more sophisticated struc-ture, such as Hierarchical [13, 39] and Fourier [12], can alsobe expressed as a strategy matrix, but they require too muchnotation to explain in Table 1. A strategy matrix is simplya direct encoding of a conditional probability distribution,where every probability is explicitly enumerated as an entryof the matrix. Hence, the representation can encode anyLDP mechanism, as long as the U and O are both finite andthe conditional probabilities can be calculated.

When executing the mechanism, each user reports a (ran-domized) response oi =MQ(ui). When all users randomizetheir data with the same mechanism, these responses aretypically aggregated into a response vector y ∈ Rm (indexed

by elements o ∈ O), where yo =∑Nj=1 1{oj = o}. Much like

the data vector, the response vector is essentially a histogramof responses, as it counts the number of users who reportedeach response. In the sequel, it is useful to think of themechanism MQ as being a function from x to y instead ofui to oi. Thus, for notational convenience, we overload thedefinition of MQ, allowing it to consume a data vector andreturn a response vector, so that MQ(x) = y.

1the name of this mechanism should not be confused withthe outputs of an arbitrary mechanism M, which we alsocall randomized responses.

The response vector y is often not that useful by itself,but it can be used to estimate more useful quantities, suchas the data vector x or the workload answers Wx. This istypically achieved with a post-processing step, and does notimpact the privacy guarantee of the mechanism.

3. THE FACTORIZATION MECHANISMIn this section, we describe our mechanism and the main

optimization problem that underlies it. We begin with a high-level problem statement, and reason about it analyticallyuntil it is in a form we can deal with algorithmically. Wepresent our key ideas and the main steps of our derivation,but defer the finer details of the proofs to Appendix A.

Our goal is to find a mechanism that has low expected erroron the workload. This objective is formalized in Problem 3.1.

Problem 3.1 (Workload Error Minimization) Givena workload W, design an ε-LDP mechanism M∗ that mini-mizes worst-case expected L2 squared error. Formally,

M∗ = arg minM

{max

xE[‖Wx−M(x)‖22]

}In the problem statement above, our goal is to search

through the space of all ε-LDP mechanisms for the one thatis best for the given workload. Because it is difficult tocharacterize an arbitrary mechanismM in a way that makesoptimization possible, we do not solve the above problemin its full generality. Instead, we perform the search over arestricted class of mechanisms which is easier to characterize.While somewhat restricted, this class of mechanisms is quiteexpressive, and it captures many of the state-of-the-art LDPmechanisms available today [18, 42, 2, 12, 13, 39].

Definition 3.2 (Workload Factorization Mechanism)Given an ε-LDP strategy matrix Q ∈ Rm×n and a reconstruc-tion matrix V ∈ Rp×m such that W = VQ, the WorkloadFactorization Mechanism (factorization mechanism for short)is defined as:

MV,Q(x) = VMQ(x)

Note thatMV,Q is defined in terms ofMQ, and it is param-eterized by an additional reconstruction matrix V as well.This reconstruction matrix is used to estimate the workloadquery answers from the response vector output by MQ. Infact, the workload query estimates produced by this class ofmechanisms is unbiased, as:

E[MV,Q(x)] = VE[MQ(x)] = VQx = Wx.

Furthermore, MV,Q inherits the privacy guarantee of MQ

by the post-processing principle of differential privacy [16].

1907

Many existing LDP mechanisms can be represented as afactorization mechanism. For example, we show how theRandomized Response mechanism can be expressed as afactorization mechanism in Example 3.3.

Example 3.3 (Randomized Response) The randomizedresponse mechanism uses Q as defined in Example 2.7 andV = Q−1 to estimate the Histogram workload (W = I).

V =1

eε − 1

eε + n− 2 −1 . . . −1−1 eε + n− 2 . . . −1...

.... . .

...−1 −1 . . . eε + n− 2

While the randomized response mechanism is intended to

be used to answer the Histogram workload, there is no reasonwhy it cannot be used for other workloads as well. In fact,it is quite straightforward to see how it can be extended toanswer an arbitrary workload, simply by using V = WQ−1.

3.1 Variance DerivationWhile the factorization mechanism is unbiased for any

workload factorization, different factorizations lead to dif-ferent amounts of variance on the workload answers. Thiscreates the opportunity to choose the workload factorizationthat leads to the lowest possible total variance. In order todo that, we need an analytic expression for the total variancein terms of V and Q, which we derive in Theorem 3.4.

Theorem 3.4 (Variance) The expected total squared error(total variance) of a workload factorization mechanism is:

E[‖Wx−MV,Q(x)‖22] =∑u∈U

xu

p∑i=1

vTi Diag(qu)vi−(vTi qu)2

where qu denotes column u of Q and vTi denotes row i of V.

Notice above that the exact expression for variance dependson the data vector x, which we do not have access to, as itis a private quantity. We want our mechanism to work wellfor all possible x, so we consider worst-case variance and arelaxation average-case variance instead.2

Corollary 3.5 (Worst-case variance) The worst-casevariance of MV,Q occurs when all users have the same worstcase type (i.e., xu = N for some u), and is:

Lworst(V,Q) = N maxu∈U

p∑i=1

vTi Diag(qu)vi − (vTi qu)2.

Corollary 3.6 (Average-case variance) The average-case variance of MV,Q occurs when all user types have thesame count (i.e., xu = N

nfor all u), and is:

Lavg(V,Q) =N

n

∑u∈U

p∑i=1

vTi Diag(qu)vi − (vTi qu)2.

With these analytic expressions for variance, we can ana-lyze and compare existing mechanisms that can be expressedas a workload factorization mechanism. The variance forrandomized response is shown in Example 3.7.

2Alternatively, if we had a prior distribution over x, we coulduse that to estimate variance.

Example 3.7 (Variance of Randomized Response)The worst-case and average-case variance of the factorizationin Example 3.3 on the Histogram workload is:

Lworst(V,Q) = Lavg(V,Q) = N(n−1)[ n

(eε − 1)2+

2

eε − 1

]The expression above is obtained by simply plugging in V

and Q to the equations above and simplifying. Interestingly,the worst-case and average-case variance are the same for thisworkload factorization due to the symmetry in the workloadand strategy matrices.

3.2 Strategy OptimizationWith an analytic expression for variance, we can state

the optimization problem underlying the factorization mech-anism. Our goal is to find a workload factorization thatminimizes the total variance on the workload. To do that,we set up an optimization problem, using total variance asthe objective function while taking into consideration theconstraints that have to hold on V and Q. This is formalizedin Problem 3.8.

Problem 3.8 (Optimal Factorization) Given a privacybudget ε and a workload W,

minimizeV,Q

L(V,Q)

subject to W = VQ∑oQo,u = 1 ∀u

0 ≤ Qo,u ≤ exp (ε)Qo,u′ ∀o, u, u′.

Above, L is a loss function that captures how good a givenfactorization is, such as the worst-case variance Lworst or theaverage-case variance Lavg. While our original objective inProblem 3.1 was to find the mechanism that optimizes worst-case variance, for practical reasons we use the average-casevariance instead. The average-case variance is a smooth ap-proximation of the worst-case variance, which leads to a moreanalytically and computationally tractable problem. Addi-tionally, the smoothness of the average-case variance makesthe corresponding optimization problem more amenable tonumerical methods. We study the ramifications of this re-laxation theoretically in Section 5.1.When using Lavg asthe objective function, we observe it can be expressed ina much simpler form using matrix operations, as shown inTheorem 3.9.

Theorem 3.9 (Objective Function) The objective func-tion L(V,Q) = tr[VDQVT ] is related to Lavg(V,Q) by:

Lavg(V,Q) =N

n

(L(V,Q)− ‖W‖2F

),

where DQ = Diag(Q1) and tr[·] is the trace of a matrix.

From now on, when we refer to L(V,Q), we are referring toits definition in Theorem 3.9. The new objective is equivalentto Lavg up to constant factors, and hence can be used inplace of it for the purposes of optimization.

With this simplified objective function, we observe thatfor a fixed strategy matrix Q, we can compute the optimalV in closed form. If the entries of the response vector werestatistically independent and had equal variance, then thiswould simply be V = WQ†, where Q† is the Moore-Penrosepseudo-inverse of Q [11, 23]. However, since the entries

1908

of the response vector have unequal variance and are notstatistically independent in general, this simple expressionis not correct. We can still express the optimal V in closedform, however, as shown in Theorem 3.10.

Theorem 3.10 (Optimal V for fixed Q) For a fixed Q,the minimizer of L(V,Q) subject to W = VQ is given by:

V = W(QTD−1Q Q)†QTD−1

Q ,

Note that we can assume DQ is invertible without lossof generality. If it were not, then one entry of the diagonalwould have to be 0, implying that a row of Q is all zero. Sucha row corresponds to an output that never occurs under theprivacy mechanism, and can be removed without changingthe mechanism. Further note that for the above formula toapply, there must exist a V such that W = VQ, which isguaranteed if and only if W is in the row space of Q [35].Expressed as a constraint, this is W = WQ†Q.

Now that we know the optimal V for any Q, we can plugit into L(V,Q) to express the objective as a function of Qalone. Doing this, and simplifying further, leads to our finaloptimization objective, stated in Theorem 3.11.

Theorem 3.11 (Objective Function for Q) The objec-tive function can be expressed as:

L(Q) = tr[(QTD−1Q Q)†(WTW)].

L(Q) is our final optimization objective. We are almostready to restate the optimization problem in terms of Q.However, before we do that, it is useful to simplify theconstraints of the problem. The constraints stated in Prob-lem 3.8 are challenging to deal with algorithmically becausethere are a large number of them. Ignoring the factorizationconstraint, there are n2m + n constraints on Q, and eachentry of Q is constrained by entries from the same columnand entries from the same row.

By introducing an auxiliary optimization variable z ∈ Rm,we reduce this to nm+n constraints, so that each entry of Qis only constrained by entries from the same column and z.Specifically, z corresponds to the minimum allowable valueon each row of Q, and every column of Q must be betweenz and exp (ε)z (coordinate-wise inequality). It is clear thatthis is exactly equivalent to Condition 2 in Proposition 2.6.Also note that Condition 1 can be expressed in matrix formas QT1 = 1. The final optimization problem underlying theworkload factorization mechanism is stated in Problem 3.12.

Problem 3.12 (Strategy Optimization) Given a work-load W and a privacy budget ε:

minimizeQ,z

tr[(QTD−1Q Q)†(WTW)]

subject to W = WQ†Q

QT1 = 1

0 ≤ z ≤ qu ≤ exp (ε)z ∀u.

4. OPTIMIZATION ALGORITHMWe now discuss our approach to solving Problem 3.12. It

is a nonlinear optimization problem with linear and nonlinearconstraints. While the objective is smooth (and hence differ-entiable) within the boundary of the constraint W = WQ†Q,it is not convex. It is typically infeasible to find a closed

Algorithm 1 Projection onto bounded probability simplex

Input: r, z ∈ Rm, ε. sorted vector and corresponding permutationu,π = sort([z− r, exp (ε)z− r])a =

[(−1)1[πi≤m]

]i=1...2m

b =[∑i−1

j=1 aj]i=1...2m

ρ = min{1 ≤ i ≤ 2m : 1T z + biui +∑i−1j=1 ajuj > 1}

λ = (1− 1T z− bρuρ −∑ρ−1j=1 ajuj)/bρ + uρ

return clip(r + λ, z, exp (ε)z)

form solution to such a problem, and conventional numericaloptimization methods are not guaranteed to converge to aglobal minimum for non-convex objectives. However, suchnumerical gradient-based methods have seen remarkable em-pirical success in a variety of domains, often finding highquality local minima. That is the approach we take, however,rather than use an out-of-the-box commercial solver, whichwould not be able to scale to larger problem sizes, we pro-vide our own optimization algorithm which achieves greaterscalability by exploiting structure in the constraint set.

The algorithm we propose is an instance of projected gra-dient descent [34], a variant of gradient descent that handlesconstraints. To implement this algorithm, the key challengeis to project onto the constraint set. In other words, givena matrix R that does not satisfy the constraints, find the“closest” matrix Q that does satisfy the constraints. Ignoringthe constraint W = WQ†Q for now, this sub-problem isstated formally in Problem 4.1.

Problem 4.1 (Projection onto LDP Constraints)Given an arbitrary matrix R, a vector z, and a privacybudget ε, the projection onto the privacy constraint, denotedQ = Πz,ε(R) is obtained by solving the following problem:

minimizeQ

‖Q−R‖2F

subject to QT1 = 1

z ≤ qu ≤ exp (ε)z ∀u.

Problem 4.1 is easier to solve than Problem 3.12 becausethe objective is now a quadratic function of Q. In addition,a key insight to solve this problem efficiently is to noticethat it is closely related to the problem of projecting ontothe probability simplex [40] (now with bound constraints),and admits a similar solution. Specifically, the form of thesolution is stated in Proposition 4.2.

Proposition 4.2 (Projection Algorithm) The solutionto Problem 4.1 may be obtained one column at a time using

qu = clip(ru + λu, z, exp (ε)z),

where clip clips the entries of ru+λu to the range [z, exp (ε)z]entry-wise and λu is a scalar value that makes 1Tqu = 1.

The solution is remarkably simple. Intuitively, we addthe same scalar value to every entry of ru then clip thosevalues that lie outside the allowed range. The scalar valueλu is the Lagrange multiplier on the constraint 1Tqu = 1,and is chosen so that qu satisfies the constraint. It may becalculated through binary search or any other method to findthe root of the function 1Tqu − 1 = 0. We give an efficientO(m log (m)) algorithm to find λu and qu in Algorithm 1.

1909

Algorithm 2 Strategy optimization

Input: Workload W ∈ Rp×n, privacy budget εIntialize Q ∈ Rm×n, z ∈ Rm, β ∈ R+

α = β/n exp (ε)for t = 1, . . . , T do

z← clip(z− α∇zL(Q),0,1)Q← Πz,ε(Q− β∇QL(Q))

end forreturn Q

Now that we have discussed the projection problem andits solution, we are ready to state the full projected gradientdescent algorithm for finding an optimized strategy. Algo-rithm 2 is an iterative algorithm, where in each iterationwe perform a gradient descent plus projection step on theoptimization parameters z and Q. The gradient ∇QL iseasily obtained as L is a function of Q, but the gradientterm ∇zL is less obvious. However, by observing that Q isactually a function of z (from the projection step Πz,ε), wecan use the multi-variate chain rule to back-propagate thegradient from Q to z to obtain ∇zL. We do not discuss thedetails of computing the gradients here, as it can be easilyaccomplished with automatic differentiation tools [20, 31].

We note that Algorithm 2 handles the constraint W =WQ†Q “for free” in the sense that we do not need to dealwith it explicitly, as long as the step sizes and initializationare chosen appropriately. Specifically, as long as the initialQ satisfies the constraint, and the step sizes are sufficientlysmall, every subsequent Q in the algorithm will also satisfythe constraint. Intuitively, this is because as we move closerto the boundary of the constraint, the objective functionblows up and eventually reaches a point of discontinuity whenthe constraint is not satisfied. Because we update using thenegative gradient, which is a descent direction, we will neverapproach the boundary of the constraint. We discuss thechoice of step size and initialization below. This trick is avery nice way to deal with a constraint that is otherwisechallenging to deal with. We note that similar ideas havebeen used to deal with related constraints in prior work [43].

The step size for the gradient descent step must be suppliedas input, and two different step sizes are used to update Qand z. Notice that we take a smaller step size to update zthan Q. This is a heuristic we use to make sure z doesn’tchange too fast; it improves the robustness of the algorithm.We perform a hyper-parameter search to find a step size thatworks well, only running the algorithm for a few iterations inthis phase, then running it longer once a step size is chosen.Decaying the step size at each iteration is also possible, assmaller step sizes typically work better in later iterations.

The final missing piece in Algorithm 2 is the initializationof Q, for which there are multiple options. One option is toinitialize with the strategy matrix from an existing mecha-nism, such as the best one from Table 1. Then intuitively theoptimized strategy will never be worse than the other mech-anisms, because the negative gradient is a descent direction.This is an informal argument, as it is technically possiblethat the optimized strategy has better average-case variancebut worse worst-case variance than the initial strategy. Wedo not take this approach, however, as we find initializingQ randomly tends to work better. Specifically, we let R bea random 4n× n matrix, where each entry is sampled from

U [0, 1]. Then we obtain Q by projecting onto the constraint

set; i.e., Q = Πz,ε(R), where z = 1+e−ε

8n1, where 1 is a

vector of ones. The choice of m is also an important consid-eration when initializing Q. While larger m leads to a moreexpressive strategy space, it also leads to more expensiveoptimization. Our choice of m = 4n represents a sweet spotthat we found works well empirically across a variety of work-loads. In general, a hyper-parameter search can be executedto find the best m. This hyper-parameter search does notdegrade the privacy guarantee in any way because we canevaluate the quality of a strategy without consuming theprivacy budget, by using the analytic formulas for variance.

It requires O(n2m + n3) time to evaluate the objectivefunction and its gradient (assuming WTW has been pre-computed) and O(nm logm) time to perform the projection.Thus, the per-iteration time complexity of Algorithm 2 isO(n2m+ n3 + nm logm), or O(n3) when using m = 4n.

5. THEORETICAL RESULTSIn this section, we answer several theoretical questions

about our mechanism. First, we justify the relaxation in theobjective function, used to make the optimization analyti-cally tractable. Second, we theoretically analyze the errorachieved by our mechanism, measured in terms of samplecomplexity. Third, we derive lower bounds on the achievableerror of workload factorization mechanisms. All the proofsare deferred to Appendix A.

5.1 Relaxed ObjectiveIn Section 3 we replaced our true optimization objective

Lworst with a relaxation Lavg. In this section, we justify thatchoice theoretically, showing that Lworst is tightly boundedabove and below by Lavg.

Theorem 5.1 (Bounds on Lworst) Let W = VQ be anarbitrary factorization of W where Q is an ε-LDP strategymatrix. Then the worst case variance Lworst(V,Q) andaverage-case variance Lavg(V,Q) are related as follows:

Lavg(V,Q) ≤ Lworst(V,Q) ≤ eε(Lavg(V,Q) +

N

n||W||2F

)Theorem 5.1 suggests that relaxing Lworst to Lavg does not

significantly impact the optimization problem. Intuitively,this theorem holds because of the privacy constraint on Q,which guarantees that the column of Q for the worst-caseuser cannot be too different from any other column. Hence,all users must have a similar impact on the total variance ofthe mechanism. Empirically, we find that Lworst is often evencloser to Lavg than the upper bound suggests. Furthermore,in some cases Lworst is exactly equal to Lavg, as we showedin Example 3.7.

5.2 Sample complexityWe gave an analytic expression for the expected total

squared error (total variance) of our mechanism in Corol-lary 3.5. However, this quantity might be difficult to inter-pret, and it is more natural to look at the number of samplesneeded to achieve a fixed error instead. Furthermore, whenrunning an LDP mechanism it is important to know howmuch data is required to obtain a target error rate, as thatinformation is critical for determining an appropriate privacybudget.

1910

Because the total variance increases with the number ofindividuals N and the number of workload queries m, weinstead look at a normalized measure of variance.

Definition 5.2 (Normalized Variance) The normalizedworst-case variance of MV,Q is:

Lnorm(V,Q) = maxx

E[ 1

m

∥∥∥∥ 1

N

{Wx−MV,Q(x)

}∥∥∥∥22

]Lnorm is the same as Lworst up to constant factors, al-

though it is more interpretable because it is a measure ofvariance on a single “average” workload query, where varianceis measured on the normalized data vector x/N .

Corollary 5.3 (Normalized Variance) The normalizedvariance is:

Lnorm(V,Q) =1

mN2Lworst(V,Q)

=1

mNmaxu∈U

p∑i=1

vTi Diag(qu)vi − (vTi qu)2

Interestingly, the dependence on N does not change with Vand Q — it is always Θ( 1

N), but the constant factor depends

on the quality of the workload factorization.

Corollary 5.4 (Sample Complexity) The number ofsamples needed to achieve normalized variance α is:

N ≥ 1

mαmaxu∈U

p∑i=1


We can readily compute the sample complexity numericallyfor any factorization VQ. In fact, the sample complexityand worst-case variance of a mechanism are proportional, asevident from the above equation. Additionally, by replacingLworst(V,Q) with a lower bound, we can get an analyticalexpression for the sample complexity in terms of the privacybudget ε and the properties of the workload W.

Example 5.5 (Sample complexity, Randomized Response)

The Randomized Response mechanism described in Exam-ple 3.3 has sample complexity:

N ≥ (n− 1)

αn

[ n

(eε − 1)2+

2

eε − 1

]on the Histogram workload.

Example 5.5 suggests the sample complexity of the ran-domized response mechanism grows roughly at a linear ratewith the domain size n.

5.3 Lower BoundFor a given workload, a theoretical lower bound on the

achievable error is useful for checking how close to optimalour strategies are. It also can be used to characterize theinherent difficulty of the workload. In this section, we derivean easily-computable lower bound on the achievable errorunder our mechanism in terms of the singular values of theworkload matrix.

Theorem 5.6 (Lower Bound, Factorization Mechanism)

Let W be a workload matrix and let Q be any ε-LDP strategymatrix. Then we have:

1

exp (ε)(λ1 + · · ·+ λn)2 ≤ L(Q)

where λ1, . . . , λn are the singular values of W and L(Q) isthe loss function defined in Theorem 3.11.

This result is similar to lower bounds known to hold in thecentral model of differential privacy, based on the analysis ofthe Matrix Mechanism [29]. In both cases the hardness of aworkload is characterized by its singular values.

Other lower bounds for this problem have characterizedthe hardness of a workload in terms of quantities like thelargest L2 column norm of W [4], the so-called factorizationnorm of W [17], and the so-called packing number associatedwith W [9]. While interesting theoretically, the factorizationnorm and packing number are hard to calculate in practice.In contrast, our bound can be easily calculated.

Theorem 5.6 gives a lower bound on our optimizationobjective. Translating that back to worst-case variance givesus Corollary 5.7.

Corollary 5.7 (Worst-case variance) The worst-casevariance of any factorization mechanism must be at least:

N

n exp (ε)(λ1 + · · ·+ λn)2 − N

n‖W‖2F ≤ Lworst(V,Q)

Combining Corollary 5.7 with Corollary 5.4 and applyingit to the Histogram workload gives us a lower bound on thesample complexity.

Example 5.8 (Lower Bound, Histogram Workload)Every workload factorization mechanism requires at least1α

(1

exp (ε)− 1

n) samples to achieve normalized variance α on

the Histogram workload.

Note the very weak dependence on n in Example 5.8, whichsuggests that the sample complexity should not change muchwith n. Further, recall from Example 5.5 that the samplecomplexity of randomized response is linear in n. Thissuggests randomized response is not the best mechanism forthe Histogram workload. This result is not new, as there areseveral mechanisms that are known to perform better thanrandomized response [1, 42, 38, 18]. We show empiricallyin Section 6 that some of these mechanisms achieve theoptimal sample complexity for the Histogram workload up toconstant factors (i.e., no dependence on n). Our mechanismalso achieves the optimal sample complexity for this workload,but has better constant factors.

For other workloads, the sample complexity may depend onn. Calculating the exact dependence on n for other workloadsrequires deriving the singular values of the workload as afunction of n in closed form, which may be challenging forworkloads with complicated structure.

6. EXPERIMENTSIn this section we experimentally evaluate our mecha-

nism. We extensively study the utility of our mechanismon a variety of workloads, domains, and privacy levels, andcompare it against multiple competing mechanisms fromthe literature. We demonstrate consistent improvements inutility compared to other mechanisms in all settings (Sec-tion 6.2 and Section 6.3 and Section 6.4). We also study therobustness and scalability of our optimization algorithm(Section 6.5 and Section 6.6). The source code for ourmechanism, and other mechanisms represented as a strategymatrix, is available at https://github.com/ryan112358/

workload-factorization-mechanism.

1911

https://github.com/ryan112358/workload-factorization-mechanism

https://github.com/ryan112358/workload-factorization-mechanism

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107Sa

mpl

esWorkload=Histogram, Domain=512

Randomized ResponseHadamardHierarchicalFourierMatrix Mechanism (L1)Matrix Mechanism (L2)Optimized

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107

Sam

ples

Workload=Prefix, Domain=512

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107

Sam

ples

Workload=All Range, Domain=512

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107

Sam

ples

Workload=All Marginals, Domain=512

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107

Sam

ples

Workload=3-Way Marginals, Domain=512

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Epsilon

100

101

102

103

104

105

106

107

Sam

ples

Workload=Parity, Domain=512

Figure 1: Sample complexity of 7 algorithms on 6 workloads for ε ∈ [0.5, 4.0]

6.1 Experimental setup

Workloads. We consider six different workloads in our em-pirical analysis, each of which can be defined for a specifieddomain size. These workloads are intended to capture com-mon queries an analyst might want to perform on data andhave been studied previously in the privacy literature. Theseworkloads are Histogram, Prefix, All Range, All Marginals,3-Way Marginals, and Parity. Histogram is the simplest work-load, studied as a running example throughout the paper,and encoded as an identity matrix. Prefix was introducedin Example 2.4, and includes a set of range queries requiredto compute the empirical CDF over a 1-dimensional domain.All Range is a workload containing all range queries over a1-dimensional domain, studied in [12]. All Marginals and 3-Way Marginals contain queries to compute the marginals overa multidimensional binary domain, and were studied in [13].Parity also contains queries defined over a multidimensionalbinary domain, and was studied in [19].

Mechanisms. We compare our mechanism against six otherstate-of-the-art mechanisms, including Randomized Response[41], Hadamard [2], Hierarchical [13, 39], Fourier [12], andthe Matrix Mechanism [27, 17] (both L1 and L2 versions).While the Matrix Mechanism is typically thought of as atechnique for central differential privacy, it has been studiedtheoretically as a mechanism for local differential privacy aswell [17]. This version of the “distributed” Matrix Mechanismis what we compare against in experiments.

The first four mechanisms are all particular instances ofthe class of factorization mechanisms, just with differentfactorizations. They were all designed to answer a fixedworkload (e.g., Randomized Response was designed for theHistogram workload), but they can still be run on otherworkloads with minor modifications. In particular, for eachmechanism we use the same Q across different workloads,but change V based on the workload, using Theorem 3.10.

We omit from comparison the Gaussian mechanism [4],as it is strictly dominated by the L2 Matrix Mechanism.We also omit from comparison RAPPOR [18] and SubsetSelection [42], as they require exponential space to representthe strategy matrix, making it prohibitive to calculate worst-case variance and sample complexity. However, we notethat these mechanisms have been previously compared withHadamard, and shown to offer comparable performance onthe Histogram workload [2].

Evaluation. Our primary evaluation metric for comparingalgorithms is sample complexity, which we calculate exactlyusing Corollary 5.4 with α = 0.01. Recall that the samplecomplexity is proportional to the worst-case variance, but isappropriately normalized and easier to interpret. Further-more, we remark that for most experiments in this section, noinput data is required, as the sample complexities we reportapply for the worst-case dataset. In practice, we have foundthat the variance on real world datasets is quite close to theworst-case variance, as we demonstrate in Section 6.4. Wealso vary the privacy budget ε and domain size n, studyingtheir impact on the sample complexity for each mechanismand workload.

6.2 Impact of EpsilonFigure 1 shows the relationship between workload and ε

on the sample complexity for each mechanism. We considerε ranging from 0.5 to 4.0, fixing n to be 512. These privacybudgets are common in practical deployments of differentialprivacy, and local differential privacy in particular [38, 14,18]. We state our main findings below:

• Our mechanism (labeled Optimized) is consistently thebest in all settings: it requires fewer samples than everyother mechanism on every workload and ε we tested.

• The magnitude of improvement over the best competitorvaries between 1.0 (Histogram, ε = 0.5) and 14.6 (AllRange, ε = 4.0), but the improvement is typically around

1912

101 102 103

Domain100

101

102

103

104

105

106

107Sa

mpl

esWorkload=Histogram, Epsilon=1.0

Randomized ResponseHadamardHierarchicalFourierMatrix Mechanism (L1)Matrix Mechanism (L2)Optimized

101 102 103

Domain100

101

102

103

104

105

106

107

Sam

ples

Workload=Prefix, Epsilon=1.0

101 102 103

Domain100

101

102

103

104

105

106

107

Sam

ples

Workload=All Range, Epsilon=1.0

101 102 103

Domain100

101

102

103

104

105

106

107

Sam

ples

Workload=All Marginals, Epsilon=1.0

101 102 103

Domain100

101

102

103

104

105

106

107

Sam

ples

Workload=3-Way Marginals, Epsilon=1.0

101 102 103

Domain100

101

102

103

104

105

106

107

Sam

ples

Workload=Parity, Epsilon=1.0

Figure 2: Sample complexity of 7 algorithms on 6 workloads for n ∈ [8, 1024]

2.5 in the medium privacy regime. In the very high-privacyregime with ε � 0.5, our mechanism is typically quiteclose to the best competitor, and in the very low-privacyregime with ε� 4.0, our mechanism matches randomizedresponse, which is optimal in that regime. The reductionof required samples translates to a context that reallymatters: data collectors can now run their analyses onsmaller samples to achieve their desired accuracy.

• The best competitor changes with the workload and ε.For example, the best competitor on the Prefix workloadwas Hierarchical, while the best competitor on the 3-WayMarginals workload was Fourier. In both cases, thesemechanisms were specifically designed to offer low erroron their respective workloads, but they don’t work as wellon other workloads. Additionally, Randomized Responseis often the best mechanism at high ε, so even for a fixedworkload, the best competitor is not always the same. Onthe other hand, our mechanism adapts effectively to theworkload and ε, and works well in all settings. As a result,only one algorithm needs to be implemented, rather thanan entire library of algorithms, and, accordingly it is notnecessary to select among alternative algorithms.

• Some workloads are inherently harder to answer thanothers: the number of samples required by our mechanismdiffers by up to two orders of magnitude between workloads.The easiest workload appears to be Histogram, while thehardest is Parity. This is consistent with the lower boundwe gave in Theorem 5.6, which characterizes the hardnessof the workload in terms of its singular values – the boundis much lower for Histogram than for Parity.

6.3 Impact of Domain SizeFigure 2 shows the relationship between workload and n

on the sample complexity for each mechanism. We considern ranging from 8 to 1024, fixing ε to be 1.0. We state ourmain findings below:

• For the Histogram workload, there is almost no dependenceon the domain size for every mechanism except randomizedresponse. This is consistent with our finding in Example 5.8regarding the lower bound on sample complexity. Thisobservation is unique to the Histogram workload, however.

• The mechanisms that were designed for a given workload,and those that adapt to the workload, have a better de-pendence on the domain size (smaller slope) than themechanisms that do not. This includes the L2 MatrixMechanism, which is worse than every other mechanism inmost settings, but slowly overtakes the other mechanismsfor large domain sizes.

• The sample complexity of our mechanism and other mech-anisms tailored to the workload is generally O(

√n), as the

slope of the lines are ≈ 0.5 in log space. 3 On the otherhand, the sample complexity of the mechanisms not tai-lored to the workload is more like O(n) (as the slope of theline is ≈ 1.0). These findings are quite interesting: theysuggest the improvements offered by workload adaptivityare more than just a constant factor, they grow with thedomain size.

6.4 Impact of DatasetWhereas results in previous sections focused on worst-case

sample complexity, we now turn our attention to samplecomplexity on real-world benchmark datasets obtained fromthe DPBench study [22]. To calculate the sample complexityon real data, we simply replace Lworst in Corollary 5.4 withthe exact (data-dependent) expression for total variancestated in Theorem 3.4.

In Figure 3a, we plot the sample complexity of each mech-anism on three datasets for each mechanism on the Prefixworkload, fixing n = 512 and ε = 1.0. We also plot the worst-case sample complexity for reference. As expected, our mech-anism still outperforms all others on each dataset. In fact, all

3the slope of a line in log space corresponds to the power ofa polynomial in linear space; i.e., log y = m log x→ y = xm.

1913

HEPTH MEDCOST NETTRACE Worst-caseDataset

101

102

103

104

105

106

107Sa

mpl

es

Randomized ResponseHadamardHierarchicalFourier

Matrix Mechanism (L1)Matrix Mechanism (L2)Optimized

(a)

n 4n 8n 12n 16nm

1.00

1.05

1.10

1.15

1.20

1.25

Varia

nce

Ratio

All MarginalsAll RangeHistogram3-Way MarginalsParityPrefix

(b)

101 102 103 104

Domain Size

10 2

10 1

100

101

102

Per-I

tera

tion

Tim

e (s

)

(c)

Figure 3: (a) Sample complexity on benchmark datasets for Prefix workload. (b) Worst-case variance (ratio to best found) ofoptimized strategy for various m. (c) Per-iteration time complexity of optimization for increasing domain sizes.

mechanisms performed pretty consistently, offering similarsample complexities for each dataset. The largest deviationbetween datasets occurs for the Hadamard mechanism, whereit needs 1.69× more samples for the NETTRACE datasetthan for the HEPTH dataset. The Optimized mechanism iseven more consistent, as the largest deviation is only 1.006×.Additionally, the real-world sample complexity is very well-approximated by the worst-case sample complexity for theOptimized mechanism, as the maximum deviation is only1.009×. This suggests the conclusions drawn based on worst-case sample complexity in Figure 1 and Figure 2 hold forreal-world data as well. Although not shown, we repeatedthis experiment for other workloads and settings and madesimilar observations.

6.5 InitializationRecall that our optimization algorithm is initialized with a

random strategy matrix, and that different initial strategiescan lead to different optimized strategies. In this section, weaim to understand how sensitive our optimization algorithmis to the different initializations, and whether it depends onm, the number of rows in the strategy matrix.

We fix n = 64 and ε = 1.0, and vary m from 2n to 16n. Foreach m, we compute 10 optimized strategies with differentrandom initializations and record the worst-case variance foreach strategy. In order to plot all workloads on the samefigure, we normalize the worst-case variance to the best foundacross all trials. In Figure 3b, we plot the median varianceratio for each m as well as an error bar to indicate the minand max ratio obtained across the 10 trials. We observe thatthe optimization is quite robust to the initialization, andproduces pretty consistent results between runs, as evidentby the small error bars. Furthermore, the optimization is notvery sensitive to the choice of m, as all optimized strategiesare within a factor of 1.21 to the best found. Strategies tendto get closer to optimal for larger m, and eventually leveloff, with the exception of the Parity workload. We suspectthis difference is due to the fact that Parity is a low-rankworkload, and doesn’t require a large strategy. Using m = 4nas we did in other experiments tends to produce strategieswithin a factor of 1.05 to 1.1 of the best found. With enoughcomputational time and resources, we recommend using ahyper-parameter search to find the best m, as this extra 10%improvement is meaningful in practice.

6.6 Scalability of OptimizationWe measure the scalability of optimization by looking at

the per-iteration time complexity. In each iteration, we mustevaluate the objective function (and its gradient) stated inTheorem 3.11, then project onto the constraint set usingAlgorithm 1. We assume WTW has been precomputed, andnote that the per-iteration time complexity only depends onWTW through its size, and not its contents. We thereforeuse the n× n identity matrix for W. Additionally, we let Qbe a random 4n×n strategy matrix. In Figure 3c, we reportthe per-iteration time required for increasing domain sizes,averaged over 15 iterations. As we can see, optimizationscales up to domains as large as n = 4096, where it takesabout 139 seconds per iteration. While expensive, it is notunreasonable to run for a few hundred iterations in this case,and is an impressive scalability result given that there areover 67 million optimization variables when n is that large.Additionally, we note that strategy optimization is a one-timecost, and it can be done offline before deploying the mecha-nism to the users. Furthermore, as we showed in Section 6.3,the number of samples required typically increases with thedomain size, so there is good reason to run mechanisms onsmall domains whenever possible, compressing it if necessary.In general, the plot shows that the time grows roughly at aO(n3) rate, as it took about 19 seconds for n = 2048 and2.5 seconds for n = 1024, confirming the theoretical timecomplexity analysis.

7. RELATED WORKThe mechanism we propose in this work is related to a

large body of research in both the central and local modelof differential privacy.

Answering linear queries under central differential privacyis a widely studied topic [3, 30, 33, 8, 28]. Many state-of-the-art mechanisms for this task, like the Matrix Mechanism [27],achieve privacy by adding Laplace or Gaussian noise to acarefully selected set of “strategy queries”. This query strat-egy is tailored to the workload, and can even be optimizedfor it, as the Matrix Mechanism does. The optimizationproblem posed by the Matrix Mechanism has been studiedtheoretically [27, 29], and several algorithms have been pro-posed to solve it or approximately solve it [44, 28, 43, 33].While similar in spirit to our mechanism, the optimizationproblem underlying the Matrix Mechanism is substantially

1914

different from ours, as it requires search over a different spaceof mechanisms tailored to central differential privacy. Forboth optimization problems, the optimization variable is a so-called “strategy matrix”, but these represent fundamentallydifferent things in each mechanism, and hence the constraintson the strategy matrix differ. For the Matrix Mechanism,the strategy matrix encodes a set of linear queries which willbe answered with a noise-addition mechanism. In constrast,the strategy matrix for our mechanism encodes a conditionalprobability distribution.

Answering linear queries under local differential privacyhas received less attention, but one notable idea is to directlyapply mechanisms from central differential privacy to thelocal model. This translation can be achieved by simplyexecuting the mechanism independently for each single-userdatabase, then aggregating the results. This approach hasbeen studied theoretically with the Gaussian mechanism [4]and the Matrix Mechanism [17]. While these mechanismstrivially provide privacy, they tend to have poor utility inpractice, as they are not tailored to the local model. Anothernotable approach for this task casts it as a mean estimationproblem, and uses LDP mechanisms designed for that [9].These works provide a thorough theoretical treatment of thisproblem, showing bounds on achieved error, but no practicalimplementation or evaluation.

More work has been done to answer specific, fixed work-loads of general interest, such as histograms [1, 42, 38, 41, 18,25, 6, 37], range queries [13, 39], and marginals [12, 39]. Avery nice summary of the computational complexity, samplecomplexity and communication complexity for the mecha-nisms designed for the Histogram workload is given in [1].Interestingly, even for the very simple Histogram workload,there are multiple mechanisms because the optimal mech-anism is not clear. This is in stark contrast to the centralmodel of differential privacy, where, for the Histogram work-load it is clear that the optimal strategy is the workloaditself. Almost all of these mechanisms are instances of thegeneral class of mechanisms we consider in Definition 3.2,just with different workload factorizations. The strategy ma-trices for these workloads were all carefully designed to offerlow error on the workloads they were designed for, by ex-ploiting knowledge about those specific workloads. However,none of these mechanisms perform optimization to choosethe strategy matrix, instead it is fixed in advance.

Kairouz et al. propose an optimization-based approach tomechanism design as well [26]. The mechanism they proposeis not designed to estimate histograms or workload queries,but for other statistical tasks, namely hypothesis testingand something they call information preservation. They alsoconsider the class of mechanisms characterized by a strat-egy matrix (Proposition 2.6), and propose an optimizationproblem over this space to maximize utility for a given task.Moreover, for convex and sublinear utility functions, theyshow that the optimal mechanism is a so-called extremalmechanism, and state a linear program to find this optimalmechanism. Unfortunately, there are 2n optimization vari-ables in this linear program, making it infeasible to solve inpractice. Furthermore, the restriction on the utility function(sublinear, convex) prevents the technique from applying toour setting.

8. CONCLUSIONWe proposed a new LDP mechanism that adapts to a

workload of linear queries provided by an analyst. We for-mulated this as a constrained optimization problem over anexpressive class of unbiased LDP mechanisms and proposeda projected gradient descent algorithm to solve this problem.We showed experimentally that our mechanism outperformsall other existing LDP mechanisms in a variety of settings,even outperforming mechanisms on the workloads for whichthey were intended. In the full version of this paper [32],we propose a simple extension to our mechanism that canimprove its practical performance even further, through apost-processing technique that ensures non-negativity.

Acknowledgements. We would like to thank Daniel Sheldon

for his insights regarding Theorem 3.4. This work was supported

by the National Science Foundation under grants CNS-1409143,

CCF-1642658, CCF-1618512, TRIPODS-1934846; and by DARPA

and SPAWAR under contract N66001-15-C-4067.

A. NOTATION AND MISSING PROOFSSymbol Meaning

n domain sizeN number of usersε privacy budgetM privacy mechanismU set of possible users, |U| = nO set of possible outcomes, |O| = mW p× n workload matrixQ m× n strategy matrixV p×m reconstruction matrixx data vectory response vector, y =MQ(x)1 vector of ones

DQ Diag(Q1)Q† pseudo-inversequ uth column of Qvi ith row of Vtr[·] trace of a matrix

Lworst(V,Q) worst-case varianceLavg(V,Q) average-case varianceL(V,Q) optimization objectiveL(Q) optimization objective for Q

Diag(·) diagonal matrix from vector

Proof of Theorem 3.4. We begin by deriving the vari-ance for a single query vTy (where y =MQ(x)). Note thaty is a sum of multinomial random variables su instantiatedwith parameters n = xu and p = qu, where su is the responsevector for all users of type u. Using the well-known formulafor the covariance of a multinomial random variable, thecovariance of a sum, and the variance of a linear combinationof correlated random variables, we obtain:

Cov[su] = xu(Diag(qu)− quqTu )

Cov[y] =∑u

xu(Diag(qu)− quqTu )

Var[vTy] = vTCov[y]v

=∑u

xu(vTDiag(qu)v − vTquqTuv)

=∑u

xu(vTDiag(qu)v − (vTqu)2)

1915

The total variance is obtained by summing over all therows of V. This completes the proof.

Proof of Theorem 3.9. The proof follows from simplealgebraic manipulations of Lavg:

Lavg(V,Q) =N

n

∑u∈U

p∑i=1


=N

n

[∑u∈U

p∑i=1

vTi Diag(qu)vi − ‖VQ‖2F]

=N

n

[ p∑i=1

vTi Diag(∑

uqu)vi − ‖W‖2F

]=N

n

[tr[VDQVT ]− ‖W‖2F

]

Proof of Theorem 3.10. Observe that we can constructV one row at a time because there are no interaction termsbetween vi and vj for i 6= j in the objective function. Fur-thermore, we can optimize vi through the following quadraticprogram.

minimizevi

vTi DQvi

subject to QTvi = wi

The above problem is closely related to a standard normminimization problem, and can be transformed to one by

making the substitution ui = D12Qvi. The problem becomes:

minimizeui

‖ui‖22

subject to (QTD− 1

2Q )ui = wi

This unique solution to this problem is given by ui =

(QTD− 1

2Q )†wi [10]. Using the Hermitian reduction identity

[7] X† = XT (XXT )†, we have:

vi = D− 1

2Q ui

= D− 1

2Q (QTD

− 12

Q )†wi

= D− 1

2Q D

− 12

Q Q(QTD− 1

2Q D

− 12

Q Q)†wi

= D−1Q Q(QTD−1

Q Q)†wi

Applying this for all i, we arrive at the desired solution.V = W(QTD−1

Q Q)†QTD−1Q

Proof of Theorem 3.11. We plug in the optimal solu-tion for V as given in Theorem 3.10 and simplify using linearalgebra identities and the cyclic permutation property oftrace. We have L(Q) , minV L(V,Q), which simplifies to:

L(Q)

= tr[W(QTD−1Q Q)†QTD−1

Q DQD−1Q Q(QTD−1

Q Q)†WT ]

= tr[W(QTD−1Q Q)†(QTD−1

Q Q)(QTD−1Q Q)†WT ]

= tr[W(QTD−1Q Q)†WT ]

= tr[(QTD−1Q Q)†(WTW)]

Proof of Theorem 5.1. It is obvious that the worst-case variance is greater than (or equal to) the average-case

variance. We will now bound the worst-case variance fromabove. Using u∗ to denote the worst-case user, we have thefollowing upper bound on worst-case variance:

Lworst(V,Q)

=N maxu∈U

p∑i=1


(a)

≤Np∑i=1

vTi Diag(qu∗)vi

=N

n

∑u∈U

p∑i=1

vTi Diag(qu∗)vi

(b)

≤ eεN

n

∑u∈U

p∑i=1

vTi Diag(qu)vi

(c)=eε

(Nn

∑u∈U

p∑i=1

vTi Diag(qu)vi − (vTi qu)2)

+eεN

n‖W‖2F

=eε(Lavg(V,Q) +

N

n||W||2F

)In step (a), we use the fact that (vTi qu)2 is non-negative.

In step (b), we apply the fact that qu∗ ≤ exp (ε)qu for all u.In step (c), we express the bound in terms of Lavg, adding 0in the form of ‖W‖2F −

∑u

∑i(v

Ti qu)2. This completes the

proof.

Proof of Theorem 5.6. Consider the following optim-ization problem which is closely related to Problem 3.12.

minimizeX�0

tr[X−1(WTW)]

subject to Xuu ≤ 1

Li et al. derived the SVD bound, which shows the minimumabove is at least 1

n(λ1 + · · ·+ λn)2 [29]. See also [43]. Fur-

thermore, if X∗ is the optimal solution and the constraintis replaced with Xuu ≤ c then cX∗ remains optimal [30], inwhich case the bound becomes 1

nc(λ1 + · · ·+ λn)2.

We will now argue that any feasible solution to our problemcan be directly transformed into a feasible solution of theabove related problem. Suppose Q is a feasible solutionto Problem 3.12 and let X = QTD−1

Q Q. Note that theobjective functions are identical now. We will argue that

Xuu ≤ exp (ε)n

.

Xuu =∑o

Q2ou

1∑u′ Qou

≤∑o

Q2ou

exp (ε)

nQou

=exp (ε)

n

∑o

Qou

=exp (ε)

n

Thus, we have shown that any solution to Problem 3.12 givesrise to a corresponding solution to the above problem. Thus,the SVD Bound applies and we arrive at the desired result:

1

exp (ε)(λ1 + · · ·+ λn)2 ≤ L(Q)

1916

B. REFERENCES

[1] J. Acharya, Z. Sun, and H. Zhang. Communicationefficient, sample optimal, linear time locally privatediscrete distribution estimation. arXiv preprintarXiv:1802.04705, 2018.

[2] J. Acharya, Z. Sun, and H. Zhang. Hadamard response:Estimating distributions privately, efficiently, and withlittle communication. In The 22nd InternationalConference on Artificial Intelligence and Statistics,pages 1120–1129, 2019.

[3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale,F. McSherry, and K. Talwar. Privacy, accuracy, andconsistency too: a holistic solution to contingency tablerelease. In Proceedings of the twenty-sixth ACMSIGMOD-SIGACT-SIGART symposium on Principlesof database systems, pages 273–282. ACM, 2007.

[4] R. Bassily. Linear queries estimation with localdifferential privacy. In K. Chaudhuri and M. Sugiyama,editors, Proceedings of Machine Learning Research,volume 89 of Proceedings of Machine LearningResearch, pages 721–729. PMLR, 16–18 Apr 2019.

[5] R. Bassily, K. Nissim, U. Stemmer, and A. G.Thakurta. Practical locally private heavy hitters. InAdvances in Neural Information Processing Systems,pages 2288–2296, 2017.

[6] R. Bassily and A. Smith. Local, private, efficientprotocols for succinct histograms. In Proceedings of theforty-seventh annual ACM symposium on Theory ofcomputing, pages 127–135, 2015.

[7] A. Ben-Israel and T. N. Greville. Generalized inverses:theory and applications, volume 15. Springer Science &Business Media, 2003.

[8] A. Bhaskara, D. Dadush, R. Krishnaswamy, andK. Talwar. Unconditional differentially privatemechanisms for linear queries. In Proceedings of theforty-fourth annual ACM symposium on Theory ofcomputing, pages 1269–1284. ACM, 2012.

[9] J. Blasiok, M. Bun, A. Nikolov, and T. Steinke.Towards instance-optimal private query release. InProceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms, pages 2480–2497.Society for Industrial and Applied Mathematics, 2019.

[10] S. Boyd and L. Vandenberghe. Convex optimization.Cambridge university press, 2004.

[11] G. Casella and R. L. Berger. Statistical inference,volume 2. Duxbury Pacific Grove, CA, 2002.

[12] G. Cormode, T. Kulkarni, and D. Srivastava. Marginalrelease under local differential privacy. In Proceedingsof the 2018 International Conference on Managementof Data, pages 131–146. ACM, 2018.

[13] G. Cormode, T. Kulkarni, and D. Srivastava.Answering range queries under local differential privacy.PVLDB, 12(10):1126–1138, 2019.

[14] B. Ding, J. Kulkarni, and S. Yekhanin. Collectingtelemetry data privately. In Advances in NeuralInformation Processing Systems, pages 3571–3580,2017.

[15] C. Dwork, F. McSherry, K. Nissim, and A. Smith.Calibrating noise to sensitivity in private data analysis.In Theory of cryptography conference, pages 265–284.Springer, 2006.

[16] C. Dwork, A. Roth, et al. The algorithmic foundationsof differential privacy. Foundations and Trends R© inTheoretical Computer Science, 9(3–4):211–407, 2014.

[17] A. Edmonds, A. Nikolov, and J. Ullman. The power offactorization mechanisms in local and centraldifferential privacy. In Proceedings of the 52nd AnnualACM SIGACT Symposium on Theory of Computing,pages 425–438, 2020.

[18] U. Erlingsson, V. Pihur, and A. Korolova. Rappor:Randomized aggregatable privacy-preserving ordinalresponse. In Proceedings of the 2014 ACM SIGSACconference on computer and communications security,pages 1054–1067. ACM, 2014.

[19] M. Gaboardi, E. J. G. Arias, J. Hsu, A. Roth, and Z. S.Wu. Dual query: Practical private query release forhigh dimensional data. In International Conference onMachine Learning, pages 1170–1178, 2014.

[20] A. Griewank et al. On automatic differentiation.Mathematical Programming: recent developments andapplications, 6(6):83–107, 1989.

[21] M. Hardt, K. Ligett, and F. McSherry. A simple andpractical algorithm for differentially private datarelease. In Advances in Neural Information ProcessingSystems, pages 2339–2347, 2012.

[22] M. Hay, A. Machanavajjhala, G. Miklau, Y. Chen, andD. Zhang. Principled evaluation of differentially privatealgorithms using dpbench. In Proceedings of the 2016International Conference on Management of Data,pages 139–154. ACM, 2016.

[23] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boostingthe accuracy of differentially private histogramsthrough consistency. PVLDB, 3(1-2):1021–1032, 2010.

[24] N. Holohan, D. J. Leith, and O. Mason. Extremepoints of the local differential privacy polytope. LinearAlgebra and its Applications, 534:78–96, 2017.

[25] P. Kairouz, K. Bonawitz, and D. Ramage. Discretedistribution estimation under local privacy. InInternational Conference on Machine Learning, pages2436–2444, 2016.

[26] P. Kairouz, S. Oh, and P. Viswanath. Extremalmechanisms for local differential privacy. In Advancesin neural information processing systems, pages2879–2887, 2014.

[27] C. Li, M. Hay, V. Rastogi, G. Miklau, andA. McGregor. Optimizing linear counting queries underdifferential privacy. In Proceedings of the twenty-ninthACM SIGMOD-SIGACT-SIGART symposium onPrinciples of database systems, pages 123–134. ACM,2010.

[28] C. Li and G. Miklau. An adaptive mechanism foraccurate query answering under differential privacy.PVLDB, 5(6):514–525, 2012.

[29] C. Li and G. Miklau. Lower bounds on the error ofquery sets under the differentially-private matrixmechanism. Theory of Computing Systems,57(4):1159–1201, 2015.

[30] C. Li, G. Miklau, M. Hay, A. McGregor, andV. Rastogi. The matrix mechanism: optimizing linearcounting queries under differential privacy. The VLDBjournal, 24(6):757–781, 2015.

[31] D. Maclaurin, D. Duvenaud, and R. P. Adams.Autograd: Effortless gradients in numpy. In ICML 2015

1917

AutoML Workshop, volume 238, 2015.

[32] R. McKenna, R. K. Maity, A. Mazumdar, andG. Miklau. A workload-adaptive mechanism for linearqueries under local differential privacy. arXiv preprintarXiv:2002.01582, 2020.

[33] R. McKenna, G. Miklau, M. Hay, andA. Machanavajjhala. Optimizing error ofhigh-dimensional statistical queries under differentialprivacy. PVLDB, 11(10):1206–1219, 2018.

[34] J. Nocedal and S. Wright. Numerical optimization.Springer Science & Business Media, 2006.

[35] G. Strang, G. Strang, G. Strang, and G. Strang.Introduction to linear algebra, volume 3.Wellesley-Cambridge Press Wellesley, MA, 1993.

[36] A. G. Thakurta, A. H. Vyrros, U. S. Vaishampayan,G. Kapoor, J. Freudiger, V. R. Sridhar, andD. Davidson. Learning new words, Mar. 14 2017. USPatent 9,594,741.

[37] S. Wang, L. Huang, P. Wang, Y. Nie, H. Xu, W. Yang,X.-Y. Li, and C. Qiao. Mutual information optimallylocal private discrete distribution estimation. arXivpreprint arXiv:1607.08025, 2016.

[38] T. Wang, J. Blocki, N. Li, and S. Jha. Locallydifferentially private protocols for frequency estimation.In Proc. of the 26th USENIX Security Symposium,pages 729–745, 2017.

[39] T. Wang, B. Ding, J. Zhou, C. Hong, Z. Huang, N. Li,and S. Jha. Answering multi-dimensional analyticalqueries under local differential privacy. In Proceedingsof the 2019 International Conference on Managementof Data, pages 159–176, 2019.

[40] W. Wang and M. A. Carreira-Perpinan. Projectiononto the probability simplex: An efficient algorithmwith a simple proof, and an application. arXiv preprintarXiv:1309.1541, 2013.

[41] S. L. Warner. Randomized response: A surveytechnique for eliminating evasive answer bias. Journalof the American Statistical Association, 60(309):63–69,1965.

[42] M. Ye and A. Barg. Optimal schemes for discretedistribution estimation under locally differentialprivacy. IEEE Transactions on Information Theory,64(8):5662–5676, 2018.

[43] G. Yuan, Y. Yang, Z. Zhang, and Z. Hao. Convexoptimization for linear query processing underapproximate differential privacy. In Proceedings of the22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, pages2005–2014. ACM, 2016.

[44] G. Yuan, Z. Zhang, M. Winslett, X. Xiao, Y. Yang,and Z. Hao. Low-rank mechanism: optimizing batchqueries under differential privacy. PVLDB,5(11):1352–1363, 2012.

1918

A workload-adaptive mechanism for linear queries under ... · beyond these special cases, as it can...

Documents

Transcript of A workload-adaptive mechanism for linear queries under ... · beyond these special cases, as it can...