Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data...
Transcript of Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data...
![Page 1: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/1.jpg)
1
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data
Processing Inequality
DIMACS Workshop on Big Data through the Lens of Sublinear Algorithms
Aug 28, 2015
Mark Braverman Ankit Garg Tengyu Ma
Huy Nguyen David Woodruff
![Page 2: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/2.jpg)
Distributed mean estimation
Statistical estimation:
– Unknown parameter 𝜃.
– Inputs to machines: i.i.d. data points ∼ 𝐷𝜃.
– Output estimator 𝜃 .
Objectives:
– Low communication 𝐶 = Π .
– Small loss
𝑅 ≔ 𝔼 𝜃 − 𝜃2
.
2
Blackboard
Big Data! Distributed Storage
and Processing small data
𝜃
small data
![Page 3: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/3.jpg)
Distributed sparse Gaussian mean estimation
• Ambient dimension 𝑑.
• Sparsity parameter 𝑘: 𝜃 0 ≤ 𝑘.
• Number of machines 𝑚.
• Each machine holds 𝑛 samples.
• Standard deviation 𝜎.
• Thus each sample is a vector
𝑋𝑗(𝑡)
∼ 𝒩 𝜃1, 𝜎2 , … ,𝒩 𝜃𝑑, 𝜎
2 ∈ ℝ𝑑
3
Goal: estimate
(𝜃1, … , 𝜃𝑑)
![Page 4: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/4.jpg)
• Ambient dimension 𝑑.
• Sparsity parameter 𝑘: 𝜃 0 ≤ 𝑘.
• Number of machines 𝑚.
• Each machine holds 𝑛 samples.
• Standard deviation 𝜎.
• Thus each sample is a vector
𝑋𝑗(𝑡)
∼ 𝒩 𝜃1, 𝜎2 , … ,𝒩 𝜃𝑑, 𝜎
2 ∈ ℝ𝑑
4
Higher value makes estimation:
harder
harder
easier
easier*
harder
Goal: estimate
(𝜃1, … , 𝜃𝑑)
![Page 5: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/5.jpg)
Distributed sparse Gaussian mean estimation
• Main result: if Π = C, then
𝑅 ≥ Ω max𝜎2𝑑𝑘
𝑛𝐶,𝜎2𝑘
𝑛𝑚
• Tight up to a log 𝑑 factor [GMN14]. Up to a const. factor in the dense case.
• For optimal performance, 𝐶 ≳ 𝑚𝑑 (not 𝑚𝑘) is needed!
5
• 𝑑 – dim • 𝑘 – sparsity • 𝑚 – machine • 𝑛 – samp. each • 𝜎 – deviation • 𝑅 – sq. loss
Statistical limit
![Page 6: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/6.jpg)
Prior work (partial list)
• [Zhang-Duchi-Jordan-Wainwright’13]: the case when 𝑑 = 1 and general communication; and the dense case for simultaneous-message protocols.
• [Shamir’14]: Implies the result for 𝑘 = 1 in a restricted communication model.
• [Duchi-Jordan-Wainwright-Zhang’14, Garg-Ma-Nguyen’14]: the dense case (up to logarithmic factors).
• A lot of recent work on communication-efficient distributed learning.
6
![Page 7: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/7.jpg)
Reduction from Gaussian mean detection
• 𝑅 ≥ Ω max𝜎2𝑑𝑘
𝑛𝐶,𝜎2𝑘
𝑛𝑚
• Gaussian mean detection
– A one-dimensional problem.
– Goal: distinguish between 𝜇0 = 𝒩 0, 𝜎2 and 𝜇1 = 𝒩 𝛿, 𝜎2 .
– Each player gets 𝑛 samples.
7
![Page 8: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/8.jpg)
• Assume 𝑅 ≪ max𝜎2𝑑𝑘
𝑛𝐶,𝜎2𝑘
𝑛𝑚
• Distinguish between 𝜇0 = 𝒩 0, 𝜎2 and 𝜇1 = 𝒩 𝛿, 𝜎2 .
• Theorem: If can attain 𝑅 ≤1
16𝑘𝛿2 in the
estimation problem using 𝐶 communication, then we can solve the detection problem at ∼ 𝐶/𝑑 min-information cost.
• Using 𝛿2 ≪ 𝜎2𝑑/(𝐶 𝑛), get detection using
𝐼 ≪𝜎2
𝑛 𝛿2 min-information cost. 8
![Page 9: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/9.jpg)
The detection problem
• Distinguish between 𝜇0 = 𝒩 0,1 and 𝜇1 = 𝒩 𝛿, 1 .
• Each player gets 𝑛 samples.
• Want this to be impossible using 𝐼 ≪1
𝑛 𝛿2
min-information cost.
9
![Page 10: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/10.jpg)
The detection problem
• Distinguish between 𝜇0 = 𝒩 0,1 and 𝜇1 = 𝒩 𝛿, 1 .
• Distinguish between 𝜇0 = 𝒩 0,1
𝑛 and
𝜇1 = 𝒩 𝛿,1
𝑛.
• Each player gets 𝑛 samples. one sample.
• Want this to be impossible using 𝐼 ≪1
𝑛 𝛿2
min-information cost.
10
![Page 11: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/11.jpg)
The detection problem
• By scaling everything by 𝑛 (and replacing 𝛿 with 𝛿 𝑛).
• Distinguish between 𝜇0 = 𝒩 0,1 and 𝜇1 = 𝒩 𝛿, 1 .
• Each player gets one sample.
• Want this to be impossible using 𝐼 ≪1
𝛿2
min-information cost.
11
Tight (for 𝑚 large enough, otherwise task impossible)
![Page 12: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/12.jpg)
Information cost
12
Blackboard Π
𝜇𝑣 = 𝒩 𝛿𝑉, 1 𝑉
𝑋1 ∼ 𝜇𝑣 𝑋2 ∼ 𝜇𝑣 𝑋𝑚 ∼ 𝜇𝑣
𝐼𝐶 𝜋 := 𝐼(Π; 𝑋1𝑋2 …𝑋𝑚)
![Page 13: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/13.jpg)
Min-Information cost
13
Blackboard Π
𝑋1 ∼ 𝜇𝑣 𝑋2 ∼ 𝜇𝑣 𝑋𝑚 ∼ 𝜇𝑣
𝜇𝑉 = 𝒩 𝛿𝑉, 1 𝑉
𝑚𝑖𝑛𝐼𝐶 𝜋 ≔ min𝑣∈{0,1}
𝐼(Π; 𝑋1𝑋2 …𝑋𝑚|𝑉 = 𝑣)
![Page 14: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/14.jpg)
Min-Information cost
𝑚𝑖𝑛𝐼𝐶 𝜋 ≔ min𝑣∈{0,1}
𝐼(Π; 𝑋1𝑋2 …𝑋𝑚|𝑉 = 𝑣)
• We will want this quantity to be Ω1
𝛿2 .
• Warning: it is not the same thing as 𝐼(Π; 𝑋1𝑋2 …𝑋𝑚|𝑉)= 𝔼𝑣∼𝑉 𝐼(Π; 𝑋1𝑋2 …𝑋𝑚|𝑉 = 𝑣)
because one case can be much smaller than the other.
• In our case, the need to use 𝑚𝑖𝑛𝐼𝐶 instead of 𝐼𝐶 happens because of the sparsity.
14
![Page 15: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/15.jpg)
Strong data processing inequality
15
Blackboard Π
𝑋1 ∼ 𝜇𝑣 𝑋2 ∼ 𝜇𝑣 𝑋𝑚 ∼ 𝜇𝑣
𝜇𝑣 = 𝒩 𝛿𝑉, 1 𝑉
Fact: Π ≥ 𝐼 Π; 𝑋1𝑋2 …𝑋𝑚 = 𝐼(Π; 𝑋𝑖|𝑋<𝑖)𝑖
![Page 16: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/16.jpg)
Strong data processing inequality
16
• 𝜇𝑣 = 𝒩 𝛿𝑉, 1 ; suppose 𝑉 ∼ 𝐵1/2.
• For each 𝑖, 𝑉 − 𝑋𝑖 − Π is a Markov chain.
• Intuition: “𝑋𝑖 contains little information about 𝑉; no way to learn this information except by learning a lot about 𝑋𝑖”.
• Data processing: 𝐼 𝑉; Π ≤ 𝐼 𝑋𝑖; Π .
• Strong Data Processing: 𝐼 𝑉; Π ≤ 𝛽 ⋅ 𝐼 𝑋𝑖; Π for some 𝛽 = 𝛽(𝜇0, 𝜇1) < 1.
![Page 17: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/17.jpg)
Strong data processing inequality
17
• 𝜇𝑣 = 𝒩 𝛿𝑉, 1 ; suppose 𝑉 ∼ 𝐵1/2.
• For each 𝑖, 𝑉 − 𝑋𝑖 − Π is a Markov chain.
• Strong Data Processing: 𝐼 𝑉; Π ≤ 𝛽 ⋅ 𝐼 𝑋𝑖; Π for some 𝛽 = 𝛽(𝜇0, 𝜇1) < 1.
• In this case (𝜇0 = 𝒩 0,1 , 𝜇1 = 𝒩 𝛿, 1 ):
𝛽 𝜇0, 𝜇1 ∼𝐼 𝑉; sign 𝑋𝑖
𝐼 𝑋𝑖; sign(𝑋𝑖)∼ 𝛿2
![Page 18: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/18.jpg)
“Proof” • 𝜇𝑣 = 𝒩 𝛿𝑉, 1 ; suppose 𝑉 ∼ 𝐵1/2.
• Strong Data Processing: 𝐼 𝑉; Π ≤ 𝛿2 ⋅ 𝐼 𝑋𝑖; Π
• We know 𝐼 𝑉; Π = Ω(1).
Π ≥ 𝐼 Π; 𝑋1𝑋2 …𝑋𝑚 ≳ 𝐼 Π;𝑋𝑖
𝑖
≥1
𝛿2…
"𝐼𝑛𝑓𝑜 Π 𝑐𝑜𝑛𝑣𝑒𝑦𝑠 𝑎𝑏𝑜𝑢𝑡 𝑉 𝑡ℎ𝑟𝑜𝑢𝑔ℎ 𝑝𝑙𝑎𝑦𝑒𝑟 𝑖"
𝑖
≳
1
𝛿2 𝐼 𝑉; Π = Ω1
𝛿2
18
Q.E.D!
![Page 19: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/19.jpg)
Issues with the proof
• The right high level idea.
• Two main issues:
– Not clear how to deal with additivity over coordinates.
– Dealing with 𝑚𝑖𝑛𝐼𝐶 instead of 𝐼𝐶.
19
![Page 20: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/20.jpg)
If the picture were this…
20
Blackboard Π
𝑋1 ∼ 𝜇𝑣 𝑋2 ∼ 𝜇0 𝑋𝑚 ∼ 𝜇0
𝜇𝑣 = 𝒩 𝛿𝑉, 1 𝑉
Then indeed 𝐼 Π; 𝑉 ≤ 𝛿2 ⋅ 𝐼 Π; 𝑋1 .
![Page 21: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/21.jpg)
Hellinger distance
• Solution to additivity: using Hellinger
distance 𝑓 𝑥 − 𝑔 𝑥2𝑑𝑥
Ω
• Following from [Jayram’09]. ℎ2 Π𝑉=0, Π𝑉=1 ∼ 𝐼 𝑉; Π = Ω 1
• ℎ2 Π𝑉=0, Π𝑉=1 decomposes into 𝑚 scenarios as above using the fact that Π is a protocol.
21
![Page 22: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/22.jpg)
𝑚𝑖𝑛𝐼𝐶
• Dealing with 𝑚𝑖𝑛𝐼𝐶 is more technical. Recall:
• 𝑚𝑖𝑛𝐼𝐶 𝜋 ≔ min𝑣∈{0,1}
𝐼(Π; 𝑋1𝑋2 …𝑋𝑚|𝑉 = 𝑣)
• Leads to our main technical statement: “Distributed Strong Data Processing Inequality”
Theorem: Suppose Ω 1 ⋅ 𝜇0 ≤ 𝜇1 ≤ 𝑂 1 ⋅ 𝜇0, and let 𝛽(𝜇0, 𝜇1) be the SDPI constant. Then
ℎ2 Π𝑉=0, Π𝑉=1 ≤ 𝑂 𝛽 𝜇0, 𝜇1 ⋅ 𝑚𝑖𝑛𝐼𝐶(𝜋)
22
![Page 23: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/23.jpg)
Putting it together
Theorem: Suppose Ω 1 ⋅ 𝜇0 ≤ 𝜇1 ≤ 𝑂 1 ⋅ 𝜇0, and let 𝛽(𝜇0, 𝜇1) be the SDPI constant. Then
ℎ2 Π𝑉=0, Π𝑉=1 ≤ 𝑂 𝛽 𝜇0, 𝜇1 ⋅ 𝑚𝑖𝑛𝐼𝐶(𝜋)
• With 𝜇0 = 𝒩 0,1 , 𝜇1 = 𝒩 𝛿, 1 , 𝛽 ∼ 𝛿2, we get Ω 1 = ℎ2 Π𝑉=0, Π𝑉=1 ≤ 𝛿2 ⋅ 𝑚𝑖𝑛𝐼𝐶(𝜋)
• Therefore, 𝑚𝑖𝑛𝐼𝐶 𝜋 = Ω1
𝛿2 .
23
![Page 24: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/24.jpg)
Putting it together
Theorem: Suppose Ω 1 ⋅ 𝜇0 ≤ 𝜇1 ≤ 𝑂 1 ⋅ 𝜇0, and let 𝛽(𝜇0, 𝜇1) be the SDPI constant. Then
ℎ2 Π𝑉=0, Π𝑉=1 ≤ 𝑂 𝛽 𝜇0, 𝜇1 ⋅ 𝑚𝑖𝑛𝐼𝐶(𝜋)
• With 𝜇0 = 𝒩 0,1 , 𝜇1 = 𝒩 𝛿, 1
• Ω 1 ⋅ 𝜇0 ≤ 𝜇1 ≤ 𝑂 1 ⋅ 𝜇0 fails!!
• Need an additional truncation step. Fortunately, the failure happens far in the tails.
24
Essential!
![Page 25: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/25.jpg)
Gaussian mean detection (𝑛 → 1) sample (𝑚𝑖𝑛𝐼𝐶)
A direct sum argument
Summary
25
Sparse Gaussian mean estimation
“Only get 𝛿2 bits toward detection
per bit of 𝑚𝑖𝑛𝐼𝐶”⇒
an 1
𝛿2 lower bound Reduction [ZDJW’13]
Distributed sparse linear
regression
Hellinger distance
Strong data processing
![Page 26: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/26.jpg)
Distributed sparse linear regression
• Each machine gets 𝑛 data of the form (𝐴𝑗 , 𝑦𝑗),
where 𝑦𝑗 = 𝐴𝑗 , 𝜃 + 𝑤𝑗, 𝑤𝑗 ∼ 𝒩 0, 𝜎2
• Promised that 𝜃 is 𝑘-sparse: 𝜃 0 ≤ 𝑘.
• Ambient dimension 𝑑.
• Loss 𝑅 = 𝔼 𝜃 − 𝜃2
.
• How much communication to achieve statistically optimal loss?
26
![Page 27: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/27.jpg)
• Promised that 𝜃 is 𝑘-sparse: 𝜃 0 ≤ 𝑘.
• Ambient dimension 𝑑. Loss 𝑅 = 𝔼 𝜃 − 𝜃2
.
• How much communication to achieve statistically optimal loss?
• We get: 𝐶 = Ω 𝑚 ⋅ min (𝑛, 𝑑) (small 𝑘 doesn’t help).
• [Lee-Sun-Liu-Taylor’15]: under some conditions 𝐶 = 𝑂 𝑚 ⋅ 𝑑 suffice.
27
Distributed sparse linear regression
![Page 28: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/28.jpg)
A new upper bound (time permitting)
• For the one-dimensional distributed Gaussian estimation (generalizes to 𝑑 dimensions trivially).
• For optimal statistical performance, Ω 𝑚 is the lower bound.
• We give a simple simultaneous-message upper bound of 𝑂(𝑚).
• Previously: multi-round 𝑂(𝑚) [GMN’14] or simultaneous 𝑂(𝑚 log 𝑛) [folklore].
28
![Page 29: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/29.jpg)
A new upper bound (time permitting)
(Stylized) main idea:
• Each machine wants to send the empirical average 𝑦𝑖 ∈ [0,1] of its input.
• Then the average 1
𝑚 𝑦𝑖
𝑚𝑖=1 = 𝑦 is computed.
• Instead of 𝑦𝑖 each machine sends 𝑏𝑖 sampled from Bernoulli distribution 𝐵𝑦𝑖
.
• Form the estimate 𝑦 =1
𝑚 𝑏𝑖
𝑚𝑖=1 .
• “Good enough” if var 𝑦𝑖 ∼ 1. 29
![Page 30: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/30.jpg)
Open problems
• Closing the gap for the sparse linear regression problem.
• Other statistical questions in the distributed framework. More general theorems?
• Can Strong Data Processing be applied to the two-party Gap Hamming Distance problem?
30
![Page 31: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/31.jpg)
• http://csnexus.info/ Organizers
• Mark Braverman (Princeton University)
• Bobak Nazer (Boston University)
• Anup Rao (University of Washington)
• Aslan Tchamkerten, General Chair (Telecom Paristech)
31
![Page 32: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/32.jpg)
• http://csnexus.info/ Primary themes
• Distributed Computation and Communication
• Fundamental Inequalities and Lower Bounds
• Inference Problems
• Secrecy and Privacy
32
![Page 33: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/33.jpg)
Institut Henri Poincaré
33
http://csnexus.info/
![Page 34: Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data ...dimacs.rutgers.edu/Workshops/ParallelAlgorithms/Slides/... · 2015. 10. 28. · Big Data!](https://reader033.fdocuments.us/reader033/viewer/2022060517/6049a66d8580b964d60bf7e7/html5/thumbnails/34.jpg)
34
Thank You!