Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12....
Transcript of Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12....
![Page 1: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/1.jpg)
Learning with Memory and Communication Constraints
Jacob Steinhardt*
Stanford University
July 30, 2015
*with John Duchi, Gregory Valiant, and Stefan Wager
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 1 / 20
![Page 2: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/2.jpg)
Motivation
Computational constraints becoming bottleneck in many systems.
Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.
(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)
This work: memory, communication.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20
![Page 3: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/3.jpg)
Motivation
Computational constraints becoming bottleneck in many systems.
Not yet a good theory of computationally-bounded statistics.
Study sample complexity of resource-constrained learning algorithms.(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)
This work: memory, communication.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20
![Page 4: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/4.jpg)
Motivation
Computational constraints becoming bottleneck in many systems.
Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.
(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)
This work: memory, communication.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20
![Page 5: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/5.jpg)
Motivation
Computational constraints becoming bottleneck in many systems.
Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.
(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)
This work: memory, communication.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20
![Page 6: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/6.jpg)
Motivation
Computational constraints becoming bottleneck in many systems.
Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.
(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)
This work: memory, communication.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20
![Page 7: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/7.jpg)
1 Memory, Communication, and Statistical Queries
2 Memory-Constrained Sparse Regression
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 3 / 20
![Page 8: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/8.jpg)
Setting
Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .
COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.
COM(b,k): each party gets k samples (instead of 1)
MEM(b): access data in a stream, store at most b bits of state.
Relate both classes to well-studied statistical query model:
SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20
![Page 9: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/9.jpg)
Setting
Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .
COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.
COM(b,k): each party gets k samples (instead of 1)
MEM(b): access data in a stream, store at most b bits of state.
Relate both classes to well-studied statistical query model:
SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20
![Page 10: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/10.jpg)
Setting
Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .
COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.
COM(b,k): each party gets k samples (instead of 1)
MEM(b): access data in a stream, store at most b bits of state.
Relate both classes to well-studied statistical query model:
SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20
![Page 11: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/11.jpg)
Setting
Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .
COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.
COM(b,k): each party gets k samples (instead of 1)
MEM(b): access data in a stream, store at most b bits of state.
Relate both classes to well-studied statistical query model:
SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20
![Page 12: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/12.jpg)
Setting
Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .
COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.
COM(b,k): each party gets k samples (instead of 1)
MEM(b): access data in a stream, store at most b bits of state.
Relate both classes to well-studied statistical query model:
SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20
![Page 13: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/13.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 14: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/14.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 15: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/15.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 16: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/16.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 17: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/17.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 18: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/18.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 19: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/19.jpg)
Main Results: Communication
Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).
Implications of theorem:
For any constant C > 0, COM(1) = COM(C log(n)) = SQ.
Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.
Then PARITY(n) 6∈ COM(n/4).
In addition, PARITY(n) 6∈ COM(n/16,n/4).
Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20
![Page 20: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/20.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 21: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/21.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 22: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/22.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 23: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/23.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 24: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/24.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 25: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/25.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 26: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/26.jpg)
Main Results: Memory
Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with
O(log |F| log(m/τ)) bits of state and
O(m log |F|/τ2) samples.
Caveat: reduction is not computationally efficient.
Implications of theorem:
Let REP be the class of efficiently representable problems: log |F|= O(n).
Then SQ∩REP⊆MEM(O(n)).
k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.
If the covariates are r -sparse, then only poly(r ,k) samples are needed.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20
![Page 27: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/27.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 5
0 10
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 28: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/28.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 5
0 10
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 29: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/29.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 5
0 10
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 30: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/30.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 5
0 10
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 31: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/31.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50
10
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 32: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/32.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 33: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/33.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
0
11
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 34: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/34.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 35: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/35.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1
?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 36: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/36.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 37: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/37.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 38: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/38.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10)
= p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 39: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/39.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 40: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/40.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 41: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/41.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query
/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 42: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/42.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query
/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 43: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/43.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query
/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 44: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/44.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query
/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ .
=⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 45: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/45.jpg)
Reduction: Communication
Goal: reduce communication-constrained algorithm to SQ algorithm.
Idea: use queries to estimate probability that next bit communicated is 0 or 1.
Consider intermediate state of algorithm:
party: 1 2 3 4 50 1
01
1?
c1
c2
c3
p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)
= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query
/p(c1:2 = 10)
Error: τ/p(c1:2).
E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).
Cumulative error: m2bτ . =⇒ Okay as long as τ 1/(m2b)!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20
![Page 46: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/46.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 47: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/47.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 48: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/48.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 49: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/49.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 50: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/50.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 51: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/51.jpg)
Reduction: Memory
Goal: represent SQ algorithm in memory-efficient way.
Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).
Algorithm is now a decision tree:
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
0 1
0 1 0 1
...
dept
hm
Issue: naıvely remembering position in tree requires Θ(m) memory.
Can we somehow identify “important” queries?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20
![Page 52: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/52.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 53: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/53.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 54: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/54.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 55: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/55.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
0
1
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 56: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/56.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 57: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/57.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ][ ]tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 58: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/58.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ]
[ ]
tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).
At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 59: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/59.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ]
[ ]
tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 60: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/60.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ]
[ ]
tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 61: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/61.jpg)
Idea: Normalizing Queries
Consider threshold query (ψ, t) of tolerance τ .
SQ(ψ, t) =
1 : E[ψ] > t + τ
0 : E[ψ] < t− τ
arbitrary : otherwise
E[ψ]
[ ]
tt− τ t + τ
01
[ ][ ]
Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .
Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.
Can always normalize queries to be good!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20
![Page 62: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/62.jpg)
Compression Scheme
Normalize all queries to be good.
At each node, color the child edge which reduces F by at least 12 :
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
1
0
0
1 0
0
1 0
1
...
Note: any path has at most log |F| colored edges.
Can remember indices of colored edges with log |F| log(m) bits of memory!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20
![Page 63: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/63.jpg)
Compression Scheme
Normalize all queries to be good.
At each node, color the child edge which reduces F by at least 12 :
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
1
0
0
1 0
0
1 0 1
...
Note: any path has at most log |F| colored edges.
Can remember indices of colored edges with log |F| log(m) bits of memory!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20
![Page 64: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/64.jpg)
Compression Scheme
Normalize all queries to be good.
At each node, color the child edge which reduces F by at least 12 :
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
1
0
0
1 0
0
1 0 1
...
Note: any path has at most log |F| colored edges.
Can remember indices of colored edges with log |F| log(m) bits of memory!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20
![Page 65: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/65.jpg)
Compression Scheme
Normalize all queries to be good.
At each node, color the child edge which reduces F by at least 12 :
ψ.
ψ0 ψ1
ψ00 ψ01 ψ10 ψ11
1
0
0
1 0
0
1 0 1
...
Note: any path has at most log |F| colored edges.
Can remember indices of colored edges with log |F| log(m) bits of memory!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20
![Page 66: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/66.jpg)
Summary
COM→ SQ
Simulate conditional probabilities of messages with statistical queries.
SQ→MEMNormalize queries, store compressed representation of decision path.
Next: study sparse regression in more detail
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20
![Page 67: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/67.jpg)
Summary
COM→ SQSimulate conditional probabilities of messages with statistical queries.
SQ→MEMNormalize queries, store compressed representation of decision path.
Next: study sparse regression in more detail
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20
![Page 68: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/68.jpg)
Summary
COM→ SQSimulate conditional probabilities of messages with statistical queries.
SQ→MEM
Normalize queries, store compressed representation of decision path.
Next: study sparse regression in more detail
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20
![Page 69: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/69.jpg)
Summary
COM→ SQSimulate conditional probabilities of messages with statistical queries.
SQ→MEMNormalize queries, store compressed representation of decision path.
Next: study sparse regression in more detail
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20
![Page 70: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/70.jpg)
Summary
COM→ SQSimulate conditional probabilities of messages with statistical queries.
SQ→MEMNormalize queries, store compressed representation of decision path.
Next: study sparse regression in more detail
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20
![Page 71: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/71.jpg)
1 Memory, Communication, and Statistical Queries
2 Memory-Constrained Sparse Regression
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 12 / 20
![Page 72: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/72.jpg)
Setting
Sparse linear regression in Rd :
Y (i) = 〈w∗,X (i)〉+ ε(i)
‖w∗‖0 = k , k d
Memory constraint:
(X (i),Y (i)) observed as read-only stream
Only keep b bits of state Z (i) between successive observations
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 13 / 20
![Page 73: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/73.jpg)
Setting
Sparse linear regression in Rd :
Y (i) = 〈w∗,X (i)〉+ ε(i)
‖w∗‖0 = k , k d
Memory constraint:
(X (i),Y (i)) observed as read-only stream
Only keep b bits of state Z (i) between successive observations
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 13 / 20
![Page 74: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/74.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 75: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/75.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
Classical case (no memory constraint):
Theorem (Wainwright, 2009)
kε
log(d) . n .kε
log(d)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 76: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/76.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
Classical case (no memory constraint):
Theorem (Wainwright, 2009)
kε
log(d) . n .kε
log(d)
Achievable with O(d) memory (Agarwal et al., 2012; S., Wager, & Liang, 2015).
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 77: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/77.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
Classical case (no memory constraint):
Theorem (Wainwright, 2009)
kε
log(d) . n .kε
log(d)
With memory constraints b:
Theorem (S. & Duchi, 2015)
kε
db. n .
kε2
db
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 78: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/78.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
Classical case (no memory constraint):
Theorem (Wainwright, 2009)
kε
log(d) . n .kε
log(d)
With memory constraints b:
Theorem (S. & Duchi, 2015)
kε
db. n .
kε2
db
Exponential increase if b d !
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 79: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/79.jpg)
Problem Statement
How much data n is needed to obtain estimator w with
E[‖w−w∗‖22]≤ ε?
Classical case (no memory constraint):
Theorem (Wainwright, 2009)
kε
log(d) . n .kε
log(d)
With memory constraints b:
Theorem (S. & Duchi, 2015)
kε
db. n .
kε2
db
[Note: up to log factors; assumes k log(d) b ≤ d ]
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20
![Page 80: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/80.jpg)
Proof Overview
Lower bound:
information-theoreticstrong data-processing inequality
W ∗ X ,Y Zd
1
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 81: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/81.jpg)
Proof Overview
Lower bound:information-theoretic
strong data-processing inequality
W ∗ X ,Y Zd
1
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 82: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/82.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zd
1
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 83: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/83.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zdb
1bd
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 84: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/84.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zdb
1bd
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 85: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/85.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zdb
1bd
main challenge: dependence between X ,Y
Upper bound:
count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 86: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/86.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zdb
1bd
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averaging
more regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 87: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/87.jpg)
Proof Overview
Lower bound:information-theoreticstrong data-processing inequality
W ∗ X ,Y Zdb
1bd
main challenge: dependence between X ,Y
Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20
![Page 88: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/88.jpg)
Lower Bound Construction
Split coordinates into k blocks of size d/k
w∗ in each block: single non-zero coordinate J, ±δ with equal probability
Direct sum argument: reduce to k = 1
J = 2
dk
Estimation to testing:
E[‖w∗− w‖22]≥ δ 2
2P[J 6= J]
Looking ahead: bound KL between Pj and base distribution P0
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20
![Page 89: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/89.jpg)
Lower Bound Construction
Split coordinates into k blocks of size d/k
w∗ in each block: single non-zero coordinate J, ±δ with equal probability
Direct sum argument: reduce to k = 1
J = 2
dk
Estimation to testing:
E[‖w∗− w‖22]≥ δ 2
2P[J 6= J]
Looking ahead: bound KL between Pj and base distribution P0
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20
![Page 90: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/90.jpg)
Lower Bound Construction
Split coordinates into k blocks of size d/k
w∗ in each block: single non-zero coordinate J, ±δ with equal probability
Direct sum argument: reduce to k = 1
J = 2
dk
Estimation to testing:
E[‖w∗− w‖22]≥ δ 2
2P[J 6= J]
Looking ahead: bound KL between Pj and base distribution P0
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20
![Page 91: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/91.jpg)
Lower Bound Construction
Split coordinates into k blocks of size d/k
w∗ in each block: single non-zero coordinate J, ±δ with equal probability
Direct sum argument: reduce to k = 1
J = 2
dk
Estimation to testing:
E[‖w∗− w‖22]≥ δ 2
2P[J 6= J]
Looking ahead: bound KL between Pj and base distribution P0
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20
![Page 92: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/92.jpg)
Lower Bound Construction
Split coordinates into k blocks of size d/k
w∗ in each block: single non-zero coordinate J, ±δ with equal probability
Direct sum argument: reduce to k = 1
J = 2
dk
Estimation to testing:
E[‖w∗− w‖22]≥ δ 2
2P[J 6= J]
Looking ahead: bound KL between Pj and base distribution P0
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20
![Page 93: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/93.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 94: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/94.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 95: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/95.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 96: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/96.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 97: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/97.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 98: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/98.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 99: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/99.jpg)
Some Information Theory
Let X ∼ Uniform(±1d )
Let Pj(Z (1:n)) be distribution conditioned on J = j
Let P0(Z (1:n)) be distribution with Y independent of X
Assouad’s method:
P[J 6= J]≥ 12−
√√√√ 1d
d
∑j=1
Dkl(P0(Z (1:n)) || Pj(Z (1:n))
)
2δ
Xj : −1 +1
Key fact: (Y ,Xj) independent of X¬j under Pj
Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20
![Page 100: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/100.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z))
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj)
≤ 4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 101: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/101.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2 I(Xj ;Z | Y , Z = z)︸ ︷︷ ︸
mutual information
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj)
≤ 4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 102: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/102.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj)
≤ 4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 103: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/103.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj)
≤ 4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 104: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/104.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj) ≤4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 105: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/105.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj) ≤4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
≤ 4δ 2
dI(X ;Z ,Y | Z )
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 106: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/106.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj) ≤4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
≤ 4δ 2
dI(X ;Z ,Y | Z )︸ ︷︷ ︸
b+O(1)
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 107: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/107.jpg)
Strong Data-Processing Inequality
Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.
PropositionFor any z,
Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)
≤ 4δ2I(Xj ;Z ,Y | Z = z)
Plug into Assouad:1d
d
∑j=1
Dkl (P0 || Pj) ≤4δ 2
d
d
∑j=1
I(Xj ;Z ,Y | Z )
≤ 4δ 2
dI(X ;Z ,Y | Z )︸ ︷︷ ︸
b+O(1)
Only get 4δ 2bd bits per round!
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20
![Page 108: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/108.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 109: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/109.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 110: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/110.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 111: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/111.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 112: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/112.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 113: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/113.jpg)
Upper Bound
Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:
w(i) = argminw
〈θ (i),w〉+ λ
√n‖w‖1 +
12η‖w‖2
2
,
θ(i) =
i−1
∑i ′=1
x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).
Hard part: determine support of w(i).
Need to distinguish |θj | ≥ λ√
n (signal) from |θj | ≈√
n (noise)
Can use count-min sketch, memory usage ≈ d log(d)λ 2
=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20
![Page 114: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/114.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 115: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/115.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 116: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/116.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 117: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/117.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 118: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/118.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 119: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/119.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20
![Page 120: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford](https://reader035.fdocuments.us/reader035/viewer/2022071502/612250acd41c783b41614d56/html5/thumbnails/120.jpg)
Discussion
Summary:
Upper and lower bounds on memory-constrained regression
Lower bound: extend data processing inequality to handle covariates
Upper bound: use `1-regularizer to reduce to sketching
Future work:
Close the gap (kd/bε vs kd/bε2)
Weaken upper bound assumptions
J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20