Feras Saad Optimal Approximate Sampling POPL 2020 New...

111
Optimal Approximate Sampling from Discrete Probability Distributions Feras Saad, Cameron Freer, Martin Rinard, Vikash Mansinghka Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology POPL 2020 New Orleans, LA, USA https://github.com/probcomp/optimal-approximate-sampling

Transcript of Feras Saad Optimal Approximate Sampling POPL 2020 New...

Page 1: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal Approximate Samplingfrom Discrete Probability Distributions

Feras Saad, Cameron Freer, Martin Rinard, Vikash Mansinghka

Computer Science & Artificial Intelligence LaboratoryMassachusetts Institute of Technology

POPL 2020New Orleans, LA, USA

https://github.com/probcomp/optimal-approximate-sampling

Page 2: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 3: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 4: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

What is a random sampling algorithm?

Let w := ( w1, ..., wn ) be a list of positive integers which sum to S.

Let p := ( p1, ..., pn ) be a probability distribution where pi = wi / S (i = 1, ..., n).

A sampling algorithm (sampler) for p is a randomized algorithm A such that:

Pr[A returns integer j ] = pj (j = 1, ..., n)1 2 3 n...

...

p1 p2 p3 pn...

Page 5: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Sampling is a fundamental operation in many fields

A sampler for p is a randomized algorithm A such that:

Pr[A returns integer j ] = pj (j = 1, ..., n)

Sampling is central to many fields…

Robotics Probabilistic Robotics, Thrun et al. 2005Artificial Intelligence Artificial Intelligence: A Modern Approach, Russell & Norvig 1994Computational Statistics Random Variate Generation, Devroye 1986Operations Research Simulation Techniques in Operations Research, Harling 1958Statistical Physics Monte Carlo Methods in Statistical Physics, Binder 1986Financial Engineering Monte Carlo Methods in Financial Engineering, Glasserman 2003Machine Learning An Introduction to MCMC for Machine Learning, Andrieu et al. 2003Systems Biology Randomization and Monte Carlo Methods in Biology, Manly 1991Scientific Computing Monte Carlo Strategies in Scientific Computing, Liu 2001

1 2 3 n...

...

p1 p2 p3 pn...

Page 6: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Libraries provide algorithms for discrete sampling

Page 7: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 8: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Libraries provide algorithms for discrete sampling

Page 9: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Libraries assume analytic model of sampling

Entropy SourceS

0.1941231237… Sampling AlgorithmA“infinitely-precise”

uniform random variate

U

1 2 3 n...

...

p1 p2 p3 pn...

Page 10: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Libraries assume analytic model of sampling

“real RAM”

Page 11: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity---important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 12: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity---important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 13: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity—important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 14: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity—important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 15: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity—important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 16: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity—important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 17: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Pros & cons of analytic model of random sampling

Statisticians prove theorems about infinitely-precise transforms of U (assuming real RAM).

Library developers implement algorithms “directly” (using floating-point approximations).

But, real RAM abstracts away details of entropy resources, numerical precision, computability, and complexity—important for both CS theory and practice.

“Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin.”

“If one considers arithmetic methods in detail, it is quickly found that the critical thing about them is the very obscure, very imperfectly understood behavior of round-off errors in mathematics.”

“One might as well admit that one can prove nothing, because the amount of theoretical information about the statistical properties of the round-off mechanism is nil.”

“I have a feeling, however, that it is somewhat silly to take a random number and put it elaborately through a power series.”

Von Neumann, J. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, vol 12. 1951.

Page 18: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Let p := (p1, ..., pn) be a discrete probability distribution (0 < pi < 1, ∑i pi = 1)

Step 1: Use p to make n bins of unit interval [0,1].

Step 2: Simulate U ~ Uniform([0,1])

Step 3: Return integer j such that U is in bin j

“Throw a dart and choose the bin it lands in”

The inversion method: A universal sampler

...

0 p1 p2 p3

U

1pn

1 2 3 n...

...

p1 p2 p3 pn...

Page 19: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Let p := (p1, ..., pn) be a discrete probability distribution (0 < pi < 1, ∑i pi = 1).

Step 1: Use p to make n bins of unit interval [0,1].

Step 2: Simulate U ~ Uniform([0,1])

Step 3: Return integer j such that U is in bin j

“Throw a dart and choose the bin it lands in”

The inversion method: A universal sampler

...

0 p1 p2 p3

U

1pn

1 2 3 n...

...

p1 p2 p3 pn...

Page 20: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Let p := (p1, ..., pn) be a discrete probability distribution (0 < pi < 1, ∑i pi = 1).

Step 1: Use p to make n bins of unit interval [0,1].

Step 2: Simulate U ~ Uniform([0,1]).

Step 3: Return integer j such that U is in bin j

“Throw a dart and choose the bin it lands in”

The inversion method: A universal sampler

...

0 p1 p2 p3

U

1pn

1 2 3 n...

...

p1 p2 p3 pn...

Page 21: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Let p := (p1, ..., pn) be a discrete probability distribution (0 < pi < 1, ∑i pi = 1).

Step 1: Use p to make n bins of unit interval [0,1].

Step 2: Simulate U ~ Uniform([0,1]).

Step 3: Return integer j such that U is in bin j.

“Throw a dart and choose the bin it lands in”

The inversion method: A universal sampler

...

0 p1 p2 p3

U

1pn

1 2 3 n...

...

p1 p2 p3 pn...

Page 22: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Step 1: Use p to make n bins of unit interval [0,1].

Step 2: Simulate U ~ Uniform([0,1]).

Step 3: Return integer j such that U is in bin j.

“Throw a dart and choose the bin it lands in”

Sources of error in floating-point samplers

...

0 p1 p2 p3

U

1pn

Error 1 Computing normalized probabilities and running sums

Error 2 Computing canonical “real”-valued uniform random variate U on unit interval

Page 23: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Error 1: Computing target probabilities (C++ example)

user specifies list of non-negative numbers(i.e., relative probabilities)

Page 24: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Error 1: Computing target probabilities (C++ example)

user specifies list of non-negative numbers(i.e., relative probabilities)

desired probabilities are[1/12, 2/12, 1/12, 3/12, 5/12]

Page 25: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Error 1: Computing target probabilities (C++ example)

user specifies list of non-negative numbers(i.e., relative probabilities)

desired probabilities are[1/12, 2/12, 1/12, 3/12, 5/12]

actual probabilities are non-exact (IEEE 754 double-precision floats)

Page 26: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Informally, canonical “real”-random variate U is generated by:

1. Draw k random bits, forming a k-bit integer Z in {0, …, 2k - 1}

2. Set U := Z / 2k

Error 2: Computing canonical variate U (C++ example)

Parametrized by precision, i.e., number of bits used to represent U in floating point

Page 27: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Floating-point samplers also waste random bits

Canonical uniform random variate U is constructed by:

1. Draw k random bits, forming a k-bit integer Z in {0, …, 2k - 1}

2. Set U := Z / 2k

In IEEE 754 double-precision floating point, k = 53 bits (C++ default).

Using 53 bits per sample can be very wasteful (depending on p).

Example Consider distribution p = [½, ½].

Only 1 random bit needed to sample from p.

Naive inversion using U wastes 52 bits!

10 p2 = ½

U

p1 = ½

Page 28: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Floating-point samplers also waste random bits

Canonical uniform random variate U is constructed by:

1. Draw k random bits, forming a k-bit integer Z in {0, …, 2k - 1}

2. Set U := Z / 2k

In IEEE 754 double-precision floating point, k = 53 bits (C++ default).

Using 53 bits per sample can be very wasteful (depending on p).

Example Consider distribution p = [½, ½].

Only 1 random bit needed to sample from p.

Naive inversion using U wastes 52 bits!

10 p2 = ½

U

p1 = ½

Page 29: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Recap: Inefficiencies & errors in common samplers

Floating-point sampling algorithms (MATLAB, numpy, R, C++, etc.):

● Waste random bits

a. Generating U using full machine word (e.g., 53-bits) can use significantly more random bits than are theoretically needed to sample from p.

Wasting random bits => excessive calls to underlying PRNG => slower wall-clock time

● Have suboptimal sampling error

a. Sampler manipulates probabilities ( p1, …, pn ) using limited-precision arithmetic.

b. Canonical “real”-random variate U is discrete and only approximately uniform.

Modern samplers generate billions samples per second => small errors can magnify

Page 30: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Recap: Inefficiencies & errors in common samplers

Floating-point sampling algorithms (MATLAB, numpy, R, C++, etc.):

● Waste random bits

a. Generating U using full machine word (e.g., 53-bits) can use significantly more random bits than are theoretically needed to sample from p.

Wasting random bits => excessive calls to underlying PRNG => slower wall-clock time

● Have suboptimal sampling error

a. Sampler manipulates probabilities ( p1, …, pn ) using limited-precision arithmetic.

b. Canonical “real”-random variate U is discrete and only approximately uniform.

Modern samplers generate billions samples per second => small errors can magnify

Page 31: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Recap: Inefficiencies & errors in common samplers

Floating-point sampling algorithms (MATLAB, numpy, R, C++, etc.):

● Waste random bits

a. Generating U using full machine word (e.g., 53-bits) can use significantly more random bits than are theoretically needed to sample from p.

Wasting random bits => excessive calls to underlying PRNG => slower wall-clock time

102x—103xspeedup

103x—104xspeedup

Roy et al. [2013]GNU C++ LibraryOAS (this work)

Page 32: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Recap: Inefficiencies & errors in common samplers

Floating-point sampling algorithms (MATLAB, numpy, R, C++, etc.):

● Waste random bits

a. Generating U using full machine word (e.g., 53-bits) can use significantly more random bits than are theoretically needed to sample from p.

Wasting random bits => excessive calls to underlying PRNG => slower wall-clock time

● Have suboptimal sampling error

a. Sampler manipulates probabilities ( p1, …, pn ) using limited-precision arithmetic.

b. Canonical “real”-random variate U is discrete and only approximately uniform.

Modern samplers generate billions samples per second => small errors can magnify

Page 33: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Recap: Inefficiencies & errors in common samplers

Floating-point sampling algorithms (MATLAB, numpy, R, C++, etc.):

● Waste random bits

a. Generating U using full machine word (e.g., 53-bits) can use significantly more random bits than are theoretically needed to sample from p.

Wasting random bits => excessive calls to underlying PRNG => slower wall-clock time

● Have suboptimal sampling error

a. Sampler manipulates probabilities ( p1, …, pn ) using limited-precision arithmetic.

b. Canonical “real”-random variate U is discrete and only approximately uniform.

Modern samplers generate billions samples per second => small errors can magnify

Uyematsu-Li Interval SamplerGNU C++ LibraryOAS (this work)

102x—104xmore accurate

Page 34: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 35: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 36: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

102x—103xspeedup

103x—104xspeedup

Roy et al. [2013]GNU C++ LibraryOAS (this work)

Page 37: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 38: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Uyematsu-Li Interval SamplerGNU C++ LibraryOAS (this work)

102x—104xmore accurate

Page 39: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1. Precise formulation of optimal approximate sampling for discrete distributions.

2. Efficient algorithms for finding optimal approximations to discrete distributions.

3. Efficient algorithms for constructing entropy-optimal samplers.

4. Empirical comparisons to existing limited-precision samplers.

Superior wall-clock time and sampling accuracy.

5. Empirical comparisons to existing exact samplers.

Enable optimally trading-off precision with accuracy.

Key contributions

Page 40: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1. Precise formulation of optimal approximate sampling for discrete distributions.

2. Efficient algorithms for finding optimal approximations to discrete distributions.

3. Efficient algorithms for constructing entropy-optimal samplers.

4. Empirical comparisons to existing limited-precision samplers.

Superior wall-clock time and sampling accuracy.

5. Empirical comparisons to existing exact samplers.

Enable optimally trading-off precision with accuracy.

Key contributions

Page 41: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1. Precise formulation of optimal approximate sampling for discrete distributions.

2. Efficient algorithms for finding optimal approximations to discrete distributions.

3. Efficient algorithms for constructing entropy-optimal samplers.

4. Empirical comparisons to existing limited-precision samplers.

Superior wall-clock time and sampling accuracy.

5. Empirical comparisons to existing exact samplers.

Enable optimally trading-off precision with accuracy.

Key contributions

Page 42: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1. Precise formulation of optimal approximate sampling for discrete distributions.

2. Efficient algorithms for finding optimal approximations to discrete distributions.

3. Efficient algorithms for constructing entropy-optimal samplers.

4. Empirical comparisons to existing limited-precision samplers.

Superior wall-clock time and sampling accuracy.

5. Empirical comparisons to existing exact samplers.

Enable optimally trading-off precision with accuracy.

Key contributions

Page 43: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1. Precise formulation of optimal approximate sampling for discrete distributions.

2. Efficient algorithms for finding optimal approximations to discrete distributions.

3. Efficient algorithms for constructing entropy-optimal samplers.

4. Empirical comparisons to existing limited-precision samplers.

Superior wall-clock time and sampling accuracy.

5. Empirical comparisons to existing exact samplers.

Enable optimally trading-off precision with accuracy.

Key contributions

Page 44: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 45: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 46: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Random source S lazily emits fair 0-1 coin flips.(replaces uniform random variate U)

For discrete distribution p := ( p1, ..., pn ), sampler is a partial map from finite sequences of coin flips to outcomes:

A: ⊎{0,1}k → {1,...,n}

For continuous distribution F, sampler is a partial map from finite sequences of coin flips to digits of the real output (in a number a system, e.g., binary expansion, continued fraction):

A: ⊎{0,1}k → ⊎{0,1}d

Random bit model of sampling

D. Knuth, A. Yao. The complexity of nonuniform random number generation. In Algorithms and Complexity: New Directions and Recent Results. 1976.

Page 47: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Analytic vs. random bit model of sampling

Entropy SourceS

0.1941231237… Sampling AlgorithmA“infinitely-precise”

uniform random variate

U

Entropy SourceS

Sampling AlgorithmAlazily generated

fair/independent bits

b1

0

b2

1

b3

1

b4

0 …

1 2 3 n...

...

p1 p2 p3 pn...

1 2 3 n...

...

p1 p2 p3 pn...

Page 48: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1

1010

0 10 10 1 0 1

0

Every sampler in the random bit model is a tree

Let A be a sampling algorithm for discrete distribution p.

We can represent A as a complete binary tree T (each internal node has two children).

1. Start at the root node.2. Call flip, if 0 go to left child, if 1 go to right child.3. If the child is a leaf, return the label of that node.

Else goto 2.

T is called discrete distribution-generating (DDG) tree.

Example DDG tree for a fair dice, p = (⅙,⅙,⅙,⅙,⅙,⅙);encodes a rejection sampler

(001, 010, 011, 100, 101, 110) -> (1,2,3,4,5,6)(000, 111) -> reject and repeat

Page 49: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1

1010

0 10 10 1 0 1

0

Every sampler in the random bit model is a tree

Let A be a sampling algorithm for discrete distribution p.

We can represent A as a complete binary tree T (each internal node has two children).

1. Start at the root node.2. Call flip, if 0 go to left child, if 1 go to right child.3. If the child is a leaf, return the label of that node.

Else goto 2.

T is called discrete distribution-generating (DDG) tree.

Example DDG tree for a fair dice, p = (⅙,⅙,⅙,⅙,⅙,⅙);encodes a rejection sampler

(001, 010, 011, 100, 101, 110) -> (1,2,3,4,5,6)(000, 111) -> reject and repeat

Page 50: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1

1010

0 10 10 1 0 1

0

Every sampler in the random bit model is a tree

Let A be a sampling algorithm for discrete distribution p.

We can represent A as a complete binary tree T (each internal node has two children).

1. Start at the root node.2. Call flip, if 0 go to left child, if 1 go to right child.3. If the child is a leaf, return the label of that node.

Else goto 2.

T is called discrete distribution-generating (DDG) tree.

Example DDG tree for a fair dice, p = (⅙,⅙,⅙,⅙,⅙,⅙);encodes a rejection sampler

(001, 010, 011, 100, 101, 110) -> (1,2,3,4,5,6)(000, 111) -> reject and repeat

Page 51: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1

1010

0 10 10 1 0 1

0

Every sampler in the random bit model is a tree

Let A be a sampling algorithm for discrete distribution p.

We can represent A as a complete binary tree T (each internal node has two children).

1. Start at the root node.2. Call flip, if 0 go to left child, if 1 go to right child.3. If the child is a leaf, return the label of that node.

Else goto 2.

T is called a discrete distribution-generating (DDG) tree.

Example DDG tree for a fair dice, p = (⅙,⅙,⅙,⅙,⅙,⅙);encodes a rejection sampler

(001, 010, 011, 100, 101, 110) -> (1,2,3,4,5,6)(000, 111) -> reject and repeat

Page 52: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

1

1010

0 10 10 1 0 1

0

Every sampler in the random bit model is a tree

Let A be a sampling algorithm for discrete distribution p.

We can represent A as a complete binary tree T (each internal node has two children).

1. Start at the root node.2. Call flip, if 0 go to left child, if 1 go to right child.3. If the child is a leaf, return the label of that node.

Else goto 2.

T is called a discrete distribution-generating (DDG) tree.

Example DDG tree for a fair dice, p = (⅙,⅙,⅙,⅙,⅙,⅙);encodes a rejection sampler

(001, 010, 011, 100, 101, 110) -> (1,2,3,4,5,6)(000, 111) -> reject and repeat

Page 53: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Entropy-optimal sampling (Knuth-Yao 1976)

For distribution p = ( p1, …, pn ), find DDG tree with least average number of flips

(i.e., is “entropy-optimal”).

Page 54: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem Entropy-optimal tree has leaf i at level j iff jth bit in binary expansion of pi = 1.

Entropy-optimal sampling (Knuth-Yao 1976)

Page 55: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem Entropy-optimal tree has leaf i at level j iff jth bit in binary expansion of pi = 1.

p = (1/2, 1/4, 1/4)

dyadic probabilities, no back edges

Example p = (3/10, 7/10)

Example p = (1/π, 1/e, 1-1/π-

Entropy-optimal sampling (Knuth-Yao 1976)

Page 56: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem Entropy-optimal tree has leaf i at level j iff jth bit in binary expansion of pi = 1.

p = (3/10, 7/10)

rational probabilities, back edges

Example p = (3/10, 7/10)

Example p = (1/π, 1/e, 1-1/π-1/e)*uncomputable; not considered here

Entropy-optimal sampling (Knuth-Yao 1976)

Page 57: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem Entropy-optimal tree has leaf i at level j iff jth bit in binary expansion of pi = 1.

p = (1/π, 1/e, 1-1/π-1/e)

irrational probabilities; uncomputable, not considered further

Example p = (3/10, 7/10)

Example p = (1/π, 1/e, 1-1/π-1/e)*uncomputable; not considered here

Entropy-optimal sampling (Knuth-Yao 1976)

Page 58: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem The expected number of flips in an entropy-optimal sampler satisfies

H(p) ≤ E[ # flips ] < H(p) + 2,

where H(p) := ∑i pi log( 1/pi ) is the Shannon entropy.

Intuition Lower bound is obvious.

Upper bound: Uniform distribution on {1, …, 2k } is full binary tree; needs exactly k bits.

Knuth-Yao prove there is a worst-case 2-bit cost for sampling non-full binary trees.

Bounding number of bits in entropy-optimal DDG

Page 59: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Why not just use exact entropy-optimal samplers from Knuth-Yao?

Example

Entropy-optimal tree for

Binomial(n=50, p=0.61)

has 10104 levels

(i.e., ~1091 terabytes)

Exact entropy-optimal samplers = exponential size

log-linear plot(exponential growth)

Page 60: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Exact entropy-optimal samplers = exponential size

Why not just use exact entropy-optimal samplers from Knuth-Yao?

Theorem (3.5 and 3.6 in main paper)

Suppose ( w1, …, wn ) are positive integers summing to S; set pi = wi / S ( i = 1, …, n ).

Entropy optimal sampler for p has depth at most S - 1 and this bound is tight.

DDG trees are exponentially large in the number of bits needed to encode p!

- Θ(n log(S)) bits needed to encode the input p

- Θ(n S) bits needed to encode DDG(p) in the worst case.

Page 61: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Exact entropy-optimal samplers = exponential size

Why not just use exact entropy-optimal samplers from Knuth-Yao?

Theorem (3.5 and 3.6 in main paper)

Suppose ( w1, …, wn ) are positive integers summing to S; set pi = wi / S ( i = 1, …, n ).

Entropy optimal sampler for p has depth at most S - 1 and this bound is tight.

Exact samplers from Knuth & Yao often infeasible to construct in practice.

“Most of the algorithms which achieve these optimum bounds are very complex, requiring a tremendous amount of space”. [KY 1976, pp. 409]

Page 62: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 63: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

We establish a principled and efficient algorithm for replacingvery deep

an entropy-optimal exact DDG tree(sampler) for p with arbitrarily large depth

withdepth-k

an entropy-optimal closestapproximate sampler for p that has anypre-specified depth k

Approximate entropy-optimal sampling

Page 64: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Goals for optimal approximate sampling

Design a limited-precision sampler for a discrete probability distribution that:

● is optimally efficient in its use of random bits

(uses the theoretically smallest amount of random bits per sample, on average)

Idea: Rather than generate U all “at once”, lazily generate random bits “one at a time” and use them optimally [Knuth & Yao, 1976].

● is optimally accurate

(attains theoretically smallest amount of sampling error, for the given precision)

Idea: Explicitly minimize error over set of achievable distributions of entropy-optimal limited-precision samplers

Page 65: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Approximate entropy-optimal sampling

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Error is inevitable since the depth (precision) is limited to k bits.

In random bit model model, depth of DDG tree ⟺ bit-precision of sampler.

We allow the “depth-k” tree to have back-edges.

Page 66: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Approximate entropy-optimal sampling

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Error is inevitable since the depth (precision) is limited to k bits.

In random bit model model, depth of DDG tree ⟺ bit-precision of sampler.

We allow the “depth-k” tree to have back-edges.

Page 67: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Approximate entropy-optimal sampling

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Error is inevitable since the depth (precision) is limited to k bits.

In random bit model model, depth of DDG tree ⟺ bit-precision of sampler.

We allow the “depth-k” tree to have back-edges.

Page 68: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Approximate entropy-optimal sampling

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Error is inevitable since the depth (precision) is limited to k bits.

In random bit model model, depth of DDG tree ⟺ bit-precision of sampler.

We allow the “depth-k” tree to have back-edges.

Page 69: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

We establish a principled and efficient algorithm for replacingvery deep

an entropy-optimal exact DDG tree(sampler) for p with arbitrarily large depth

withdepth-k

an entropy-optimal closestapproximate sampler for p that has anypre-specified depth k

Approximate entropy-optimal sampling

Page 70: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Summary of main result (Theorem 4.7)

Given any

- discrete probability distribution p;- specification of k > 0 bits of precision; and- measure of statistical error that is an f-divergence.

we efficiently construct the most accurate sampling algorithm among all entropy-optimal samplers for p that use k bits of precision.

Additional property: Our samplers are more accurate and entropy-efficient than any sampler that consumes at most k random bits per sample.

- Includes all floating-point samplers in standard libraries that transform a uniform variate U (e.g., R, MATLAB, numpy, scipy, C++, GNU GSL, etc.)

Page 71: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Summary of main result (Theorem 4.7 + Prop. 2.16)

Given any

- discrete probability distribution p;- specification of k > 0 bits of precision; and- measure of statistical error that is an f-divergence.

we efficiently construct the most accurate sampling algorithm among all entropy-optimal samplers for p that use k bits of precision.

Additional property: This sampler is more accurate and entropy-efficient than any sampler that consumes at most k random bits per sample.

- Includes all floating-point samplers in standard libraries that transform a uniform variate U (e.g., R, MATLAB, numpy, scipy, C++, GNU GSL, etc.)

Page 72: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Summary of main result (Theorem 4.7)

Given any

- discrete probability distribution p;- specification of k > 0 bits of precision; and- measure of statistical error that is an f-divergence.

we construct the most accurate sampling algorithm among all entropy-optimal samplers for p that use k bits of precision.

Additional property: Our samplers are more accurate and entropy-efficient than any sampler that consumes at most k random bits per sample.

- Includes all floating-point samplers in standard libraries that transform a uniform variate U (e.g., R, MATLAB, numpy, scipy, C++, GNU GSL, etc.)

Page 73: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

f-Divergences: A family of statistical error metrics

Let S be the space of all probability distributions on some finite set.

Definition A statistical divergence Δ on S is a function Δ( . || . ) : S × S →[0, ∞] such that

Δ(p||q) = 0 if and only if p = q.

Definition An f-divergence is a statistical divergence such that

Δg(p||q) = ∑ g(qi / pi ) pi

for some convex function g : (0, ∞) →ℜ with g(1) = 0 ( g is called the generator of Δg ).

f-divergences are widely used in information theory, statistics, machine learning, etc.

Page 74: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

f-Divergences: A family of statistical error metrics

Let S be the space of all probability distributions on some finite set.

Definition A statistical divergence Δ on S is a function Δ( . || . ) : S × S →[0, ∞] such that

Δ(p||q) = 0 if and only if p = q.

Definition An f-divergence is a statistical divergence such that

Δg(p||q) = ∑ g(qi / pi ) pi

for some convex function g : (0, ∞) →ℜ with g(1) = 0 ( g is called the generator of Δg ).

f-divergences are widely used in information theory, statistics, machine learning, etc.

Page 75: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Examples of f-divergences & generating functions

Page 76: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Minimizing error is a balls-to-bins problem

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δg(p, q).

Theorem (3.4 in main paper) The output probabilities ( q1, ..., qn ) of an entropy-optimal depth-k DDG tree are all integer multiples of 1 / (2k - 2ℓ I[ℓ < k]), for the same ℓ (0 ≤ ℓ ≤ k).

Optimization Problem (formal) Assign Z balls to n bins such so as to minimize the error:

Δg (p||q) = ∑i g(mi / (Zpi )) pi , (OBJ-FUN)

where mi is the number of balls assigned to bin i (i = 1, …., n), i.e., qi = mi / Z.

(solve OBJ-FUN separately for Z = 2k; 2k - 2k-1; … ; 2k - 2; 2k - 1)

Page 77: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Minimizing error is a balls-to-bins problem

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δg(p, q).

Theorem (3.4 in main paper) The output probabilities ( q1, ..., qn ) of an entropy-optimal depth-k DDG tree are all integer multiples of 1 / (2k - 2ℓ I[ℓ < k]), for the same ℓ (0 ≤ ℓ ≤ k).

Optimization Problem (formal) Assign Z balls to n bins such so as to minimize the error:

Δg (p||q) = ∑i g(mi / (Zpi )) pi , (OBJ-FUN)

where mi is the number of balls assigned to bin i (i = 1, …., n), i.e., qi = mi / Z.

(solve OBJ-FUN separately for Z = 2k; 2k - 2k-1; … ; 2k - 2; 2k - 1)

Page 78: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Minimizing error is a balls-to-bins problem

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δg(p, q).

Theorem (3.4 in main paper) The output probabilities ( q1, ..., qn ) of an entropy-optimal depth-k DDG tree are all integer multiples of 1 / (2k - 2ℓ I[ℓ < k]), for the same ℓ (0 ≤ ℓ ≤ k).

Optimization Problem (formal) Assign Z balls to n bins such so as to minimize the error:

Δg (p||q) = ∑i g(mi / (Zpi )) pi , (OBJ-FUN)

where mi is the number of balls assigned to bin i (i = 1, …., n), i.e., qi = mi / Z.

(solve OBJ-FUN separately for Z = 2k; 2k - 2k-1; … ; 2k - 2; 2k - 1)

Page 79: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Minimizing error is a balls-to-bins problem

O(n logn) discreteoptimization algorithm

(see main paper)

Page 80: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 81: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers: Wall-clock runtime benefits

Knuth–Yao Sampler from Roy et al. [2013] Inversion Sampler (GNU C++ Library) Optimal Approximate Sampler (OAS; this work)

scales O(n H(p)) <= O(n log(n))scales O(n)scales O(H(p)) <= O(log(n))

103x—104xspeedup

102x—103xspeedup

Page 82: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers: Efficiency benefits

● Optimal approximate samplersCall PRNG a variable number of times => efficiently using random bits (faster).

● Inversion sampler (and other floating-point samplers)Call PRNG a fixed number of times => wasting random bits (slower).

Page 83: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers: Efficiency benefits

● Optimal approximate samplersCall PRNG a variable number of times => efficiently using random bits (faster).

● Inversion sampler (and other floating-point samplers)Call PRNG a fixed number of times => wasting random bits (slower).

Page 84: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers: Efficiency benefits

● Optimal approximate samplersCall PRNG a variable number of times => efficiently using random bits (faster).

● Inversion sampler (and other floating-point samplers)Call PRNG a fixed number of times => wasting random bits (slower).

Page 85: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers: Efficiency benefits

● Optimal approximate samplersCall PRNG a variable number of times => efficiently using random bits (faster).

● Inversion sampler (and other floating-point samplers)Call PRNG a fixed number of times => wasting random bits (slower).

Page 86: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

each panel: a common family of discrete distributions

x-axis: relative (Hellinger) error of baseline sampler to our optimal sampler.

y-axis: fraction of 500 random distributions whose relative error is <= value on the x-axis.

Optimal samplers: Accuracy benefits

100x to 10000xmore accurate

Interval Sampler [Uyematsu-Li, 2013]

Inversion Sampler [GNU C++ library]OAS (this work)

Page 87: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Total-Variation(p, q) = ∑i | pi – qi |

“Manhattan Distance”

Low entropyOne outcome with most mass

Medium entropyFew outcomes with most mass

High entropyMany outcomes with equal mass

TV sampling error increases as entropy increases

How many bits do I need? It depends… O

ptim

al E

rror

Error Measure: Total Variation

Entropy of Target Distribution (Bits)

low entropy med. entropy high entropy

Page 88: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

KL(p, q) = ∑i log(pi / qi ) pi

“Information-Theoretic Distance”

Low entropyOne outcome with most mass

Medium entropyFew outcomes with most mass

High entropyMany outcomes with equal mass

TV sampling error decreases as entropy increases

How many bits do I need? It depends… O

ptim

al E

rror

Error Measure: KL Divergence

Entropy of Target Distribution (Bits)

low entropy med. entropy high entropy

Page 89: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Entropy of target distribution dictates required bit-precision for desired level of error.

Error vs. entropy behaves differently depending on the choice of error metric.

Optimal approximate sampler algorithms work for many choices of error metrics.

How many bits do I need? It depends…

Error Measure: Total Variation Error Measure: KL Divergence

Entropy of Target Distribution (Bits) Entropy of Target Distribution (Bits)

Opt

imal

Err

or

Opt

imal

Err

or

Page 90: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimal samplers versus exact samplers

Page 91: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

- roughly same bits/sample as exact entropy-optimal sampler- use significantly less precision than exact entropy-optimal sampler- small approximation error

Optimal samplers versus exact samplers

Page 92: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

- roughly same bits/sample as exact entropy-optimal sampler- use significantly less precision than exact entropy-optimal sampler- small approximation error

Optimal samplers versus exact samplers

Page 93: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

- roughly same bits/sample as exact entropy-optimal sampler- use significantly less precision than exact entropy-optimal sampler- small approximation error

Optimal samplers versus exact samplers

Page 94: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 95: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Application of DDG sampling: Cryptography

In cryptography, sampling over finite discrete “lattices” is a key subproblem:

● Entropy-optimal sampling needed

(entropy is expensive resource)

● Limited-precision analysis needed

(for theoretical security guarantees)

L. Ducas and P. Q. Nguyen. Faster Gaussian Lattice Sampling using Lazy Floating-Point Arithmetic. ASIACRYPT 2012.Roy, et al. High Precision Discrete Gaussian Sampling on FPGAs. SAC 2013.N. Dwarakanath and S. Galbraith. Sampling from discrete Gaussians for lattice-based cryptography on a constrained device. Appl. Alg. Eng. Comm. Comp., 25(3). 2014.J. Follath. Gaussian Sampling in Lattice Based Cryptography. Tatra Mt. Math. Publ 60. 2014.C. Du and G. Bai. Towards efficient discrete Gaussian sampling for lattice-based cryptography. FPL 2015.

Page 96: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Hardware architecture for sampling DDG trees

Roy, et al. High Precision Discrete Gaussian Sampling on FPGAs. SAC 2013.

The (N × k) binary probability matrix P is encoded into ROM and sampled as follows:

Algorithm Knuth-Yao SamplingInput Probability Matrix POutput Sample in [0, ..., N-1]

d = 0col = 0while True:

r = flip()d = 2*d + (1-r)for row in [N-1, …, 0]:

d -= P[row][col]if d == -1:

return rowcol = col + 1

Page 97: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Related work

1. Exact sampling with near-optimal entropy and linear spaceThe Fast Loaded Dice Roller [Saad et al., 2020] (AISTATS, coming soon)

2. Coalgebraic framework for implementing and composing entropy-preserving reductions between arbitrary input sources to output distributions

Kozen & Soloviev [2018]; see also Pae & Loui [2006] for asymptotically-optimal variable-length conversions using coins of unknown bias

3. Limited-precision samplers for discrete distributionsrandom graph [Blanca & Mihail 2012] geometric [Bringmann & Friedrich 2013],uniform [Lumbroso 2013], discrete Gaussian [Folláth 2014], general [Uyematsu & Li 2003]

4. Variants of random bit model (biased/unknown/non i.i.d. sources, variable precision)[von Neumann 1951; Elias 1972; Stout & Warren 1984; Blum 1986; Roche 1991; Peres 1992; Han & Verdú 1993; Vembu & Verdú 1995; Abrahams 1996; Cicalese et al. 2006; Kozen 2014]

Page 98: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Talk Outline

● Introduction

● Issues with Floating-Point Samplers

● Overview of Approach

● Experimental Results

● Related Work

● Technical Appendix

Page 99: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Bernoulli sampling in the random bit model

Suppose p ∈ (0, 1) and write p = (0.p1 p2 p3…)2 in the base-2 expansion.

analytic sampler random bit sampler

Generate uniform real U i = 0return 1 if U < p else 0 repeat

i = i + 1Generate random bit B

until B ≠ pireturn 1 if B < pi else 0

Page 100: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Rejection sampling in the random bit model

w1

1 .. .. 1 2 .. 2 .. n .. n R R R R

w2 wn REJECT

S 2k

* what is expected # flips?* is the method efficient? (in space? in entropy?)

0

Suppose ( w1, …, wn ) are positive integers summing to S and put

pi = wi / S ( i = 1, …, n )

Fix integer k so that 2k-1 < S ≤ 2k

random bit sampler “choose a random cell in this table”

repeatgenerate k random bitsforming integer Z

until Z < Mreturn Table[Z] (shown right)

Page 101: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Suppose k = 7 bits.Truncate P after column 7 to get Q.

Issue: Probabilities of q sum to < 1.Solution? Normalize Q by adding 1s randomly.

Example Suppose Δ is KL Divergence: KL(p, q) = ∑i log(pi / qi ) pi .Suppose p = [ϵ, (1-ϵ)/2, (1-ϵ)/2]; ϵ ≪ 1/2k.Then first k digits of ϵ are 0, at least one unit needs to be added to Q.

Case 1: Add units to q1 ⟹ KL(p || q) < ∞Case 2: Do not add units to q1 ⟹ KL(p || q) = ∞

0 0 0 0 0 0 0 1 0 1 1 1 1 1 0

0 1 0 1 1 1 1 0 0 0 1 0 1 1 0

0 1 0 1 0 0 0 0 0 1 0 1 0 1 0

p1

p2

p3

Naive truncation can result in large sampling errors

Q

= 0.

P

Page 102: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δ(p, q).

Suppose k = 7 bits.Truncate P after column 7 to get Q.

Issue: Probabilities of q sum to < 1.Solution? Normalize Q by adding 1s randomly.

Example Suppose Δ is Pearson Chi-Square: χ2(p || q) = ∑i (pi - qi)2 / pi .

Suppose p = [ϵ, ϵ, ϵ, ϵ, 1-4ϵ]; ϵ ≪ 1/2k.Then first k digits of 1-4ϵ have k ones, so 1 unit (2-k) needs to be added to last column of Q.

Case 1: Give last unit to q1 ⟹ χ(p || q) = 4ϵ + (1-4ϵ-1)²/ϵ² = 16ϵ ≈ 0.Case 2: Give last unit to q5 ⟹ χ(p || q) ≥ (ϵ - 1/2ᵏ)²/ϵ² ≈ 1/(ϵ²2²ᵏ) ≫ 0.

0 0 0 0 0 0 0 1 0 1 1 1 1 1 0

0 1 0 1 1 1 1 0 0 0 1 0 1 1 0

0 1 0 1 0 0 0 0 0 1 0 1 0 1 0

p1

p2

p3

Naive truncation can result in large sampling errors

P

Q

= 0.

Page 103: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Naive truncation can result in large sampling errors

Summary of previous examples: error of truncated distribution is sensitive to

- target distribution p;

- precision specification k;

- definition error measure Δ.

We will develop a general strategy that guarantees obtaining the least possible error for any

setting of these parameters.

Page 104: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Minimizing error is a balls-to-bins problem

Problem For distribution p = ( p1, …, pn ), find an entropy-optimal depth-k DDG tree whose output distribution q = ( q1, …, qn ) minimizes the error Δg(p, q).

Theorem (3.4 in main paper) The output probabilities ( q1, ..., qn ) of an entropy-optimal depth-k DDG tree are all integer multiples of 1 / (2k - 2ℓ I[ℓ < k]), for the same ℓ (0 ≤ ℓ ≤ k).

Optimization Problem (formal) Assign Z balls to n bins such so as to minimize the error:

Δg (p||q) = ∑i g(mi / (Zpi )) pi , (OBJ-FUN)

where mi is the number of balls assigned to bin i (i = 1, …., n), i.e., qi = mi / Z.

(solve OBJ-FUN separately for Z = 2k; 2k - 2k-1; … ; 2k - 2; 2k - 1)

Page 105: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Optimization Problem (formal) Assign Z balls to n bins such so as to minimize the error:

Δg (p||q) = ∑i g(mi / (Zpi )) pi , (OBJ-FUN)

where mi is the number of balls assigned to bin i (i = 1, …., n), i.e., qi = mi / Z.

(solve OBJ-FUN separately for Z = 2k; 2k - 2k-1; … ; 2k - 2; 2k - 1)

In the next slides, we will solve this problem (for any value of Z).

Notation Denote the set of assignments of Z indistinguishable balls to n bins by

𝓜[n, Z] = { (m1, …, mn) | 0 ≤ mi ≤ Z and ∑i mi = Z }.

Minimizing error is a balls-to-bins problem

Page 106: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem (4.11 in main paper) Let mZ = (m1, ..., mn) be an optimum of OBJ-FN, where mZ belongs to 𝓜[n, t] for some t > 0.Then mZ+1

= (m1, ..., mj + 1, …, mn) is an optimum of OBJ-FUN over 𝓜[n, t+1], where

j = kargmini=1,…,n pi ( f((mi + 1) / (2kpi)) - g(mi / (Zpi)) )

In English: “Find the index j where incrementing mt gives the least increase in the error.”

Proof (Sketch) The error-delta: pi ( g((mi + 1) / (Zpi)) - g(mi / (Zpi)) )is the slope of the secant that connects the points(mi + 1) / Zpi and mi / Zpi.

Leverage the fact that for any convex f the slopes satisfy:

and the optimality of mZ to show that any other solutionm’Z+1 can always be made more optimal by moving m’t one unit (up or down) toward mt + 1.

Idea: Greedy optimization (Version 1)

f

< <

Page 107: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem (4.11 in main paper) Let mZ = (m1, ..., mn) be an optimum of OBJ-FN, where mZ belongs to 𝓜[n, t] for some t > 0.Then mZ+1

= (m1, ..., mj + 1, …, mn) is an optimum of OBJ-FUN over 𝓜[n, t+1], where

j = argmini=1,…,n pi ( f((mi + 1) / (2kpi)) - g(mi / (Zpi)) )

If we can find any optimum, use the Theorem to make it sum to Z.Observation: 𝓜[n, 0] has a single element: (0, ..., 0) which is therefore optimal.

Algorithm runtime = O(n+2k).1. Initialize m = (0, …, 0)2. For each i = 1, …, 2k:

a. Find j as defined above.b. Increment mj = mj + 1.

A first-pass algorithm (exponential time)

Page 108: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Theorem (4.10 in main paper) Suppose m ∈ 𝓜[n,k,Z] for some Z > 0, not necessarily optimal; m can be made into an optimum of OBJ-FN over 𝓜[n, Z] by greedy local choices:

repeat forever: let j = argmini=1,...,n pi ( g((mi + 1) / (Zpi)) - f(mi / (Zpi)) ) // least cost of increment is ϵj let l = argmini=1,...,n pi ( g((mi ‒ 1) / (Zpi)) - f(mi / (Zpi)) ) // least cost of decrement is ϵl if ϵj + ϵl < 0:

set mj = mj + 1set ml = ml - 1

else return (m1, …, mn)

Use Theorem to optimize a random assignment:

Algorithm worst-case runtime = O(n+2k).1. Initialize m = (2k, …, 0)2. Run loop in the Theorem.

Idea: Greedy optimization (Version 2)

Page 109: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Combining greedy optimization algorithms

Theorem 4.11 “an optimal solution that sums to Z can quickly make it optimal for Z±1.”Theorem 4.10 “non-optimal solution that sums Z can be greedily made optimal for Z.”

If we could find an initial tuple m* that has the following properties:Property 1: m* is n units away from being optimal for 𝓜[n, ∑m*];Property 2: | Z - ∑m* | ≤ n,

then we have the following linear time algorithm for optimizing OBJ-FN:

1. Initialize solution to m*.2. Use Thm 4.10 to make m* optimal for 𝓜[n,k,∑m*]. // O(n) by Property 13. Use Thm 4.11 to make m* optimal for 𝓜[n,k,2k]. // O(n) by Property 24. Return m*.

Theorem (4.14 in main paper) The following initialization satisfies Properties 1 and 2:mi = Floor[ Zpi ] + I[[f( Floor[ Zpi ] / Zpi ) < f( (Floor[ Zpi ] + 1) / Zpi ) ]] ( i = 1, … n ).

Page 110: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

Intuition for the choice of initial assignment

Theorem (4.14 in main paper) The following initialization satisfies Properties 1 and 2:mi = Floor[ Zpi ] + I[[f( Floor[ Zpi ] / Zpi ) < f( (Floor[ Zpi ] + 1) / Zpi ) ]] ( i = 1, … n ).

generator g > 0 generator g < 0 on some region (0, 1)

1 1 1[ Zpi ] / Zpi ([ Zpi ]+1) / Zpi

([ Zpj ]+1) / Zpj

([ Zpj ]+2) / Zpj

([ Zpi ]) / Zpi

([ Zpi ]-1) / Zpi

generator g < 0 on some region (1, ∞)

([ Zpj ]) / Zpj

([ Zpj ]-1) / Zpj

([ Zpi ]) / Zpi

([ Zpi ]+1) / Zpi

Case 1 Case 2 Case 3

Page 111: Feras Saad Optimal Approximate Sampling POPL 2020 New ...fsaad.mit.edu/assets/popl20main-p126-slides.pdf · Computational Statistics Random Variate Generation, Devroye 1986 Operations

We establish a principled and efficient algorithm for replacingvery deep

an entropy-optimal exact DDG tree(sampler) for p with arbitrarily large depth

withdepth-k

an entropy-optimal closestapproximate sampler for p that has anypre-specified depth k

Approximate entropy-optimal sampling