Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in...

Learning and testing k-modal distributions

Rocco A. ServedioColumbia University

Joint work (in progress) with

Ilias DiakonikolasUC Berkeley

Costis DaskalakisMIT

What this talk is about

Probability distributions over [N] = {1,2,…,N}

Monotone increasing distribution: for all

(Whole talk: “increasing” means “non-decreasing”)

`

1 2 N

k-modal distributions

k-modal: k peaks and valleys

A 3-modal distribution:

A unimodal distribution: Another one:

Monotone distribution: 0-modal

The learning problem

Target distribution p is an unknown k-modal distribution over [N]

Algorithm gets samples from p

Goal: output a hypothesis h that’s -close to p in total variation distance

Want algorithm that uses few samples &

is computationally efficient.

1 2 N

The testing problem

q is a known k-modal distribution over [N].

Goal: output “yes” w.h.p. if

“no” w.h.p. if

1 2 N

p is an unknown k-modal distribution over [N].

Algorithm gets samples from p.

1 2 N

Please note

Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution.

This problem requires samples, even for k=0.

1 N 1 N

hard to distinguish vs

uniform over random uniform over

Why study these questions?

• k-modal distributions seem natural

• would be nice if k-modal structure were exploitable by efficient learning / testing algorithms

• post hoc justification: solutions exhibit interesting connections between testing and learning

The general case: learning

If we drop k-modal assumption, learning problem becomes:

Learn an arbitrary distribution over [N] to total variation distance

1 N

samples are necessary and sufficient

The general case: testing

q is a known, arbitrary distribution over [N].

Goal: output “yes” if

“no” if

p is an unknown, arbitrary distribution over [N].

Algorithm gets samples from p.

samples are necessary and sufficient [GR00, BFFKRW02, P08]

If we drop k-modal assumption, testing problem becomes:

This work: main learning result

We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses

samples

and runs in

time.

Close to optimal: -sample lower bound for

any algorithm.

Main testing result

We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses

samples

and runs in time.

Any testing algorithm must use samples.

Testing is easier than learning!

Prior work

k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access)

k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound

We’ll use this algorithm as a

black box in our results

Outline of rest of talk

• Background: some tools

• Learning k-modal distributions

• Testing k-modal distributions

First tool: Learning monotone distributions

Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size.

[B87b] also gave lower bound for learning a monotone distribution.

Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality

Theorem: [DKW56] Let be any distribution over with CDF .

Let be empirical estimate of obtained from samples.

Then with probability .

Morally, means you can partition into intervals each of mass under , using samples.

Note:

samples suffice (by easyChernoff bound argument)

true CDF

empirical CDF

Learning k-modal

distributions

The problem

Learn an unknown k-modal distribution over [N].

1 2 N

What should we shoot for?

Easy lower bound: need samples.

(have to solve monotone-distribution-learning problems over to accuracy )

Want an algorithm that uses roughly this many samples and takes time

The problem, again

Goal: learn an unknown k-modal distribution over [N].

We know how to efficiently learn an unknown monotone distribution…

Would be easy if we knew the k peaks/valleys…

Guessing them exactly: infeasible

Guessing them approximately: not too great either

X X X

A first approach

Break up [N] into many intervals:

is not monotone for at most k of the intervals

So running monotone distribution learner on each interval will usually give a good answer.

…

First approach in more detail

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

(Assumes each point has mass at most or so; heavier points are easy to detect and deal with.)

2. Run monotone distribution learner on each to get

(Actually run it twice: once for increasing, once for decreasing.

Do hypothesis testing to pick one as .)

3. Combine hypotheses in obvious way:

and

Sketch of analysis

and


Takes samples

2. Run monotone distribution learner on each to get

Takes samples

3. Combine hypotheses in obvious way:

Total error from k non-monotone intervals

from scaling factors

from estimating ’s with ’s

Improving the approach

came from running monotone distribution learner times rather than just times

If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save…

…this is a property testing problem!

More sophisticated algorithm: two new ingredients.

First ingredient: testing k-modal distributions for monotonicity

Consider the following property testing problem:

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Algorithm gets samples from unknown k-modal distribution p over [N].

1 n 1 n

hard to distinguish

Note: k-modal promise for p might save us from lower bound…

Efficiently testing k-modal distributions for monotonicity




Theorem: There is a -sample tester for this problem.

We’ll use this to identify sub-intervals of [N] where p is monotone

vclose to

…can we efficiently learn close-to-monotone distributions?

Second ingredient: agnostically learning monotone distributions

Consider the following “agnostic learning” problem:

Algorithm gets samples from unknown distribution p over [N] that is -close to monotone.

Goal: output hypothesis distribution h such that

If opt=0, this is the original “learn a monotone distribution” problem

Want to handle general case as efficiently as opt=0 case

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.


Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.


Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples.

Semi-

The [Birge87] monotone distribution learner does the job.

We will take , , so versus doesn’t matter.

The learning algorithm: first phase


2. Run testers on then etc., until first time both say “no” at Mark and continue.

invocations of tester in total

(Alternative: use binary search: invocations of tester in total.)

…

The algorithm

2. Run testers on then etc., until first time both say “no” at

Mark and continue.

Each time an interval is marked,

• the block of unmarked intervals right before it is close-to-monotone; call this a superinterval

• (at least) one of the k peaks/valleys of p is “used up”

…

The learning algorithm: second phase

After this step, [N] is partitioned into • superintervals each -close to monotone• “marked” intervals, each of weight

Rest of algorithm:

3. Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for

4. Output final hypothesis

Analysis of the algorithm

Sample complexity:

• runs of tester: each uses samples

• runs of semi-agnostic monotone learner: each uses

samples.

Error rate:

• error from marked intervals

• total error from estimating ’s with ’s

• total error from scaling factors

I owe you a tester

Theorem: There is a -sample tester for this problem.




The testing algorithm

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

Completeness: p monotone increasing test passes w.h.p.

average value of

over [a,b]

Soundness

Soundness lemma: If is k-modal and have

then is -close to monotone increasing.

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.

Correcting a peak of p

Lemma: If is k-modal and have


Draw a line at height such that

(mass of “hill” above line) =

(missing mass of “valley” below line):

Consider a peak of p:

Correct the peak by bulldozing the hill into the valley:

Why it works

ncorrection

Lemma: If is k-modal and have


So and

so so

Summary

Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N].

Upper bounds pretty close to lower bounds for these problems.

• Testing is easier than learning

• Learning algorithms have a testing component

Future work

More efficient algorithms for restricted classes of -modal distributions?

• [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of

special type of unimodal distribution: “Poisson Binomial Distribution”

Thank you

Key ingredient: oblivious decomposition

Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets.

… …

Flattening a monotone distributionusing the oblivious decomposition

Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition:

Lemma: [B87] For any monotone decreasing distribution , have

… …

true pdf

flattened version

…

…

Learning monotone distributionsusing oblivious decomposition [B87]

Reduce

• -• View as arbitrary distribution over -element set:

Algorithm:• Draw samples from• Output hypothesis is the flattened empirical distribution

learning monotone distributions over to accuracy

learning arbitrary distributions over to accuracy

Analysis:

Testing monotone distributionsusing oblivious decomposition

: known monotone distribution over

: unknown monotone distribution over

: known distribution over

: unknown distribution over

But, can do better by using oblivious decomposition directly:

testing equality of monotone distributions over to accuracy

testing equality of arbitrary distributions over to accuracy

Using [BFFKRW02], get -sample testing algorithm

Can use learning algorithm to get -sample algorithm for testing problem.

Can show lower bound for any tester.

[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution

Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in...

Documents

Transcript of Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in...