Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in...

46
Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis MIT
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in...

Page 1: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Learning and testing k-modal distributions

Rocco A. ServedioColumbia University

Joint work (in progress) with

Ilias DiakonikolasUC Berkeley

Costis DaskalakisMIT

Page 2: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

What this talk is about

Probability distributions over [N] = {1,2,…,N}

Monotone increasing distribution: for all

(Whole talk: “increasing” means “non-decreasing”)

`

1 2 N

Page 3: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

k-modal distributions

k-modal: k peaks and valleys

A 3-modal distribution:

A unimodal distribution: Another one:

Monotone distribution: 0-modal

Page 4: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The learning problem

Target distribution p is an unknown k-modal distribution over [N]

Algorithm gets samples from p

Goal: output a hypothesis h that’s -close to p in total variation distance

Want algorithm that uses few samples &

is computationally efficient.

1 2 N

Page 5: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The testing problem

q is a known k-modal distribution over [N].

Goal: output “yes” w.h.p. if

“no” w.h.p. if

1 2 N

p is an unknown k-modal distribution over [N].

Algorithm gets samples from p.

1 2 N

Page 6: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Please note

Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution.

This problem requires samples, even for k=0.

1 N 1 N

hard to distinguish vs

uniform over random uniform over

Page 7: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Why study these questions?

• k-modal distributions seem natural

• would be nice if k-modal structure were exploitable by efficient learning / testing algorithms

• post hoc justification: solutions exhibit interesting connections between testing and learning

Page 8: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The general case: learning

If we drop k-modal assumption, learning problem becomes:

Learn an arbitrary distribution over [N] to total variation distance

1 N

samples are necessary and sufficient

Page 9: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The general case: testing

q is a known, arbitrary distribution over [N].

Goal: output “yes” if

“no” if

p is an unknown, arbitrary distribution over [N].

Algorithm gets samples from p.

samples are necessary and sufficient [GR00, BFFKRW02, P08]

If we drop k-modal assumption, testing problem becomes:

Page 10: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

This work: main learning result

We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses

samples

and runs in

time.

Close to optimal: -sample lower bound for

any algorithm.

Page 11: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Main testing result

We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses

samples

and runs in time.

Any testing algorithm must use samples.

Testing is easier than learning!

Page 12: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Prior work

k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access)

k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound

We’ll use this algorithm as a

black box in our results

Page 13: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Outline of rest of talk

• Background: some tools

• Learning k-modal distributions

• Testing k-modal distributions

Page 14: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

First tool: Learning monotone distributions

Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size.

[B87b] also gave lower bound for learning a monotone distribution.

Page 15: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality

Theorem: [DKW56] Let be any distribution over with CDF .

Let be empirical estimate of obtained from samples.

Then with probability .

Morally, means you can partition into intervals each of mass under , using samples.

Note:

samples suffice (by easyChernoff bound argument)

true CDF

empirical CDF

Page 16: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Learning k-modal

distributions

Page 17: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The problem

Learn an unknown k-modal distribution over [N].

1 2 N

Page 18: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

What should we shoot for?

Easy lower bound: need samples.

(have to solve monotone-distribution-learning problems over to accuracy )

Want an algorithm that uses roughly this many samples and takes time

Page 19: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The problem, again

Goal: learn an unknown k-modal distribution over [N].

We know how to efficiently learn an unknown monotone distribution…

Would be easy if we knew the k peaks/valleys…

Guessing them exactly: infeasible

Guessing them approximately: not too great either

X X X

Page 20: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

A first approach

Break up [N] into many intervals:

is not monotone for at most k of the intervals

So running monotone distribution learner on each interval will usually give a good answer.

Page 21: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

First approach in more detail

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

(Assumes each point has mass at most or so; heavier points are easy to detect and deal with.)

2. Run monotone distribution learner on each to get

(Actually run it twice: once for increasing, once for decreasing.

Do hypothesis testing to pick one as .)

3. Combine hypotheses in obvious way:

and

Page 22: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Sketch of analysis

and

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

Takes samples

2. Run monotone distribution learner on each to get

Takes samples

3. Combine hypotheses in obvious way:

Total error from k non-monotone intervals

from scaling factors

from estimating ’s with ’s

Page 23: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Improving the approach

came from running monotone distribution learner times rather than just times

If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save…

…this is a property testing problem!

More sophisticated algorithm: two new ingredients.

Page 24: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

First ingredient: testing k-modal distributions for monotonicity

Consider the following property testing problem:

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Algorithm gets samples from unknown k-modal distribution p over [N].

1 n 1 n

hard to distinguish

Note: k-modal promise for p might save us from lower bound…

Page 25: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Efficiently testing k-modal distributions for monotonicity

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Algorithm gets samples from unknown k-modal distribution p over [N].

Theorem: There is a -sample tester for this problem.

We’ll use this to identify sub-intervals of [N] where p is monotone

vclose to

…can we efficiently learn close-to-monotone distributions?

Page 26: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Second ingredient: agnostically learning monotone distributions

Consider the following “agnostic learning” problem:

Algorithm gets samples from unknown distribution p over [N] that is -close to monotone.

Goal: output hypothesis distribution h such that

If opt=0, this is the original “learn a monotone distribution” problem

Want to handle general case as efficiently as opt=0 case

Page 27: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.

Goal: output hypothesis distribution h such that

Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.

Page 28: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

agnostically learning monotone distributions

Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone.

Goal: output hypothesis distribution h such that

Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples.

Semi-

The [Birge87] monotone distribution learner does the job.

We will take , , so versus doesn’t matter.

Page 29: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The learning algorithm: first phase

1. Use [DKW] to divide [N] into intervals & obtain estimates such that

2. Run testers on then etc., until first time both say “no” at Mark and continue.

invocations of tester in total

(Alternative: use binary search: invocations of tester in total.)

Page 30: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The algorithm

2. Run testers on then etc., until first time both say “no” at

Mark and continue.

Each time an interval is marked,

• the block of unmarked intervals right before it is close-to-monotone; call this a superinterval

• (at least) one of the k peaks/valleys of p is “used up”

Page 31: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The learning algorithm: second phase

After this step, [N] is partitioned into • superintervals each -close to monotone• “marked” intervals, each of weight

Rest of algorithm:

3. Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for

4. Output final hypothesis

Page 32: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Analysis of the algorithm

Sample complexity:

• runs of tester: each uses samples

• runs of semi-agnostic monotone learner: each uses

samples.

Error rate:

• error from marked intervals

• total error from estimating ’s with ’s

• total error from scaling factors

Page 33: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

I owe you a tester

Theorem: There is a -sample tester for this problem.

Algorithm gets samples from unknown k-modal distribution p over [N].

Goal: output “yes” w.h.p. if p is monotone increasing

“no” w.h.p. if p is -far from monotone increasing

Page 34: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

The testing algorithm

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

Completeness: p monotone increasing test passes w.h.p.

average value of

over [a,b]

Page 35: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Soundness

Soundness lemma: If is k-modal and have

then is -close to monotone increasing.

Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that

then output “no”; otherwise output “yes”

To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.

Page 36: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Correcting a peak of p

Lemma: If is k-modal and have

then is -close to monotone increasing.

Draw a line at height such that

(mass of “hill” above line) =

(missing mass of “valley” below line):

Consider a peak of p:

Correct the peak by bulldozing the hill into the valley:

Page 37: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Why it works

ncorrection

Lemma: If is k-modal and have

then is -close to monotone increasing.

So and

so so

Page 38: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Summary

Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N].

Upper bounds pretty close to lower bounds for these problems.

• Testing is easier than learning

• Learning algorithms have a testing component

Page 39: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Future work

More efficient algorithms for restricted classes of -modal distributions?

• [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of

special type of unimodal distribution: “Poisson Binomial Distribution”

Page 40: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Thank you

Page 41: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Key ingredient: oblivious decomposition

Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets.

… …

Page 42: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Flattening a monotone distributionusing the oblivious decomposition

Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition:

Lemma: [B87] For any monotone decreasing distribution , have

… …

true pdf

flattened version

Page 43: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Learning monotone distributionsusing oblivious decomposition [B87]

Reduce

• -• View as arbitrary distribution over -element set:

Algorithm:• Draw samples from• Output hypothesis is the flattened empirical distribution

learning monotone distributions over to accuracy

learning arbitrary distributions over to accuracy

Analysis:

Page 44: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

Testing monotone distributionsusing oblivious decomposition

: known monotone distribution over

: unknown monotone distribution over

: known distribution over

: unknown distribution over

But, can do better by using oblivious decomposition directly:

testing equality of monotone distributions over to accuracy

testing equality of arbitrary distributions over to accuracy

Using [BFFKRW02], get -sample testing algorithm

Can use learning algorithm to get -sample algorithm for testing problem.

Can show lower bound for any tester.

Page 45: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.

[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution

Page 46: Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.