Distribution-Independent PAC Learning of …Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with...

Distribution-Independent PAC Learning of Halfspaces with Massart Noise

Themis Gouleakis (MPI)

Ilias Diakonikolas (UW Madison)

Christos Tzamos (UW Madison)

EXPLAINING THE TITLE

Main Result: First computationally efficient algorithm for learning halfspaces in

the distribution-independent PAC model with Massart noise.

Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 1

HALFSPACES

Class of functions such that

where

• Also known as: Linear Threshold Functions, Perceptrons, Linear Separators, Threshold Gates, Weighted Voting Games, …

• Extensively studied in ML since [Rosenblatt’58]

+

+

++

+ +

-- -

--

-


(DISTRIBUTION-INDEPENDENT) PAC LEARNING

: known class of functions

• Input: multiset of IID labeled examples from distribution such that:, where is fixed but arbitrary, and

for some fixed unknown target concept

• Goal: find hypothesis minimizing

samples

Unknown


f : Rd ! {±1}

(DISTRIBUTION-INDEPENDENT) PAC LEARNING WITH MASSART NOISE

: known class of functions

• Input: multiset of IID labeled examples from distribution such that:, where is fixed but arbitrary, and

where

for some fixed unknown target concept

• Goal: find hypothesis minimizing

Unknown samples


f : Rd ! {±1}

PAC LEARNING WITH OTHER NOISE

Massart Noise “in between” Random Classification Noise and Agnostic Model:• Random Classification Noise (RCN) [Angluin-Laird’88]:

– Special case of Massart noise: For all , we have that• Agnostic Model [Haussler’92, Kearns-Shapire-Sellie’94]:

– Adversary can flip arbitrary fraction of the labels:


RCNNoise Rate exactly η

MassartNoise Rate at most η

AgnosticArbitrary η fraction

LEARNING HALFSPACES WITH NOISE: PRIOR WORK

Sample Complexity Well-Understood for Learning Halfspaces in all these models.

Fact: samples suffice to achieve misclassification error

Computational Complexity• Halfspaces efficiently learnable in realizable PAC model

– [e.g., Maass-Turan’94].

• Polynomial-time algorithm for learning halfspaces with RCN

– [Blum-Frieze-Kannan-Vempala’96]

• Learning Halfspaces with Massart Noise

• Weak agnostic learning of LTFs is computationally intractable

– [Guruswami-Raghevendra’06, Feldman et al.’06, Daniely’16]


LEARNING HALFSPACES WITH MASSART NOISE: OPEN

Malicious misclassification noise [Sloan’88, Rivest-Sloan’94] (equivalent to Massart).

Open Problem [Sloan’88, Cohen’97, Blum’03]Is there a polynomial-time algorithm with non-trivial error for halfspaces?

(Or even for more restricted concept classes?)

[A. Blum, FOCS’03 Tutorial]: “Given labeled examples from an unknown Boolean disjunction, corrupted with 1% Massart

noise, can we efficiently find a hypothesis that achieves misclassification error 49%?”

No progress in distribution-free setting.

Efficient algorithms when marginal is uniform on unit sphere (line of work started by [Awasthi-Balcan-Haghtalab-Urner’15])


Main Theorem: There is an efficient algorithm that learns halfspaces on in the distribution-independent PAC model with Massart noise. Specifically, the algorithm outputs a hypothesis h with misclassification error

where is the upper bound on the Massart noise rate, and runs in time

MAIN ALGORITHMIC RESULT

Remarks:• Hypothesis is a decision-list of halfspaces.• Misclassification error is , as opposed to• First non-trivial guarantee in sub-exponential time.


First efficient algorithm for learning halfspaces with Massart noise.

INTUITION: LARGE MARGIN CASE

• Realizable Case:(Perceptron =) SGD on

• Random Classification Noise:SGD on

for

In both cases:

++

+++

---

--

--Target vector with

Marginal satisfies


LARGE MARGIN CASE: MASSART NOISE

Lemma 1: No convex surrogate works.

But…

Lemma 2: Let be the minimizer of

for There exists such that has: • and

•

++

--+

--+

+-

--+


z

SUMMARY OF APPROACH

Large-Margin Case:

• There exists convex surrogate with non-trivial error on unknown subset S.• Can algorithmically identify S using samples.• Use convex surrogate hypothesis on S. • Iterate on complement.

General Case: Reduce to Large Margin Case


Lemma 2: Let minimizer of for There exists such that has: • and• +

-

CONCLUSIONS AND OPEN PROBLEMS

• First efficient algorithm with non-trivial error guarantees for distribution-independent PAC learning of halfspaces with Massart noise.

• Misclassification error where is an upper bound on the noise rate.

Open Questions:

• Error ?• Other models of robustness?

Thank you! Questions?


Poster #226: 5-7 PM TodayEast Exhibition Hall B + C

Distribution-Independent PAC Learning of …Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with...

Documents

Transcript of Distribution-Independent PAC Learning of …Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with...