Distribution-Independent PAC Learning of …Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with...
Transcript of Distribution-Independent PAC Learning of …Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with...
Distribution-Independent PAC Learning of Halfspaces with Massart Noise
Themis Gouleakis (MPI)
Ilias Diakonikolas (UW Madison)
Christos Tzamos (UW Madison)
EXPLAINING THE TITLE
Main Result: First computationally efficient algorithm for learning halfspaces in
the distribution-independent PAC model with Massart noise.
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 1
HALFSPACES
Class of functions such that
where
• Also known as: Linear Threshold Functions, Perceptrons, Linear Separators, Threshold Gates, Weighted Voting Games, …
• Extensively studied in ML since [Rosenblatt’58]
+
+
++
+ +
-- -
--
-
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 2
(DISTRIBUTION-INDEPENDENT) PAC LEARNING
: known class of functions
• Input: multiset of IID labeled examples from distribution such that:, where is fixed but arbitrary, and
for some fixed unknown target concept
• Goal: find hypothesis minimizing
samples
Unknown
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 3
f : Rd ! {±1}
(DISTRIBUTION-INDEPENDENT) PAC LEARNING WITH MASSART NOISE
: known class of functions
• Input: multiset of IID labeled examples from distribution such that:, where is fixed but arbitrary, and
where
for some fixed unknown target concept
• Goal: find hypothesis minimizing
Unknown samples
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 4
f : Rd ! {±1}
PAC LEARNING WITH OTHER NOISE
Massart Noise “in between” Random Classification Noise and Agnostic Model:• Random Classification Noise (RCN) [Angluin-Laird’88]:
– Special case of Massart noise: For all , we have that• Agnostic Model [Haussler’92, Kearns-Shapire-Sellie’94]:
– Adversary can flip arbitrary fraction of the labels:
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 5
RCNNoise Rate exactly η
MassartNoise Rate at most η
AgnosticArbitrary η fraction
LEARNING HALFSPACES WITH NOISE: PRIOR WORK
Sample Complexity Well-Understood for Learning Halfspaces in all these models.
Fact: samples suffice to achieve misclassification error
Computational Complexity• Halfspaces efficiently learnable in realizable PAC model
– [e.g., Maass-Turan’94].
• Polynomial-time algorithm for learning halfspaces with RCN
– [Blum-Frieze-Kannan-Vempala’96]
• Learning Halfspaces with Massart Noise
• Weak agnostic learning of LTFs is computationally intractable
– [Guruswami-Raghevendra’06, Feldman et al.’06, Daniely’16]
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 6
LEARNING HALFSPACES WITH MASSART NOISE: OPEN
Malicious misclassification noise [Sloan’88, Rivest-Sloan’94] (equivalent to Massart).
Open Problem [Sloan’88, Cohen’97, Blum’03]Is there a polynomial-time algorithm with non-trivial error for halfspaces?
(Or even for more restricted concept classes?)
[A. Blum, FOCS’03 Tutorial]: “Given labeled examples from an unknown Boolean disjunction, corrupted with 1% Massart
noise, can we efficiently find a hypothesis that achieves misclassification error 49%?”
No progress in distribution-free setting.
Efficient algorithms when marginal is uniform on unit sphere (line of work started by [Awasthi-Balcan-Haghtalab-Urner’15])
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 7
Main Theorem: There is an efficient algorithm that learns halfspaces on in the distribution-independent PAC model with Massart noise. Specifically, the algorithm outputs a hypothesis h with misclassification error
where is the upper bound on the Massart noise rate, and runs in time
MAIN ALGORITHMIC RESULT
Remarks:• Hypothesis is a decision-list of halfspaces.• Misclassification error is , as opposed to• First non-trivial guarantee in sub-exponential time.
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 8
First efficient algorithm for learning halfspaces with Massart noise.
INTUITION: LARGE MARGIN CASE
• Realizable Case:(Perceptron =) SGD on
• Random Classification Noise:SGD on
for
In both cases:
++
+++
---
--
--Target vector with
Marginal satisfies
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 9
LARGE MARGIN CASE: MASSART NOISE
Lemma 1: No convex surrogate works.
But…
Lemma 2: Let be the minimizer of
for There exists such that has: • and
•
++
--+
--+
+-
--+
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 10
z
SUMMARY OF APPROACH
Large-Margin Case:
• There exists convex surrogate with non-trivial error on unknown subset S.• Can algorithmically identify S using samples.• Use convex surrogate hypothesis on S. • Iterate on complement.
General Case: Reduce to Large Margin Case
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 11
Lemma 2: Let minimizer of for There exists such that has: • and• +
-
CONCLUSIONS AND OPEN PROBLEMS
• First efficient algorithm with non-trivial error guarantees for distribution-independent PAC learning of halfspaces with Massart noise.
• Misclassification error where is an upper bound on the noise rate.
Open Questions:
• Error ?• Other models of robustness?
Thank you! Questions?
Diakonikolas, Gouleakis, Tzamos Learning Halfspaces with Massart Noise 12
Poster #226: 5-7 PM TodayEast Exhibition Hall B + C