Post on 18-Jan-2018
description
Near-Minimax Optimal Learning with Decision Trees
University of Wisconsin-Madison and Rice University
Rob Nowak and Clay Scott
Supported by the NSF and the ONR
nowak@engr.wisc.edu
Basic Problem
Classification: build a decision rule based on labeled training data
Given n training points, how well can we do ?
Smooth Decision Boundaries
Suppose that the Bayes decision boundary behaves locally like a Lipschitz function
Mammen & Tsybakov ‘99
Dyadic Thinking about Classification Trees
recursive dyadic partition
Pruned dyadic partition
Pruned dyadic tree
Dyadic Thinking about Classification Trees
Hierarchical structure facilitates optimization
The Classification Problem
Problem:
Classifiers
The Bayes Classifier:
Minimum Empirical Risk Classifier:
Generalization Error Bounds
Generalization Error Bounds
Generalization Error Bounds
Selecting a good h
Convergence to Bayes Error
Ex. Dyadic Classification Trees
labeled training data Bayes decision boundary complete RDP pruned RDP
Dyadic classification tree
Codes for DCTs
0
1
00
0 01 1 1 1
1
code-lengths:
ex:
code: 0001001111+ 6 bits for leaf labels
Error Bounds for DCTs
Compare with CART:
Rate of Convergence
Suppose that the Bayes decision boundary behaves locally like a Lipschitz function
Mammen & Tsybakov ‘99 C. Scott & RN ‘02
Why too slow ?
because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced
all |T| leaf trees are equally favored
Local Error Bounds in Classification
Spatial Error Decomposition: Mansour & McAllester ‘00
Relative Chernoff Bound
Relative Chernoff Bound
Local Error Bounds in Classification
Bounded Densities
Global vs. Local
Key: local complexity is offset by small volumes!
Local Bounds for DCTs
Unbalanced Tree
J leafsdepth J-1
Global bound:
Local bound:
Convergence to Bayes Error
Mammen & Tsybakov ‘99 C. Scott & RN ‘03
Concluding Remarks
~
data dependent bound
Neural Information Processing Systems 2002, 2003 nowak@engr.wisc.edu