Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest...

57
Large-Scale Nearest Neighbor Classification with Statistical Guarantees Guang Cheng Big Data Theory Lab Department of Statistics Purdue University Joint Work with Xingye Qiao and Jiexin Duan July 3, 2018 Guang Cheng Big Data Theory Lab@Purdue 1 / 48

Transcript of Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest...

Page 1: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Large-Scale Nearest Neighbor Classification withStatistical Guarantees

Guang Cheng

Big Data Theory LabDepartment of Statistics

Purdue University

Joint Work with Xingye Qiao and Jiexin Duan

July 3, 2018

Guang Cheng Big Data Theory Lab@Purdue 1 / 48

Page 2: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

SUSY Data Set (UCI Machine Learning Repository)

Size: 5,000,000 particles from the accelerator

Predictor variables: 18 (properties of the particles)

Goal – Classification:

distinguish between a signal process and a background process

Guang Cheng Big Data Theory Lab@Purdue 2 / 48

Page 3: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Nearest Neighbor Classification

The kNN classifier predicts the class of x ∈ Rd to be the mostfrequent class of its k nearest neighbors (Euclidean distance).

Guang Cheng Big Data Theory Lab@Purdue 3 / 48

Page 4: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Computational/Space Challenges for kNN Classifiers

Time complexity: O(dN + Nlog(N))dN: computing distances from the query point to all Nobservations in Rd

Nlog(N): sorting N distances and selecting the k nearestdistances

Space complexity: O(dN)Here, N is the size of the entire dataset

Guang Cheng Big Data Theory Lab@Purdue 4 / 48

Page 5: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

How to Conquer ”Big Data?”

Guang Cheng Big Data Theory Lab@Purdue 5 / 48

Page 6: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

If We Have A Supercomputer...

Train the total data at one time: oracle kNN

Figure: I’m Pac-Superman!

Guang Cheng Big Data Theory Lab@Purdue 6 / 48

Page 7: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Superman

To build a Pac-superman for oracle-kNN, we need:

Expensive super-computer

Operation system, e.g. Linux

Statistical software designed for big data, e.g. Spark, Hadoop

Write complicated algorithms (e.g. MapReduce) with limitedextendibility to other classification methods

Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Page 8: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Superman

To build a Pac-superman for oracle-kNN, we need:

Expensive super-computer

Operation system, e.g. Linux

Statistical software designed for big data, e.g. Spark, Hadoop

Write complicated algorithms (e.g. MapReduce) with limitedextendibility to other classification methods

Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Page 9: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Superman

To build a Pac-superman for oracle-kNN, we need:

Expensive super-computer

Operation system, e.g. Linux

Statistical software designed for big data, e.g. Spark, Hadoop

Write complicated algorithms (e.g. MapReduce) with limitedextendibility to other classification methods

Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Page 10: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Superman

To build a Pac-superman for oracle-kNN, we need:

Expensive super-computer

Operation system, e.g. Linux

Statistical software designed for big data, e.g. Spark, Hadoop

Write complicated algorithms (e.g. MapReduce) with limitedextendibility to other classification methods

Guang Cheng Big Data Theory Lab@Purdue 7 / 48

Page 11: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Machine Learning Literature

In machine learning area, there are some nearest neighbor

algorithms for big data already

Muja&Lowe(2014) designed an algorithm to search nearest

neighbors for big data

Comments: complicated; without statistical gurantee; lack of

extendibility

We hope to design an algorithm framework:

Easy to implement

Strong statistical guarantee

More generic and expandable (e.g. easy to be applied to other

classifiers (decision trees, SVM, etc)

Guang Cheng Big Data Theory Lab@Purdue 8 / 48

Page 12: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

If We Don’t Have A Supercomputer...

What can we do?

Guang Cheng Big Data Theory Lab@Purdue 9 / 48

Page 13: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

If We Don’t Have A Supercomputer...

What about splitting the data?

Guang Cheng Big Data Theory Lab@Purdue 10 / 48

Page 14: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Divide & Conquer (D&C) Framework

Goal: deliver different insights from regression problems.

Guang Cheng Big Data Theory Lab@Purdue 11 / 48

Page 15: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Big-kNN for SUSY Data

Oracle-kNN: kNN trained by the entire data

Number of subsets: s = Nγ , γ = 0.1, 0.2, . . . 0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.29

0.30

0.31

0.32

0.33

gamma

risk

(err

or r

ate)

Big−kNN Oracle−kNN

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

050

010

0015

00

gamma

spee

dup

Big−kNN

Figure: Risk and Speedup for Big-kNN.

Risk of classifier φ: R(φ) = P(φ(X ) 6= Y )

Speedup: running time ratio between Oracle-kNN & Big-kNN

Guang Cheng Big Data Theory Lab@Purdue 12 / 48

Page 16: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Batman (Big Data)

Guang Cheng Big Data Theory Lab@Purdue 13 / 48

Page 17: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

We need a statistical understanding of Big-kNN....

Guang Cheng Big Data Theory Lab@Purdue 14 / 48

Page 18: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Weighted Nearest Neighbor Classifiers (WNN)

Definition: WNN

In a dataset with size n, the WNN classifier assigns a weight wni

on the i-th neighbor of x:

φwnn (x) = 1

{n∑

i=1

wni1{Y(i) = 1} ≥ 1

2

}s.t.

n∑i=1

wni = 1,

When wni = k−11{1 ≤ i ≤ k}, WNN reduces to kNN.

Guang Cheng Big Data Theory Lab@Purdue 15 / 48

Page 19: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

D&C Construction of WNN

Construction of Big-WNN

In a big data set with size N = sn, the Big-WNN is constructed as:

φBign,s (x) = 1

s−1s∑

j=1

φ(j)n (x) ≥ 1/2

,

where φ(j)n (x) is the local WNN in j-th subset.

Guang Cheng Big Data Theory Lab@Purdue 16 / 48

Page 20: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Statistical Guarantees: Classification Accuracy & Stability

Primary criterion: accuracy

Regret=Expected Risk−Bayes Risk= ED

[R(φn)

]− R(φBayes)

A small value of regret is preferred

Secondary criterion: stability (Sun et al, 2016, JASA)

Definition: Classification Instability (CIS)

Define classification instability of a classification procedure Ψ as

CIS(Ψ) = ED1,D2

[PX

(φn1(X ) 6= φn2(X )

)],

where φn1 and φn2 are the classifiers trained from the sameclassification procedure Ψ based on D1 and D2 with the samedistribution.

A small value of CIS is preferred

Guang Cheng Big Data Theory Lab@Purdue 17 / 48

Page 21: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Statistical Guarantees: Classification Accuracy & Stability

Primary criterion: accuracy

Regret=Expected Risk−Bayes Risk= ED

[R(φn)

]− R(φBayes)

A small value of regret is preferred

Secondary criterion: stability (Sun et al, 2016, JASA)

Definition: Classification Instability (CIS)

Define classification instability of a classification procedure Ψ as

CIS(Ψ) = ED1,D2

[PX

(φn1(X ) 6= φn2(X )

)],

where φn1 and φn2 are the classifiers trained from the sameclassification procedure Ψ based on D1 and D2 with the samedistribution.

A small value of CIS is preferred

Guang Cheng Big Data Theory Lab@Purdue 17 / 48

Page 22: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret of Big-WNN

Theorem (Regret of Big-WNN)

Under regularity assumptions, with s upper bounded by the subsetsize n, we have as n, s →∞,

Regret (Big-WNN) ≈ B1s−1

n∑i=1

w2ni + B2

( n∑i=1

αiwni

n2/d

)2,

where αi = i1+2d − (i − 1)1+

2d , wni are the local weights. The

constants B1 and B2 are distribution-dependent quantities.

Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Page 23: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret of Big-WNN

Theorem (Regret of Big-WNN)

Under regularity assumptions, with s upper bounded by the subsetsize n, we have as n, s →∞,

Regret (Big-WNN) ≈ B1s−1

n∑i=1

w2ni + B2

( n∑i=1

αiwni

n2/d

)2,

Variance part

where αi = i1+2d − (i − 1)1+

2d , wni are the local weights. The

constants B1 and B2 are distribution-dependent quantities.

Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Page 24: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret of Big-WNN

Theorem (Regret of Big-WNN)

Under regularity assumptions, with s upper bounded by the subsetsize n, we have as n, s →∞,

Regret (Big-WNN) ≈ B1s−1

n∑i=1

w2ni + B2

( n∑i=1

αiwni

n2/d

)2,

Variance part Bias part

where αi = i1+2d − (i − 1)1+

2d , wni are the local weights. The

constants B1 and B2 are distribution-dependent quantities.

Guang Cheng Big Data Theory Lab@Purdue 18 / 48

Page 25: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Round 1

Guang Cheng Big Data Theory Lab@Purdue 19 / 48

Page 26: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio of Big-WNN and Oracle-WNN)

Given an Oracle-WNN classifier, we can always design a Big-WNNby adjusting its weights according to those in the oracle version s.t.

Regret (Big-WNN)

Regret (Oracle-WNN)→ Q,

where Q = (π2 )4

d+4 and 1 < Q < 2.

E.g., in Big-kNN case, we set k = b(π2 )d

d+4 kO

s c given the

oracle kO . We need a multiplicative constant (π2 )d

d+4 !

We name Q the Majority Voting (MV) Constant.

Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Page 27: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio of Big-WNN and Oracle-WNN)

Given an Oracle-WNN classifier, we can always design a Big-WNNby adjusting its weights according to those in the oracle version s.t.

Regret (Big-WNN)

Regret (Oracle-WNN)→ Q,

where Q = (π2 )4

d+4 and 1 < Q < 2.

E.g., in Big-kNN case, we set k = b(π2 )d

d+4 kO

s c given the

oracle kO . We need a multiplicative constant (π2 )d

d+4 !

We name Q the Majority Voting (MV) Constant.

Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Page 28: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio of Big-WNN and Oracle-WNN)

Given an Oracle-WNN classifier, we can always design a Big-WNNby adjusting its weights according to those in the oracle version s.t.

Regret (Big-WNN)

Regret (Oracle-WNN)→ Q,

where Q = (π2 )4

d+4 and 1 < Q < 2.

E.g., in Big-kNN case, we set k = b(π2 )d

d+4 kO

s c given the

oracle kO . We need a multiplicative constant (π2 )d

d+4 !

We name Q the Majority Voting (MV) Constant.

Guang Cheng Big Data Theory Lab@Purdue 20 / 48

Page 29: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Majority Voting Constant

MV constant monotonically decreases to one as d increases

0 20 40 60 80 100

1.0

1.1

1.2

1.3

1.4

d

Q

For example:

d=1, Q=1.44

d=2, Q=1.35

d=5, Q=1.22

d=10, Q=1.14

d=20, Q=1.08

d=50, Q=1.03

d=100, Q=1.02

Guang Cheng Big Data Theory Lab@Purdue 21 / 48

Page 30: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

How to Compensate Statistical Accuracy Loss, i.e., Q > 1

Why is there statistical accuracy loss?

Guang Cheng Big Data Theory Lab@Purdue 22 / 48

Page 31: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Statistical Accuracy Loss due to Majority Voting

Accuracy loss due to the transformation from(continuous) percentage to (discrete) 0-1 label

Guang Cheng Big Data Theory Lab@Purdue 23 / 48

Page 32: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Majority Voting/Accuracy Loss Once in Oracle Classifier

Only one majority voting in oracle classifier

Figure: I’m Pac-Superman!

Guang Cheng Big Data Theory Lab@Purdue 24 / 48

Page 33: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Majority Voting/Accuracy Loss Twice in D&C Framework

Guang Cheng Big Data Theory Lab@Purdue 25 / 48

Page 34: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Next Step....

Is it possible to apply majority votingonce in D&C?

Guang Cheng Big Data Theory Lab@Purdue 26 / 48

Page 35: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

A Continuous Version of D&C Framework

Guang Cheng Big Data Theory Lab@Purdue 27 / 48

Page 36: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Only Loss Once in the Continuous Version

Guang Cheng Big Data Theory Lab@Purdue 28 / 48

Page 37: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Construction of Pac-Batman (Continuous Version)

Guang Cheng Big Data Theory Lab@Purdue 29 / 48

Page 38: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Continuous Version of Big-WNN (C-Big-WNN)

Definition: C-Big-WNN

In a data set with size N = sn, the C-Big-WNN is constructed as:

φCBign,s (x) = 1

1

s

s∑j=1

S(j)n (x) ≥ 1/2

,where S

(j)n (x) =

∑ni=1 wni ,jY(i),j(x) is the weighted average for

nearest neighbors of query point x in j-th subset.

Guang Cheng Big Data Theory Lab@Purdue 30 / 48

Page 39: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Continuous Version of Big-WNN (C-Big-WNN)

Definition: C-Big-WNN

In a data set with size N = sn, the C-Big-WNN is constructed as:

φCBign,s (x) = 1

1

s

s∑j=1

S(j)n (x) ≥ 1/2

,

Step 1:Percentage calculation

where S(j)n (x) =

∑ni=1 wni ,jY(i),j(x) is the weighted average for

nearest neighbors of query point x in j-th subset.

Guang Cheng Big Data Theory Lab@Purdue 30 / 48

Page 40: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Continuous Version of Big-WNN (C-Big-WNN)

Definition: C-Big-WNN

In a data set with size N = sn, the C-Big-WNN is constructed as:

φCBign,s (x) = 1

1

s

s∑j=1

S(j)n (x) ≥ 1/2

,

Step 1:Percentage calculation

Step 2:Majority voting

where S(j)n (x) =

∑ni=1 wni ,jY(i),j(x) is the weighted average for

nearest neighbors of query point x in j-th subset.

Guang Cheng Big Data Theory Lab@Purdue 30 / 48

Page 41: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

C-Big-kNN for SUSY Data

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.29

0.30

0.31

0.32

0.33

gamma

risk

(err

or r

ate)

Big−kNN C−Big−kNN Oracle−kNN

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

050

010

0015

00

gammasp

eedu

pBig−kNN C−Big−kNN

Figure: Risk and Speedup for Big-kNN and C-Big-kNN.

Guang Cheng Big Data Theory Lab@Purdue 31 / 48

Page 42: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Round 2

Guang Cheng Big Data Theory Lab@Purdue 32 / 48

Page 43: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN)

Given an Oracle-WNN, we can always design a C-Big-WNN byadjusting its weights according to those in the oracle version s.t.

Regret (C-Big-WNN)

Regret (Oracle-WNN)→ 1,

E.g., in the C-Big-kNN case, we can set k = bkO

s c given the

oracle kO . There is no multiplicative constant!

Note that this theorem can be directly applied to theoptimally weighted NN (Samworth, 2012, AoS), i.e., OWNN,whose weights are optimized to lead to the minimal regret.

Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Page 44: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN)

Given an Oracle-WNN, we can always design a C-Big-WNN byadjusting its weights according to those in the oracle version s.t.

Regret (C-Big-WNN)

Regret (Oracle-WNN)→ 1,

E.g., in the C-Big-kNN case, we can set k = bkO

s c given the

oracle kO . There is no multiplicative constant!

Note that this theorem can be directly applied to theoptimally weighted NN (Samworth, 2012, AoS), i.e., OWNN,whose weights are optimized to lead to the minimal regret.

Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Page 45: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Asymptotic Regret Comparison

Theorem (Regret Ratio between C-Big-WNN and Oracle-WNN)

Given an Oracle-WNN, we can always design a C-Big-WNN byadjusting its weights according to those in the oracle version s.t.

Regret (C-Big-WNN)

Regret (Oracle-WNN)→ 1,

E.g., in the C-Big-kNN case, we can set k = bkO

s c given the

oracle kO . There is no multiplicative constant!

Note that this theorem can be directly applied to theoptimally weighted NN (Samworth, 2012, AoS), i.e., OWNN,whose weights are optimized to lead to the minimal regret.

Guang Cheng Big Data Theory Lab@Purdue 33 / 48

Page 46: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Practical Consideration: Speed vs Accuracy

It is time consuming to apply OWNN in each subset although

it leads to the minimal regret

Rather, it is more easier and faster to implement kNN in each

subset (with accuracy loss, though)

What is the statistical price (in terms of accuracy and

stability) in order to trade accuracy with speed?

Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Page 47: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Practical Consideration: Speed vs Accuracy

It is time consuming to apply OWNN in each subset although

it leads to the minimal regret

Rather, it is more easier and faster to implement kNN in each

subset (with accuracy loss, though)

What is the statistical price (in terms of accuracy and

stability) in order to trade accuracy with speed?

Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Page 48: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Practical Consideration: Speed vs Accuracy

It is time consuming to apply OWNN in each subset although

it leads to the minimal regret

Rather, it is more easier and faster to implement kNN in each

subset (with accuracy loss, though)

What is the statistical price (in terms of accuracy and

stability) in order to trade accuracy with speed?

Guang Cheng Big Data Theory Lab@Purdue 34 / 48

Page 49: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Statistical Price

Theorem (Statistical Price)

We can design an optimal C-Big-kNN by setting its weights ask = bkO,opt

s c s.t.

Regret(Optimal C-Big-kNN)

Regret(Oracle-OWNN)→ Q ′,

CIS(Optimal C-Big-kNN)

CIS(Oracle-OWNN)→√Q ′,

where Q ′ = 2−4/d+4(d+4d+2)(2d+4)/(d+4) and 1 < Q ′ < 2. Here,

kO,opt minimizes the regret of Oracle-kNN.

Interestingly, both Q and Q ′ only depend on data dimension!

Guang Cheng Big Data Theory Lab@Purdue 35 / 48

Page 50: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Statistical Price

Q ′ converges to one as d grows in a unimodal way

0 20 40 60 80 100

0.98

1.00

1.02

1.04

1.06

1.08

1.10

1.12

d

Q_p

rime

For example:

d=1, Q=1.06

d=2, Q=1.08

d=5, Q=1.08

d=10, Q=1.07

d=20, Q=1.04

d=50, Q=1.02

d=100, Q=1.01

The worst dimension is d∗ = 4 =⇒ Q∗ = 1.089.

Guang Cheng Big Data Theory Lab@Purdue 36 / 48

Page 51: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Simulation Analysis – kNN

Consider the classification problem for Big-kNN and C-Big-kNN:

Sample size: N = 27, 000

Dimensions: d = 6, 8

P0 ∼ N(0d , Id) and P1 ∼ N( 2√d

1d , Id)

Prior class probability: π1 = Pr(Y = 1) = 1/3

Number of neighbors in Oracle-kNN: kO = N0.7

Number of subsamples in D&C: s = Nγ , γ = 0.1, 0.2, . . . 0.8

Number of neighbors in Big-kNN: kd = b(π2 )d

d+4 kO

s c

Number of neighbors in C-Big-kNN: kc = bkO

s c

Guang Cheng Big Data Theory Lab@Purdue 37 / 48

Page 52: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Simulation Analysis – Empirical Risk

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.14

00.

145

0.15

00.

155

0.16

00.

165

0.17

0

gamma

risk

● ● ● ●

●●

Big−kNNC−Big−kNN

Oracle−kNNBayes

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.14

0.15

0.16

0.17

0.18

gamma

risk

● ●● ● ●

Big−kNNC−Big−kNN

Oracle−kNNBayes

Figure: Empirical Risk (Testing Error). Left/Right: d = 6/8.

Guang Cheng Big Data Theory Lab@Purdue 38 / 48

Page 53: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Simulation Analysis – Running Time

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

020

040

060

080

010

00

gamma

time

●● ● ● ● ●

●Big−kNN C−Big−kNN Oracle−kNN

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

020

040

060

080

010

00

gammatim

e

●● ● ● ● ●

●Big−kNN C−Big−kNN Oracle−kNN

Figure: Running Time. Left/Right: d = 6/8.

Guang Cheng Big Data Theory Lab@Purdue 39 / 48

Page 54: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Simulation Analysis – Empirical Regret Ratio

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

1.0

1.5

2.0

2.5

3.0

3.5

4.0

gamma

Q

●● ●

●Big−kNN C−Big−kNN

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

1.0

1.5

2.0

2.5

3.0

3.5

gamma

Q

● ●●

● ●

●Big−kNN C−Big−kNN

Figure: Empirical Ratio of Regret. Left/Right: d = 6/8.

Q: Regret(Big-kNN)/Regret(Oracle-kNN) orRegret(C-Big-kNN)/Regret(Oracle-kNN)

Guang Cheng Big Data Theory Lab@Purdue 40 / 48

Page 55: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Simulation Analysis – Empirical CIS

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.00

80.

009

0.01

00.

011

0.01

2

gamma

cis

●Big−kNN C−Big−kNN Oracle−kNN

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.00

90.

010

0.01

10.

012

0.01

3

gammaci

s

●●

●●

●Big−kNN C−Big−kNN Oracle−kNN

Figure: Empirical CIS. Left/Right: d = 6/8.

Guang Cheng Big Data Theory Lab@Purdue 41 / 48

Page 56: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Real Data Analysis – UCI Machine Learning Repository

Data Size Dim Big-kNN C-Big-kNN Oracle-kNN Speeduphtru2 17898 8 3.72 3.36 3.34 21.27gisette 6000 5000 12.86 10.77 10.78 12.83musk1 476 166 36.43 33.87 35.71 3.6musk2 6598 166 10.43 9.98 10.14 14.68occup 20560 6 5.09 4.87 5.19 20.31credit 30000 24 21.85 21.37 21.5 22.44SUSY 5000000 18 30.66 30.08 30.01 68.45

Table: Test error (Risk): Big-kNN compared to Oracle-kNN in realdatasets. Best performance is shown in bold-face. The speedup factor isdefined as computing time of Oracle-kNN divided by the time of theslower Big-kNN method. Oracle k = N0.7, number of subsets s = N0.3.

Guang Cheng Big Data Theory Lab@Purdue 42 / 48

Page 57: Large-Scale Nearest Neighbor Classification with Statistical ...chengg/BigNN.pdfLarge-Scale Nearest Neighbor Classi cation with Statistical Guarantees Guang Cheng Big Data Theory Lab

Guang Cheng Big Data Theory Lab@Purdue 48 / 48