Neyman-Pearson Criterion (NPC): A Model Selection...

33
Neyman-Pearson Criterion (NPC): A Model Selection Criterion for Asymmetric Binary Classification Jingyi Jessica Li Department of Statistics University of California, Los Angeles http://jsb.ucla.edu

Transcript of Neyman-Pearson Criterion (NPC): A Model Selection...

Page 1: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Neyman-Pearson Criterion (NPC): A Model Selection

Criterion for Asymmetric Binary Classification

Jingyi Jessica Li

Department of Statistics

University of California, Los Angeles

http://jsb.ucla.edu

Page 2: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection for binary classification: a motivating example

Automated disease diagnosis a binary classification problem

β€’ Features/predictors: ∼ 20K human gene expression levels

β€’ Response: binary disease status: 0 (diseased) and 1 (healthy)

Given a classification method (e.g., logistic regression),

β€’ What subset of genes exhibits the highest diagnostic power?

A model selection problem

1

Page 3: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection for binary classification: a motivating example

Automated disease diagnosis a binary classification problem

β€’ Features/predictors: ∼ 20K human gene expression levels

β€’ Response: binary disease status: 0 (diseased) and 1 (healthy)

Given a classification method (e.g., logistic regression),

β€’ What subset of genes exhibits the highest diagnostic power?

A model selection problem1

Page 4: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Two binary classification paradigms: classical vs. Neyman-Pearson

Paradigm Oracle classifier Practical classifier

Classical Ο†βˆ— = arg minΟ†

R(φ) φ̂ = arg minφ

RΜ‚(Ο†)

Neyman-Pearson Ο†βˆ—Ξ± = arg minR0(Ο†)≀α

R1(Ο†) Ο†Μ‚Ξ± by the NP umbrella algorithm

[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]

where Ξ± is a user-specified upper bound on the type I error

The Neyman-Pearson paradigm is suitable for disease diagnosis

β€’ The two classes have asymmetric importance

β€’ Mispredicting a normal tissue sample as malignant

β‡’ patient anxiety & additional medical costsβ€”type II error R1(Ο†)

β€’ Mispredicting a tumor sample as normal

β‡’ life loss β€”type I error R0(Ο†)

Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο†)

2

Page 5: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Two binary classification paradigms: classical vs. Neyman-Pearson

Paradigm Oracle classifier Practical classifier

Classical Ο†βˆ— = arg minΟ†

R(φ) φ̂ = arg minφ

RΜ‚(Ο†)

Neyman-Pearson Ο†βˆ—Ξ± = arg minR0(Ο†)≀α

R1(Ο†) Ο†Μ‚Ξ± by the NP umbrella algorithm

[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]

where Ξ± is a user-specified upper bound on the type I error

The Neyman-Pearson paradigm is suitable for disease diagnosis

β€’ The two classes have asymmetric importance

β€’ Mispredicting a normal tissue sample as malignant

β‡’ patient anxiety & additional medical costsβ€”type II error R1(Ο†)

β€’ Mispredicting a tumor sample as normal

β‡’ life loss β€”type I error R0(Ο†)

Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο†)

2

Page 6: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Two binary classification paradigms: classical vs. Neyman-Pearson

Paradigm Oracle classifier Practical classifier

Classical Ο†βˆ— = arg minΟ†

R(φ) φ̂ = arg minφ

RΜ‚(Ο†)

Neyman-Pearson Ο†βˆ—Ξ± = arg minR0(Ο†)≀α

R1(Ο†) Ο†Μ‚Ξ± by the NP umbrella algorithm

[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]

where Ξ± is a user-specified upper bound on the type I error

The Neyman-Pearson paradigm is suitable for disease diagnosis

β€’ The two classes have asymmetric importance

β€’ Mispredicting a normal tissue sample as malignant

β‡’ patient anxiety & additional medical costsβ€”type II error R1(Ο†)

β€’ Mispredicting a tumor sample as normal

β‡’ life loss β€”type I error R0(Ο†)

Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο†)

2

Page 7: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Two binary classification paradigms: classical vs. Neyman-Pearson

Paradigm Oracle classifier Practical classifier

Classical Ο†βˆ— = arg minΟ†

R(φ) φ̂ = arg minφ

RΜ‚(Ο†)

Neyman-Pearson Ο†βˆ—Ξ± = arg minR0(Ο†)≀α

R1(Ο†) Ο†Μ‚Ξ± by the NP umbrella algorithm

[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]

where Ξ± is a user-specified upper bound on the type I error

The Neyman-Pearson paradigm is suitable for disease diagnosis

β€’ The two classes have imbalanced sample sizes

R(Ο†) = IP(Y = 0)R0(Ο†) + IP(Y = 1)R1(Ο†)

When IP(Y = 0) = IP(diseased)οΏ½ IP(Y = 1) = IP(healthy),

β€’ The classical oracle classifier may have an excessively large R0(Ο†)

β€’ The NP oracle classifier will have R0(Ο†) ≀ Ξ±

3

Page 8: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Two binary classification paradigms: classical vs. Neyman-Pearson

Paradigm Oracle classifier Practical classifier

Classical Ο†βˆ— = arg minΟ†

R(φ) φ̂ = arg minφ

RΜ‚(Ο†)

Neyman-Pearson Ο†βˆ—Ξ± = arg minR0(Ο†)≀α

R1(Ο†) Ο†Μ‚Ξ± by the NP umbrella algorithm

[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]

where Ξ± is a user-specified upper bound on the type I error

The Neyman-Pearson paradigm is suitable for disease diagnosis

β€’ The two classes have imbalanced sample sizes

R(Ο†) = IP(Y = 0)R0(Ο†) + IP(Y = 1)R1(Ο†)

When IP(Y = 0) = IP(diseased)οΏ½ IP(Y = 1) = IP(healthy),

β€’ The classical oracle classifier may have an excessively large R0(Ο†)

β€’ The NP oracle classifier will have R0(Ο†) ≀ Ξ±

3

Page 9: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Paper and software of the Neyman-Pearson umbrella algorithm

R package nproc

https://CRAN.R-project.org/package=nproc

Email: [email protected]

Page 10: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (population)

β€’ Goal: Develop a model selection criterion to compare models (i.e., feature

subsets) under the Neyman-Pearson (NP) paradigm

β€’ Idea: prediction-based model selection

β€’ Compare two feature subsets based on the type II errors of their

corresponding NP oracle classifiers

in contrast to

β€’ Compare two feature subsets based on the risks of their corresponding

classical oracle classifiers

β€’ Question: Will the model selection results be different under the two

paradigms?

5

Page 11: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (population)

β€’ Goal: Develop a model selection criterion to compare models (i.e., feature

subsets) under the Neyman-Pearson (NP) paradigm

β€’ Idea: prediction-based model selection

β€’ Compare two feature subsets based on the type II errors of their

corresponding NP oracle classifiers

in contrast to

β€’ Compare two feature subsets based on the risks of their corresponding

classical oracle classifiers

β€’ Question: Will the model selection results be different under the two

paradigms?

5

Page 12: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (population)

β€’ Goal: Develop a model selection criterion to compare models (i.e., feature

subsets) under the Neyman-Pearson (NP) paradigm

β€’ Idea: prediction-based model selection

β€’ Compare two feature subsets based on the type II errors of their

corresponding NP oracle classifiers

in contrast to

β€’ Compare two feature subsets based on the risks of their corresponding

classical oracle classifiers

β€’ Question: Will the model selection results be different under the two

paradigms?

5

Page 13: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Prediction-based model selection under the two paradigms (population)

A linear discriminant analysis (LDA) example

Two features X1,X2 ∈ IR with the following class conditional distributions:

X{1} | (Y = 0) ∼ N (βˆ’5, 22) , X{1} | (Y = 1) ∼ N (0, 22) ,

X{2} | (Y = 0) ∼ N (βˆ’5, 22) , X{1} | (Y = 1) ∼ N (1.5, 3.52) .

We would like to select the better feature between the two

β€’ Classical oracle classifiers:

R(Ο†βˆ—{1}

)= 0.106 < R

(Ο†βˆ—{2}

)= 0.113

So X1 is the better feature

β€’ Neyman-Pearson (NP) oracle classifiers:

R1

(Ο†βˆ—Ξ±{1}

)vs. R1

(Ο†βˆ—Ξ±{2}

)β€’ Ξ± = 0.01, R1

(Ο†βˆ—Ξ±{1}

)= 0.431 > R1

(Ο†βˆ—Ξ±{2}

)= 0.299 =β‡’ X2 better

β€’ Ξ± = 0.20, R1

(Ο†βˆ—Ξ±{1}

)= 0.049 < R1

(Ο†βˆ—Ξ±{2}

)= 0.084 =β‡’ X1 better

6

Page 14: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Prediction-based model selection under the two paradigms (population)

A linear discriminant analysis (LDA) example

Two features X1,X2 ∈ IR with the following class conditional distributions:

X{1} | (Y = 0) ∼ N (βˆ’5, 22) , X{1} | (Y = 1) ∼ N (0, 22) ,

X{2} | (Y = 0) ∼ N (βˆ’5, 22) , X{1} | (Y = 1) ∼ N (1.5, 3.52) .

We would like to select the better feature between the two

β€’ Classical oracle classifiers:

R(Ο†βˆ—{1}

)= 0.106 < R

(Ο†βˆ—{2}

)= 0.113

So X1 is the better feature

β€’ Neyman-Pearson (NP) oracle classifiers:

R1

(Ο†βˆ—Ξ±{1}

)vs. R1

(Ο†βˆ—Ξ±{2}

)β€’ Ξ± = 0.01, R1

(Ο†βˆ—Ξ±{1}

)= 0.431 > R1

(Ο†βˆ—Ξ±{2}

)= 0.299 =β‡’ X2 better

β€’ Ξ± = 0.20, R1

(Ο†βˆ—Ξ±{1}

)= 0.049 < R1

(Ο†βˆ—Ξ±{2}

)= 0.084 =β‡’ X1 better

6

Page 15: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Prediction-based model selection under the two paradigms (population)

Special scenarios where prediction-based model selection is the same under the

two paradigms

Lemma 1

X{1} | (Y = 0) ∼ N(¡0

1, (Οƒ01)2), X{1} | (Y = 1) ∼ N

(Β΅1

1, (Οƒ11)2),

X{2} | (Y = 0) ∼ N(¡0

2, (Οƒ02)2), X{2} | (Y = 1) ∼ N

(Β΅1

2, (Οƒ12)2).

Let Ξ± ∈ (0, 1) , Ο†βˆ—Ξ±{1} and Ο†βˆ—Ξ±{2} be the NP oracle classifiers based on X{1} and

X{2} ∈ IR respectively, and Ο†βˆ—{1} and Ο†βˆ—{2} be the corresponding classical oracle

classifiers. Ifσ0

2

Οƒ12

=Οƒ0

1

Οƒ11

,

then

sign(R1

(Ο†βˆ—Ξ±{1}

)βˆ’ R1

(Ο†βˆ—Ξ±{2}

))= sign

(R(Ο†βˆ—{1}

)βˆ’ R

(Ο†βˆ—{2}

)).

for all Ξ±. The reverse statement also holds.

7

Page 16: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Prediction-based model selection under the two paradigms (population)

Special scenarios where prediction-based model selection is the same under the

two paradigms

Lemma 2

Let A1, A2 βŠ† {1, . . . , d} be two index sets. For a random vector X ∈ IRd , let

XA1 and XA2 be sub-vectors of X comprising of coordinates with indexes in A1

and A2 respectively, and assume they follow the class conditional distributions:

XA1 | (Y = 0) ∼ N (¡01,Σ1) , XA1 | (Y = 1) ∼ N (¡1

1,Ξ£1) ,

XA2 | (Y = 0) ∼ N (¡02,Σ2) , XA2 | (Y = 1) ∼ N (¡1

2,Ξ£2) ,

where ¡ij ∈ IR` denotes the mean vector of feature set Aj in class i , and

Ξ£j ∈ IR`Γ—` denotes the variance-covariance matrix of feature set Aj , j = 1, 2,

i = 0, 1.

The selected feature set Aβˆ—Ξ± = A1 or A2 under the NP paradigm is invariant to Ξ±

and is consistent with the selected feature set Aβˆ— under the classical paradigm.

8

Page 17: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (sample)

β€’ Goal: Develop a practical prediction-based model selection criterion under

the NP paradigm

β€’ Idea: Compare two feature subsets based on the estimated type II errors

of their corresponding NP classifiers (constructed by our NP umbrella

algorithm)

β€’ Question: How to construct a β€œgood” estimator of the type II error of an

NP classifier?

Leave out some class 1 data!

9

Page 18: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (sample)

β€’ Goal: Develop a practical prediction-based model selection criterion under

the NP paradigm

β€’ Idea: Compare two feature subsets based on the estimated type II errors

of their corresponding NP classifiers (constructed by our NP umbrella

algorithm)

β€’ Question: How to construct a β€œgood” estimator of the type II error of an

NP classifier?

Leave out some class 1 data!

9

Page 19: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (sample)

β€’ Goal: Develop a practical prediction-based model selection criterion under

the NP paradigm

β€’ Idea: Compare two feature subsets based on the estimated type II errors

of their corresponding NP classifiers (constructed by our NP umbrella

algorithm)

β€’ Question: How to construct a β€œgood” estimator of the type II error of an

NP classifier?

Leave out some class 1 data!

9

Page 20: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Model selection under the Neyman-Pearson paradigm (sample)

β€’ Goal: Develop a practical prediction-based model selection criterion under

the NP paradigm

β€’ Idea: Compare two feature subsets based on the estimated type II errors

of their corresponding NP classifiers (constructed by our NP umbrella

algorithm)

β€’ Question: How to construct a β€œgood” estimator of the type II error of an

NP classifier?

Leave out some class 1 data!

9

Page 21: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

The model selection criterion: NPC (Neyman-Pearson Criterion)

Statistical formulation

Given α, δ ∈ [0, 1], a classification method, and a feature subset

A βŠ† {1, . . . , p}, a practical NP classifier Ο†Μ‚Ξ±A is constructed by the NP

umbrella algorithm. Then the NPC for model A at level Ξ± is defined as

NPCΞ±A := RΜ‚1(Ο†Μ‚Ξ±A)

where RΜ‚1(Ο†) is the estimated type II error of any classifier Ο† on leave-out

class 1 data

β€’ Sample splitting: split training data into three parts

β€’ mixed classes 0 and 1 sample

β€’ left-out class 0 sample

}NP umbrella algorithm

=β‡’ Ο†Μ‚Ξ±A

‒ left-out class 1 sampleφ̂αA=⇒ NPCαA

β€’ Multiple random splits can be used to construct an ensemble estimator

with a smaller variance

10

Page 22: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Theoretical properties of NPC

Concentration of RΜ‚1(Ο†Μ‚Ξ±A) at R1(Ο†βˆ—Ξ±A)

Given reasonable conditions on

(1) the two class conditional distributions of X | (Y = 0) and

X | (Y = 1)

(2) the scoring function of the given classification method

(3) the two class samples sizes

we can show that with probability at least 1βˆ’ Ξ΄β€²

|RΜ‚1(Ο†Μ‚Ξ±A)βˆ’ R1(Ο†βˆ—Ξ±A)| ≀ C(Ξ΄β€²)

where

β€’ R1(Ο†βˆ—Ξ±A) is the population type II error of the method-specific

oracle classifier Ο†βˆ—Ξ±A (the classifier that shares the same scoring

function as the β€˜best’ classical classifier constrained by the

classification method)

β€’ The deviation upper bound C(Ξ΄β€²) increases as Ξ΄β€² decreases; also,

C(Ξ΄β€²) converges to zero as the sample sizes go to infinity11

Page 23: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

A glance at data (Fleischer et al., 2014 Genome Biology):

β€’ 46 (class 1) normal tissues V.S. 239 breast cancer (class 0) tissue

β€’ Methylation levels are measured at 468, 424 genome locations in every

tissue

β€’ After preprocessing and normalization, d = 19, 363οΏ½ n = 285

12

Page 24: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

Use L1 penalized logistic regression to generate a solution path

=β‡’ 41 nested feature sets

Apply model selection criteria:

●●

●●●

●●●●●●

●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●

βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●●●●●●●●●

●●

●●●●●●●

●●●●

●●

●●●●●●

●●●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●

●●●●●●●

●●●●●

●●●●●●

●●●●●

●●●●●●●●●

●● ●

●● ●

●●

●

●●

●●

● ●●

● ●

● ● ●

● ● ●●

● ●

● ● ●●

●

●

● ●●

● ●●

● ●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●

●●●●

●●

●

●●

●

●

●●

●●

●●●●●●●

●●●●●●●●●●●●●●

●

●

● ●●

●●

●●

●

●

●

●

●

●

●

● ●

●

●

●

● ● ● ● ● ●

● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ ZNF646

Remove: ZNF646 . . .

13

Page 25: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●

●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●

βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●●●●●●●●●

●●

●●●●

●●●

●●●●

●●

●●●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●

●●●●

●

●●●

●●

●●●●●

●●●●

●●●

●●

●●●●●●●●●

●

● ●

●● ●

●●

●

●●

●

●

●● ●

● ●

● ● ●

● ●

●

●

●

●

●

●●

● ●

● ● ● ●●

●

●● ●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●

●

●

●●●

●

●

●

●●

●●●●●●●●●

●●●●●●●●●●●●●●

●

●

● ●● ●

●

●

●

●

●●

●

●

●

●

● ●

● ●●

●●

● ●●

●

● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ ERAP1

- ZNF646

Remove: ZNF646, ERAP1 . . .

14

Page 26: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●

β—β—β—β—β—β—β—β—β—β—β—β—β—β—β—βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●

●●●●●●●●●

●●

●●●●

●●●

●●●

●

●●●●●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●

●

●●●

●●●●●

●●●

●●●●●●●●

●●●

●●●●●●

●●●

●

● ●

●● ●

●● ●

●

●

●●

●

●●

●●

● ● ●

● ●

●

●

●

●

●●

●

●●

●●

● ●● ●

●●

●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●●

●●

●

●

●●

●●●

●●●●●

●●●●

●●●●●●●●●●●●●●

●

●

● ●● ●

●

● ●

● ●

●

●

●●

●●

●

● ●●

●●

● ●●

●● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ LOC121952 (METTL21EP)

- ZNF646, ERAP1

Remove: ZNF646, ERAP1, LOC121952 . . .

15

Page 27: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●

β—β—β—β—β—β—β—β—β—β—β—β—β—β—β—βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●

●●●●●●●●

●●

●●●

●●●●

●●●

●

●●●●●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●●

●

●●

●●●●●

●●●

●●●●●●●●

●●●

●●●●●●

●●●

●

● ●

●● ●

●● ● ●

●

●●

●●

●● ●

● ● ●

● ● ●

●

●

●

●●

●

●●

●●

● ●● ●

●●

●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●●●

●

●

●

●●

●

●●

●●●●●●●

●●●●●●●

●●●●●●●●●

●

●

● ●● ●

●

● ● ●

●

●

●

●

●

●

● ●

● ●●

● ● ● ●●

●● ● ● ● ●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ GEMIN4

- ZNF646, ERAP1, LOC121952 (METTL21EP)

Remove: ZNF646, ERAP1, LOC121952, GEMIN4 . . .

16

Page 28: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●

β—β—β—β—β—β—β—β—β—β—β—β—β—β—β—βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●●●

●●●●●●

●●

●●●●●●●

●●●

●

●●●●●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●

●●

●

● ●

●● ●

●● ● ● ●

●

● ●●

●● ●

● ●●

● ●●

●

●

●●

● ●● ●

●● ● ●

● ● ●● ●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●●●●●●●●

●●●●●

●●●●

●●●●●●●●●●●●●●●●●

●

●

● ●● ●

●

● ● ● ● ●

●

●●

●●

●

● ●

●● ● ●

● ● ●● ● ● ●

●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ BATF

- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4

Remove: ZNF646, ERAP1, LOC121952, GEMIN4, BATF . . .

17

Page 29: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●●

●●●●●

●●

●●●●●●●

β—β—β—β—β—β—β—β—β—β—β—β—β—β—β—βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●●●●

●●●

●●

●●

●●●●●●●

●●●

●

●●●●●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●●●●●

●●●●●

●●●●●●

●

●●●●●●●

●●●●●●●

●●

●

● ●

●● ●

●● ● ● ●

● ●●

●

● ● ●

●●

●● ●

●●

●

●●

● ●● ●

●● ● ●

● ● ●● ●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●●●●●●

●●

●●●

●●●●●●

●●●●●●●●●●●●●●●●●

●

●

● ●● ●

●

● ● ● ● ● ●

● ●

● ● ●

●●

●● ● ●

● ● ●● ● ● ●

●

● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC+ MIR21

- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4, BATF

Remove: ZNF646, ERAP1, LOC121952, GEMIN4, BATF, MIR21 . . .

18

Page 30: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

●●

●●●

●●●●●●●●●●●●

●●●●●●●●●

β—β—β—β—β—β—β—β—β—β—β—β—β—β—β—βˆ’250

βˆ’200

βˆ’150

βˆ’100

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

AIC●●

●●●

●●●●●●●●●

●●●

●●●●

●●●●●

●●●●

●●●●

●●●●●●

●

βˆ’180

βˆ’150

βˆ’120

βˆ’90

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

BIC

●●●●●

●●●●●●●●

●

●●●

●●●●●●

●●●●●

●●●

●●●●

●●●●●

●

● ●

●● ●

●● ● ● ●

● ● ●

●

●● ●

●● ●

●●

●●

●

●●

●

●●

●

●●

● ●●

●●

●●

●

0.00

0.01

0.02

0.03

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

Classical Criterion

●

●●●●●

●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●

●

●

● ●● ●

●

● ● ● ● ● ● ●●

●●

●

● ● ●●

●● ● ●

●● ● ● ●

●● ● ● ● ● ● ● ● ●

●

0.0

0.1

0.2

0.3

0.4

0 10 20 30 40Feature Subset Index

Crit

erio

n Va

lue

NPC

- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4, BATF, MIR21

No obvious rise in NPC

19

Page 31: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Real data application: DNA methylation features for cancer diagnosis

Among the 41 genes,

protein-coding microRNA pseudogene

remained 32 (20) 3 0

removed 4 (2) 1 1

() means the number of genes that express proteins in breast cancer tissues

according to the Human Protein Atlas database

Observations:

β€’ 9 out of 32 genes do not yet have available protein expression data in

breast cancer tissues in the Human Protein Atlas database

β€’ A specificity higher than 90% is achievable with only three gene markers:

HMGB2, MIR195 and SPARCL1. Inclusion of SPARCL1 increases

specificity from ∼ 70% to over 90%

20

Page 32: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Conclusions

β€’ A new model selection criterion: NPC, tailored for asymmetric binary

classification under the NP paradigm

β€’ NPC allows users to select a model that achieve the best specificity among

candidate models while maintaining a high sensitivity

β€’ Useful in disease diagnosis, and other applications (network security

control, loan screening and prediction of regional conflicts)

β€’ Flexible to the choice of classification methods and the way of

constructing NP classifiers

21

Page 33: Neyman-Pearson Criterion (NPC): A Model Selection ...jsb.ucla.edu/sites/default/files/031919_NPC_ENAR2019.pdfModel selection under the Neyman-Pearson paradigm (population) Goal: Develop

Acknowledgements

Yiling Chen

(UCLA)

Dr. Xin Tong

(USC)

22