Neyman-Pearson Criterion (NPC): A Model Selection...
Transcript of Neyman-Pearson Criterion (NPC): A Model Selection...
Neyman-Pearson Criterion (NPC): A Model Selection
Criterion for Asymmetric Binary Classification
Jingyi Jessica Li
Department of Statistics
University of California, Los Angeles
http://jsb.ucla.edu
Model selection for binary classification: a motivating example
Automated disease diagnosis a binary classification problem
β’ Features/predictors: βΌ 20K human gene expression levels
β’ Response: binary disease status: 0 (diseased) and 1 (healthy)
Given a classification method (e.g., logistic regression),
β’ What subset of genes exhibits the highest diagnostic power?
A model selection problem
1
Model selection for binary classification: a motivating example
Automated disease diagnosis a binary classification problem
β’ Features/predictors: βΌ 20K human gene expression levels
β’ Response: binary disease status: 0 (diseased) and 1 (healthy)
Given a classification method (e.g., logistic regression),
β’ What subset of genes exhibits the highest diagnostic power?
A model selection problem1
Two binary classification paradigms: classical vs. Neyman-Pearson
Paradigm Oracle classifier Practical classifier
Classical Οβ = arg minΟ
R(Ο) ΟΜ = arg minΟ
RΜ(Ο)
Neyman-Pearson ΟβΞ± = arg minR0(Ο)β€Ξ±
R1(Ο) ΟΜΞ± by the NP umbrella algorithm
[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]
where Ξ± is a user-specified upper bound on the type I error
The Neyman-Pearson paradigm is suitable for disease diagnosis
β’ The two classes have asymmetric importance
β’ Mispredicting a normal tissue sample as malignant
β patient anxiety & additional medical costsβtype II error R1(Ο)
β’ Mispredicting a tumor sample as normal
β life loss βtype I error R0(Ο)
Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο)
2
Two binary classification paradigms: classical vs. Neyman-Pearson
Paradigm Oracle classifier Practical classifier
Classical Οβ = arg minΟ
R(Ο) ΟΜ = arg minΟ
RΜ(Ο)
Neyman-Pearson ΟβΞ± = arg minR0(Ο)β€Ξ±
R1(Ο) ΟΜΞ± by the NP umbrella algorithm
[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]
where Ξ± is a user-specified upper bound on the type I error
The Neyman-Pearson paradigm is suitable for disease diagnosis
β’ The two classes have asymmetric importance
β’ Mispredicting a normal tissue sample as malignant
β patient anxiety & additional medical costsβtype II error R1(Ο)
β’ Mispredicting a tumor sample as normal
β life loss βtype I error R0(Ο)
Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο)
2
Two binary classification paradigms: classical vs. Neyman-Pearson
Paradigm Oracle classifier Practical classifier
Classical Οβ = arg minΟ
R(Ο) ΟΜ = arg minΟ
RΜ(Ο)
Neyman-Pearson ΟβΞ± = arg minR0(Ο)β€Ξ±
R1(Ο) ΟΜΞ± by the NP umbrella algorithm
[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]
where Ξ± is a user-specified upper bound on the type I error
The Neyman-Pearson paradigm is suitable for disease diagnosis
β’ The two classes have asymmetric importance
β’ Mispredicting a normal tissue sample as malignant
β patient anxiety & additional medical costsβtype II error R1(Ο)
β’ Mispredicting a tumor sample as normal
β life loss βtype I error R0(Ο)
Policy makers often like to enforce a pre-specified threshold Ξ± on R0(Ο)
2
Two binary classification paradigms: classical vs. Neyman-Pearson
Paradigm Oracle classifier Practical classifier
Classical Οβ = arg minΟ
R(Ο) ΟΜ = arg minΟ
RΜ(Ο)
Neyman-Pearson ΟβΞ± = arg minR0(Ο)β€Ξ±
R1(Ο) ΟΜΞ± by the NP umbrella algorithm
[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]
where Ξ± is a user-specified upper bound on the type I error
The Neyman-Pearson paradigm is suitable for disease diagnosis
β’ The two classes have imbalanced sample sizes
R(Ο) = IP(Y = 0)R0(Ο) + IP(Y = 1)R1(Ο)
When IP(Y = 0) = IP(diseased)οΏ½ IP(Y = 1) = IP(healthy),
β’ The classical oracle classifier may have an excessively large R0(Ο)
β’ The NP oracle classifier will have R0(Ο) β€ Ξ±
3
Two binary classification paradigms: classical vs. Neyman-Pearson
Paradigm Oracle classifier Practical classifier
Classical Οβ = arg minΟ
R(Ο) ΟΜ = arg minΟ
RΜ(Ο)
Neyman-Pearson ΟβΞ± = arg minR0(Ο)β€Ξ±
R1(Ο) ΟΜΞ± by the NP umbrella algorithm
[Rigollet and Tong (2011)] [Tong, Feng and Li (2018)]
where Ξ± is a user-specified upper bound on the type I error
The Neyman-Pearson paradigm is suitable for disease diagnosis
β’ The two classes have imbalanced sample sizes
R(Ο) = IP(Y = 0)R0(Ο) + IP(Y = 1)R1(Ο)
When IP(Y = 0) = IP(diseased)οΏ½ IP(Y = 1) = IP(healthy),
β’ The classical oracle classifier may have an excessively large R0(Ο)
β’ The NP oracle classifier will have R0(Ο) β€ Ξ±
3
Paper and software of the Neyman-Pearson umbrella algorithm
R package nproc
https://CRAN.R-project.org/package=nproc
Email: [email protected]
Model selection under the Neyman-Pearson paradigm (population)
β’ Goal: Develop a model selection criterion to compare models (i.e., feature
subsets) under the Neyman-Pearson (NP) paradigm
β’ Idea: prediction-based model selection
β’ Compare two feature subsets based on the type II errors of their
corresponding NP oracle classifiers
in contrast to
β’ Compare two feature subsets based on the risks of their corresponding
classical oracle classifiers
β’ Question: Will the model selection results be different under the two
paradigms?
5
Model selection under the Neyman-Pearson paradigm (population)
β’ Goal: Develop a model selection criterion to compare models (i.e., feature
subsets) under the Neyman-Pearson (NP) paradigm
β’ Idea: prediction-based model selection
β’ Compare two feature subsets based on the type II errors of their
corresponding NP oracle classifiers
in contrast to
β’ Compare two feature subsets based on the risks of their corresponding
classical oracle classifiers
β’ Question: Will the model selection results be different under the two
paradigms?
5
Model selection under the Neyman-Pearson paradigm (population)
β’ Goal: Develop a model selection criterion to compare models (i.e., feature
subsets) under the Neyman-Pearson (NP) paradigm
β’ Idea: prediction-based model selection
β’ Compare two feature subsets based on the type II errors of their
corresponding NP oracle classifiers
in contrast to
β’ Compare two feature subsets based on the risks of their corresponding
classical oracle classifiers
β’ Question: Will the model selection results be different under the two
paradigms?
5
Prediction-based model selection under the two paradigms (population)
A linear discriminant analysis (LDA) example
Two features X1,X2 β IR with the following class conditional distributions:
X{1} | (Y = 0) βΌ N (β5, 22) , X{1} | (Y = 1) βΌ N (0, 22) ,
X{2} | (Y = 0) βΌ N (β5, 22) , X{1} | (Y = 1) βΌ N (1.5, 3.52) .
We would like to select the better feature between the two
β’ Classical oracle classifiers:
R(Οβ{1}
)= 0.106 < R
(Οβ{2}
)= 0.113
So X1 is the better feature
β’ Neyman-Pearson (NP) oracle classifiers:
R1
(ΟβΞ±{1}
)vs. R1
(ΟβΞ±{2}
)β’ Ξ± = 0.01, R1
(ΟβΞ±{1}
)= 0.431 > R1
(ΟβΞ±{2}
)= 0.299 =β X2 better
β’ Ξ± = 0.20, R1
(ΟβΞ±{1}
)= 0.049 < R1
(ΟβΞ±{2}
)= 0.084 =β X1 better
6
Prediction-based model selection under the two paradigms (population)
A linear discriminant analysis (LDA) example
Two features X1,X2 β IR with the following class conditional distributions:
X{1} | (Y = 0) βΌ N (β5, 22) , X{1} | (Y = 1) βΌ N (0, 22) ,
X{2} | (Y = 0) βΌ N (β5, 22) , X{1} | (Y = 1) βΌ N (1.5, 3.52) .
We would like to select the better feature between the two
β’ Classical oracle classifiers:
R(Οβ{1}
)= 0.106 < R
(Οβ{2}
)= 0.113
So X1 is the better feature
β’ Neyman-Pearson (NP) oracle classifiers:
R1
(ΟβΞ±{1}
)vs. R1
(ΟβΞ±{2}
)β’ Ξ± = 0.01, R1
(ΟβΞ±{1}
)= 0.431 > R1
(ΟβΞ±{2}
)= 0.299 =β X2 better
β’ Ξ± = 0.20, R1
(ΟβΞ±{1}
)= 0.049 < R1
(ΟβΞ±{2}
)= 0.084 =β X1 better
6
Prediction-based model selection under the two paradigms (population)
Special scenarios where prediction-based model selection is the same under the
two paradigms
Lemma 1
X{1} | (Y = 0) βΌ N(Β΅0
1, (Ο01)2), X{1} | (Y = 1) βΌ N
(Β΅1
1, (Ο11)2),
X{2} | (Y = 0) βΌ N(Β΅0
2, (Ο02)2), X{2} | (Y = 1) βΌ N
(Β΅1
2, (Ο12)2).
Let Ξ± β (0, 1) , ΟβΞ±{1} and ΟβΞ±{2} be the NP oracle classifiers based on X{1} and
X{2} β IR respectively, and Οβ{1} and Οβ{2} be the corresponding classical oracle
classifiers. IfΟ0
2
Ο12
=Ο0
1
Ο11
,
then
sign(R1
(ΟβΞ±{1}
)β R1
(ΟβΞ±{2}
))= sign
(R(Οβ{1}
)β R
(Οβ{2}
)).
for all Ξ±. The reverse statement also holds.
7
Prediction-based model selection under the two paradigms (population)
Special scenarios where prediction-based model selection is the same under the
two paradigms
Lemma 2
Let A1, A2 β {1, . . . , d} be two index sets. For a random vector X β IRd , let
XA1 and XA2 be sub-vectors of X comprising of coordinates with indexes in A1
and A2 respectively, and assume they follow the class conditional distributions:
XA1 | (Y = 0) βΌ N (Β΅01,Ξ£1) , XA1 | (Y = 1) βΌ N (Β΅1
1,Ξ£1) ,
XA2 | (Y = 0) βΌ N (Β΅02,Ξ£2) , XA2 | (Y = 1) βΌ N (Β΅1
2,Ξ£2) ,
where Β΅ij β IR` denotes the mean vector of feature set Aj in class i , and
Ξ£j β IR`Γ` denotes the variance-covariance matrix of feature set Aj , j = 1, 2,
i = 0, 1.
The selected feature set AβΞ± = A1 or A2 under the NP paradigm is invariant to Ξ±
and is consistent with the selected feature set Aβ under the classical paradigm.
8
Model selection under the Neyman-Pearson paradigm (sample)
β’ Goal: Develop a practical prediction-based model selection criterion under
the NP paradigm
β’ Idea: Compare two feature subsets based on the estimated type II errors
of their corresponding NP classifiers (constructed by our NP umbrella
algorithm)
β’ Question: How to construct a βgoodβ estimator of the type II error of an
NP classifier?
Leave out some class 1 data!
9
Model selection under the Neyman-Pearson paradigm (sample)
β’ Goal: Develop a practical prediction-based model selection criterion under
the NP paradigm
β’ Idea: Compare two feature subsets based on the estimated type II errors
of their corresponding NP classifiers (constructed by our NP umbrella
algorithm)
β’ Question: How to construct a βgoodβ estimator of the type II error of an
NP classifier?
Leave out some class 1 data!
9
Model selection under the Neyman-Pearson paradigm (sample)
β’ Goal: Develop a practical prediction-based model selection criterion under
the NP paradigm
β’ Idea: Compare two feature subsets based on the estimated type II errors
of their corresponding NP classifiers (constructed by our NP umbrella
algorithm)
β’ Question: How to construct a βgoodβ estimator of the type II error of an
NP classifier?
Leave out some class 1 data!
9
Model selection under the Neyman-Pearson paradigm (sample)
β’ Goal: Develop a practical prediction-based model selection criterion under
the NP paradigm
β’ Idea: Compare two feature subsets based on the estimated type II errors
of their corresponding NP classifiers (constructed by our NP umbrella
algorithm)
β’ Question: How to construct a βgoodβ estimator of the type II error of an
NP classifier?
Leave out some class 1 data!
9
The model selection criterion: NPC (Neyman-Pearson Criterion)
Statistical formulation
Given Ξ±, Ξ΄ β [0, 1], a classification method, and a feature subset
A β {1, . . . , p}, a practical NP classifier ΟΜΞ±A is constructed by the NP
umbrella algorithm. Then the NPC for model A at level Ξ± is defined as
NPCΞ±A := RΜ1(ΟΜΞ±A)
where RΜ1(Ο) is the estimated type II error of any classifier Ο on leave-out
class 1 data
β’ Sample splitting: split training data into three parts
β’ mixed classes 0 and 1 sample
β’ left-out class 0 sample
}NP umbrella algorithm
=β ΟΜΞ±A
β’ left-out class 1 sampleΟΜΞ±A=β NPCΞ±A
β’ Multiple random splits can be used to construct an ensemble estimator
with a smaller variance
10
Theoretical properties of NPC
Concentration of RΜ1(ΟΜΞ±A) at R1(ΟβΞ±A)
Given reasonable conditions on
(1) the two class conditional distributions of X | (Y = 0) and
X | (Y = 1)
(2) the scoring function of the given classification method
(3) the two class samples sizes
we can show that with probability at least 1β Ξ΄β²
|RΜ1(ΟΜΞ±A)β R1(ΟβΞ±A)| β€ C(Ξ΄β²)
where
β’ R1(ΟβΞ±A) is the population type II error of the method-specific
oracle classifier ΟβΞ±A (the classifier that shares the same scoring
function as the βbestβ classical classifier constrained by the
classification method)
β’ The deviation upper bound C(Ξ΄β²) increases as Ξ΄β² decreases; also,
C(Ξ΄β²) converges to zero as the sample sizes go to infinity11
Real data application: DNA methylation features for cancer diagnosis
A glance at data (Fleischer et al., 2014 Genome Biology):
β’ 46 (class 1) normal tissues V.S. 239 breast cancer (class 0) tissue
β’ Methylation levels are measured at 468, 424 genome locations in every
tissue
β’ After preprocessing and normalization, d = 19, 363οΏ½ n = 285
12
Real data application: DNA methylation features for cancer diagnosis
Use L1 penalized logistic regression to generate a solution path
=β 41 nested feature sets
Apply model selection criteria:
ββ
βββ
ββββββ
ββββββ
ββ
βββββββ
βββββββββββββββ
β250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
ββββββββββββ
ββ
βββββββ
ββββ
ββ
ββββββ
βββ
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
βββ
βββββββ
βββββ
ββββββ
βββββ
βββββββββ
ββ β
ββ β
ββ
β
ββ
ββ
β ββ
β β
β β β
β β ββ
β β
β β ββ
β
β
β ββ
β ββ
β β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββ
ββββ
ββ
β
ββ
β
β
ββ
ββ
βββββββ
ββββββββββββββ
β
β
β ββ
ββ
ββ
β
β
β
β
β
β
β
β β
β
β
β
β β β β β β
β β β β β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ ZNF646
Remove: ZNF646 . . .
13
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
ββββββ
ββββββ
ββ
βββββββ
βββββββββββββββ
β250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
ββββββββββββ
ββ
ββββ
βββ
ββββ
ββ
ββββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
ββ
ββββ
β
βββ
ββ
βββββ
ββββ
βββ
ββ
βββββββββ
β
β β
ββ β
ββ
β
ββ
β
β
ββ β
β β
β β β
β β
β
β
β
β
β
ββ
β β
β β β ββ
β
ββ β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
β
β
β
βββ
β
β
β
ββ
βββββββββ
ββββββββββββββ
β
β
β ββ β
β
β
β
β
ββ
β
β
β
β
β β
β ββ
ββ
β ββ
β
β β β β β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ ERAP1
- ZNF646
Remove: ZNF646, ERAP1 . . .
14
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
ββββββββββββ
βββββββββ
ββββββββββββββββ250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
βββ
βββββββββ
ββ
ββββ
βββ
βββ
β
ββββββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
βββ
β
βββ
βββββ
βββ
ββββββββ
βββ
ββββββ
βββ
β
β β
ββ β
ββ β
β
β
ββ
β
ββ
ββ
β β β
β β
β
β
β
β
ββ
β
ββ
ββ
β ββ β
ββ
β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
ββ
ββ
β
β
ββ
βββ
βββββ
ββββ
ββββββββββββββ
β
β
β ββ β
β
β β
β β
β
β
ββ
ββ
β
β ββ
ββ
β ββ
ββ β β β β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ LOC121952 (METTL21EP)
- ZNF646, ERAP1
Remove: ZNF646, ERAP1, LOC121952 . . .
15
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
ββββββββββββ
βββββββββ
ββββββββββββββββ250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
ββββ
ββββββββ
ββ
βββ
ββββ
βββ
β
ββββββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
ββββ
β
ββ
βββββ
βββ
ββββββββ
βββ
ββββββ
βββ
β
β β
ββ β
ββ β β
β
ββ
ββ
ββ β
β β β
β β β
β
β
β
ββ
β
ββ
ββ
β ββ β
ββ
β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
βββ
β
β
β
ββ
β
ββ
βββββββ
βββββββ
βββββββββ
β
β
β ββ β
β
β β β
β
β
β
β
β
β
β β
β ββ
β β β ββ
ββ β β β β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ GEMIN4
- ZNF646, ERAP1, LOC121952 (METTL21EP)
Remove: ZNF646, ERAP1, LOC121952, GEMIN4 . . .
16
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
ββββββββββββ
βββββββββ
ββββββββββββββββ250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
ββββββ
ββββββ
ββ
βββββββ
βββ
β
ββββββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
ββββββ
ββββββ
βββ
ββββ
βββββββ
βββββββ
ββ
β
β β
ββ β
ββ β β β
β
β ββ
ββ β
β ββ
β ββ
β
β
ββ
β ββ β
ββ β β
β β ββ β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
ββββββββ
βββββ
ββββ
βββββββββββββββββ
β
β
β ββ β
β
β β β β β
β
ββ
ββ
β
β β
ββ β β
β β ββ β β β
β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ BATF
- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4
Remove: ZNF646, ERAP1, LOC121952, GEMIN4, BATF . . .
17
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
βββββββ
βββββ
ββ
βββββββ
ββββββββββββββββ250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
βββββββ
βββ
ββ
ββ
βββββββ
βββ
β
ββββββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
βββββββ
βββββ
ββββββ
β
βββββββ
βββββββ
ββ
β
β β
ββ β
ββ β β β
β ββ
β
β β β
ββ
ββ β
ββ
β
ββ
β ββ β
ββ β β
β β ββ β
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
ββββββ
ββ
βββ
ββββββ
βββββββββββββββββ
β
β
β ββ β
β
β β β β β β
β β
β β β
ββ
ββ β β
β β ββ β β β
β
β β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC+ MIR21
- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4, BATF
Remove: ZNF646, ERAP1, LOC121952, GEMIN4, BATF, MIR21 . . .
18
Real data application: DNA methylation features for cancer diagnosis
ββ
βββ
ββββββββββββ
βββββββββ
ββββββββββββββββ250
β200
β150
β100
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
AICββ
βββ
βββββββββ
βββ
ββββ
βββββ
ββββ
ββββ
ββββββ
β
β180
β150
β120
β90
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
BIC
βββββ
ββββββββ
β
βββ
ββββββ
βββββ
βββ
ββββ
βββββ
β
β β
ββ β
ββ β β β
β β β
β
ββ β
ββ β
ββ
ββ
β
ββ
β
ββ
β
ββ
β ββ
ββ
ββ
β
0.00
0.01
0.02
0.03
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
Classical Criterion
β
βββββ
βββββββββββ
βββββββββ
ββββββββββββββ
β
β
β ββ β
β
β β β β β β ββ
ββ
β
β β ββ
ββ β β
ββ β β β
ββ β β β β β β β β
β
0.0
0.1
0.2
0.3
0.4
0 10 20 30 40Feature Subset Index
Crit
erio
n Va
lue
NPC
- ZNF646, ERAP1, LOC121952 (METTL21EP), GEMIN4, BATF, MIR21
No obvious rise in NPC
19
Real data application: DNA methylation features for cancer diagnosis
Among the 41 genes,
protein-coding microRNA pseudogene
remained 32 (20) 3 0
removed 4 (2) 1 1
() means the number of genes that express proteins in breast cancer tissues
according to the Human Protein Atlas database
Observations:
β’ 9 out of 32 genes do not yet have available protein expression data in
breast cancer tissues in the Human Protein Atlas database
β’ A specificity higher than 90% is achievable with only three gene markers:
HMGB2, MIR195 and SPARCL1. Inclusion of SPARCL1 increases
specificity from βΌ 70% to over 90%
20
Conclusions
β’ A new model selection criterion: NPC, tailored for asymmetric binary
classification under the NP paradigm
β’ NPC allows users to select a model that achieve the best specificity among
candidate models while maintaining a high sensitivity
β’ Useful in disease diagnosis, and other applications (network security
control, loan screening and prediction of regional conflicts)
β’ Flexible to the choice of classification methods and the way of
constructing NP classifiers
21
Acknowledgements
Yiling Chen
(UCLA)
Dr. Xin Tong
(USC)
22