Be naive. not idiot.
-
Upload
predicsis -
Category
Data & Analytics
-
view
51 -
download
2
Transcript of Be naive. not idiot.
Be naive. Not idiot.Leveraging the class-conditional independency assumption.
Sylvain Ferrandiz
21 février 2017
Class-conditionnalindependency assumption
Often said simple, or naive, even idiot*
* Idiot's Bayes - not so stupid after all?, Hand, D.J., & Yu, K. (2001). International Statistical Review. Vol 69 part 3, pages 385-399. ISSN 0306-7734.
argmax
yp(y/x1, ..., xK) = argmax
yp(y)
Y
k
p(xk/y)
Estimate univariatedistributions
Parametric assumption?
Nonparametric assumption?
→ Kernels→ Binning (discretization /
grouping
→ Gaussian mixture→ Multinomial distribution
Winning binning?
Outliers
Missing values
Stability
* MODL: a Bayes optimal discretization method for continuous attributes. Boullé, M., (2006). Machine Learning, 65(1):131-165.
No parameter to validate
O(nlog(n))
Selective Naive Bayes
On predictive distributions and Bayesian networks. Kontkanen, P., Myllymäki, P., Silander, T., Tirri, H. & Grünwald, P. (2000). Statistics and Computing, 10, 39-54.
sk 2 {0, 1}
argmax
yp(y/x1, ..., xK) = argmax
yp(y)
Y
k
p(xk/y)sk
Select features
An introduction to variable and feature selection. Guyon, I., Elisseeff, A. (2003)Journal of machine learning research 3 (Mar), 1157-1182
Wrapper approach?
Embedded approach?→ Greedy optimization→ Nested subsets→ Direct objective
optimization
Filter approach?→ Mutual information→ Weak learner
→ Cross-validation
Forward Feature Selection
AB
C D
Pool of actualcandidates
Pool of future candidates
Featuresincluded in the
model
E
Drawindependently
Include iff the AUROCC is improved
Keep it safeotherwise
Forward Feature Selection
A
B
C
D
Pool of actualcandidates
Pool of future candidates
Featuresincluded in the
model
E
Drawindependently
Include iff the AUROCC is improved
Keep it safeotherwise
The averaging trick
wk =
Ps2S skp(s/d)Ps2S p(s/d)
* A Parameter-Free Classification Method for Large Scale Learning. Boullé, M., (2009). Journal of Machine Learning Research, 10:1367-1385.
The averaging trick
Explored model only
wk =
Ps2S skp(s/d)Ps2S p(s/d)
wk =
Ps2S skp(s/d)Ps2S p(s/d)
* A Parameter-Free Classification Method for Large Scale Learning. Boullé, M., (2009). Journal of Machine Learning Research, 10:1367-1385.
The averaging trick
Nonparametric prior
wk =
Ps2S skp(s/d)Ps2S p(s/d)
* A Parameter-Free Classification Method for Large Scale Learning. Boullé, M., (2009). Journal of Machine Learning Research, 10:1367-1385.
+ -performance
algorithm complexity
Nonparametric and stable(bye bye cross-validation!)
It’s up to the user to find‘composite’ features and capture correlational relationships, but
…
It’s where the fun is, ain’t it?
Numeric / Categorical(bye bye dummy-encoding!)
Interpretable*
*https://www.quora.com/What-makes-a-model-interpretable/answer/Claudia-Perlich
Feature surfacing
Users
Sales
Web
UsersCustomerIdFirstnameLastnameAge
SalesCustomerIdProductAmountTime
WebCustomerIdPageTime
Users.Customer_IdUsers.FirstnameUsers.LastnameUsers.AgeOutcomeCount(Sales.Product)CountDistinct(Sales.Product)Mean(Sales.Amount)Sum(Sales.Amount) where Sales.Product = 'Mobile Data'Count(Web.Page) where Day(Web.Time) in [6;7]…