Careful Use of Machine Learning Methods is needed for...

Careful Use of Machine Learning Methods is needed for Mobile Application

A case study on Transportation-mode Detection By Yu et al (2013)

Presented by

Muhammad Awais Shafique

Contents

1.  IntroducCon 2.  TransportaCon-‐mode

detecCon 3.  PracCcal use of SVM 4.  PiJall of CV accuracy 5.  Model size reducCon 6.  Fast training by opCmizaCon

7.  MulC-‐class SVM method 8.  Non-‐machine learning issues 9.  Conclusion

Introduction

• Machine learning methods are oUen applied as a black box. •  Example is transportaCon-‐mode detecCon. • Collect data, use algorithms and compare results. • Default se[ngs may not be the best one. •  EvaluaCon criterion (e.g. cross-‐validaCon) may not be appropriate. •  Some methods may not be applied due to resource constraints of mobile phones.

This paper focuses on using SVM and how the performance can be opCmized.

Transportation-mode Detection

•  The detector can use only up to 16 KB of memory. • Data consists of log files containing signals from gyroscope, accelerometer and magnetometer.

• ClassificaCon was done among SCll, Walk, Run, Bike, Others

•  Five features were extracted by calculaCng mean or standard deviaCon of the signals.

• Decision trees, AdaBoost and SVM were employed.

Transportation-mode Detection

• Results

Classifiers CV accuracy (%) Model size (KB)

Decision Tree 89.41 76.02

AdaBoost 91.11 1500.54

SVM 84.72 1379.97

Practical use of SVM

• Worse SVM performance may be because of lacking •  Data scaling •  Parameter selecCon

• Given label-‐instance pairs (𝑦↓1 , 𝒙↓1 )…(𝑦↓𝑙 , 𝒙↓𝑙 ) with 𝑦↓𝑖 =±1, 𝒙↓𝑖 ∈ 𝑅↑𝑛 , ∀𝑖 as the training set. (Primal problem)

Practical use of SVM (Cont.)

• Because w becomes a huge vector so dual opCmizaCon problem is solved. (Dual Problem)

• Where

• 𝐾( 𝒙↓𝑖 , 𝒙↓𝑗 ) is the kernel funcCon. • Default kernel funcCon in LIBSVM is RBF (Gaussian) kernel

•  The opCmal soluCon saCsfies

•  Linear scaling of features is done

• Parameter selecCon •  RegularizaCon parameter (C) •  Kernel parameter (𝛾 in case of RBF kernel)

•  Select the one achieving the best five-‐fold CV accuracy

• Results

SVM procedures CV accuracy (%)

Linear scaling + parameter selecCon 89.20

Log scaling + parameter selecCon 90.48

Pitfall of CV accuracy

• Although CV accuracy is most widely used evaluaCon measure but it can over-‐esCmate the real performance.

• Assume each user records 10 log files and each log file generates 100 feature vectors.

Pitfall of CV accuracy (Cont.)

•  Feature vector in the same log file shares some informaCon. •  In CV procedure if data from one log file appear in both training and validaCon sets, then the predicCon becomes easy.

•  Therefore the standard instance-‐wise split of data may easily over-‐esCmate the real performance.

•  To eliminate the sharing of meta-‐informaCon, data split should be made at higher level such as logs or users.

CV strategy SVM CV accuracy (%)

Instance-‐wise CV 90.48

Log-‐wise CV 83.37

• Although log-‐wise CV is more reasonable but its beder to have an independent test set collected by a completely different group of users.

•  The result confirms that instance-‐wise CV may severely over-‐esCmate.

Classifiers CV accuracy (%) Test accuracy (%) Model size (KB)

Decision Tree 89.41 77.77 76.02

AdaBoost 91.11 78.84 1500.54

SVM 90.48 85.14 1379.97

•  Similarly in “Towards physical acCvity diary: moCon recogniCon using simple acceleraCon features with mobile phones” by J. Yang (2009)

Reported CV accuracy 80 – 90 %

Reported Test accuracy < 70 %

2 folds 5 folds 8 folds

CV accuracy 85.05 83.37 82.21

Test accuracy 85.33 85.14 84.66

Model size reduction

• Although good accuracy achieved but the model size is much larger than 16 KB.

•  Large size due to storage of opCmal soluCon 𝛼 and support vectors. • Because it is a mulC-‐class problem and LIBSVM uses one-‐against-‐one method so for k-‐class problem the model size is

• Where n is the number of features.

Model size reduction (Cont.)

•  To reduce size use polynomial kernel.

• Where 𝛾 is the kernel parameter and d is the degree. •  The kernel is the inner product of two vectors 𝜙( 𝑥↓𝑖 ) and 𝜙( 𝑥↓𝑗 ) •  If d = 3

• Only 𝑤 and b need to be stored.

•  For d = 3, the model size turns out to be 2.28 KB

• Comparison among kernels

SVM method Test accuracy (%) Model size (KB)

RBF kernel 85.33 1287.15

Polynomial kernel 84.79 2.28

Linear kernel 78.51 0.24

Fast training by optimization

1.  The training of kernel SVM is known to be slow. 2.  Because of using 𝐾( 𝒙↓𝑖 , 𝒙↓𝑗 ) rather than 𝜙( 𝑥↓𝑖 ) or 𝜙( 𝑥↓𝑗 ), the

se[ng is very restricted. •  For linear SVM the opCmizaCon problem becomes

• Where 𝜉(𝑤;𝑥↓𝑖 , 𝑦↓𝑖 ) is the loss funcCon

Fast training by optimization (Cont.)

• Commonly used loss funcCons

•  The three loss funcCons are related so they give similar test result.

• Comparison scenarios I.  LIBSVM: polynomial kernel with hinge loss. II.  LIBLINEAR (primal): Linear SVM with squared hinge loss. III.  LIBLINEAR (dual): Linear SVM with squared hinge loss.

•  LIBSVM and LIBLINEAR (primal) give similar accuracy. •  Training Cme of LIBSVM is significantly high. •  In theory, both primal and dual solvers give exactly same accuracy.

LIBSVM LIBLINEAR

Primal Dual

Test accuracy 84.79 84.52 84.31

Training Cme 30519.10 1368.25 4039.20

Multi-class SVM

•  SVM is designed for two-‐class classificaCon. •  For mulC-‐class two methods are used

• One-‐against-‐one (Stores k(k-‐1)/2 weight vectors) • One-‐against-‐rest (Stores k weight vectors)

•  For 5 transport modes, we need 10 and 5 vectors respecCvely.

Multi-class SVM (Cont.)

• Results

One-‐against-‐one 84.52 2.24

One-‐against-‐rest 83.95 1.12

a.  1 + 4(4 – 1)/2 = 7 weight vectors b.  1 + 1 + 3(3 – 1)/2 = 5 weight vectors

• Results

One-‐against-‐one 84.52 2.24

One-‐against-‐rest 83.95 1.12

Hierarchy 1 84.46 1.57

Hierarchy 2 84.53 1.12

Non-machine learning issues

•  Feature engineering

•  ExtracCng important features is one of the most crucial steps. • Added two frequency-‐domain features.

a.  Peak magnitude: index of the highest FFT value. b.  RaCo: raCo between largest and second largest FFT values.

Non-machine learning issues (Cont.)

• Results

CV strategy 5 features Adding 2 FFT features

Instance-‐wise CV 89.90 92.98

Log-‐wise CV 85.05 89.26

Test accuracy 85.33 91.53

Non-machine learning issues (Cont.)

• Use of Domain knowledge

• Using informaCon from past predicCons. • Power saving by not enabling the classifier in some situaCons.

Conclusion

• Direct use of a machine learning method may not give saCsfactory results.

• Careful evaluaCon criterion must be chosen as this study showed that standard CV accuracy can slightly over-‐esCmate.

• PracCConer should take care while employing classifiers and should have deeper understanding of the methodology.

Careful Use of Machine Learning Methods is needed for...

Documents

Transcript of Careful Use of Machine Learning Methods is needed for...

Jiafu Tang, Yang Yu, Jia Li: An exact algorithm for …bin.t.u-tokyo.ac.jp/rzemi16/file/1-2.pdfJiafu Tang, Yang Yu, Jia Li: An exact algorithm for the multi-trip vehicle routing and

IntroducCon’ CS642:’’ Computer’Security’

Bierlaire, M., Chen, J., Newman, J., A probabilistic map ...bin.t.u-tokyo.ac.jp/rzemi13/test/伊藤3.pdf1-2-4-5-7-9と 1-2-4-6-7-9の2経路が考えられる ・データのDDRがネットワークのいずれかの部分を含むよう

Transportation system monitoring method by using …bin.t.u-tokyo.ac.jp/I-BinN-S/img/ibinn-seo.pdfTransportation system monitoring TSU method by using probe vehicles that observe other

Beyond’the’OWASP’Top’10’ - MDSec · 2015-05-02 · ©2011’MDSec’ConsulCng’Ltd.’’All’rights’reserved.’ ’’ IntroducCon’ Background’ Dangers&of&OWASP&Top&10&

Climate$Change$Adaptaon$&$ Sustainable$Design$atthe$$ … · 2020. 2. 3. · • IntroducCon$ • Projectgoals$ • Overview$of$climate$change$ • Research$methodology$ • Selected$strategies$

こうky 戦略的交通手段選定モデル ハイパーパス - …bin.t.u-tokyo.ac.jp/model16/presentation/F.pdf04. Estimation Results (MNL & NL & CNL) MNL NL CNL estimated parameter

c 2006 by Yaser Sheikh - UCF CRCV°c 2006 by Yaser Sheikh. ABSTRACT In this dissertation, we address the problem of object detection and object association across ... Khurram Shaﬁque,

Suspension-Firing of BiomassSuspension-Firing of Biomass. Part 2: Boiler Measurements of Ash Deposit Shedding Muhammad Shaﬁque Bashir,† Peter Arendt Jensen,*,† Flemming Frandsen,†

INTRODUCION AL PROJECT&FACILITY … · INTRODUCCON AL PROJECT FINANCE. GESTION DE CONCESIONES Y ... elaboración de trabajos y ejercicios prácticos. 7 Estrategias metodológicas

交通量配分bin.t.u-tokyo.ac.jp/kaken/pdf/traffic_assign tutrial for...ここではまず(1)～(4)の4節に分けて，確定的利用者均衡配分の基礎理論について説明します.

Shaﬁque N. Virani The Right Path: A Post-Mongol Persian Ismaili … · EpistleoftheRightPath(Risa¯la-yi _ Sira¯ _ t al-Mustaqı¯m),introduced,discussed,edited and translated

Shaﬁque N. Virani The Right Path: A Post-Mongol Persian ...The Right Path: A Post-Mongol Persian Ismaili Treatise TheEpistleoftheRightPath(Risa¯la-yi _ Sira¯ _ tal-Mustaqı¯m)isananonymoustreatise,

Idaho Democra,c Party Delegate Selec,on Plan For the 2020 ... · The State Delegate Selecon Plan For the 2020 Democra,c Na,onal Conven,on Table of Contents I. IntroducCon & DescripCon

Structural estimation for a route choice model with …bin.t.u-tokyo.ac.jp/model16/lecture/Oyama.pdfPath candidate set Large Small Pyo et al. (2001); Bierlaire et al. (2013) Oyama,

Shaﬁque Keshavjee 08-05.2018 Zürich - VBG

サポートベクトルマシーン についてbin.t.u-tokyo.ac.jp/rzemi14/file/12.pdf · 1995年にV.Vapnikによって統計的学習理論の枠 組みで提案された機械学習!

Application of AI for travel behavior modelling in urban ...bin.t.u-tokyo.ac.jp/model18/lecture/Chikaraishi.pdf · Take-home message: •There is a high possibility of utilizing machine

Study of Clustering Modes based on Choice of Transport across …bin.t.u-tokyo.ac.jp/model15/presentation/C.pdf · 2015-10-16 · Objectives of the study 1. To evaluate the spatial

The 2017 Suiier Course “Behavior Modelling in Transportation …bin.t.u-tokyo.ac.jp/model18/lecture/Urata2.pdf · 2018. 9. 24. · modelling with HPC Sep. 15, 2018 JunjiUrata (Univ.

Bierlaire, M., Chen, J., Newman, J., A probabilistic map ...bin.t.u-tokyo.ac.jp/rzemi13/test/伊藤3.pdf1-2-4-5-7-9と 1-2-4-6-7-9の2経路が考えられる・データのDDRがネットワークのいずれかの部分を含むよう

こうky 戦略的交通手段選定モデルハイパーパス - …bin.t.u-tokyo.ac.jp/model16/presentation/F.pdf04. Estimation Results (MNL & NL & CNL) MNL NL CNL estimated parameter

サポートベクトルマシーンについてbin.t.u-tokyo.ac.jp/rzemi14/file/12.pdf · 1995年にV.Vapnikによって統計的学習理論の枠組みで提案された機械学習!