MapReduce based SVM
-
Upload
dr-ferhat-ozgur-catak -
Category
Technology
-
view
2.439 -
download
1
description
Transcript of MapReduce based SVM
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
CloudSVM: Training an SVM Classifier in Cloud ComputingSystems
F. Ozgur CATAK 1
- M. Erdal BALABAN 2
1TUBITAK - National Research Institute of Electronics and Cryptology(UEKAE)
2Istanbul University, Faculty of Business Administration, Department of Quantitative Methods
ICPCA / SWS 2012
28 Nov 2012
1 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
Motivation
Our Research Focus
Overcome Big Space Complexity and Time Complexity of Support VectorMachine Algorithm
Training SVM in Cloud Systems with MapReduce
Using HDFS File System
Try to Find out a Global Classifier Function
2 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
3 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
1.1 Support Vector Machine1.2 SVM Solutions
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
4 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
1.1 Support Vector Machine1.2 SVM Solutions
1. INTRODUCTION
Support Vector Machine - SVM
Developed from Statistical Learning Theory (Vapnik & Chervonenkis)
Supervised learning method in statistics and computer science
Analyze data and recognize patterns, used for classification and regressionanalysis
Maximum generalization accuracy while avoiding overfit
Issues
computationally expensive to process
Quadratic optimization problem has O(m3) time and O(m2) spacecomplexity, where m is the training set size
4 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
1.1 Support Vector Machine1.2 SVM Solutions
1. INTRODUCTION
Solution - Feature Reduction
Singular Value Decomposition (SVD)
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Correlation Based Feature Selection (CFS)
Solution - Distributed Computing
Conventional distributed machine learning methods are complicated
Pre-Configured Intranet/Internet Environments
Costly
5 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
6 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
2. SUPPORT VECTOR MACHINE
Support Vector Machine
In machine learning, support vector machines are supervised learningmodels with associated learning algorithms that analyze data andrecognize patterns, used for classification and regression analysis.
An SVM model is a representation of the examples as points in space,mapped so that the examples of the separate categories are divided by aclear gap that is as wide as possible.
6 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
2. SUPPORT VECTOR MACHINE
D a set of n points of the form.
D = {(xi, yi) | xi ∈ Rm, yi ∈ {−1, 1} }ni=1
for each xi in data set D
w.xi − b > 1 if yi = 1, (1)
w.xi − b < −1 if yi = −1 (2)
Or equivalently
yi(w.xi − b) ≥ 1,∀(xi, yi) ∈ D (3)
.The distance between these two hyper-
planes is |F (xi)|‖ #»w‖ =⇒ 1
‖ #»w‖.Maximize distance between these twohyperplanes:
Minimize : P (w, b) =1
2‖ #»w‖2
subject to : yi(〈 #»w, #»x i〉+ b) ≥ 1
(4)
7 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
2. SUPPORT VECTOR MACHINE
By introducing Lagrange multipliers α, the previous linear constrained problemcan be expressed as
Optimization Problem
Minimize :P ( #»w, b) =1
2‖ #»w‖2
Subject to :yi(〈 #»w, #»x i〉+ b) ≥ 1(5)
Lagrange Multipliers
J( #»w, b, α) =1
2‖ #»w‖2 +
n∑i=1
αi(yi(#»w. #»x i − b)− 1) (6)
8 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
2. SUPPORT VECTOR MACHINE
Lagrange Multiplier Solution
Minimization of Lagrange Function J(w, b, α) respect to w and b’. Saddle Points ;
State 1 =∂J( #»w, b, α)
∂w= 0
State 2 =∂J( #»w, b, α)
∂b= 0
State 1 ve 2 solution,
#»w =m∑
i=1
αiyi#»x i and
n∑i=1
αiyi = 0 (7)
New Optimization Problem
Maksimize :Q =n∑
i=1
αi −1
2
n∑i=1
n∑j=1
αiαjyiyj#»x i
#»x j
subject to :n∑
i=1
αiyi = 0
α ≥ 0
(8)
9 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
10 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
3. MapReduce - Cloud Computing Algorithm
MapReduce Overview
Breaks large problem into smaller parts, solve in parallel, combine results.
Programmer specifies map and reduce functions.
Transparent Scaling: use same code on MBs locally or TBs acrossthousands of machines.
10 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
3. MapReduce - Cloud Computing Algorithm
MapReduce Overview
Most Popular Cloud Computing Model
Elastic Framework for Software Developers for Parallel and DistributedApplications
Input and Output files are on distributed file system.
map(key1, value1)⇒ list(key2, value2)
reduce(key2, list(value2))⇒ list(value3)
Figure : Overview of MapReduce
11 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
12 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
4. Development of System Model
CloudSVM
It’s a new Technique for Training SVM in Cloud with MapReduce
Training data set is uploaded to HDFS
We found classifier functions with this novel approach for data sets inHDFS
What’s new in CloudSVM
SVM has O(m3) time complexity and O(m2) space complexity where m isdata set size.It is very important result for large scale data sets and BigData
12 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
4. Development of System Model
Figure : CloudSVM Architecture Schematic View.
13 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
4. Development of System Model
CloudSVM Algorithm Map Function
SVGlobal = ∅ {Empty global support vector set}while ht 6= ht−1 do
for l ∈ L {For each subset loop} doDtl ← Dtl ∪ SV tGlobal
end forend while
14 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
4. Development of System Model
CloudSVM Algorithm Reduce Function
while ht 6= ht−1 dofor l ∈ L doSVl, h
t ← svm(Dl) {Train merged Dataset to obtain Support Vectorsand Hypothesis }
end forfor l ∈ L doSVGlobal ← SVGlobal ∪ SVl
end forend while
15 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
16 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
5. SIMULATION RESULTS
Method
We used 10-fold cross-validation, dividing the set of samples at random into 10approximately equal-size parts.We used ”Hinge Loss” for testing our models trained with CloudSVMalgorithm. Empirical risk can be computed with an approximation.
l(f( #»x ), y) = max {0, 1− y.f( #»x )} (9)
Remp(h) =1
n
n∑i=1
l(h( #»x i), yi) (10)
According to the empirical risk minimization principle the learning algorithmshould choose a hypothesis h which minimizes the empirical risk:
h = argminh∈H
Remp(h). (11)
16 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
5. SIMULATION RESULTS
Softwares & Development Environments
Hadoop 0.23
Python 2.7
SciPy, NumPy (Scientific and Numeric Python Libraries)
pythonxy (Scientific-oriented Python Distribution based on Qt andSpyder)
MrJob 0.3.5 (Hadoop Streaming)
LibSVM
Centos 6.2 64 bit
17 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
5. SIMULATION RESULTS
Table : Various UCI Datasets
Dataset Row Feature γ C Iteration SV Accuracy Kernel Type
German 1000 24 100 1 5 606 0.7728 LinearHeart 270 13 100 1 3 137 0.8259 Linear
Ionosphere 351 34 108 1 3 160 0.8423 LinearSatellite 4435 36 100 1 2 1384 0.9064 Linear
18 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
5. SIMULATION RESULTS
Table : Data set prediction accuracy with iterations
German & Heart Datasets. Smoothly Converges to Loss Values and SVs Size
19 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
5. SIMULATION RESULTS
Table : Data set prediction accuracy with iterations
Ionosphere & Satellite Datasets. Smoothly Converges to Loss Values and SVsSize
20 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
Conclusion & RecommendationReferences
Contents
1 1. Introduction1.1 Support Vector Machine1.2 SVM Solutions
2 2. Support Vector Machine2.1 Definition2.3 Optimization Problem2.4 Lagrange Multiplier
3 3. MapReduce3.1 MapReduce - Cloud Computing Algorithm3.2 Schematic View of MapReduce
4 4. Development of System Model4.1 Overview4.2 CloudSVM Architecture Schematic View4.3 CloudSVM Algorithm MapReduce Function
5 5. Simulation Results5.1 Method5.2 UCI Dataset Results5.3 Convergence of CloudSVM
6 6.ConclusionConclusion & RecommendationReferences
21 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
Conclusion & RecommendationReferences
6. CONCLUSION & RECOMMENDATION
Conclusion
We showed the simulation results
Stable and High Generalization Property
Independent of Network and Computer Infrastructure (Cloud ComputingBased)
Recommendation
Multiclass Classification
Application to Real Datasets
How many several different parts can be divided?
21 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems
1. Introduction2. Support Vector Machine
3. MapReduce4. Development of System Model
5. Simulation Results6.Conclusion
Conclusion & RecommendationReferences
6. REFERENCES
Vapnik, V.N.: The nature of statistical learning theory. Springer, NY(1995)
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J. and Qiu, Z.,Cui, H.:PSVM: Parallelizing Support Vector Machines on Distributed Computers.Advances in Neural Information Processing Systems 20, (2007)
Lu, Y., Roychowdhury, V., Vandenberghe, L.: Distributed parallel supportvector machines in strongly connected networks. IEEE Trans. NeuralNetworks, 19, 1167-1178 (2008)
Graf,H. P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallelsupport vector machines: The cascade SVM.In: Proceedings of theEighteenth Annual Conference on Neural Information Processing Systems(NIPS), pp. 521-528. MIT Press, Vancouver (2004)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on largeclusters. In :Proceedings of the 6th conference on Symposium onOperating Systems Design & Implementation(OSDI), pp. 10-10. USENIXAssociation, Berkeley (2004)
22 / 22 F. Ozgur CATAK - M. Erdal BALABAN CloudSVM: Training an SVM Classifier in Cloud Computing Systems