Extreme learning machine:Theory and applications
-
Upload
formatc666 -
Category
Technology
-
view
4.475 -
download
2
description
Transcript of Extreme learning machine:Theory and applications
![Page 1: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/1.jpg)
Extreme learning machine:Theory and applicationsG.-B. Huang, Q.-Y. Zhu, and C.-K. SiewNeurocomputing, 2006
Presenter: James Chou2012/03/15
![Page 2: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/2.jpg)
Outline
Introduction Single-hidden layer feed-forward neural networks Neural Network Mathematical Model Back Propagation algorithm ELM Mathematical Model Performance Evaluation Conclusion
2
![Page 3: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/3.jpg)
Introduction
For past decades, gradient descent based methods have mainly been used in many learning algorithms of feed-forward neural networks.
Traditionally, all the parameters of the feed-forward neural networks need to tune iterative and need a very long time to learn.
When the input weights and the hidden layer biases are randomly assigned, SLFNs (single-hidden layer feed-forward neural networks) can be simply considered as a linear system and the output weights (linking the hidden layer to the output layer) can be computed through simple generalized inverse operation.
3
![Page 4: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/4.jpg)
Introduction (Cont.)
Based on this idea, this paper proposes a simple learning algorithm for SLFNs called extreme learning.
Different from traditional learning algorithms the extreme learning algorithm not only provide the smaller training error but also the better performance.
4
![Page 5: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/5.jpg)
Single-hidden layer feed-forwardneural networks
F(. ) is activation function Hard Limiter function
Sigmoid function
5
)(1
N
iii xFOutput
xwhen
xwhenxf
,0
,1)(
xexf
1
1)(
θ is the threshold
![Page 6: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/6.jpg)
Single-hidden layer feed-forwardneural networks (Cont.)
6
G() is activation functionL is number of hidden layer nodes
![Page 7: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/7.jpg)
Neural Network Mathematical Model
7
![Page 8: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/8.jpg)
Neural Network Mathematical Model (Cont.)
8
If ε = 0 , meanFL(x) = f(x) = T , T is known targetand Cost function = 0
![Page 9: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/9.jpg)
Neural Network Mathematical Model (Cont.)
Mathematical Model is βH = T From linear algebra view point If hidden layer have 20 nodes and total 1000
training datas.β How to calculate the big size inverse matrix is a
traditional issue.We try to calculate the inverse matrix of 5000*50,
but the PC crashes.
9
![Page 10: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/10.jpg)
Back Propagation algorithm
BP algorithm is the classic gradient base algorithm to find the best weight vectors and minimize the cost function.
10
η is Leaming Rate
Demo BP algorithm!
![Page 11: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/11.jpg)
ELM Mathematical Model
H+ is the Moore-Penrose generalized inverse of hidden layer output matrix H.
H+ = (HTH)-1HT
11
![Page 12: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/12.jpg)
ELM Mathematical Model (Cont.)
Moore-Penrose generalized inverse matrixThe application of linear algebra theorem.For a general linear system Ax = y, we say that is a
least-squares solution (l.s.s) if There · mean a norm in Euclidean space and ∥ ∥
A R∈ mxn and Y R∈ m.The resolution of a general linear system Ax=y,
where A may be singular and may even not be square, can be made very simple by the use of the Moore–Penrose generalized inverse.
12
![Page 13: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/13.jpg)
ELM Mathematical Model (Cont.)
Mathematical Model is βH = T We can rewritten the formula as
β = H+T = (HTH)-1HTT If hidden layer have 20 nodes and total 1000
training datas.
13
![Page 14: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/14.jpg)
Performance Evaluation
![Page 15: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/15.jpg)
Regression of SinC Function15
![Page 16: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/16.jpg)
Regression of SinC Function (Cont.)
100000 training data with 5-20% noise. 100000 testing data is noise free. The result of training 50 times in the
following table.
16
Noise TrainingTime_AVG(sec)
TrainingRMS_AVG TestingRMS_AVG
5% 0.6462 0.0113 2.201e-04=0.00022
10% 0.6306 0.0224 2.753e-04=0.00027
15% 0.6427 0.0334 8.336e-04=0.00083
20% 0.6452 0.0449 11.541e-04=0.00115
Demo ELM!
![Page 17: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/17.jpg)
Real-World Regression Problems
17
![Page 18: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/18.jpg)
Real-World Regression Problems (Cont.)
18
![Page 19: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/19.jpg)
Real-World Regression Problems (Cont.)
19
![Page 20: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/20.jpg)
Real-World Regression Problems (Cont.)
20
![Page 21: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/21.jpg)
Real-World Very Large Complex Applications
21
![Page 22: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/22.jpg)
Real Medical Diagnosis Application: Diabetes
22
![Page 23: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/23.jpg)
Protein Sequence Classification23
![Page 24: Extreme learning machine:Theory and applications](https://reader034.fdocuments.us/reader034/viewer/2022042713/547c5e115906b58b798b4748/html5/thumbnails/24.jpg)
Conclusion
Advantages ELM needs less training time compared to popular BP and
SVM/SVR. The prediction performance of ELM is usually a little better
than BP and close to SVM/SVR in many applications. Only need to turn the parameter L (hidden layer nodes). Nonlinear activation function still can work in ELM.
Disadvantages How to find the optimal soluction? Local minima issue. Easy Overfitting.
24