Instructor : Omid Sojoodi Faculty of Electrical, Computer and...
Transcript of Instructor : Omid Sojoodi Faculty of Electrical, Computer and...
![Page 1: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/1.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 2: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/2.jpg)
Machine learning: the study of self-modifying computer systems that can acquire new knowledge and improve their own performance
Survey on machine learning techniques:
induction from examples Bayesian learning artificial neural networks instance-based learning genetic algorithms reinforcement learning unsupervised learning and biologically motivated learning algorithms
2
![Page 3: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/3.jpg)
Text Book: Ethem Alpaydin (2010) “Introduction to Machine Learning”, 2nd edition. MIT Press. Tom Mitchell (1997) “Machine Learning”, McGraw-Hill
Grading: Assignments (20%) Presentation (30%) Final Examination (50%)
3
![Page 4: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/4.jpg)
Machine Learning at AAAI Journal of Machine Learning Research Journal of Machine Learning Gossip (ML humor) mdl-research.org Machine Learning Database Repository at UC Irvine David Aha's list of machine learning resources Avrim Blum's Machine Learning Page UCI - Machine Learning Repository UTCS Machine Learning Research Group Microsoft Bayesian Network Editor (MSBNx) Weka 3 -- Machine Learning Software in Java Journal of AI Research (online text) C5/See5 MLC++, A Machine Learning Library in C++ Web->KB project DELVE-Data for Evaluating Learning
4
![Page 5: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/5.jpg)
Introduction (A1, M1) (A: Alpaydin, M: Mitchell)
Supervised Learning (A2, M7) Bayesian Learning (A3, A16, M6) Parametric Methods (A4) Dimensionality Reduction (A6) Decision Tree (A9, M3) Multilayer Perceptron (A11, M4) Kernel Machine (A13) Reinforcement Learning (A18, M13) Clustering (A7) Machine Learning Experiments (A19) Genetic Algorithms (M9)
5
![Page 6: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/6.jpg)
Neural Networks: perceptrons, backpropagation, etc.
Pattern analysis: Bayesian learning, instance-based learning Artificial intelligence:
decision trees (in some courses) Statistics:
hypothesis testing Data Mining:
associations rule and classification (Relatively) unique to this course:
concept learning, computational learning theory, genetic algorithms, reinforcement learning, decision trees (in depth treatment)
6
![Page 7: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/7.jpg)
Theory-bounded Research: Networked Data Classification Classification of Uncertain Data Similarity-based Dimension Reduction (NCA, NDA, DNDA, SDA, …) Nearest Neighbor Classification in Non-stationary environments Learning Distance Metric Data Stream Classification Ensemble Classifiers …
7
![Page 8: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/8.jpg)
Application-based Research: ML Methods in Sensor Network ML Methods in Control and Robotics ML Methods in Weather Forecasting ML Methods in Financial Problems ML Methods in Filtering (Spam Filtering/Document) ML Methods in Machine Vision ML Methods in Biomedical Engineering (Biomedical Image/Signal Processing) …
8
![Page 9: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/9.jpg)
How can machines (computers) learn? How can machines improve automatically with experience? Benefits:
Improved performance Automated optimization New uses of computers Reduced programming Insights into human learning and learning disabilities
9
![Page 10: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/10.jpg)
Current status: Yet unsolved problem. Theoretical insights emerging. Practical applications. Huge data volume demands ML, and provides opportunity to ML (data mining).
State of the art: speech recognition medical predictions fraud detection drive autonomous vehicles (highway and desert) board games (backgammon, chess) theoretical bounds on error, number of inputs needed, etc.
10
![Page 11: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/11.jpg)
A program is said to learn from experience E with respect to task T and performance measure P, P in T increase with E.
Examples: Playing checkers, Handwriting recognition, Robot driving, etc.
Goal of ML: “define precisely a class of problems that encompasses interesting forms of learning, to explore algorithms that solve such problems, and to understand the fundamental structure of learning problems and processes” (Mitchell, 1997)
11
![Page 12: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/12.jpg)
Training experience: direct vs. indirect (learning to play checkers)
problem of credit assignment degree of control over training examples (teacher-dependent or learner-generated) closeness of training example distribution to true distribution over which P is measured: in many cases, ML algorithms assume that both distributions are similar, which may not be the case in practice.
12
![Page 13: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/13.jpg)
Remaining design choices:
The exact type of knowledge to be learned. A representation for this target knowledge. A learning mechanism.
13
![Page 14: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/14.jpg)
Type of knowledge to be learned: for example, we want to learn the best move in a board game. Can represent as a function (B: board states, M: moves):
ChooseMove : B M, but it is hard to learn directly.
14
![Page 15: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/15.jpg)
Another function (B: board states, R: real numbers):
V : B R, which gives the evaluation of each board state.
V (b = win) = 100 V (b = lose) = −100 V (b = draw) = 0 V (b = otherwise) = V (b0), where b0 is the best final
board state that can be reached from b. However, this is not efficiently computable, i.e., it is a
nonoperational definition. Goal of ML is to find an operational description of V ,
however, in practice, an approximation is all we can get.
15
![Page 16: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/16.jpg)
Given an ideal target function V , we want to learn an approximate function :
Trade-off between rich and parsimonious representation. Example: as a linear combination of number of pieces, number of particular relational situations in the board (e.g., threatened), etc. (represented as xi) in board configuration b:
where wi are the weight values to be learned.
Advantage of the above representation: reduction of scope (or dimensionality) from the original problem.
16
n
iii xwwbV
10)(ˆ
![Page 17: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/17.jpg)
Given board state and true V , we want to learn the weights wi that specify .
Start with a set of a large number of input-target pairs < b, Vtrain(b) >. Problem: cannot come up with a full set of
< b, Vtrain(b) > pairs. Solution: If Vtrain(b) is unknown, set it to the estimated of its successor board state:
Vtrain(b) = train(Successor(b)).
17
![Page 18: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/18.jpg)
Last component in defining a learning algorithm: adjustment of weights.
Want to learn weights wi that best fit the set of training samples {< b, Vtrain(b) >}.
How to define best fit? Once we have we can calculate all (b) for all b in the training set, and calculate the error (here MSE)
How to reduce E?
18
||
))(ˆ)(()(,
2
settraining
bVbVE settrainingbVb
traintrain
![Page 19: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/19.jpg)
Least Mean Squares (LMS) learning rule to minimize MSE: Until weights converge : For each training example < b, Vtrain(b) > 1) Use the current weights to calculate (b) 2) For each weight wi, update it as wi wi + η(Vtrain(b) − (b))xi, where η is a small learning rate constant
The error Vtrain(b) − (b) and the input xi both contribute to the weight update.
19
![Page 20: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/20.jpg)
Training experience: against experts, against self, table of correct moves, ... Target function: board move, board value, ... Representation of target function: polynomial, linear function of small number of features, artificial neural network, … Learning algorithm: gradient descent, linear programming, Genetic Algorithm, ...
20
![Page 21: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/21.jpg)
Useful to think of ML as searching a very large space of possible hypotheses to best fit the data and the learner’s prior knowledge. For example, the hypothesis space for would be all possible s with different weight assignment. Useful concepts regarding hypothesis space search:
Size of hypothesis space Number of training examples available/needed. Confidence in generalizing to new unseen data.
21
![Page 22: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/22.jpg)
What algorithms exist for generalizable learners given specific training set? Requirements for convergence? Which algorithms are best for a particular domain? How much training data needed? Bounds on confidence, based on data size? How long to train? Use of prior knowledge? How to choose best training experience? Impact of the choice? How to reduce ML problem to function approximation? How can learner alter the representation itself?
22
![Page 23: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/23.jpg)
Supervised learning: input-target pairs given. Unsupervised learning: only input distribution is given. Reinforcement learning: sparse reward signal is given for action based on sensory input; environment-altering actions.
23
![Page 24: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/24.jpg)
Can machines themselves formulate their own learning tasks?
Can they come up with their own representations? Can they come up with their own learning strategy? Can they come up with their own motivation?Can they come up with their own questions/problems?
What if the machines are faced with multiple, possibly conflicting tasks? Can there be a meta learning algorithm? What if performance is hard to measure (i.e., hard to quantify, or even worse, subjective)? 24
![Page 25: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/25.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 26: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/26.jpg)
Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when:
Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)
2
![Page 27: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/27.jpg)
Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought “chips” also bought “yogurt” Build a model that is a good and useful approximation to the data.
3
![Page 28: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/28.jpg)
There is a connection between machine learning and data mining Data mining:
“…the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.” (Hand et al., 2001)
4
![Page 29: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/29.jpg)
5
Define Problem
Data Collection
Data Preparation
Data Modeling
Interpretation/Evaluation
Implement /Deploy Model
Machine Learning
![Page 30: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/30.jpg)
Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Control, robotics, troubleshooting Medicine: Medical diagnosis Telecommunications: Spam filters, intrusion detection Bioinformatics: Motifs, alignment Web mining: Search engines ...
6
![Page 31: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/31.jpg)
Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to
Solve the optimization problem Representing and evaluating the model for inference
7
![Page 32: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/32.jpg)
Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning
8
![Page 33: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/33.jpg)
Basket analysis: P (Y | X ) probability that somebody who buys
X also buys Y where X and Y are products/services.
Example: P ( chips | yogurt ) = 0.7
9
![Page 34: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/34.jpg)
10
Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings
Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
![Page 35: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/35.jpg)
Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Medical diagnosis: From symptoms to illnesses Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc ...
11
![Page 36: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/36.jpg)
12
Training examples of a person
Test images
ORL dataset, AT&T Laboratories, Cambridge UK
![Page 37: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/37.jpg)
Example: Price of a used car x : car attributes
y : price y = g (x | ) g ( ) model, parameters
13
y = wx+w0
![Page 38: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/38.jpg)
Navigating a car: Angle of the steering wheel Kinematics of a robot arm
14
α1= g1(x,y) α2= g2(x,y)
α1
α2
(x,y)
Response surface design
![Page 39: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/39.jpg)
Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud
15
![Page 40: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/40.jpg)
Learning “what normally happens” No output Clustering: Grouping similar instances Example applications
Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs (sequence of amino acid) Document clustering
16
![Page 41: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/41.jpg)
Given: a set (sample) of data (observations). Task: build a model of the process that generated the data. This time however, we’re not trying to learn about the
relationship between inputs and outputs, we just want to find (and/or take advantage of) structure in the data.
Sometimes called descriptive (rather than predictive) modeling. Problems that can be framed as unsupervised learning: dimensionality reduction, compression, probability density estimation.
17
![Page 42: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/42.jpg)
Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability, ...
18
![Page 43: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/43.jpg)
UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/
19
![Page 44: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/44.jpg)
Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association ...
20
![Page 45: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/45.jpg)
International Conference on Machine Learning (ICML) European Conference on Machine Learning (ECML) Neural Information Processing Systems (NIPS) Uncertainty in Artificial Intelligence (UAI) Computational Learning Theory (COLT) International Conference on Artificial Neural Networks (ICANN) International Conference on AI & Statistics (AISTATS) International Conference on Pattern Recognition (ICPR) ...
21
![Page 46: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/46.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 47: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/47.jpg)
Class C of a “family car” Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car?
Output: Positive (+) and negative (–) examples
Input representation: x1: price, x2 : engine power
2
![Page 48: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/48.jpg)
Nt
tt ,r 1}{xX
negative is if positive is if
xx
01
r
3
2
1
xx
x
![Page 49: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/49.jpg)
2121 power engine AND price eepp
4
![Page 50: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/50.jpg)
negative is says if positive is says if
)(xx
xhh
h01
N
t
tt rhhE11 x)|( X
5
Error of h on H
![Page 51: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/51.jpg)
6
most specific hypothesis, S
most general hypothesis, G
h H, between S and G is consistent and make up the version space (Mitchell, 1997)
![Page 52: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/52.jpg)
Choose h with largest margin
7
![Page 53: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/53.jpg)
N points can be labeled in 2N ways as +/– H shatters N if there
exists h H consistent for any of these: VC(H ) = N
8
An axis-aligned rectangle shatters 4 points only !
![Page 54: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/54.jpg)
How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ?
(Blumer et al., 1989)
Each strip is at most ε/4 Pr that we miss a strip 1‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4)N
Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)
9
![Page 55: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/55.jpg)
Use the simpler one becauseSimpler to use
(lower computational complexity)
Easier to train (lower space complexity)
Easier to explain (more interpretable)
Generalizes better (lower variance - Occam’s razor)
10
![Page 56: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/56.jpg)
Nt
tt ,r 1}{xX
, i f i f
ijr
jt
it
ti C
Cxx
01
, i f i f
ijh
jt
it
ti C
Cxx
x01
11
Train hypotheses hi(x), i =1,...,K:
![Page 57: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/57.jpg)
01 wxwxg
012
2 wxwxwxg
12
N
t
tt xgrN
gE1
21X|
N
t
tt wxwrN
wwE1
20101
1X|,
tt
t
Nt
tt
xfr
r
rx 1,X
![Page 58: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/58.jpg)
Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f
13
![Page 59: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/59.jpg)
There is a trade-off between three factors (Dietterich, 2003):
1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E, on new data As N↑ E As c (H)↑ first E and then E ↑
14
![Page 60: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/60.jpg)
To estimate generalization error, we need data unseen during training. We split the data as
Training set (50%) Validation set (25%) Test (publication) set (25%)
Resampling when there is few data
15
![Page 61: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/61.jpg)
1. Model:
2. Loss function:
3. Optimization procedure:
|xg
t
tt grLE |,| xX
16
X|min arg* E
![Page 62: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/62.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 63: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/63.jpg)
The world →unknown process →data.
Because of our lack of knowledge about the process, we model it as a random process and use probability theory to analyse it.
Result of tossing a coin is ϵ {Heads,Tails} Random var D ϵ{1,0}
Bernoulli: P {D=1} = poD (1 ‒ po)(1 ‒ D)
Sample: D = {Dt } Nt =1
Estimation: po = # {Heads}/#{Tosses} = ∑t Dt / N
Prediction of next toss: Heads if po > ½, Tails otherwise
2
![Page 64: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/64.jpg)
Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk
Input: D = [D1,D2]T ,Output: h ϵ {0,1} Prediction:
otherwise 0 )|0()|1( if 1
choose
or otherwise 0
50)|1( if 1 choose
2121
21
hh
hh
,ddhP ,ddhP
. ,ddhP
3
![Page 65: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/65.jpg)
DpDpPDP hhh | |
1|1|000|11|
110
DPDpPDpPDpDp
PP
hhhhhh
hh
4
posterior
likelihood prior
evidence
![Page 66: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/66.jpg)
5 )()|(maxarg
)()()|(maxarg
)|(maxarg
hPhDPDP
hPhDP
DhPh
Hh
Hh
HhMAP
Generally want the most probable hypothesis given the training data Maximum a posteriori hypothesis hMAP :
DpDpPDP hhh | |
![Page 67: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/67.jpg)
6
If all hypotheses are equally probable a priori:
jij hhhPP ,,)(ih
then, hMAP reduces to:
)|(maxarg hDPhHh
ML
Maximum Likelihood hypothesis
![Page 68: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/68.jpg)
7
Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98%, of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the disease is not present. Furthermore, .008 of the entire population have this cancer. P(cancer)=0.008, P(+|cancer)=0.98, P(+|~cancer)=0.03, P(~cancer)=0.992, P(-|cancer)=0.02, P(-|~cancer)=0.97 How does P(cancer|+) compare to P(~cancer|+)? What is hMAP ?
![Page 69: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/69.jpg)
8
P(cancer|+) = P(+|cancer) P(cancer) / P(+) = (0.98)(0.008) / P(+) = 0.0078 / P(+) P(~cancer|+) = P(+|~cancer) P(~cancer) / P(+) = (0.03)(0.992) / P(+) = 0.0298 / P(+)
hMAP= ~cancer
![Page 70: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/70.jpg)
9
What is the most probable hypothesis given the training data, vs.
What is the most probable classification?
Example: P(h1|D)= 0.4 P(-|h1)=0 P(+|h1)=1 P(h2|D)= 0.3 P(-|h2)=1 P(+|h2)=0 P(h3|D)= 0.3 P(-|h3)=1 P(+|h3)=0 - New instance x is classified as - or + ? hMAP= h1 + But P(-|x)= .6 -
![Page 71: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/71.jpg)
10
If a new instance can take classification vj ϵ V , then the probability P(vj |D) of correct classification of new instance being vj is:
Hihiijj DhPhvPDvP )|()|()|(
Thus, the optimal classification is:
Hihiij
VjvDhPhvP )|()|(maxarg
![Page 72: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/72.jpg)
11
Example: P(h1|D)= 0.4 P(-|h1)=0 P(+|h1)=1 P(h2|D)= 0.3 P(-|h2)=1 P(+|h2)=0 P(h3|D)= 0.3 P(-|h3)=1 P(+|h3)=0
6.0)|()|(
4.0)|()|(
i
i
hii
hii
DhPhP
DhPhPx is classified as -
![Page 73: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/73.jpg)
12
Given attribute values <a1, a2, ..., an>, give the classification vϵV :
),,|(maxarg 1 njVjv
MAP aavPv
)()|,,(maxarg),,(
)()|,,(maxarg
1
1
1
jjnVjv
n
jjn
VjvMAP
vPvaaPaaP
vPvaaPv
Want to estimate P(a1, a2, ..., an|vj ) and P(vj ) from training data.
![Page 74: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/74.jpg)
13
• P(vj ) is easy to calculate: Just count the frequency. • The naive Bayes classifier simply assumes that the
attribute values are conditionally independent given the target value, thus
n
ijij
VvNB vaPvP
jv
1
)|()(maxarg
![Page 75: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/75.jpg)
14
Naive Bayes Learn(examples) For each target value vj Pe(vj ) estimate P(vj ) For each attribute value ai of each attribute a Pe(ai|vj ) estimate P(ai|vj ) Classify New Instance(x)
n
ijieje
VvNB vxPvP
jv
1
)|()(maxarg
![Page 76: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/76.jpg)
15
Day Outlook Temperature Humidity Wind Play
Tennis
Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
![Page 77: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/77.jpg)
16
Consider Play Tennis again, and new instance: x = <Outlk = sunny, Temp = cool, Humid = high, Wind = stron> V = {Yes,No}
noxPlayTennisanswernostrongPnohighPnocoolPnosunnyPnoP
yesstrongPyeshighPyescoolPyessunnyPyesP
vaPvPvi
kiknoyesv
NBk
)(:1)|()|()|()|()(
005.0)|()|()|()|()(
)|()(maxarg],[
0.02
9/14 3/9
3/5
![Page 78: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/78.jpg)
In above example: P(Wind = strong| Play Tennis = no) = nc /n = 3/5 If nc ≈ 0 then P=1/k , k is the number of possible value for the attribute
K= 2 for wind attribute ( weak, strong) p=.5 m is a constant ( equivalent sample size)
17
0)|(i
ki vaP
mnmpnc m-estimate of probability
![Page 79: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/79.jpg)
General ways of measuring performance/error Actions: αi (e.g. assign class Ci to some input) Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973) (for taking action αi)
xx
xx
|min| if choose
||
kkii
k
K
kiki
RR
CPR1
18
![Page 80: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/80.jpg)
kiki
ik if if
10
x
x
xx
|
|
||
i
ikk
K
kkiki
CP
CP
CPR
1
1
19
For minimum risk, choose the most probable class
![Page 81: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/81.jpg)
101
10
otherwise
if if
,Kiki
ik
xxx
xx
|||
||
iik
ki
K
kkK
CPCPR
CPR
11
1
otherwise reject 1| and || if choose xxx ikii CPikCPCPC
20
![Page 82: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/82.jpg)
Kigi ,, , 1xxx kkii ggC max if choose
xxx kkii gg max|R
ii
i
i
i
CPCpCPR
g||
|
xx
xx
21
K decision regions R1,...,RK
(ok for 0/1 loss)
![Page 83: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/83.jpg)
Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:
otherwise if
choose2
1 0C
gC x
xx
||log
2
1
CPCP
22
![Page 84: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/84.jpg)
(given) Prob of state k given evidence x: P (Sk|x) (and) Utility of αi when state is k: Uik (then) Expected utility: i.e a rational choice –maximize expected utility (usually equivalent to minimizing expected risk).
xx
xx
| max| if Choose
||
jjii
kkiki
EUEUα
SPUEU
23
![Page 85: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/85.jpg)
Association rule: X Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation.
24
![Page 86: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/86.jpg)
Support (X Y):
Confidence (X Y): Lift (X Y):
25
customers and bought whocustomers
##, YXYXP
XYX
XPYXPXYP
bought whocustomers and bought whocustomers
|
##
)(,
)()|(
)()(,
YPXYP
YPXPYXP
![Page 87: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/87.jpg)
For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) is not frequent, none of its supersets can be frequent.Once we find the frequent k-item sets, we convert them to rules: X, Y Z, ...
and X Y, Z, ...
26
![Page 88: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/88.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 89: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/89.jpg)
X = { xt }t where xt ~ p (x) Our model is a specified probability distribution
Learning = estimating its parameters Parametric estimation: Assume a form for p (x |q ) and estimate q , its
sufficient statistics, using X e.g., N ( μ, σ2) where q = { μ, σ2} Density estimation (can then be used for
classification, etc.)
2
![Page 90: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/90.jpg)
Likelihood of given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ)
Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)
Why? Small numbers converts product to sum, removes exp(?)
Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X)
3
![Page 91: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/91.jpg)
Bernoulli: Two states, failure/success, x in {0,1} P (x) = po
x (1 – po ) (1 – x)
L (po|X) = log ∏t poxt (1 – po )
(1 – xt) MLE: po = ∑t x
t / N
Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pi
xi L(p1,p2,...,pK|X) = log ∏t ∏i pi
xit MLE: pi = ∑t xi
t / N
4
![Page 92: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/92.jpg)
2
2
2exp
2
1 x-xp
N
mxs
N
xm
t
t
t
t
2
2
p(x) = N ( μ, σ2)
MLE for μ and σ2:
5
μ σ
2
2
221 xxp exp
estimated values
True parameters
![Page 93: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/93.jpg)
6
Unknown parameter Estimator (of di = d (Xi) on sample Xi Bias: b (d) = E [d] – Variance: E [(d–E [d])2] Mean square error: r (d, ) = E [(d– )2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 + Variance
![Page 94: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/94.jpg)
7
• Bias: how much the expected value of the estimator varies from the correct
• Variance: variation around the expected value.
• Examples: • Sample mean, m, is an unbiased estimator of the true mean.
It’s also a consistent estimator, since Var(m) tends to zero as N tends to infinity.
• Sample variance (as it turns out) is a biased estimator of the
true variance.
![Page 95: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/95.jpg)
Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ
i.e an average over predictions using all values of θ, weighted by the probability of each θ value.
Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) This uses a priori
Maximum Likelihood (ML): θML = argmaxθ p(X|θ) This doesn’t have a prior. If prior is flat, MAP == ML.
Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ
8
![Page 96: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/96.jpg)
xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)
θML = m θMAP = θBayes’ =
220
2
220
20
11
1 ///
///|
Nm
NNE X
9
![Page 97: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/97.jpg)
iii
iii
CPCxpxg
CPCxpxg
log| loglyequivalentor
|
ii
iii
i
i
ii
CPxxg
xCxp
log log log
exp|
2
2
2
2
22
21
221
10
![Page 98: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/98.jpg)
11
Given the sample ML estimates are Discriminant becomes
Nt
tt ,rx 1}{X
x , i f
i f ijx
xr
jt
it
ti C
C01
t
ti
t
tii
t
i
t
ti
t
ti
t
it
ti
i r
rmxs
r
rxm
N
rCP
2
2 ˆ
ii
iii CP
smxsxg ˆ log log log 2
2
22
21
![Page 99: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/99.jpg)
12
Equal variances
Single boundary at halfway between means
![Page 100: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/100.jpg)
13
Variances are different
Two boundaries
![Page 101: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/101.jpg)
2
20,N
,N
|~|
~
|:estimator
xgxrp
xgxfr
N
t
tN
t
tt
N
t
tt
xpxrp
rxp
11
1
log| log
, log|XL
14
We want to learn the parameters of our model via Max. Likelihood
![Page 102: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/102.jpg)
2
1
2
12
2
2
1
|21|
|2
12log
2|exp
21 log|
N
t
tt
N
t
tt
ttN
t
xgrE
xgrN
xgr
X
XL
15
Important: so maximizing L is equivalent to minimizing MSE (assuming Gaussian noise)
![Page 103: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/103.jpg)
0101 wxwwwxg tt ,|
t
t
t
tt
t
t
t
t
t
t
xwxwxr
xwNwr
210
10
t
t
tt
t
t
t
t
tt
t
xr
r
ww
xx
xNyw
1
02A
yw 1A
16
![Page 104: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/104.jpg)
012
2012 wxwxwxwwwwwxg ttktkk
t ,,,,|
NNNN
k
k
r
rr
xxx
xxxxxx
2
1
22
2222
1211
1
11
r D
rw TT DDD1
17
![Page 105: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/105.jpg)
2
121 N
t
tt xgrE ||X
18
Square Error: Relative Square Error: Absolute Error: E (θ |X) = ∑t
|rt – g(xt| θ)| ε-sensitive Error: E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)
2
1
2
1N
t
t
N
t
tt
rr
xgrE
||X
![Page 106: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/106.jpg)
19
222 ||| xgExgExgExrExxgxrEE XXXX
bias variance
222 xgxrExxrErExxgrE ||||
noise squared error
![Page 107: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/107.jpg)
ti
t i
tti
t
tt
xgM
xg
xgxgNM
g
xfxgN
g
1
1
1
2
22
Variance
Bias
20
M samples Xi={xti , rt
i}, i=1,...,M
are used to fit gi (x), i =1,...,M
![Page 108: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/108.jpg)
Example: gi(x)=2 has no variance and high bias
gi(x)= ∑t rt
i/N has lower bias with variance As we increase complexity,
bias decreases (a better fit to data) and variance increases (fit varies more with
data) Bias/Variance dilemma: (Geman et al., 1992)
21
![Page 109: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/109.jpg)
22
bias
variance
f
gi g
f
![Page 110: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/110.jpg)
23
Best fit “min error”
![Page 111: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/111.jpg)
24
Best fit, “elbow”
![Page 112: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/112.jpg)
Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models
E’=error on data + λ model complexity Akaike’s information criterion (AIC),
Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)
25
![Page 113: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/113.jpg)
datamodel model|datadata|model
pppp
26
Prior on models, p(model)
Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter 17)
![Page 114: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/114.jpg)
27
Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002] 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]
i i
N
t
tt wxgrEtionregulariza 22
121 ww ||: X
![Page 115: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/115.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 116: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/116.jpg)
Reduces time complexity: Less computation Reduces space complexity: Less parameters Saves the cost of observing the feature Simpler models are more robust on small datasets More interpretable; simpler explanation Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions
2
![Page 117: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/117.jpg)
Feature selection: Choosing k<d important features, ignoring the remaining d – k
Subset selection algorithms Feature extraction: Project the
original xi , i =1,...,d dimensions to new k<d dimensions, zj , j =1,...,k Principal components analysis (PCA), linear
discriminant analysis (LDA), factor analysis (FA)
3
![Page 118: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/118.jpg)
There are 2d subsets of d features Forward search: Add the best feature at each step
Set of features F initially Ø. At each iteration, find the best new feature j = argmini E ( F xi ) Add xj to F if E ( F xj ) < E ( F )
Hill-climbing O(d2) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l)
4
![Page 119: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/119.jpg)
Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = wTx Find w such that Var(z) is maximized
Var(z) = Var(wTx) = E[(wTx – wTμ)2] = E[(wTx – wTμ)(wTx – wTμ)] = E[wT(x – μ)(x – μ)Tw] = wT E[(x – μ)(x –μ)T]w = wT ∑ w where Var(x)= E[(x – μ)(x –μ)T] = ∑
5
![Page 120: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/120.jpg)
01max 1222222
wwwwwww
TTT
6
Maximize Var(z) subject to ||w||=1
∑w1 = αw1 that is, w1 is an eigenvector of ∑ Choose the one with the largest eigenvalue for Var(z) to
be max Second principal component: Max Var(z2), s.t., ||w2||=1 and orthogonal to w1
∑ w2 = α w2 that is, w2 is another eigenvector of ∑ and so on.
111111
wwwww
TTmax
![Page 121: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/121.jpg)
z = WT(x – m)
where the columns of W are the eigenvectors of ∑, and m is sample mean
Centers the data at the origin and rotates the axes
7
![Page 122: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/122.jpg)
dk
k
21
21
8
Proportion of Variance (PoV) explained
when λi are sorted in descending order
Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow”
![Page 123: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/123.jpg)
9
![Page 124: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/124.jpg)
10
![Page 125: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/125.jpg)
Find a small number of factors z, which when combined generate x :
xi – μi = vi1z1 + vi2z2 + ... + vikzk + εi
where zj, j =1,...,k are the latent factors with E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i ≠ j ,
εi are the noise sources E[ εi ]= ψi, Cov(εi , εj) =0, i ≠ j, Cov(εi , zj) =0 , and vij are the factor loadings
11
![Page 126: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/126.jpg)
PCA From x to z z = WT(x – μ) FA From z to x x – μ = Vz + ε
12
x z
z x
![Page 127: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/127.jpg)
In FA, factors zj are stretched, rotated and translated to generate x
13
![Page 128: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/128.jpg)
Find a low-dimensional space such that when x is projected, classes are well-separated. Find w that maximizes
tttT
tt
tttT
rmsr
rm
ssmmJ
21
211
22
21
221
xwxw
w
14
![Page 129: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/129.jpg)
TBB
T
TT
TTmm
2121
2121
221
221
mmmmww
wmmmmw
mwmw
SS where
15
Between-class scatter: Within-class scatter:
2121
21
111
111
21
21
SSSS
S
S
WWT
tt
Ttt
Ttt
TttT
tt
tT
ss
r
r
rms
where
where
ww
mxmx
wwwmxmxw
xw
![Page 130: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/130.jpg)
211 mmw Wc S
16
Find w that max LDA soln: Parametric soln:
wwmmw
wwwww
WT
T
WT
BT
JSS
S2
21
,~| when iiCp μμμ 21
1
Nxw
![Page 131: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/131.jpg)
Ti
ti
tt
tii
K
iiW r mxmxSSS
1
17
Within-class scatter: Between-class scatter: Find W that max
K
ii
K
i
TiiiB K
N11
1 mmmmmm S
WSW
WSWW
WT
BT
J The largest eigenvectors of SW-1SB
Maximum rank of K-1
![Page 132: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/132.jpg)
18
![Page 133: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/133.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 134: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/134.jpg)
2
Learn to approximate discrete-valued target functions. Step-by-step decision making: It can learn disjunctive
expressions: Hypothesis space is completely expressive, avoiding problems with restricted hypothesis spaces.
Inductive bias: small trees over large trees.
![Page 135: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/135.jpg)
Each instance holds attribute values. Instances are classified by filtering the attribute values down the decision tree, down to a leaf which gives the final answer.Internal nodes: attribute names or attribute values. Branching occurs at attribute nodes.
3
![Page 136: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/136.jpg)
4
![Page 137: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/137.jpg)
Internal decision nodes Univariate: Uses a single attribute, xi
Numeric xi : Binary split : xi > wm Discrete xi : n-way split for n possible values
Multivariate: Uses all attributes, x
Leaves Classification: Class labels, or proportions Regression: Numeric; r average, or local fit
Learning is greedy; find the best split recursively (Breiman et al, 1984; Quinlan, 1986, 1993)
5
![Page 138: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/138.jpg)
6
• Each path from root to leaf is a conjunctions of constraints on the attribute values. (Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook = Overcast) ∨ (Outlook = Rain ∧ Wind = Weak)
![Page 139: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/139.jpg)
Good at classification problems where:
Instances are represented by attribute-value pairs. The target function has discrete output values. Disjunctive descriptions may be required. The training data may contain errors. The training data may contain missing attribute values.
7
![Page 140: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/140.jpg)
Given a set of examples (training set), both positive and negative, the task is to construct a decision tree that describes a concise decision path. Using the resulting decision tree, we want to classify new instances of examples (either as yes or no).
8
![Page 141: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/141.jpg)
A trivial solution is to explicitly construct paths for each given example. In this case, you will get a tree where the number of leaves is the same as the number of training examples. The problem with this approach is that it is not able to deal with situations where, some attribute values are missing or new kinds of situations arise. Consider that some attributes may not count much toward thefinal classification.
9
![Page 142: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/142.jpg)
Memorizing all cases may not be the best way. We want to extract a decision pattern that can describe a large number of cases in a concise way. In terms of a decision tree, we want to make as few tests as possible before reaching a decision, i.e. the depth of the tree should be shallow.
10
![Page 143: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/143.jpg)
Basic idea: pick up attributes that can clearly separate positive and negative cases. These attributes are more important than others: the final classification heavily depend on the value of these attributes.
11
![Page 144: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/144.jpg)
Main loop: 1. A the “best” decision attribute for next node 2. Assign A as decision attribute for node 3. For each value of A, create new descendant of node 4. Sort training examples to leaf nodes 5. If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes
12
![Page 145: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/145.jpg)
13
A1 or A2? • With initial and final number of positive and negative examples based on the attribute just tested, we want to decide which attribute is better. • How to quantitatively measure which one is better?
![Page 146: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/146.jpg)
Use Shannon’s information theory to choose the attribute that give the maximum information gain.
Pick an attribute such that the information gain (or entropy reduction) is maximized.
Entropy measures the average surprisal of events. Less probable events are more surprising.
14
![Page 147: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/147.jpg)
Given two events, H and T (Head and Tail): Rare (uncertain) events give more surprise:
H more surprising than T if P(H) < P(T) H more uncertain than T if P(H) < P(T) How to represent “more surprising”, or “more uncertain”?
Surprise(H) > Surprise(T) if P(H) < P(T) 1/ P(H) > 1/ P(T) log(1/ P(H))> log(1/ P(T)) - log(P(H)) > - log(P(T))
15
![Page 148: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/148.jpg)
16
• S is a sample of training examples • p+ is the proportion of positive examples in S • p- is the proportion of negative examples in S • Entropy measures the average uncertainty in S Entropy(S) ≡ −p+ log2 p+ − p- log2 p-
![Page 149: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/149.jpg)
By performing some query, if you go from state S1 with entropy E(S1) to state S2 with entropy E(S2), where E(S1) > E(S2), your uncertainty has decreased. The amount by which uncertainty decreased, i.e., E(S1) − E(S2), can be thought of as information you gained (information gain) through getting answers to your query.
17
![Page 150: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/150.jpg)
18
Ciii ppSEntropy )(log)( 2
)(||||)(),(
)(v
Avaluesv
v SEntropySSSEntropyASGain
• C: categories (classifications) • S: set of examples • A: a single attribute • Sv: set of examples where attribute A = v.
![Page 151: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/151.jpg)
19
Day Outlook Temperature Humidity Wind Play
Tennis
Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
• Which attribute to test first?
![Page 152: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/152.jpg)
Which attribute is the best classifier?
20
• +: # of positive examples; −: # of negative examples • Initial entropy = − (9/14) log (9/14) – (5/14) log (5/14) = 0.94. • You can calculate the rest. • Note: 0.0 × log 0.0 ≡ 0.0 even though log 0.0 is not defined.
![Page 153: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/153.jpg)
tt
m
ttt
mm
tmt m
t
mm
mm
brb
gbgrN
E
mb
xx
x
xxx
otherwise node reaches :i f
21
01 X
21
Error at node m: After splitting:
tt
mj
ttt
mjmjj
tmjt mj
t
mm
mjmj
brb
gbgrN
E
jmb
xx
x
xxx
'
otherwise branch and node reaches :i f
21
01 X
![Page 154: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/154.jpg)
22
Model Selection in Trees
![Page 155: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/155.jpg)
Remove subtrees for better generalization (decrease variance)
Prepruning: Early stopping Postpruning: Grow the whole tree then prune subtrees which overfit on the pruning set
Prepruning is faster, postpruning is more accurate (requires a separate pruning set)
23
![Page 156: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/156.jpg)
24
C4.5Rules (Quinlan, 1993)
![Page 157: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/157.jpg)
Rule induction is similar to tree induction but tree induction is breadth-first, rule induction is depth-first; one rule at a time
Rule set contains rules; rules are conjunctions of terms Rule covers an example if all terms of the rule evaluate to true for the example Sequential covering: Generate rules one at a time until all positive examples are covered IREP (Fürnkrantz and Widmer, 1994), Ripper (Cohen, 1995)
25
![Page 158: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/158.jpg)
26
![Page 159: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/159.jpg)
27
![Page 160: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/160.jpg)
28
![Page 161: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/161.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 162: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/162.jpg)
Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010
Large connectitivity: 105
Parallel processing Distributed computation/memory Robust to noise, failures
2
![Page 163: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/163.jpg)
Levels of analysis (Marr, 1982) 1. Computational theory 2. Representation and algorithm 3. Hardware implementation Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD
Neural net: SIMD with modifiable local memory
Learning: Update by training/experience
3
![Page 164: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/164.jpg)
Neuron switching time ~.001 second (1 ms) Number of neurons ~1010
Connections per neuron ~104−5
Scene recognition time ~.1 second (100 ms) 100 processing steps doesn’t seem like enough
[ ] much parallel computation
4
![Page 165: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/165.jpg)
Many neuron-like threshold switching units (real-valued) Many weighted interconnections among units Highly parallel, distributed process Emphasis on tuning weights automatically: New learning algorithms, new optimization techniques, new learning principles.
5
![Page 166: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/166.jpg)
Input is high-dimensional discrete or real-valued (e.g. raw sensor input) Output is discrete or real valued Output is a vector of values Possibly noisy data Long training time (may need occasional, extensive retraining) Form of target function is unknown Fast evaluation of learned target function Human readability of result is unimportant
6
![Page 167: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/167.jpg)
Speech synthesis Handwritten character recognition Financial prediction, Transaction fraud detection Driving a car on the highway
7
![Page 168: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/168.jpg)
8
otherwisexwif
xo1
0.1)(
Sometimes we’ll use simpler vector notation:
otherwisexwxwwif
xxo nnn 1
0...1),...,( 110
1
![Page 169: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/169.jpg)
9
The tunable parameters are the weights w0,w1, ...,wn, so the space H of candidate hypotheses is the set of all possible combination of real-valued weight vectors:
}|{ )1(nRwwH
![Page 170: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/170.jpg)
10
Perceptrons can represent basic Boolean functions. Thus, a network of perceptron units can compute any Boolean function.
What about XOR or EQUIV?
![Page 171: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/171.jpg)
11
Perceptrons can only represent linearly separable functions. • Output of the perceptron:
1isoutputthen,01isoutputthen,0
1100
1100
tIWIWtIWIW
The hypothesis space is a collection of separating lines.
![Page 172: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/172.jpg)
12
Rearranging: 1isoutputthen,01100 tIWIW
We get (if W1>0) ,
10
1
01 W
tIWWI
where points above the line, the output is 1, and -1 for those below the line. Compare with ,
11
0
Wtx
WWy
![Page 173: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/173.jpg)
13
Without the bias (t = 0), learning is limited to adjustment of the slope of the separating line passing through the origin. Three example lines with different weights are shown.
![Page 174: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/174.jpg)
14
Only functions where the -1 points and 1 points are clearly separable can be represented by perceptrons.
The geometric interpretation is generalizable to functions of n arguments, i.e. perceptron with n inputs plus one threshold (or bias) unit.
![Page 175: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/175.jpg)
15
For functions that take integer or real values as arguments and output either -1 or 1. Left: linearly separable (i.e., can draw a straight line between the classes). Right: not linearly separable (i.e., perceptrons cannot represent such a function)
![Page 176: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/176.jpg)
16
Perceptrons cannot represent XOR!
![Page 177: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/177.jpg)
17
:1isoutputthen,01100 tIWIW1 -t ≤ 0 t ≥ 0 2 W1 - t > 0 W1 > t 3 W0 - t > 0 W2 > t 4 W0 + W1 – t ≤ 0 W0 + W1 ≤ t
2t < W0 + W1 < t (from 2,3 and 4), but t ≥ 0 (from 1), a contradiction.
![Page 178: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/178.jpg)
The weights do not have to be calculated manually. We can train the network with (input,output) pair according to the following weight update rule:
wi wi +η ( t – o ) xi
where η is the learning rate parameter. Proven to converge if input set is linearly separable and η is small.
18
![Page 179: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/179.jpg)
wi wi +η ( t – o ) xi
When t = o, weight stays. When t = 1 and o = −1, change in weight is:
η ( 1 – (-1) ) xi > 0 if xi are all positive. Thus will increase, thus eventually, output o will turn to 1.
When t = -1 and o = 1, change in weight is: η ( 1 – 1 ) xi < 0 if xi are all positive. Thus will increase, thus eventually, output o will turn to -1. 19
xw.
xw.
![Page 180: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/180.jpg)
The perceptron rule cannot deal with noisy data. The delta rule will find an approximate solution even when input set is not linearly separable. Use linear unit without the step function: Want to reduce the error by adjusting
20
xwxo .)(w
Dddd otwE 2)(
21)(
![Page 181: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/181.jpg)
Want to minimize by adjusting Note: the error surface is defined by the training data D. A different data set will give a different surface. E(w0,w1) is the error function above, and we want to change (w0,w1) to position under a low E.
21
Dd dd otwEw 2)(21)(:
![Page 182: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/182.jpg)
Gradient Training rule:
22
nwE
wE
wEwE ,...,][
10
][wEw
ii w
Ew
![Page 183: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/183.jpg)
23
ddidd
i
ddd
idd
ddd
idd
ddd
i
ddd
ii
xotwE
xwtw
ot
otw
ot
otw
otww
E
))((
).()(
)()(221
)(21
)(21
,
2
2
Since we want d diddii
i xotwwEw ,)(,
![Page 184: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/184.jpg)
Gradient-Descent (training_examples, η ) Each training example is a pair of the form < , t>, where is the vector of input values, and t is the target output value. η is the learning rate (e.g., .05).
Initialize each wi to some small random value Until the termination condition is met, Do – Initialize each wi to zero. – for each < , t> in training_examples, Do * Input the instance to the unit and compute o * for each linear unit weight wi , Do Δwi Δwi + η (t – o ) xi
– for each linear unit weight wi , Do wi wi + Δwi
24
![Page 185: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/185.jpg)
Gradient descent is effective in searching through a large or infinite H:
H contains continuously parameterized hypotheses, and the error can be differentiated w.r.t the parameters.
Limitations: convergence can be slow, and finds local minima (global minimum not guaranteed).
25
![Page 186: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/186.jpg)
Avoiding local minima: Incremental gradient descent, or stochastic gradient descent.
Instead of weight update based on all input in D, immediately update weights after each input example:
Δwi = η ( t – o ) xi,Instead of
Can be seen as minimizing error function
26
,)(Dd
iddi xotw
.)(21)( 2
ddd otwE
![Page 187: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/187.jpg)
In the standard version, error is defined over entire D. In the standard version, more computation is needed per weight update, but η can be larger. Stochastic version can sometimes avoid local minima.
27
![Page 188: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/188.jpg)
Perceptron training rule guaranteed to succeed if Training examples are linearly separable Sufficiently small learning rate η
Linear unit training rule using gradient descent Asymptotic convergence to hypothesis with minimum squared error Given sufficiently small learning rate η Even when training data contains noise Even when training data not separable by H
28
![Page 189: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/189.jpg)
Differentiable threshold unit: sigmoid Interesting property: Output: Other function:
29
)exp(11)(
yy
))(1)(()( yydy
yd
).( xwo
1)2exp(1)2exp()tanh(
yyy
![Page 190: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/190.jpg)
30
Nonlinear decision surface
Another example: XOR
![Page 191: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/191.jpg)
31
Dddidddd
i
ddi
d dd
Dd ddii
xoootwE
otw
ot
otww
E
,
2
)1(
.
.
.
)()(221
)(21
![Page 192: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/192.jpg)
Initialize all weights to small random numbers. Until satisfied, Do
For each training example, Do 1. Input the training example to the network and compute
the network outputs 2. For each output unit k 3. For each hidden unit h
4. Update each network weight wi,j
Note: wji is the weight from i to j 32
))(1( kkkkk otoo
outputsk kkhhhh woo )1(
ijjijijiji xwwherewww
![Page 193: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/193.jpg)
For output unit For hidden unit In sum, is the derivate times the error.
33
Error
kk
net
kkk otook
)()1()('
erroratedBackpropag
outputskkkh
net
hhh wooh )('
)1(
![Page 194: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/194.jpg)
Different formula for output and hidden For output unit For hidden unit
34
ji
dji w
Ew
inputi
neterror
jjjjji
d xoootwE
j )('
)1()(
inputi
error
jDownstreamkkjk
net
jjji
d xwoowE
j
)()('
)1(
![Page 195: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/195.jpg)
Gradient descent over entire network weight vector. Easily generalized to arbitrary directed graphs. Will find a local, not necessarily global error minimum:
In practice, often works well (can run multiple times with different initial weights).
Often include weight momentum Minimizes error over training examples:
Will it generalize well to subsequent examples?
Training can take thousands of iterations slow! Using the network after training is very fast
35
)1()( ,,, nwxnw jijijji
![Page 196: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/196.jpg)
Boolean functions: every Boolean function representable with two layers (hidden unit size can grow exponentially in the worst case:
one hidden unit per input example, and “OR” them).
Continuous functions: Every bounded continuous function can be approximated with an arbitrarily small error (output units are linear). Arbitrary functions: with three layers (output units are linear).
36
![Page 197: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/197.jpg)
H-space = n-D weight space (when there are n weights). The space is continuous, unlike decision tree or
general-to-specific concept learning algorithms.
Inductive bias: Smooth interpolation between data points.
37
![Page 198: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/198.jpg)
38
![Page 199: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/199.jpg)
39
![Page 200: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/200.jpg)
Learned encoding is similar to standard 3-bit binary code. Automatic discovery of useful hidden layer representations is a key feature of ANN. Note: The hidden layer representation is compressed.
40
![Page 201: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/201.jpg)
Error in two different robot perception tasks. Training set and validation set error. Early stopping ensures good performance on unobserved samples, but must be careful. Weight decay, use of validation sets, use of k-fold cross-validation, etc. to overcome the problem.
41
![Page 202: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/202.jpg)
Applications:
Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association
Network architectures Time-delay networks (Waibel et al., 1989) Recurrent networks (Rumelhart et al., 1986)
42
![Page 203: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/203.jpg)
43
![Page 204: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/204.jpg)
44
![Page 205: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/205.jpg)
45
![Page 206: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/206.jpg)
46 (Le Cun et al, 1989)
![Page 207: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/207.jpg)
47
![Page 208: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/208.jpg)
Destructive Weight decay:
48
ii
ii
i
wEE
wwEw
2
2'
Constructive Growing networks
(Ash, 1989) (Fahlman and Lebiere, 1989)
![Page 209: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/209.jpg)
Consider weights wi as random vars, prior p(wi) Weight decay, ridge regression, regularization
cost=data-misfit + λ complexity 49
2
2
212
w
w
www
ww
wwww
EE
wcwpwpp
Cppp
pp
ppp
ii
ii
MAP
'
)/(
ˆ
exp where
log| log| log
| log max arg ||
XX
XX
XX
![Page 210: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/210.jpg)
50
![Page 211: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/211.jpg)
ANN learning provides general method for learning real-valued functions over continuous or discrete-valued attributed. ANNs are robust to noise. H is the space of all functions parameterized by the weights. H space search is through gradient descent: convergence to local minima. Backpropagation gives novel hidden layer representations. Overfitting is an issue. More advanced algorithms exist.
51
![Page 212: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/212.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 213: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/213.jpg)
Discriminant-based: No need to estimate densities first Define the discriminant in terms of support vectors The use of kernel functions, application-specific measures of similarity No need to represent instances as vectors Convex optimization problems with a unique solution
2
![Page 214: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/214.jpg)
3
1
1111
11
0
0
0
0
2
1
wr
rw
rw
wCC
rr
tTt
ttT
ttT
t
tt
ttt
xw
xwxw
wxx
x
as rewritten be can which for for
that such and find if if
where,X
(Cortes and Vapnik, 1995; Vapnik, 1995)
![Page 215: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/215.jpg)
4
Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ||w||=1, and to max margin
wxw 0wtT
twr tTt
,wxw 0
twr tTt ,121
02 xww to subject min
![Page 216: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/216.jpg)
5
![Page 217: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/217.jpg)
6
00
0
21
121
121
10
1
110
2
10
2
02
N
t
ttp
N
t
tttp
N
t
tN
t
tTtt
N
t
tTttp
tTt
rwL
rL
wr
wrL
twr
xww
xww
xww
xww
to subject min ,
![Page 218: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/218.jpg)
7
t and to subject
tr
rr
rwrL
tttt
tsTtst
t s
st
t
tT
t t
ttt
t
tttTTd
,002121
21
0
xx
ww
xwww
Most αt are 0 and only a small number have αt >0; they are the support vectors
![Page 219: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/219.jpg)
Not linearly separable Soft error New primal is
8
ttTt wxr 10w
t
t
ttt
tttTtt
tt
p wxrCL 121
02 ww
![Page 220: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/220.jpg)
9
![Page 221: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/221.jpg)
10
otherwise if
tt
tttt
hinge ryry
ryL1
10),(
![Page 222: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/222.jpg)
11
t
tttt
N
t s
sTtststd
tttTt
t
t
Nr
xxrrL
wr
N
,,
,,
100
21
00
21
1
0
2
t
to subject
to subject
1 - min
xw
w
n controls the fraction of support vectors
![Page 223: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/223.jpg)
Preprocess input x by basis functions z = φ(x) g(z)=wTz g(x)=wT φ(x)
The SVM solution
12
t
tttt
TtttT
t
ttt
t
ttt
Krg
rg
rr
xxx
xφxφxφwx
xφzw
,
![Page 224: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/224.jpg)
Polynomials of degree q:
13
qtTtK 1xxxx ,
T
T
xxxxxx
yxyxyyxxyxyx
yxyx
K
22
212121
22
22
21
2121212211
22211
2
2221
22211
1
,,,,,
,
x
yxyx
![Page 225: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/225.jpg)
Radial-basis functions:
14
2
2
2sK
tt
xxxx exp,
![Page 226: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/226.jpg)
Kernel “engineering” Defining good measures of similarity String kernels, graph kernels, image kernels, ... Empirical kernel map: Define a set of templates mi and score function s(x,mi)
(xt)=[s(xt,m1), s(xt,m2),..., s(xt,mM)] and K(x,xt)= (x)T (xt)
15
![Page 227: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/227.jpg)
Fixed kernel combination Adaptive kernel combination
Localized kernel combination
16
yxyxyxyx
yxyx
,,,,
,,
21
21
KKKK
cKK
i
tii
t
ttt s i
stii
stst
t
td
i
m
ii
Krg
KrrL
KK
xxx
xx
yxyx
,)(
,
,,
21
1
i
tii
t
tt Krg xxxx ,|)(
![Page 228: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/228.jpg)
1-vs-all Pairwise separation Error-Correcting Output Codes (section 17.5) Single multiclass optimization
17
02
21
00
1
2
ti
ttii
tTiz
tTz
i t
ti
K
ii
ziww
C
tt
to subject
min
,,xwxw
w
![Page 229: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/229.jpg)
Use a linear model (possibly kernelized) f(x)=wTx+w0 Use the є-sensitive error function
18
otherwisei f
, tt
tttt
frfr
frex
xx
0
t
ttC2
21 wmin
00
0
tt
ttT
tTt
rw
wr
,
xwxw
![Page 230: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/230.jpg)
19
![Page 231: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/231.jpg)
20
Polynomial Kernel Gaussian Kernel
![Page 232: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/232.jpg)
Consider a sphere with center a and radius R
21
t
tt
N
t s
sTtstst
t
sTttd
ttt
t
t
C
xxrrxxL
Ra
R
10
0
1
2
2
,
,
to subject
to subject
C min
x
![Page 233: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/233.jpg)
22
![Page 234: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/234.jpg)
Kernel PCA does PCA on the kernel matrix (equal to canonical PCA with a linear kernel) Kernel LDA
23
![Page 235: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/235.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 236: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/236.jpg)
How an autonomous agent that sense and act in the environment can learn to choose optimal actions to achieve its goals. Examples: mobile robot, optimization in process control, board games, etc. Ingredients: reward/penalty for each action, where the reinforcement signal can be significantly delayed. One approach: Q learning
2
![Page 237: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/237.jpg)
Terminology: State: state of the environment, obtained through sensors Action: alter the state Policy: choosing actions that achieve a particular goal, based on the current state. Goal: desired configuration (or state).
Desired policy: From any initial state, choose actions that maximize the reward accumulated over time by the agent.
3
![Page 238: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/238.jpg)
Agent has a state in an environment, takes an action and sometimes receives reward and the state changes Goal: learn to choose actions that maximize discounted, cumulative award: That is, we want to learn a policy : S A that maximizes the above, where S is the set of states, and A that of actions.
4
Agent
Environment
Action Reward State
...)/(2
)/(1
)/(0
221100 rarara SSS
.10...,22
10 whererrr
![Page 239: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/239.jpg)
5
Among K levers, choose the one that pays best Q(a): value of action a Reward is ra Set Q(a) = ra Choose a* if Q(a*)=maxa Q(a)
Rewards stochastic (keep an expected reward) aQaraQaQ tttt 11
![Page 240: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/240.jpg)
Deterministic vs. nondeterministic action outcomes. With or without prior knowledge about the effect of action on environmental state. Partially or fully known environmental state (e.g., Partially Observable Markov Decision Process [POMDP]).
6
![Page 241: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/241.jpg)
st : State of agent at time t at: Action taken at time t In st, action at is taken, clock ticks and reward rt+1 is received and state changes to st+1 Next state prob: P (st+1 | st , at ) Reward prob: p (rt+1 | st , at ) Initial state(s), goal state(s) Episode (trial) of actions from initial state to goal (Sutton and Barto, 1998; Kaelbling et al., 1996)
7
![Page 242: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/242.jpg)
Policy, Value of a policy, Finite-horizon: Infinite horizon:
8
tt sa : AS
tsV
T
iitTtttt rErrrEsV
121
rate discount the is 101
13
221
iit
itttt rErrrEsV
![Page 243: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/243.jpg)
9
1111
111
11
11
11
1
1
11
1
ttastttttt
ttttat
ts
ttttat
tta
iit
ita
iit
i
a
ttt
asQassPrEasQ
saasQsV
sVassPrEsV
sVrE
rrE
rE
ssVsV
tt
t
tt
t
t
t
,,,
,
,
,
**
**
**
*
*
max|
in of Valuemax
|max
max
max
max
max
Bellman’s equation
![Page 244: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/244.jpg)
10
• Immediate reward given only when entering the goal state G. • Given any initial state, we want to generate an action sequence to maximize V .
![Page 245: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/245.jpg)
Discount rate: = 0.9 Top middle: 100 + 0 + 0 + ... = 100 Top left: 0 + 100 + 0 + ... = 90 Bottom left: 0 + 0 + 100 + ... = 81 Note that these values are supposed to be obtained using the optimal policy .
11
r(s,a) V*(s) values
![Page 246: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/246.jpg)
Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is known There is no need for exploration Can be solved using dynamic programming Solve for Optimal policy
12
1111
ts
ttttat sVassPrEsVt
t
** ,|max
1111
ts
tttttta
t sVassPasrEstt
*,|,|max arg*
![Page 247: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/247.jpg)
13
![Page 248: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/248.jpg)
14
![Page 249: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/249.jpg)
Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is not known; model-free learning There is need for exploration to sample from
P (st+1 | st , at ) and p (rt+1 | st , at )
Use the reward received in the next time step to update the value of current state (action) The temporal difference between the value of the current action and the value discounted from the next state 15
![Page 250: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/250.jpg)
ε-greedy: With pr ε,choose one action at random uniformly; and choose the best action with pr 1-ε Probabilistic: Move smoothly from exploration/exploitation. Decrease ε Annealing
16
A
1bbsQ
asQsaP,exp
,exp|
A
1bTbsQ
TasQsaP/,exp
/,exp|
![Page 251: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/251.jpg)
Deterministic: single possible reward and next state
used as an update rule (backup) Starting at zero, Q values increase, never
decrease
17
11111
1
ttastttttt asQassPrEasQ
tt
,max,|, **
1111
ttattt asQrasQt
,max,
1111
ttattt asQrasQt
,ˆmax,ˆ
![Page 252: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/252.jpg)
Consider the value of action marked by ‘*’: If path A is seen first, Q(*)=0.9*max(0,81)=73 Then B is seen, Q(*)=0.9*max(100,81)=90
Or, If path B is seen first, Q(*)=0.9*max(100,0)=90 Then A is seen, Q(*)=0.9*max(100,81)=90
Q values increase but never decrease
18
γ=0.9
![Page 253: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/253.jpg)
When next states and rewards are nondeterministic (there is an opponent or randomness in the environment), we keep averages (expected values) instead as assignments Q-learning (Watkins and Dayan, 1992): Off-policy vs on-policy (Sarsa) Learning V (TD-learning: Sutton, 1988)
19
ttttattttt asQasQrasQasQt
,ˆ,ˆmax,ˆ,ˆ111
1
ttttt sVsVrsVsV 11
![Page 254: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/254.jpg)
20
![Page 255: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/255.jpg)
21
![Page 256: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/256.jpg)
Keep a record of previously visited states (actions)
22
asaseasQasQasQasQr
aseaass
ase
tttttt
tttttt
t
ttt
,,,,,,,
,,
111
1
1otherwise
and if
![Page 257: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/257.jpg)
23
![Page 258: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/258.jpg)
Tabular: Q (s , a) or V (s) stored in a table Regressor: Use a learner to estimate Q (s , a) or V (s)
24
zeros all with
yEligibilit
0θ1
111
111
2111
eee
eθ
θθ
θ
tttt
tttttt
tt
ttttttt
tttttt
asQasQasQr
asQasQasQrasQasQrE
t
t
,,,
,,,,,
![Page 259: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/259.jpg)
The agent does not know its state but receives an observation p(ot+1|st,at) which can be used to infer a belief about states Partially observable MDP
25
![Page 260: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/260.jpg)
Two doors, behind one of which there is a tiger p: prob that tiger is behind the left door R(aL)=-100p+80(1-p), R(aR)=90p-100(1-p) We can sense with a reward of R(aS)=-1 We have unreliable sensors
26
![Page 261: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/261.jpg)
If we sense oL , our belief in tiger’s position changes
27
1
1301007090
110090
1308070100
180100
1307070
)|()(
)(.)(
.)'('
)|(),()|(),()|()(
)(.)(
.)'('
)|(),()|(),()|()(..
.)(
)()|()|('
LS
LL
LRRRLLLRLR
LL
LRRLLLLLLL
L
LLLLL
oaRoP
poPp
ppozPzarozPzaroaR
oPp
oPp
ppozPzarozPzaroaR
ppp
oPzPzoPozPp
![Page 262: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/262.jpg)
28
)()()()(
max
)())|(),|(),|(max()())|(),|(),|(max(
)()|(max'
pppppppp
oPoaRoaRoaRoPoaRoaRoaR
oPoaRV
RRSRRRLLLSLRLL
jj
jii
1100901263314643180100
![Page 263: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/263.jpg)
29
![Page 264: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/264.jpg)
Let us say the tiger can move from one room to the other with prob 0.8
30
)'()'()'('
max'
)(..'
pppppp
V
ppp
11009012633180100
18020
![Page 265: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/265.jpg)
When planning for episodes of two, we can take aL, aR, or sense and wait:
31
1110090180100
2
'max)()(
maxV
pppp
V
![Page 266: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/266.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 267: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/267.jpg)
Parametric: Assume a single model for p (x | Ci) Semiparametric: p (x | Ci) is a mixture of densities
Multiple possible explanations/prototypes: Different handwriting styles, accents in speech
Nonparametric: No model; data speaks for itself
2
![Page 268: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/268.jpg)
where Gi the components/groups/clusters, P ( Gi ) mixture proportions (priors), p ( x | Gi) component densities Gaussian mixture where p(x|Gi) ~ N ( μi , ∑i ) parameters
Φ = {P ( Gi ), μi , ∑i }ki=1
unlabeled sample X={xt}t (unsupervised learning)
3
k
iii GPGpp
1|xx
![Page 269: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/269.jpg)
Supervised: X = { xt ,rt }t Classes Ci i=1,...,K
where p ( x | Ci) ~ N ( μi , ∑i )
Φ = {P (Ci ), μi , ∑i }Ki=1
4
K
iii Ppp
1CC|xx
tti
Ti
tt i
tti
i
tti
ttt
ii
tti
i
rr
rr
Nr
CP
mxmx
xm
S
ˆ
Unsupervised : X = { xt }t Clusters Gi i=1,...,k
where p ( x | Gi) ~ N ( μi , ∑i )
Φ = {P ( Gi ), μi , ∑i }ki=1
Labels, r ti ?
k
iii GPGpp
1|xx
![Page 270: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/270.jpg)
Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data Reference vectors, mj, j =1,...,k Use nearest (most similar) reference: Reconstruction error
5
jt
jit mxmx min
otherwisemini f
01
1
jt
jit
ti
t i itt
ikii
b
bE
mxmx
mxm X
![Page 271: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/271.jpg)
6
otherwise0minif1 j
t
jit
ti
mxmxb
![Page 272: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/272.jpg)
7
![Page 273: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/273.jpg)
8
![Page 274: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/274.jpg)
Log likelihood with a mixture model Assume hidden variables z, which when known, make optimization much simplerComplete likelihood, Lc(Φ |X,Z), in terms of x and z Incomplete likelihood, L(Φ |X), in terms of x
9
t
k
iii
t
t
t
GPGp
p
1|log
|log|
x
xXL
![Page 275: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/275.jpg)
Iterate the two steps 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ’ given z, X, and old Φ.
An increase in Q increases incomplete
likelihood
10
ll
lC
l E
|maxarg:step-M
|||:step-E
Q
X,ZX,LQ1
XLXL || ll 1
![Page 276: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/276.jpg)
zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of
supervised learning); assume p(x|Gi)~N(μi,∑i) E-step: M-step:
11
ti
lti
j jl
jt
il
it
lti
hGP
GPGpGPGpzE
,
,,X
x
xx
|
||,
tti
Tli
tt
li
ttil
i
tti
ttt
ili
tti
i
hh
hh
Nh
P
111
1
mxmx
xm
S
GUse estimated labels in place of unknown labels
![Page 277: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/277.jpg)
12
5.0)|( 11 hxGP
![Page 278: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/278.jpg)
Regularize clusters 1. Assume shared/diagonal covariance matrices 2. Use PCA/FA to decrease dimensionality: Mixtures of
PCA/FA
Can use EM to learn Vi (Ghahramani and Hinton, 1997;
Tipping and Bishop, 1999)
13
iTiiiit Gp ψmx VV,| N
![Page 279: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/279.jpg)
Dimensionality reduction methods find correlations between features and group features Clustering methods find similarities between instances and group instances Allows knowledge extraction through number of clusters, prior probabilities, cluster parameters, i.e., center, range of features.
Example: CRM, customer segmentation
14
![Page 280: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/280.jpg)
Estimated group labels hj (soft) or bj (hard) may be seen as the dimensions of a new k dimensional space, where we can then learn our discriminant or regressor. Local representation (only one bj is 1, all others are 0; only few hj are nonzero) vs
Distributed representation (After PCA; all zj are nonzero)
15
![Page 281: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/281.jpg)
In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:
16
K
iii
k
jijiji
Ppp
GPGppi
1
1
CC
C
|
||
xx
xx
![Page 282: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/282.jpg)
Cluster based on similarities/distances Distance measure between instances xr and xs
Minkowski (Lp) (Euclidean for p = 2) City-block distance
17
pd
j
psj
rj
srm xxd
/,
1
1xx
d
jsj
rj
srcb xxd
1xx ,
![Page 283: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/283.jpg)
Start with N groups each with one instance and merge two closest groups at each iteration Distance between two groups Gi and Gj:
Single-link:
Complete-link:
Average-link, centroid
18
srji dGGd
js
ir
xxxx
,min,, GG
srji dGGd
js
ir
xxxx
,max,, GG
![Page 284: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/284.jpg)
19
Dendrogram
![Page 285: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/285.jpg)
Defined by the application, e.g., image quantization Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) Manually check for meaning
20
![Page 286: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/286.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 287: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/287.jpg)
Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?
Training/validation/test sets Resampling methods: K-fold cross-validation
2
![Page 288: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/288.jpg)
Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability
Cost-sensitive learning
3
![Page 289: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/289.jpg)
4
![Page 290: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/290.jpg)
Response surface design for approximating and maximizing the response function in terms of the controllable factors 5
![Page 291: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/291.jpg)
A. Aim of the study B. Selection of the response variable C. Choice of factors and levels D. Choice of experimental design E. Performing the experiment F. Statistical Analysis of the Data G. Conclusions and Recommendations
6
![Page 292: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/292.jpg)
The need for multiple training/validation sets {Xi,Vi}i: Training/validation sets of fold i
K-fold cross-validation: Divide X into k, Xi,i=1,...,K Ti share K-2 parts
7
121
3122
3211
KKKK
K
K
XXXTXV
XXXTXV
XXXTXV
2
1
![Page 293: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/293.jpg)
5 times 2 fold cross-validation (Dietterich, 1998)
8
1510
2510
259
159
124
224
223
123
112
212
211
111
XVXT
XVXT
XVXT
XVXT
XVXT
XVXT
![Page 294: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/294.jpg)
Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws
that is, only 36.8% is new!
9
368011 1 .eN
N
![Page 295: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/295.jpg)
Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found
= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity
10
![Page 296: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/296.jpg)
11
![Page 297: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/297.jpg)
12
![Page 298: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/298.jpg)
13
![Page 299: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/299.jpg)
X = { xt }t where xt ~ N ( μ, σ2) m ~ N ( μ, σ2/N)
14
1
950961961
950961961
22 Nzm
NzmP
Nm
NmP
mNP
mN
//
...
...
~ Z
100(1- α) percent confidence interval
![Page 300: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/300.jpg)
When σ2 is not known:
15
1
950641
950641
NzmP
NmP
mNP
..
..
1
1
1212
122
NStm
NStmP
tSmNNmxS
NN
Nt
t
,/,/
~ /
![Page 301: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/301.jpg)
Reject a null hypothesis if not supported by the sample with enough confidence X = { xt }t where xt ~ N ( μ, σ2)
H0: μ = μ0 vs. H1: μ ≠ μ0 Accept H0 with level of significance α if μ0 is in the 100(1- α) confidence interval Two-sided test
16
220
// ,zzmN
![Page 302: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/302.jpg)
One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0 Accept if
Variance unknown: Use t, instead of z Accept H0: μ = μ0 if
17
zmN ,0
12120
NN ttS
mN,/,/ ,
![Page 303: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/303.jpg)
Single training/validation set: Binomial Test If error prob is p0, prob that there are e errors or less in
N validation trials is
18
Accept if this prob is less than 1- α
jNjje
j
ppjN
eXP 001
1
1- α N=100, e=20
![Page 304: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/304.jpg)
Number of errors X is approx N with mean Np0 and var Np0(1-p0)
19
Z~00
0
1 pNpNpX
Accept if this prob for X = e is less than z1-α
1- α
![Page 305: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/305.jpg)
Multiple training/validation sets xt
i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s2 average and var of pi , we accept p0 or less error if
is less than tα,K-1
20
10 ~ KtS
pmK
Nx
pN
tti
i1
![Page 306: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/306.jpg)
Single training/validation set: McNemar’s Test Under H0, we expect e01= e10=(e01+ e10)/2
21
21
1001
21001 1
X~ee
ee
Accept if < X2α,1
![Page 307: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/307.jpg)
Use K-fold cv to get K training/validation folds pi
1, pi2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi
2 : Paired difference on fold i The null hypothesis is whether pi has mean 0
22
12121
12
21
00
01
00
KKK
K
i iK
i i
ttts
mKsmK
Kmp
sK
pm
HH
,/,/ ,~
::
in if Accept
vs.
![Page 308: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/308.jpg)
Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998) pi
(j) : difference btw errors of 1 and 2 on fold j=1, 2 of replication i=1,...,5
23
55
12
11
2221221
5
2
ts
p
ppppsppp
i i
iiiiiiii
~/
/
Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5
![Page 309: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/309.jpg)
Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5
24
5105
12
5
1
2
1
2
2,~ F
s
p
i i
i jj
i
![Page 310: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/310.jpg)
Errors of L algorithms on K folds We construct two estimators to σ2 .
One is valid if H0 is true, the other is always valid. We reject H0 if the two estimators disagree.
25
LH 210 :
KiLjX jij ,..., ,,...,,,~ 112N
![Page 311: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/311.jpg)
26
212
0
2212
2
1
22
2
221
2
1
0
1
1
L
jjL
j
j
L
j
j
j jL
j j
K
i
ijj
SSbH
mmKSSbKmm
Lmm
K
SKL
mmS
L
mm
KKX
m
H
X
X
N
~
~/
ˆ
/,~
:
have wetrue, is whenSo
namely, , is of estimator an Thus
true is If
2
![Page 312: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/312.jpg)
27
11210
11
22
212
212
2
2
2
1
21
22
20
11
11
1
11
KLLL
KLL
KLKj
j ijij
j i
jijL
j
jK
i jijj
j
FH
FKLSSwLSSb
KLSSw
LSSb
SSwSK
mXSSw
KLmX
LS
KmX
S
S
H
,,
,
:
~/
////
~~
ˆ
if
: variances group of averagethe is to estimator second our of Regardless
2
2
XX
![Page 313: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/313.jpg)
28
If ANOVA rejects, we do pairwise posthoc tests
)(~
: vs :
1
10
2 KLw
ji
jiji
tmm
t
HH
![Page 314: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/314.jpg)
Comparing two algorithms: Sign test: Count how many times A beats B over N
datasets, and check if this could have been by chance if A and B did have the same error rate
Comparing multiple algorithms
Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error
If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference
29
![Page 315: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/315.jpg)
Instructor : Omid Sojoodi
Faculty of Electrical, Computer and IT Engineering Qazvin Azad University
Contact Info: o_sojoodi@{ieee.org, m.ieice.org}
![Page 316: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/316.jpg)
Based loosely on simulated evolution. Hypotheses: described in bit strings (subject to interpretation in specific domains). Search: population of hypotheses, refined through mutation and crossover to increase fitness. Applications: optimization problems, learning the topology and parameters in neural networks, and many more.
2
![Page 317: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/317.jpg)
Lamarck and others: Species “transmute” over time (inheritance of acquired trait)
Darwin and Wallace: Consistent, heritable variation among individuals in population Natural selection of the fittest
Mendel and genetics: A mechanism for inheriting traits Genotype phenotype mapping
3
![Page 318: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/318.jpg)
Mutation and crossover of hypotheses in the current population. Basically a generate-and-test beam search. Motivating factors:
Evolution is known to be successful. GAs can search hypotheses containing complex interacting parts. Easily parallelizable
4
![Page 319: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/319.jpg)
Population: set of current hypotheses Fitness: predefined measure of success Elements of GA:
fitness test selection reproduction (mutation, crossover)
5
![Page 320: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/320.jpg)
GA(Fitness, Fitness threshold, p, r, m) Initialize: P p random hypotheses Evaluate: for each h in P, compute Fitness(h) While [maxh Fitness(h)] < Fitness threshold
1. Select: Probabilistically select (1−r)p members of P to add to Ps.
2. Crossover: Probabilistically select pairs of hypotheses from P. For each pair, <h1, h2>, produce two offspring by applying the Crossover operator. Add all offspring to Ps. 3. Mutate: Invert a randomly selected bit in m. p random members of Ps. 4. Update: P Ps 5. Evaluate: for each h in P, compute Fitness(h)
Return the hypothesis from P that has the highest fitness.
6
![Page 321: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/321.jpg)
Represent (Outlook = Overcast _ Rain) ^ (Wind = Strong) by Outlook Wind 011 10 Represent IF Wind = Strong THEN PlayTennis = yes by Outlook Wind PlayTennis 111 10 10
7
![Page 322: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/322.jpg)
8
![Page 323: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/323.jpg)
Fitness proportionate selection:
Tournament selection: Pick h1, h2 at random with uniform prob. With probability p, select the more fit.
Rank selection: Sort all hypotheses by fitness Probability of selection is proportional to rank
9
![Page 324: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/324.jpg)
Learn disjunctive set of propositional rules, competitive with C4.5 Fitness: Fitness(h) = (percent_correct(h))2
Representation: IF a1 = T ^ a2 = F THEN c = T; IF a2 = T THEN c = F represented by a1 a2 c a1 a2 c 10 01 1 11 10 0 Genetic operators:
want variable length rule sets (as number of attributes can change) want only well-formed bitstring hypotheses
10
![Page 325: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/325.jpg)
Start with a1 a2 c a1 a2 c
h1 : 10 01 1 11 10 0 h2 : 01 11 0 10 01 0
1. choose crossover points for h1, e.g., after bits 1, 8 2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g., <1, 3>, <1, 8>, <6, 8>.
if we choose <1, 3>, result is a1 a2 c
h3 : 11 10 0 a1 a2 c a1 a2 c a1 a2 c
h4 : 00 01 1 11 11 0 10 01 0 11
![Page 326: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/326.jpg)
Add new genetic operators, also applied probabilistically: 1. AddAlternative: generalize constraint on ai by changing a 0 to 1 2. DropCondition: generalize constraint on ai by changing every 0 to 1
And, add new field to bitstring to determine whether to allow these
a1 a2 c a1 a2 c AA DC 01 11 0 10 01 0 1 0
So now the learning strategy also evolves. (Allowing this increased accuracy.)
12
![Page 327: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/327.jpg)
Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14 Average performance on a set of 12 synthetic problems:
GABIL without AA and DC operators: 92.1% accuracy GABIL with AA and DC operators: 95.2% accuracy symbolic learning methods ranged from 91.2 to 96.6
13
![Page 328: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/328.jpg)
How to characterize evolution of population in GA? Schema = string containing 0, 1, * (“don’t care”)
Typical schema: 10**0* Instances of above schema: 101101, 100000, ... An instance of length 4, say 0010, can have 24 matching
schemas.
Characterize population by number of instances representing each possible schema:
m(s, t) = number of instances of schema s in pop at time t Want to estimate m(s, t + 1) given m(s, t) and other factors.
14
![Page 329: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/329.jpg)
m(s, t) can change as t changes, due to the following factors:
Selection: if individuals representing s get selected more often, m(s, ·) will increase. Crossover Mutation
Schema theorem: gives E[m(s, t + 1)].
15
![Page 330: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/330.jpg)
= average fitness of pop. at time t m(s, t) = instances of schema s in pop at time t
= average fitness of instances of s at time t : instances of schema s in the population at time t
Probability of selecting h in one selection step
Mean fitness of instances of s at time t:
16
![Page 331: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/331.jpg)
Probability of selecting an instance of s in one step
Expected number of instances of s after n selections
17
![Page 332: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/332.jpg)
m(s, t) = instances of schema s in pop at time t = average fitness of pop. at time t
= ave. fitness of instances of s at time t pc = probability of single point crossover operator pm = probability of mutation operator l = length of single bit strings o(s) = number of defined (non “*”) bits in s d(s) = distance between leftmost, rightmost defined bits in s
18
![Page 333: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/333.jpg)
Population of programs represented by tree
19
![Page 334: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/334.jpg)
20
![Page 335: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/335.jpg)
Goal: spell UNIVERSAL Terminals:
CS (“current stack”) = name of the top block on stack, or F. TB (“top correct block”) = name of topmost correct block on stack NN (“next necessary”) = name of the next block needed above TB in the stack
21
![Page 336: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/336.jpg)
(MS x): (“move to stack”), if block x is on the table, moves x to the top of the stack and returns the value T. Otherwise, does nothing and returns the value F.
(MT x): (“move to table”), if block x is somewhere in the stack, moves the block at the top of the stack to the table and returns the value T. Otherwise, returns F.
(EQ x y): (“equal”), returns T if x equals y, and returns F otherwise. (NOT x): returns T if x = F, else returns F (DU x y): (“do until”) executes the expression x repeatedly until
expression y returns the value T
22
![Page 337: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/337.jpg)
Trained to fit 166 test problems Using population of 300 programs, found this after 10 generations: (EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) )
23
![Page 338: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/338.jpg)
Lamarck (19th century) Believed individual genetic makeup was altered by lifetime experience But current evidence contradicts this view
What is the impact of individual learning on population evolution?
24
![Page 339: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/339.jpg)
Assume Individual learning has no direct influence on individual DNA But ability to learn reduces need to “hard wire” traits in DNA
Then
Ability of individuals to learn will support more diverse gene pool – Because learning allows individuals with various “hard wired” traits to be successful
More diverse gene pool will support faster evolution of gene pool
individual learning (indirectly) increases rate of evolution
25
![Page 340: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/340.jpg)
Plausible example: 1. New predator appears in environment 2. Individuals who can learn (to avoid it) will be selected 3. Increase in learning individuals will support more diverse gene pool 4. resulting in faster evolution 5. possibly resulting in new non-learned traits such as instinctive fear of predator
26
![Page 341: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/341.jpg)
Evolve simple neural networks: Some network weights fixed during lifetime, others trainable Genetic makeup determines which are fixed, and their weight values
Results: With no individual learning, population failed to improve over time When individual learning allowed
Early generations: population contained many individuals with many trainable weights Later generations: higher fitness, while number of trainable weights decreased
27
![Page 342: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/342.jpg)
Coevolution: escalating effect or complementary dependence (insects and flowering plants) between two or more species. Cultural transmission: memes vs. genes.
28
![Page 343: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems](https://reader033.fdocuments.us/reader033/viewer/2022060918/60aab77f1af6562c253af9e7/html5/thumbnails/343.jpg)
Conduct randomized, parallel, hill-climbing search through H Approach learning as optimization problem (optimize fitness) Nice feature: evaluation of Fitness can be very indirect
consider learning rule set for multistep decision making no issue of assigning credit/blame to individual steps
29