Machine Learning...

Jeff Howbert Introduction to Machine Learning Winter 2014 1

Machine Learning

Math EssentialsPart 2

Most commonly used continuous probability distribution

Also known as the normal distribution

Two parameters define a Gaussian:– Mean μ location of center– Variance σ2 width of curve

Gaussian distribution

)2(1),|( σ

πσσμ

−−

In one dimension

)2(1),|( σ

πσσμ

−−

In one dimension

Normalizing constant: insures that distribution

integrates to 1

Controls width of curve

Causes pdf to decrease as distance from center

increases

μ = 0 σ2 = 1 μ = 2 σ2 = 1

μ = 0 σ2 = 5 μ = -2 σ2 = 0.3

21 xex−

)()(21

)2(1),|(

μxΣμx

ΣΣμx

−−− −

= eΝ dπ

Multivariate Gaussian distribution

In d dimensions

x and μ now d-dimensional vectors– μ gives center of distribution in d-dimensional space

σ2 replaced by Σ, the d x d covariance matrix– Σ contains pairwise covariances of every pair of features– Diagonal elements of Σ are variances σ2 of individual

features– Σ describes distribution’s shape and spread

Covariance– Measures tendency for two variables to deviate from

their means in same (or opposite) directions at same time

e high (positive)covariance

In two dimensions

⎥⎦

⎤⎢⎣

⎥⎦

⎤⎢⎣

13.03.025.0

In two dimensions

⎥⎦

⎤⎢⎣

26.06.02

Σ ⎥⎦

⎤⎢⎣

Σ ⎥⎦

⎤⎢⎣

In three dimensions

rng( 1 );mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10;

0.30 1.00 0.70;0.10 0.70 2.00] ;

x = randn( 1000, 3 );x = x * sigma;x = x + repmat( mu', 1000, 1 );scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );

Orthogonal projection of y onto x– Can take place in any space of dimensionality > 2– Unit vector in direction of x is

x / || x ||– Length of projection of y in

direction of x is|| y || ⋅ cos(θ )

– Orthogonal projection ofy onto x is the vector

projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)

Vector projection

projx( y )

There are many types of linear models in machine learning.– Common in both classification and regression.– A linear model consists of a vector w in d-dimensional

feature space.– The vector w attempts to capture the strongest gradient

(rate of change) in the output variable, as seen across all training samples.

– Different linear models optimize w in different ways.– A point x in feature space is mapped from d dimensions

to a scalar (1-dimensional) output z by projection onto w:

Linear models

dd xwxwwwz +++=⋅+= L1100 xw cf. Lecture 5bw0 ≡ αw ≡ β

There are many types of linear models in machine learning.– The projection output z is typically transformed to a final

predicted output y by some function ƒ:

example: for logistic regression, ƒ is logistic functionexample: for linear regression, ƒ( z ) = z

– Models are called linear because they are a linear function of the model vector components w1, …, wd.

– Key feature of all linear models: no matter what ƒ is, a constant value of z is transformed to a constant value of y, so decision boundaries remain linear even after transform.

Linear models

)()()( 1100 dd xwxwwfwfzfy +++=⋅+== Lxw

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

margin

From projection to prediction

positive margin → class 1

negative margin → class 0

Interpreting the model vector of coefficients

From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ]w0 = B( 1 ), w = [ w1 w2 ] = B( 2 : 3 )w0, w define location and orientationof decision boundary

– - w0 is distance of decisionboundary from origin

– decision boundary isperpendicular to w

magnitude of w defines gradientof probabilities between 0 and 1

Logistic regression in two dimensions

Logistic function in d dimensions

Decision boundary for logistic regression

Machine Learning...

Documents

Transcript of Machine Learning...

Learning Environments - Informing Scienceproceedings.informingscience.org/InSITE2010/InSITE... · Key Words: M-Learning, Collaborative Learning, Natural Learning, Learning Environments

UNIT V: LEARNING. LEARNING Learning from Observation Inductive Learning Decision Trees Explanation based Learning Statistical Learning methods Reinforcement.

Classification Basic Concepts, Decision Trees, and Model ...courses.washington.edu/css581/lecture_slides/04_classification_basi… · Jeff Howbert Introduction to Machine Learning

The Netflix Prize Contest - University of Washingtoncourses.washington.edu/css581/lecture_slides/09a_Netflix_Prize.pdf · more submission ! July 26, 18:18 GMT BPC Makes Their Final

VII. ANTIVIRALS A. Introductioncourses.washington.edu/medch401/pdf_text/401_05_antiviralsREV.pdf · A. Introduction Viruses are among the smallest micro-organisms varying in size

Unsupervised Learning Learning Unsupervised...Unsupervised Learning and Data Mining Learning Mining Unsupervised Learning and Data Mining Unsupervised Data Clustering Supervised Learning

Clustering Basic Concepts and Algorithms 2courses.washington.edu/css581/lecture_slides/12... · Clustering Basic Concepts and Algorithms 2. ... – Example in biological sciences

Bayesian Learning, Regression-based learning. Overview Bayesian Learning Full MAP learning Maximum Likelihood Learning Learning Bayesian Networks.

Learning and Knowledge Microlearning 2006. Business is learning Life is learning Education is learning Aging is learning What isnt learning???

EDFD 302 MGMSANTOS Learning Models. Learning Models: Mastery Learning Discovery Learning Guided Discovery Learning Meaningful Reception Learning.

Machine Learning Introduction - University of Washingtoncourses.washington.edu/css581/lecture_slides/01_intro.pdf · Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2006 zProgramming

Classification Discriminant Analysiscourses.washington.edu/css581/lecture_slides/06b_discriminant_ana… · Jeff Howbert Introduction to Machine Learning Winter 2014 1 Classification

E-LEARNING , B-LEARNING Y M-LEARNING

Learning from observations Inductive Learning - learning from examples Machine Learning.

Learning organizations, ideal organizations, learning, Single loop learning, Double – loop learning, Organizational Learning, Traditional organization,

Robotics Design Project Introductioncourses.washington.edu/engr100/Section_Wei/robotics/robot... · Robotics Design Project Introduction: ... (clockwise and counterclockwise) and

UW Courses Web Server - Introductioncourses.washington.edu/chinese2/smp/SMP_L15-20.pdf(Lessons 15-20) Introduction Introduction This packet has been designed as a supplement to Integrated

Learning Reinforcement Learning by Learning REINFORCE

Classification / Regression Neural Networks 2courses.washington.edu/css581/lecture_slides/15... · Classification / Regression Neural Networks 2. ... ALVINN is a perception system

B learning, m-learning, e-learning