Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
-
Upload
maximilian-mcbride -
Category
Documents
-
view
222 -
download
0
Transcript of Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
![Page 1: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/1.jpg)
Machine Learning
CUNY Graduate Center
Lecture 2: Math Primer
![Page 2: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/2.jpg)
2
Today
• Probability and Statistics– Naïve Bayes Classification
• Linear Algebra– Matrix Multiplication– Matrix Inversion
• Calculus– Vector Calculus– Optimization– Lagrange Multipliers
![Page 3: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/3.jpg)
3
Classical Artificial Intelligence
• Expert Systems• Theorem Provers• Shakey• Chess
• Largely characterized by determinism.
![Page 4: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/4.jpg)
4
Modern Artificial Intelligence
• Fingerprint ID• Internet Search• Vision – facial ID, object recognition• Speech Recognition• Asimo• Jeopardy!
• Statistical modeling to generalize from data.
![Page 5: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/5.jpg)
5
Two Caveats about Statistical Modeling
• Black Swans• The Long Tail
![Page 6: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/6.jpg)
6
Black Swans
• In the 17th Century, all known swans were white.• Based on evidence, it is impossible for a swan to
be anything other than white.
• In the 18th Century, black swans were discovered in Western Australia
• Black Swans are rare, sometimes unpredictable events, that have extreme impact
• Almost all statistical models underestimate the likelihood of unseen events.
![Page 7: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/7.jpg)
7
The Long Tail
• Many events follow an exponential distribution• These distributions have a very long “tail”.
– I.e. A large region with significant probability mass, but low likelihood at any particular point.
• Often, interesting events occur in the Long Tail, but it is difficult to accurately model behavior in this region.
![Page 8: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/8.jpg)
8
Boxes and Balls
• 2 Boxes, one red and one blue.• Each contain colored balls.
![Page 9: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/9.jpg)
9
Boxes and Balls
• Suppose we randomly select a box, then randomly draw a ball from that box.
• The identity of the Box is a random variable, B.
• The identity of the ball is a random variable, L.
• B can take 2 values, r, or b• L can take 2 values, g or o.
![Page 10: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/10.jpg)
10
Boxes and Balls
• Given some information about B and L, we want to ask questions about the likelihood of different events.
• What is the probability of selecting an apple?
• If I chose an orange ball, what is the probability that I chose from the blue box?
![Page 11: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/11.jpg)
11
Some basics
• The probability (or likelihood) of an event is the fraction of times that the event occurs out of n trials, as n approaches infinity.
• Probabilities lie in the range [0,1]• Mutually exclusive events are events that cannot
simultaneously occur.– The sum of the likelihoods of all mutually exclusive events
must equal 1.
• If two events are independent then,
p(X, Y) = p(X)p(Y)
p(X|Y) = p(X)
![Page 12: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/12.jpg)
12
Joint Probability – P(X,Y)
• A Joint Probability function defines the likelihood of two (or more) events occurring.
• Let nij be the number of times event i and event j simultaneously occur.
Orange Green
Blue box 1 3 4
Red box 6 2 8
7 5 12
![Page 13: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/13.jpg)
13
Generalizing the Joint Probability
![Page 14: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/14.jpg)
14
Marginalization
• Consider the probability of X irrespective of Y.
• The number of instances in column j is the sum of instances in each cell
• Therefore, we can marginalize or “sum over” Y:
![Page 15: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/15.jpg)
15
Conditional Probability
• Consider only instances where X = xj.
• The fraction of these instances where Y = yi is the conditional probability– “The probability of y given x”
![Page 16: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/16.jpg)
16
Relating the Joint, Conditional and Marginal
![Page 17: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/17.jpg)
17
Sum and Product Rules
• In general, we’ll refer to a distribution over a random variable as p(X) and a distribution evaluated at a particular value as p(x).
Sum Rule
Product Rule
![Page 18: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/18.jpg)
18
Bayes Rule
![Page 19: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/19.jpg)
19
Interpretation of Bayes Rule
• Prior: Information we have before observation.
• Posterior: The distribution of Y after observing X
• Likelihood: The likelihood of observing X given Y
PriorPosterior
Likelihood
![Page 20: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/20.jpg)
20
Boxes and Balls with Bayes Rule
• Assuming I’m inherently more likely to select the red box (66.6%) than the blue box (33.3%).
• If I selected an orange ball, what is the likelihood that I selected the red box? – The blue box?
![Page 21: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/21.jpg)
21
Boxes and Balls
![Page 22: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/22.jpg)
22
Naïve Bayes Classification
• This is a simple case of a simple classification approach.
• Here the Box is the class, and the colored ball is a feature, or the observation.
• We can extend this Bayesian classification approach to incorporate more independent features.
![Page 23: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/23.jpg)
23
Naïve Bayes Classification
• Some theory first.
![Page 24: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/24.jpg)
24
Naïve Bayes Classification
• Assuming independent features simplifies the math.
![Page 25: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/25.jpg)
25
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY FIRM BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 26: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/26.jpg)
26
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY FIRM BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
Prior:
![Page 27: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/27.jpg)
27
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY SOFT BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 28: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/28.jpg)
28
Naïve Bayes Example Data
HOT LIGHT SOFT RED
COLD HEAVY SOFT RED
HOT HEAVY FIRM RED
HOT LIGHT FIRM RED
COLD LIGHT SOFT BLUE
COLD HEAVY SOFT BLUE
HOT HEAVY FIRM BLUE
HOT LIGHT FIRM BLUE
HOT HEAVY FIRM ?????
![Page 29: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/29.jpg)
29
Continuous Probabilities
• So far, X has been discrete where it can take one of M values.
• What if X is continuous?• Now p(x) is a continuous probability density
function.• The probability that x will lie in an interval (a,b)
is:
![Page 30: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/30.jpg)
30
Continuous probability example
![Page 31: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/31.jpg)
31
Properties of probability density functions
Sum Rule
Product Rule
![Page 32: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/32.jpg)
32
Expected Values
• Given a random variable, with a distribution p(X), what is the expected value of X?
![Page 33: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/33.jpg)
33
Multinomial Distribution
• If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution.
• The probability of x being in state k is μk
![Page 34: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/34.jpg)
34
Expected Value of a Multinomial
• The expected value is the mean values.
![Page 35: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/35.jpg)
35
Gaussian Distribution
• One Dimension
• D-Dimensions
![Page 36: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/36.jpg)
36
Gaussians
![Page 37: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/37.jpg)
37
How machine learning uses statistical modeling
• Expectation– The expected value of a function is the
hypothesis
• Variance– The variance is the confidence in that
hypothesis
![Page 38: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/38.jpg)
38
Variance
• The variance of a random variable describes how much variability around the expected value there is.
• Calculated as the expected squared error.
![Page 39: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/39.jpg)
39
Covariance
• The covariance of two random variables expresses how they vary together.
• If two variables are independent, their covariance equals zero.
![Page 40: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/40.jpg)
40
Linear Algebra
• Vectors– A one dimensional array. – If not specified, assume x is a column
vector.
• Matrices– Higher dimensional array.– Typically denoted with capital letters.– n rows by m columns
![Page 41: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/41.jpg)
41
Transposition
• Transposing a matrix swaps columns and rows.
![Page 42: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/42.jpg)
42
Transposition
• Transposing a matrix swaps columns and rows.
![Page 43: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/43.jpg)
43
Addition
• Matrices can be added to themselves iff they have the same dimensions.– A and B are both n-by-m matrices.
![Page 44: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/44.jpg)
44
Multiplication
• To multiply two matrices, the inner dimensions must be the same.– An n-by-m matrix can be multiplied by an m-by-k matrix
![Page 45: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/45.jpg)
45
Inversion
• The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property.
• Where I is the identity matrix is an n-by-n matrix with ones along the diagonal.– Iij = 1 iff i = j, 0 otherwise
![Page 46: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/46.jpg)
46
Identity Matrix
• Matrices are invariant under multiplication by the identity matrix.
![Page 47: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/47.jpg)
47
Helpful matrix inversion properties
![Page 48: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/48.jpg)
48
Norm
• The norm of a vector, x, represents the euclidean length of a vector.
![Page 49: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/49.jpg)
49
Positive Definite-ness
• Quadratic form– Scalar
– Vector
• Positive Definite matrix M
• Positive Semi-definite
![Page 50: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/50.jpg)
50
Calculus
• Derivatives and Integrals• Optimization
![Page 51: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/51.jpg)
51
Derivatives
• A derivative of a function defines the slope at a point x.
![Page 52: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/52.jpg)
52
Derivative Example
![Page 53: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/53.jpg)
53
Integrals
• Integration is the inverse operation of derivation (plus a constant)
• Graphically, an integral can be considered the area under the curve defined by f(x)
![Page 54: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/54.jpg)
54
Integration Example
![Page 55: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/55.jpg)
55
Vector Calculus
• Derivation with respect to a matrix or vector
• Gradient• Change of Variables with a Vector
![Page 56: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/56.jpg)
56
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how can we find f’(x)?
![Page 57: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/57.jpg)
57
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how can we find f’(x)?
![Page 58: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/58.jpg)
58
Example Derivation
![Page 59: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/59.jpg)
59
Example Derivation
Also referred to as the gradient of a function.
![Page 60: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/60.jpg)
60
Useful Vector Calculus identities
• Scalar Multiplication
• Product Rule
![Page 61: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/61.jpg)
61
Useful Vector Calculus identities
• Derivative of an inverse
• Change of Variable
![Page 62: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/62.jpg)
62
Optimization
• Have an objective function that we’d like to maximize or minimize, f(x)
• Set the first derivative to zero.
![Page 63: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/63.jpg)
63
Optimization with constraints
• What if I want to constrain the parameters of the model.– The mean is less than 10
• Find the best likelihood, subject to a constraint.
• Two functions:– An objective function to maximize– An inequality that must be satisfied
![Page 64: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/64.jpg)
64
Lagrange Multipliers
• Find maxima of f(x,y) subject to a constraint.
![Page 65: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/65.jpg)
65
General form
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
![Page 66: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/66.jpg)
66
Example
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
![Page 67: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/67.jpg)
67
Example
Now have 3 equations with 3 unknowns.
![Page 68: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/68.jpg)
68
ExampleEliminate Lambda Substitute and Solve
![Page 69: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/69.jpg)
69
Why does Machine Learning need these tools?
• Calculus– We need to identify the maximum likelihood, or
minimum risk. Optimization– Integration allows the marginalization of
continuous probability density functions
• Linear Algebra– Many features leads to high dimensional spaces– Vectors and matrices allow us to compactly
describe and manipulate high dimension al feature spaces.
![Page 70: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/70.jpg)
70
Why does Machine Learning need these tools?
• Vector Calculus– All of the optimization needs to be performed
in high dimensional spaces– Optimization of multiple variables
simultaneously – Gradient Descent– Want to take a marginal over high
dimensional distributions like Gaussians.
![Page 71: Machine Learning CUNY Graduate Center Lecture 2: Math Primer.](https://reader036.fdocuments.us/reader036/viewer/2022062323/5697c0211a28abf838cd2bfb/html5/thumbnails/71.jpg)
71
Next Time
• Linear Regression and Regularization
• Read Chapter 1.1, 3.1, 3.3