A.I. Algorithms Cogs 188

Post on 16-Jan-2022

4 views 0 download

Transcript of A.I. Algorithms Cogs 188

A.I. AlgorithmsCogs 188

• 1 Midterm: 20%

• 1 Final Exam: 30%

• Assignment 0: 5%

• Assignment 1: 15%

• Assignment 2: 15%

• Assignment 3: 15%

Grades

Assignments are to be done individually (not in groups). Late assignments will have 33% penalty per day that they are late. So if you submit an assignment 1 minute late, you will have lost 33% of the points, if you submit an assignment 24 hours and 1 minute late, you will lose 66% of the points.

Tentative ScheduleDate Day Topics Covered Assignments

October 1st Thursday Machine Learning overview. Assignment 0 Assigned

October 6th Tuesday K-NN

October 8th Thursday Linear Regression - Objective Function

October 13th Tuesday Gradient Descent Assignment 1 Assigned

October 15th Thursday Perceptron

October 20th Tuesday Perceptron Revision

October 22nd Thursday Statistics & Probability - Distributions

October 27th Tuesday K-Means Assignment 1 Due

October 29th Thursday Midterm

November 3rd Tuesday Review and Hierarchical Clustering Assignment 2 Assigned

November 5th Thursday EM-Algorithm

November 10th Tuesday EM-Algorithm Cont.

November 12th Thursday EM-Algorithm Revision

November 17th Tuesday Genetic Algorithms Assignment 2 Due

November 19th Thursday Genetic Algorithms - Cont. Assignment 3 Assigned

November 24th Tuesday Genetic Algorithms - Examples

November 26th Thursday No class, happy thanksgiving!

December 1st Tuesday Bayes Theorem

December 3rd Thursday Naïve Bayes Classification

December 8th Tuesday A.I. In Healthcare

December 10th Thursday Review Assignment 3 Due

December 16th Wednesday Final Exam

• Instructor:– Dr. Anjum Gupta, a3gupta@cogsci.ucsd.edu

• TA:– Qiyuan: qiw103@eng.ucsd.edu

• If you are sending an email to us, please send all theemails to both addresses, however posting yourquestions on Canvas is recommended, wheneverpossible.

Teaching Staff

1. Probability and Statistics2. Python / Jupyter Notebook (TA sections)3. Nearest Neighbor 4. Linear Regression, Logistic Regression5. Perceptron6. Bayes Theorem7. K-means, Hierarchical Clustering8. Genetic Algorithm9. EM Algorithm

Syllabus

Learning.

You are learning, if you improve your performance with experience.

Big Picture

Input Data

Statistics

Algorithms

Graph TheoryInformation

Theory

Probability Theory

Game Theory

Linear Algebra

Analytical Geometry

Output

Machine Learning Tools

Computer Science

Domain Expertise

Things you can do with Machine Learning

• Given voice stream, identify the speaker/language.

• Recognize handwritten numbers.

• Evaluating the “lifetime value” of a customer (or sales lead)

• Face or object recognition in a video stream

• Given symptoms, diagnose a disease.

• Adjusting stock portfolio based on sentiment and clustering

• Distinguish between a weed and a plant sapling.

• Hand gesture analysis, a glove that sends text messages.

• Too many to count individually. That what makes machine learning so useful.

Machine Learning in Agriculture

Blue River Technologies: Differentiating weed vs plant saplings

Root AI: Identifying ripe tomatoes to pick.

Let’s start with our canonical two broad categories!

• Supervised – Discriminant Models

• Unsupervised – Generative Models

For tasks, we, humans can technically do ourselves, but it will be nice to get some help and automate it!

Classifying Data

For tasks that we, humans, cannot do. E.g. Potentially generating new insights and extracting hidden information that data contains.

Understanding Data

Generally Speaking…

• Discriminative Models – Classifying Data

– Spam filter (Spam, Not Spam)

– Identify language from a voice stream

– Facial expression recognition

– Classify species according to some physical features

• Generative Models – Understanding Data

– Detect anomalies

– Finding probability of a scenario

– Predicting future outcomes

– Completing the missing data

Optimization Algorithms

• We will also learn two specific optimization algorithms.

– Gradient Descent

– Genetic Algorithms

Hand Written Digits example

Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)

Use these to learn a classifier which will label digit-images automatically…

Classification

Image What is the number?

?

?

Handwritten Digits example

Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)

Use these 20,0000 images to understand something about the digits and handwriting.

Understanding Data:Being able to generate the numbers!

These results are from one of the projects I worked on as with a fellow graduate student, Eric Wiewiora.

Regenerated using model of digit 2 Regenerated using model of digit 5

Naïve Bayes Example: Fishing data

Day Outlook Water Temperature

Pollutants in Water

Wind Fish Present

Day1 Sunny Hot High Weak No

Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Bayesian Networks are for Bayesians, although frequentists are also welcome.

You are given various variables. For example: imagine you are going fishing.

DepthTemperature

Light

Corals

Food Source

Fish Present

You can map out some relationship among them through “expert knowledge,” then refine it and learn the exact parameters.

Now you can ask, What is the probability of Fish in a shallow & cold water with corals?

Complete Bayes Net has the Graph and the Prob. Tables with it.

Liver Disorder Diagnostic Bayesian Network

Bayesian Vs Frequentist

Sherlock Holmes was apparently a frequentist.I have no data yet. It is a capital mistake to theorize before

one has data.

(A Scandal in Bohemia)

This sounds like a Bayesian.

Intuition becomes increasingly valuable in the new information society precisely because there is so much data.

John Naisbitt (Author)

Complex Bayesian Networks have given way to “deep learning”

• New algorithms came along and replaced the idea of Bayesian networks.

• They still influence many unsupervised learning algorithms.

• Other algorithms such as EM-algorithm also helps us model the “true” nature of the data.

• We will visit some of the generative algorithms later in the course

• Let’s start with the classification algorithms first.

Classification

• Can a computer learn to recognize objects?

• Shown 10,000 flowers, can a computer “understand” flowers? Can it say if the new photograph shown is a flower?

Iris Setosa Iris Versicolor Iris Virginica

Let’s try our brain’s algorithm!

Iris SetosaIris Versicolor Iris Virginica

???

What is Similarity?The quality or state of being similar; likeness; resemblance; as, a similarity of features. Webster's Dictionary

For example, for someone who is writing a software for healthcare industry,They may have to deal with the questions of “how similar are two patients.”

It depends on what you are comparing the two objects for.

Whole lot of research and Ph.D. thesis, just on the concept of similarity.

1. Patient Similarity Networks for Precision Medicine

2. Patient Similarity: Emerging Concepts in Systems and Precision Medicine

3. Machine learning of patient similarity: A case study on predicting survival in cancer patient after locoregional chemotherapy

Fish Sorting: For Packaging

salmon

sea bass

sortingchamber

classifier

Pattern Classification, Chapter 1

29

An Example

• “Sorting incoming Fish on a conveyor according to species using optical sensing”

Sea bass

Species

Salmon

Pattern Classification, Chapter 1

30

• Problem Analysis

– Set up a camera and take some sample images to extract features

• Length

• Lightness

• Width

• Number and shape of fins

• Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our classifier!

Pattern Classification, Chapter 1

31

• Classification

– Select the length of the fish as a possible feature for discrimination

Pattern Classification, Chapter 1

32

Pattern Classification, Chapter 1

33

The length is a poor feature alone!

Select the lightness as a possible feature.

Pattern Classification, Chapter 1

34

Pattern Classification, Chapter 1

35

• Adopt the lightness and add the width of the fish

Fish xT = [x1, x2]

Lightness Width

• Plotting Salmon and Seabass based on two-dimensional feature vector.

Feature extraction

Task: to extract features which are good for classification.

Good features: • Objects from the same class have similar feature values.

• Objects from different classes have different values.

“Good” features “Bad” features

Basic concepts

y x=

nx

x

x

2

1Feature vector

- A vector of observations (measurements).

- is a point in feature space .

Hidden state

- Cannot be directly measured.

- Patterns with equal hidden state belong to the same class.

Xx

x X

Yy

Task

- To design a classifer (decision rule)

which decides about a hidden state based on an onbservation.YX →:q

Pattern

Text Classification

• Representing Text as a Vector.• Stem words used, such that “computer, computes ..” all get noted under

“compute.”• The number in the vector is actually divided by the number of documents that number

appears in. “Inverse Document Frequency”

Grasshoppers

KatydidsLet’s go back to agriculture!

Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is.

Katydid or Grasshopper?

Thorax Length

Abdomen Length Antennae

Length

MandibleSize

SpiracleDiameter

Leg Length

For any domain of interest, we can measure features

Color {Green, Brown, Gray, Other} Has Wings?

Insect ID

Abdomen Length

Antennae Length

Insect Class

1 2.7 5.5 Grasshopper

2 8.0 9.1 Katydid

3 0.9 4.7 Grasshopper

4 1.1 3.1 Grasshopper

5 5.4 8.5 Katydid

6 2.9 1.9 Grasshopper

7 6.1 6.6 Katydid

8 0.5 1.0 Grasshopper

9 8.3 6.6 Katydid

10 8.1 4.7 Katydids

11 5.1 7.0 ???????

We can store features in a database.

My_Collection

The classification problem can now be expressed as:

• Given a training database (My_Collection), predict the class label of a previously unseen instance

previously unseen instance =

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Grasshoppers Katydids

Abdomen Length

Katydid or Grasshopper?

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Abdomen Length

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Grasshoppers Katydids

Abdomen Length

We will also use this lager dataset as a motivating example…

Each of these data objects are called…• exemplars• (training) examples• instances• tuples

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Grasshoppers Katydids

Abdomen Length

We will also use this lager dataset as a motivating example…

Each of these data objects are called…• exemplars• (training) examples• instances• tuples

????

Nearest Neighbor Classifier

If the nearest instance to the previously unseen instance is a Katydid

class is Katydidelse

class is Grasshopper

KatydidsGrasshoppers

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Abdomen Length

Hand Written Digits example

Database of 20,000 images of handwritten digits, each labeled by a human (Supervised Learning)

[28 x 28 greyscale; pixel values 0-255; labels 0-9]

Use these to learn a classifier which will label digit-images automatically…

Nearest neighbor

Image to label Nearest neighbor

Nearest neighbor

Image to label Nearest neighbor

Overall:

error rate = 6%

(on test set)

Grasshoppers Katydids

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Abdomen Length

Classifying Insects

Each of these data objects are called…• exemplars• (training) examples• instances• tuples

An

ten

na

Le

ng

th

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Grasshoppers Katydids

Abdomen Length

We will also use this lager dataset as a motivating example…

Each of these data objects are called…• exemplars• (training) examples• instances• tuples

????

What else do we want?

• K-NN (K-Nearest Neighbors) is great!

• What is one obvious way we can improve our grasp on classification problem?

Lets try to study the classification problem with some examples.

I am going to show you some classification problems which were shown to pigeons!

Let us see if you are as smart as a pigeon!

Examples of

class A

3 4

1.5 5

6 8

2.5 5

Examples of

class B

5 2.5

5 2

8 3

4.5 3

8 1.5

4.5 7

What class is this object?

What about this one, A or B?

Pigeon Problem 1

Examples of

class A

3 4

1.5 5

6 8

2.5 5

Examples of

class B

5 2.5

5 2

8 3

4.5 3

8 1.5

This is a B!Pigeon Problem 1

Here is the rule.If the left bar is smaller than the right bar, it is an A, otherwise it is a B.

Examples of

class A

4 4

5 5

6 6

3 3

Examples of

class B

5 2.5

2 5

5 3

2.5 3

7 7

Pigeon Problem 2

So this one is an A.

The rule is as follows, if the two bars are equal sizes, it is an A. Otherwise it is a B.

Examples of

class A

4 4

1 5

6 3

3 7

Examples of

class B

5 6

7 5

4 8

7 7

6 6

Pigeon Problem 3

This one is really hard!What is this, A or B?

Examples of

class A

4 4

1 5

6 3

3 7

Examples of

class B

5 6

7 5

4 8

7 7

6 6

Pigeon Problem 3 It is a B!

The rule is as follows, if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.

Why did we spend so much time with this game?

Because we wanted to show that almost all classification problems have a geometric interpretation, check out the next 4 slides…

Examples of

class A

3 4

1.5 5

6 8

2.5 5

Examples of

class B

5 2.5

5 2

8 3

4.5 3

Pigeon Problem 1

Here is the rule again.If the left bar is smaller than the right bar, it is an A, otherwise it is a B.

Lef

t B

ar

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Right Bar

Examples of

class A

4 4

5 5

6 6

3 3

Examples of

class B

5 2.5

2 5

5 3

2.5 3

Pigeon Problem 2

Lef

t B

ar

10

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

Right Bar

Let me look it up… here it is.. the rule is, if the two bars are equal sizes, it is an A. Otherwise it is a B.

Examples of

class A

4 4

1 5

6 3

3 7

Examples of

class B

5 6

7 5

4 8

7 7

Pigeon Problem 3

Lef

t B

ar

100

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

Right Bar

The rule again:if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.

Examples of

class A

2 2

1 7

7 3

3 8

Examples of

class B

Pigeon Problem 4

The rule again:If both squares are bigger than 6, it is an B. Otherwise it is a A.

Lef

t B

ar

100

10 20 30 40 50 60 70 80 90 100

10

20

30

40

50

60

70

80

90

Right Bar

8 6

7 6

7 5

10

1 2 3 4 5 6 7 8 9 10

123456789

100

10 20 30 40 50 60 70 80 90100

10

20

30

40

50

60

70

80

90

1

0

1 2 3 4 5 6 7 8 9 10

123456789

Which of the “Pigeon Problems” can be

solved by the Simple Linear Classifier?

1) Perfect

2) Useless

3) Perfect

4) Not so good

Lef

t B

ar

100

10 20 3040 506070 8090100

102030405060708090

Right Bar

Nearest neighbor: pros and cons

Pros• Simple• No assumptions about the distribution or shape of different classes.• Excellent performance on a wide range of tasks• Effective with large training set

Cons• Time consuming – with n training points in Rd, time to label a new

point is O(nd)• No insight into the domain.• Would prefer a compact classifier• No good way to determine parameter “k.”• Dependant highly on the distance measure used.

Some Variants

• K-nearest Neighbors• Pick K nearest Neighbors and take the majority vote

• Parzen Window• Pick an area around a point, look at the majority of points in

that window

• Many other variants. Nearest Neighbor search is elementary but deserves proper attention. Best accuracy for the digits data is using a variant of nearest neighbor.

Distance Measures

How many clusters does this have? Which two points are the neighbors?

A Famous ProblemR. A. Fisher’s Iris Dataset.

• 3 classes

• 50 of each class

The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width.

Iris Setosa Iris Versicolor Iris Virginica

Setosa

Versicolor

Virginica