An Information Age: Math and Technology of Data

39
An Information Age: Math and Technology of Data NCCTM 2018

Transcript of An Information Age: Math and Technology of Data

Page 1: An Information Age: Math and Technology of Data

An Information Age: Math and Technology of Data

NCCTM 2018

Page 2: An Information Age: Math and Technology of Data

Introductions

● Sarah Ritchey: PhD Candidate at Duke in Applied Mathematics● Blain Patterson: PhD Candidate at NC State in Mathematics Education● Together developed and taught a course on machine learning for Duke TIP.

Page 3: An Information Age: Math and Technology of Data

Amazon

● How does amazon recommend

items?

● All purchases are tracked.

● This is a massive amount of data.

● Computers are needed to

recognize patterns in data.

Page 4: An Information Age: Math and Technology of Data

● Netflix uses algorithms to recommend movies by leveraging patterns in data.

● How would you do it?

Critic Star Wars Raiders of the Lost Arc

Casablanca Singin’ in the Rain

Sarah **** **** * **

Blain ***** **** ** *

Sophie ** ** **** ***

DJ ** * *** ****

Joe ***** ? ? **

Page 5: An Information Age: Math and Technology of Data

Preference Space

Star Wars

5 ● Blain

4 ● Sarah

3

2 ● DJ ● Sophie

1

1 2 3 4 5 Raiders

Page 6: An Information Age: Math and Technology of Data

Features

A feature is a data attribute used to make comparisons of objects. Features can be either quantitative (size, weight, etc.) or qualitative (color, shape, etc.).

Page 7: An Information Age: Math and Technology of Data

Movie Features

Feature Star Wars Raiders of the Lost Arc

Casablanca Singin’ in the Rain

Page 8: An Information Age: Math and Technology of Data

Movie Features

Feature Star Wars Raiders of the Lost Arc

Casablanca Singin’ in the Rain

Action (1-5) 5 4 2 1

Romance (1-5) 1 2 4 3

Length (min) 121 115 102 103

Harrison Ford Y Y N N

Year 1977 1981 1942 1952

Page 9: An Information Age: Math and Technology of Data

Feature Space

Action

5 ● Star Wars

4 ● Raiders

3

2 ● Casablanca

1 ● Singin’

1 2 3 4 5 Romance

Page 10: An Information Age: Math and Technology of Data

Machine Learning

Machine learning is a subfield of computer science. The objective is to teach a computer to solve problems without explicitly programing it to do so.

Page 11: An Information Age: Math and Technology of Data

Problem Solving

Page 12: An Information Age: Math and Technology of Data

Prerequisites

● Coordinate Geometry● Vectors● Descriptive Statistics● Machine Learning

Terminology

Page 13: An Information Age: Math and Technology of Data

Software

● Google Sheets● Desmos● Orange

Page 14: An Information Age: Math and Technology of Data

Machine Learning 101

Page 15: An Information Age: Math and Technology of Data

Classification vs. Clustering

● Classification ○ Data that include attributes of some object and a categorical label are

provided.○ The goal is to place objects it into the correct category.

● Clustering ○ Data the include attributes of some object are provided. ○ The goal is to group the objects together that are similar.

Page 16: An Information Age: Math and Technology of Data

Pitch PredictionData on every pitch in the MLB is collected.● Speed● Break● Location● Count● Outs● Score● Runners● Pitcher● Batter

Use this information teaching a computer to classify the pitch.● Fastball● Curveball● Knuckleball ● Change Up● Slider

We can then predict what type of pitch will be thrown next in real time.

Page 17: An Information Age: Math and Technology of Data

Social Network Clustering

Page 18: An Information Age: Math and Technology of Data

Popular Machine Learning Algorithms

● K-Nearest Neighbors

● Classification Trees

● K-Means

● Linear Regression

● Logistic Regression

● Gaussian Naive Bayes

● Support Vector Machines

Page 19: An Information Age: Math and Technology of Data

Popular Machine Learning Algorithms

● K-Nearest Neighbors

● Classification Trees

● K-Means

● Linear Regression

● Logistic Regression

● Gaussian Naive Bayes

● Support Vector Machines

Page 20: An Information Age: Math and Technology of Data

K-Nearest Neighbor

Page 21: An Information Age: Math and Technology of Data

Representing Data as a Vector

From our movie example, we have

SW = (5, 1, 121, 1, 1977)

RLA = (4, 2, 115, 1, 1981)

C = (2, 4, 102, 0, 1924)

SR = (1, 3, 103, 0, 1952)

Page 22: An Information Age: Math and Technology of Data

K-Nearest Neighbors

Suppose k = 3. For each point, v, in the validation data

1. Find the 3 points (t1, t2, t3) from the training data that are the closest to v.2. Determine the most common label of the 3 points t1, t2, t3.

3. Label v with the most common label.

Page 23: An Information Age: Math and Technology of Data

Football or Basketball?

Page 24: An Information Age: Math and Technology of Data

Football or Basketball?

Page 25: An Information Age: Math and Technology of Data

Measuring Model Performance

Page 26: An Information Age: Math and Technology of Data

Testing Supervised Learning Algorithms

Use known data to develop and test the algorithm.

● ~80% training data is used to train our algorithm. ● ~20% validation data is used to test our algorithm.

Page 27: An Information Age: Math and Technology of Data

Accuracy vs. Precision

Page 28: An Information Age: Math and Technology of Data

K-Means

Page 29: An Information Age: Math and Technology of Data

Clustering

The scatterplot shown to the right contains four groups. Sort these data into the four groups.

Page 30: An Information Age: Math and Technology of Data

Clustering

● How did you do it?

● What if we had more data?

● What if we had more than two features?

Page 31: An Information Age: Math and Technology of Data

K-Means Algorithm

1. Choose k random starting points called centroids. 2. For each point, calculate the distance to each centroid. 3. Label each point according to the nearest centroid. 4. Recalculate the mean of the coordinates of points with the

same label. 5. Move the centroid to the mean coordinates calculated

above. 6. Repeat steps 1-5 until the centroid no longer moves.

Page 32: An Information Age: Math and Technology of Data

K-Means Example

Page 33: An Information Age: Math and Technology of Data

Additional Topics

Page 34: An Information Age: Math and Technology of Data

Classification Trees

Page 35: An Information Age: Math and Technology of Data

Regression

Page 36: An Information Age: Math and Technology of Data

Overview of Orange

Page 38: An Information Age: Math and Technology of Data

Questions?

Feel free to contact us at

[email protected]

724.977.3068

[email protected]

814.873.0113

Page 39: An Information Age: Math and Technology of Data

Thank You