An Information Age: Math and Technology of Data

NCCTM 2018

Introductions

● Sarah Ritchey: PhD Candidate at Duke in Applied Mathematics● Blain Patterson: PhD Candidate at NC State in Mathematics Education● Together developed and taught a course on machine learning for Duke TIP.

Amazon

● How does amazon recommend

items?

● All purchases are tracked.

● This is a massive amount of data.

● Computers are needed to

recognize patterns in data.

● Netflix uses algorithms to recommend movies by leveraging patterns in data.

● How would you do it?

Critic Star Wars Raiders of the Lost Arc

Casablanca Singin’ in the Rain

Sarah **** **** * **

Blain ***** **** ** *

Sophie ** ** **** ***

DJ ** * *** ****

Joe ***** ? ? **

Preference Space

Star Wars

5 ● Blain

4 ● Sarah

2 ● DJ ● Sophie

1 2 3 4 5 Raiders

Features

A feature is a data attribute used to make comparisons of objects. Features can be either quantitative (size, weight, etc.) or qualitative (color, shape, etc.).

Movie Features

Feature Star Wars Raiders of the Lost Arc

Movie Features

Feature Star Wars Raiders of the Lost Arc

Action (1-5) 5 4 2 1

Romance (1-5) 1 2 4 3

Length (min) 121 115 102 103

Harrison Ford Y Y N N

Year 1977 1981 1942 1952

Feature Space

Action

5 ● Star Wars

4 ● Raiders

2 ● Casablanca

1 ● Singin’

1 2 3 4 5 Romance

Machine Learning

Machine learning is a subfield of computer science. The objective is to teach a computer to solve problems without explicitly programing it to do so.

Problem Solving

Prerequisites

● Coordinate Geometry● Vectors● Descriptive Statistics● Machine Learning

Terminology

Software

● Google Sheets● Desmos● Orange

Machine Learning 101

Classification vs. Clustering

● Classification ○ Data that include attributes of some object and a categorical label are

provided.○ The goal is to place objects it into the correct category.

● Clustering ○ Data the include attributes of some object are provided. ○ The goal is to group the objects together that are similar.

Pitch PredictionData on every pitch in the MLB is collected.● Speed● Break● Location● Count● Outs● Score● Runners● Pitcher● Batter

Use this information teaching a computer to classify the pitch.● Fastball● Curveball● Knuckleball ● Change Up● Slider

We can then predict what type of pitch will be thrown next in real time.

Social Network Clustering

Popular Machine Learning Algorithms

● K-Nearest Neighbors

● Classification Trees

● K-Means

● Linear Regression

● Logistic Regression

● Gaussian Naive Bayes

● Support Vector Machines

Popular Machine Learning Algorithms

● K-Nearest Neighbors

● Classification Trees

● K-Means

● Linear Regression

● Logistic Regression

● Gaussian Naive Bayes

● Support Vector Machines

K-Nearest Neighbor

Representing Data as a Vector

From our movie example, we have

SW = (5, 1, 121, 1, 1977)

RLA = (4, 2, 115, 1, 1981)

C = (2, 4, 102, 0, 1924)

SR = (1, 3, 103, 0, 1952)

K-Nearest Neighbors

Suppose k = 3. For each point, v, in the validation data

1. Find the 3 points (t1, t2, t3) from the training data that are the closest to v.2. Determine the most common label of the 3 points t1, t2, t3.

3. Label v with the most common label.

Football or Basketball?

Measuring Model Performance

Testing Supervised Learning Algorithms

Use known data to develop and test the algorithm.

● ~80% training data is used to train our algorithm. ● ~20% validation data is used to test our algorithm.

Accuracy vs. Precision

K-Means

Clustering

The scatterplot shown to the right contains four groups. Sort these data into the four groups.

Clustering

● How did you do it?

● What if we had more data?

● What if we had more than two features?

K-Means Algorithm

1. Choose k random starting points called centroids. 2. For each point, calculate the distance to each centroid. 3. Label each point according to the nearest centroid. 4. Recalculate the mean of the coordinates of points with the

same label. 5. Move the centroid to the mean coordinates calculated

above. 6. Repeat steps 1-5 until the centroid no longer moves.

K-Means Example

Additional Topics

Classification Trees

Regression

Overview of Orange

Standards

Questions?

Feel free to contact us at

ser39@duke.edu

724.977.3068

bapatte3@ncsu.edu

814.873.0113

Thank You

An Information Age: Math and Technology of Data

Documents

Transcript of An Information Age: Math and Technology of Data

Technology + Singapore Math = Number Sense

Integrating technology in math

Technology - Age of sharing

Norwood Summer Math Elementary Math & Technology Deptartments.

Math Differentiation Through Technology

Modern Automotive Technology Chapter 6 - … Automotive Technology Technology Chapter 6 Automotive Measurement and Math. Chapter 6 Chapter 6 Automotive Measurement and Math …

Basic Math Area Formulas Math for Water Technology MTH 082 Lecture 2 Chapter 9 Math for Water Technology MTH 082 Lecture 2 Chapter 9.

Science, Technology, Engineering, and (mostly) Math

Math: Content Literacy and Technology

The Science, Technology, Engineering and Math of Golf€¦ · The Science, Technology, Engineering and Math of Golf. STEM stands for Science, Technology, Engineering and Math. The

Technology Enhanced Math Rehab

STEM (Science, Technology, Engineering, & Math) Mentors

W200 Math And Technology 2

Science, Computer Technology, Engineering & Math

Assistive Technology And Math

WST 091 Math Sludge Age Hydraulic Loading

Math + Technology = Student Success

Merging Math and Technology

Stone age technology

AGE AND TECHNOLOGY REPORT