From Machine Learning to Data Science By Jiqiong Qiu

23
From Machine Learning to Data Science How to become a data scientist 1 Jiqiong QIU

Transcript of From Machine Learning to Data Science By Jiqiong Qiu

From Machine Learning to Data Science

How to become a data scientist

1

Jiqiong QIU

Part 1:Machine Learning

2

Outline• What is machine learning?

• Why use machine learning?

• When use machine learning?

• How to learn machine learning?

• Machine Learning and Other Fields

3

Machine Learning is concerned with computer programs that automatically improve their performance through experience (data bases).

What is Machine Learning?

Recommendation System

Yale Face Database

4

What is Machine Learning?

Data MLImproved

performance measure

Why use machine learning?

5

Why use Machine Learning?Face Recognition • ‘define’ face and hand-

program: difficult

• learn from data (observations) and recognize: easy staff even for a child

• ‘ML-based face recognition system’ can be easier to build than hand-programmed system

When use machine learning?6

When use Machine Learning?

1. exists some ‘underlying pattern’ to be learned

• so ‘performance measure’ can be improved

2. but no programmable (easy) definition

• so ‘ML’ is needed

3. somehow there is data about the pattern

• so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML

7

Which of the following is best suited for machine learning?• predicting whether the next cry of the baby

happens at an even-numbered minute or not

• determining whether a given graph contains a cycle

• deciding whether to approve credit card to some customer

• guessing whether the earth will be destroyed by the misuse of nuclear power in the next ten years

8

Funny but hardMachine Learning: a mixture of theoretical and practical tools

theory oriented

• derive everything deeply for solid understanding

• less interesting to general audience

techniques oriented

• flash over the sexiest techniques broadly for shiny coverage

• too many techniques, hard to choose, hard to use properly

9

How to learn machine learning?

• MOOCS:

• Coursera Andrew Ng Stanford Machine Learning

• Caltech course/ Andrew Ng Machine Learning youtube version

• Edx 15.071x The Analytics Edge

• http://videolectures.net/

• etc

• Play time:

• topcoder/leetcode/hackerrank

• kaggle/datascience.net

10

Statistics

Data Mining

Machine Learning

Artificial Intelligence

Machine Learning and Other Fields

• Statistics: quantifies numbers

• Data Mining: explains patterns

• Machine Learning: predicts with models

• Artificial Intelligence: behaves and reasons

13

Part 2: Data Science

Wait !!!

14

In real life

Problem is not well posed Data is not perfect

Pre-processingData Visualisation

It’s not a Kaggle Competition

15

Data Visualisation

16

Preprocessing

17

Do the right choiceDefine the right metrics!

18

• MAPE:Mean Absolute Percentage Error • MAE: Mean Absolute Error • RMSE: Root Mean Square Error

Do the right choice

Good

Low training error and simple classifier

Bad

High training error

Classifier too complex

No free lunch!

19

Do the right choiceBenchmark is always

helpful

20

Do the right choice

Benchmark is always helpful

21

Is this the end?Google Flu Trende

22

Nobel price and Chocolate

Demo Time

23