PyData SF 2016 --- Moving forward through the darkness

85
Moving Forward Through The Darkness the blindness of modeling and how to break through Chia-Chi@PyData SF 2016

Transcript of PyData SF 2016 --- Moving forward through the darkness

Page 1: PyData SF 2016 --- Moving forward through the darkness

Moving Forward ThroughThe Darkness

the blindness of modeling and how to break through

Chia-Chi@PyData SF 2016

Page 2: PyData SF 2016 --- Moving forward through the darkness
Page 3: PyData SF 2016 --- Moving forward through the darkness

About Chia-Chi (George) ● Organizer of Taiwan R User Group and MLDM Monday● 7 years experience in quantitative trading in future & option market● 5 years consultant experience in machine learning & data mining● 4 years experience in e-commerce (consultant & join SaaS teams)● 4 years experience in building of recommendation and search engine ● Volenteer in PyCon APAC 2014 (program officer)● Volenteer in PyCon APAC 2015 (program officer)● I love python and hope I can write python everyday !

Page 4: PyData SF 2016 --- Moving forward through the darkness

Training models from data

is just like scketching pictures from the world

Page 5: PyData SF 2016 --- Moving forward through the darkness
Page 6: PyData SF 2016 --- Moving forward through the darkness

Jackson Pollock:The painting has a life of its own.

I try to let it come through.

Page 7: PyData SF 2016 --- Moving forward through the darkness

As a data scientist … The data has a life of its own. I just try to let it come through.

Page 8: PyData SF 2016 --- Moving forward through the darkness
Page 9: PyData SF 2016 --- Moving forward through the darkness

The first step … is not picking up your pen !

is choosing an angle !and do some observation !

Page 10: PyData SF 2016 --- Moving forward through the darkness
Page 11: PyData SF 2016 --- Moving forward through the darkness

Now, the object you want to sckeching ...is your data!

Page 12: PyData SF 2016 --- Moving forward through the darkness

Try to scketch it : y = a_0 + a_1 x

Page 13: PyData SF 2016 --- Moving forward through the darkness

Which line is the most similar one ?

It depends on you observation angle!

Page 14: PyData SF 2016 --- Moving forward through the darkness

What the angle means ...in a machine learning problem ?

Page 15: PyData SF 2016 --- Moving forward through the darkness

Angle of Linear Regression

Page 16: PyData SF 2016 --- Moving forward through the darkness
Page 17: PyData SF 2016 --- Moving forward through the darkness
Page 18: PyData SF 2016 --- Moving forward through the darkness
Page 19: PyData SF 2016 --- Moving forward through the darkness

After Chose an angle ... You chose the question & the evaluator ...

(as a data scientist)

Page 20: PyData SF 2016 --- Moving forward through the darkness
Page 21: PyData SF 2016 --- Moving forward through the darkness

How to change the angle?

In the Linear Regression Problem

Page 22: PyData SF 2016 --- Moving forward through the darkness
Page 23: PyData SF 2016 --- Moving forward through the darkness
Page 24: PyData SF 2016 --- Moving forward through the darkness
Page 25: PyData SF 2016 --- Moving forward through the darkness

The MetaphorData (the object) +

Evaluator (view of point | angle)=> Model (picture)

Page 26: PyData SF 2016 --- Moving forward through the darkness
Page 27: PyData SF 2016 --- Moving forward through the darkness

Different Angles ... Different Pictures …

(in scketching)

Page 28: PyData SF 2016 --- Moving forward through the darkness

Different questions ... Different models …

(in data science)

Page 29: PyData SF 2016 --- Moving forward through the darkness

Whatever you observe … Whatever you draw !

(both in scketching and data science)

Page 30: PyData SF 2016 --- Moving forward through the darkness

The two keysHelp you apply machine learning

in the real world

Page 31: PyData SF 2016 --- Moving forward through the darkness

Can Learn ONLYThrough Real

Practice

Can Learn fromSchool or Practice

Page 32: PyData SF 2016 --- Moving forward through the darkness

Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?

Page 33: PyData SF 2016 --- Moving forward through the darkness

Case Study:How to build

a Recommendation System in News Platform

Page 34: PyData SF 2016 --- Moving forward through the darkness

User-Centered Recommendation

News you probabily also want to read

Page 35: PyData SF 2016 --- Moving forward through the darkness

Platform

Tracking

Users Behavior

Data Feed Response

News Data Results

Machine Learning

Server Group

Server Group

Processing Data Prediction Data

Page 36: PyData SF 2016 --- Moving forward through the darkness

Modeling Procedures -- Part 1:● Problem: how to make users reach more news they want to read ?● Data:

○ News Data (Article)■ Title■ Text■ Time■ Category

○ User Behavior Data ■ User View Post■ User Click Links■ User not Click Links

Page 37: PyData SF 2016 --- Moving forward through the darkness

Modeling Procedures -- Part 2:● Data to Vector (or Tensors)

○ News Data■ TermDocumentMatrix (scikit-learn)■ Word2Vec (gensim word2vec)

○ User Behavior Data ■ Event Sampling (Spark streaming, Kafka, or Traildb)

● construct user-item matrix (user view|click|not-click events)● construct item-item matrix (view-after-view click-after-view ... )● construct user-item-time tensor cube ● construct user-item-item-time tensor cube

Page 38: PyData SF 2016 --- Moving forward through the darkness

Modeling Procedures -- Part 3:● Real to ML (or Math) and Solve ML (or Math) Problems

○ Real Problem: how to make users reach more news they want to read ?○ ML (or Math) Problems:

■ Hottest & Newest ■ Content-Based Relations■ Collaborative Filtering

Page 39: PyData SF 2016 --- Moving forward through the darkness

Newest & Hottest : Sorting

Page 40: PyData SF 2016 --- Moving forward through the darkness

Content-Based Relations: ClusteringWith

TermDocumentMatrix

Or

The results coming from word2vec

Page 41: PyData SF 2016 --- Moving forward through the darkness
Page 42: PyData SF 2016 --- Moving forward through the darkness
Page 43: PyData SF 2016 --- Moving forward through the darkness
Page 44: PyData SF 2016 --- Moving forward through the darkness

Collaborative Filtering: MF & Matrix Completion

Page 46: PyData SF 2016 --- Moving forward through the darkness

Modeling Procedures -- Part 4:● Combine Solutions together and Check it with Real Problem

○ Ensemble Learning (for static combination)○ Reinforcement Learning (for dynamic improvement)

Page 47: PyData SF 2016 --- Moving forward through the darkness
Page 48: PyData SF 2016 --- Moving forward through the darkness
Page 49: PyData SF 2016 --- Moving forward through the darkness
Page 50: PyData SF 2016 --- Moving forward through the darkness

Recap: Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?

Page 51: PyData SF 2016 --- Moving forward through the darkness

The Blindnessin the Modeling Procedures

Page 52: PyData SF 2016 --- Moving forward through the darkness

Blindness of Modeling Procedures:

● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?

Page 53: PyData SF 2016 --- Moving forward through the darkness

Problem Data

Probelm-Driven:

Thinking Data

Through Problem

Data-Driven:Thinking Problem

Through Data

Problem behind

Problem

Information behind Data

BusinessInsights

Page 54: PyData SF 2016 --- Moving forward through the darkness

The Blindness betweenData and Problem

Is there any related information in that data ?Could the problem answer by this data ?

Page 55: PyData SF 2016 --- Moving forward through the darkness
Page 56: PyData SF 2016 --- Moving forward through the darkness

Case Study: Bookstore Could you use our POS data

to find some methods to convertthose users who originally dislike us?

Page 57: PyData SF 2016 --- Moving forward through the darkness

Case Study: Bookstore Could you use our POS data to find the potential users ?

"potential" means users want to buy itbut they haven't

Page 58: PyData SF 2016 --- Moving forward through the darkness
Page 59: PyData SF 2016 --- Moving forward through the darkness

Data from POSONLY has the information

about converted users

There is no information about disliked and unconverted users

Page 60: PyData SF 2016 --- Moving forward through the darkness

Thinking in Two WaysData-Driven

Problem-Driven

Page 61: PyData SF 2016 --- Moving forward through the darkness

Problem???

POSData

Probelm-Driven:

Thinking Data

Through Problem

Data-Driven:Thinking Problem

Through Data

Problem behind

Problem

Information behind Data

BusinessInsights

Page 62: PyData SF 2016 --- Moving forward through the darkness

Gain Bookstore's Revenue ?

Data???

Probelm-Driven:

Thinking Data

Through Problem

Data-Driven:Thinking Problem

Through Data

Problem behind

Problem

Information behind Data

BusinessInsights

Page 63: PyData SF 2016 --- Moving forward through the darkness

Case Study: LBS Food Search

a story about "delicous" is not delicous !

(this is also the blindness of NLP)

Page 64: PyData SF 2016 --- Moving forward through the darkness

Data ThinkingFirst-Hand Versus Second-Hand

for example, delicous versus "delicous"

Page 65: PyData SF 2016 --- Moving forward through the darkness

Machine could NOT Learn by itself.

It just like a child.It learn by training data !

sometimes would learn badly!

Page 66: PyData SF 2016 --- Moving forward through the darkness

Blindness of Modeling Procedures:● Choose a Real Problem● Collecting Related Data

● Choose a method convert Data to Vectors (or Tensors)

● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?

Page 67: PyData SF 2016 --- Moving forward through the darkness

The Blindness FromData to Vector

Is there any information losing when you are converting your data?

Page 68: PyData SF 2016 --- Moving forward through the darkness

Blindness of unigramI love it (我愛它) = it love me (它愛我)I hate it (我恨它) = it hate me (它恨我)

Page 69: PyData SF 2016 --- Moving forward through the darkness

The Blindness FromMathematical Concept

The gap between math and real world:

When putting the units back to the formula …

Page 70: PyData SF 2016 --- Moving forward through the darkness

Math in Elementary School … The secret behind the minus operator● 103 - 100 = 6 - 3 ?● 103 dollars - 100 dollars = 6 dollars - 3 dollars ?● 103 dollar stock - 100 dollars stock = 6 dollars stock - 3 dollars stock ?

(formula)(units)

● (103 - 100 = 6 - 3) (dollars)● (103 - 100 = 6 - 3) (dollars stock)

How to choose the right coordinate for stock price ?

Page 71: PyData SF 2016 --- Moving forward through the darkness

Blindness of Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)

● Decompose Real Problem into several ML or Math Problems

● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?

Page 72: PyData SF 2016 --- Moving forward through the darkness

The Blindness FromML Frameworks

Classification & Clustering

Page 73: PyData SF 2016 --- Moving forward through the darkness

When orange-apple classifier meet an banana?

Page 74: PyData SF 2016 --- Moving forward through the darkness

When a digital classifier meet an alphabet ?

A -> 9

Page 75: PyData SF 2016 --- Moving forward through the darkness

The blindness of clustering methods

Page 76: PyData SF 2016 --- Moving forward through the darkness

Cannot force two points in the same cluster

Page 77: PyData SF 2016 --- Moving forward through the darkness

The fact is … We always get some data with labels

But some without

(1) How to propograte labels ?(2) How to detect new labels with labelers?

Page 78: PyData SF 2016 --- Moving forward through the darkness

New Data & New Labelsare coming all the way

In e-commerce retailers &In news platforms

Page 79: PyData SF 2016 --- Moving forward through the darkness

What we need … (1) Classifier just like a clustering method

(one-versus-all incremental classifier)

(2) Clustering Method just like a Classifier(Metric Learning)

Page 80: PyData SF 2016 --- Moving forward through the darkness

one-versus-all incremental classifier

NotClass 1

NotClass 2

Page 82: PyData SF 2016 --- Moving forward through the darkness

Actually … You can also use deep neural network

to construct the metric learning staff

Page 83: PyData SF 2016 --- Moving forward through the darkness

Metric LearningAlways give me a whole new angle

to observe the world

Page 84: PyData SF 2016 --- Moving forward through the darkness

Remember that ... !Whatever you observe …

Whatever you draw !

Page 85: PyData SF 2016 --- Moving forward through the darkness

Thanks for your attention!