PyData SF 2016 --- Moving forward through the darkness
-
Upload
chia-chi-chang -
Category
Data & Analytics
-
view
316 -
download
1
Transcript of PyData SF 2016 --- Moving forward through the darkness
Moving Forward ThroughThe Darkness
the blindness of modeling and how to break through
Chia-Chi@PyData SF 2016
About Chia-Chi (George) ● Organizer of Taiwan R User Group and MLDM Monday● 7 years experience in quantitative trading in future & option market● 5 years consultant experience in machine learning & data mining● 4 years experience in e-commerce (consultant & join SaaS teams)● 4 years experience in building of recommendation and search engine ● Volenteer in PyCon APAC 2014 (program officer)● Volenteer in PyCon APAC 2015 (program officer)● I love python and hope I can write python everyday !
Training models from data
is just like scketching pictures from the world
Jackson Pollock:The painting has a life of its own.
I try to let it come through.
As a data scientist … The data has a life of its own. I just try to let it come through.
The first step … is not picking up your pen !
is choosing an angle !and do some observation !
Now, the object you want to sckeching ...is your data!
Try to scketch it : y = a_0 + a_1 x
Which line is the most similar one ?
It depends on you observation angle!
What the angle means ...in a machine learning problem ?
Angle of Linear Regression
After Chose an angle ... You chose the question & the evaluator ...
(as a data scientist)
How to change the angle?
In the Linear Regression Problem
The MetaphorData (the object) +
Evaluator (view of point | angle)=> Model (picture)
Different Angles ... Different Pictures …
(in scketching)
Different questions ... Different models …
(in data science)
Whatever you observe … Whatever you draw !
(both in scketching and data science)
The two keysHelp you apply machine learning
in the real world
Can Learn ONLYThrough Real
Practice
Can Learn fromSchool or Practice
Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?
Case Study:How to build
a Recommendation System in News Platform
User-Centered Recommendation
News you probabily also want to read
Platform
Tracking
Users Behavior
Data Feed Response
News Data Results
Machine Learning
Server Group
Server Group
Processing Data Prediction Data
Modeling Procedures -- Part 1:● Problem: how to make users reach more news they want to read ?● Data:
○ News Data (Article)■ Title■ Text■ Time■ Category
○ User Behavior Data ■ User View Post■ User Click Links■ User not Click Links
Modeling Procedures -- Part 2:● Data to Vector (or Tensors)
○ News Data■ TermDocumentMatrix (scikit-learn)■ Word2Vec (gensim word2vec)
○ User Behavior Data ■ Event Sampling (Spark streaming, Kafka, or Traildb)
● construct user-item matrix (user view|click|not-click events)● construct item-item matrix (view-after-view click-after-view ... )● construct user-item-time tensor cube ● construct user-item-item-time tensor cube
Modeling Procedures -- Part 3:● Real to ML (or Math) and Solve ML (or Math) Problems
○ Real Problem: how to make users reach more news they want to read ?○ ML (or Math) Problems:
■ Hottest & Newest ■ Content-Based Relations■ Collaborative Filtering
Newest & Hottest : Sorting
Content-Based Relations: ClusteringWith
TermDocumentMatrix
Or
The results coming from word2vec
Collaborative Filtering: MF & Matrix Completion
Use only 20% data to re-generate full image !
Ref: ipynb@github
Modeling Procedures -- Part 4:● Combine Solutions together and Check it with Real Problem
○ Ensemble Learning (for static combination)○ Reinforcement Learning (for dynamic improvement)
Recap: Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?
The Blindnessin the Modeling Procedures
Blindness of Modeling Procedures:
● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?
Problem Data
Probelm-Driven:
Thinking Data
Through Problem
Data-Driven:Thinking Problem
Through Data
Problem behind
Problem
Information behind Data
BusinessInsights
The Blindness betweenData and Problem
Is there any related information in that data ?Could the problem answer by this data ?
Case Study: Bookstore Could you use our POS data
to find some methods to convertthose users who originally dislike us?
Case Study: Bookstore Could you use our POS data to find the potential users ?
"potential" means users want to buy itbut they haven't
Data from POSONLY has the information
about converted users
There is no information about disliked and unconverted users
Thinking in Two WaysData-Driven
Problem-Driven
Problem???
POSData
Probelm-Driven:
Thinking Data
Through Problem
Data-Driven:Thinking Problem
Through Data
Problem behind
Problem
Information behind Data
BusinessInsights
Gain Bookstore's Revenue ?
Data???
Probelm-Driven:
Thinking Data
Through Problem
Data-Driven:Thinking Problem
Through Data
Problem behind
Problem
Information behind Data
BusinessInsights
Case Study: LBS Food Search
a story about "delicous" is not delicous !
(this is also the blindness of NLP)
Data ThinkingFirst-Hand Versus Second-Hand
for example, delicous versus "delicous"
Machine could NOT Learn by itself.
It just like a child.It learn by training data !
sometimes would learn badly!
Blindness of Modeling Procedures:● Choose a Real Problem● Collecting Related Data
● Choose a method convert Data to Vectors (or Tensors)
● Decompose Real Problem into several ML or Math Problems● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?
The Blindness FromData to Vector
Is there any information losing when you are converting your data?
Blindness of unigramI love it (我愛它) = it love me (它愛我)I hate it (我恨它) = it hate me (它恨我)
The Blindness FromMathematical Concept
The gap between math and real world:
When putting the units back to the formula …
Math in Elementary School … The secret behind the minus operator● 103 - 100 = 6 - 3 ?● 103 dollars - 100 dollars = 6 dollars - 3 dollars ?● 103 dollar stock - 100 dollars stock = 6 dollars stock - 3 dollars stock ?
(formula)(units)
● (103 - 100 = 6 - 3) (dollars)● (103 - 100 = 6 - 3) (dollars stock)
How to choose the right coordinate for stock price ?
Blindness of Modeling Procedures:● Choose a Real Problem● Collecting Related Data● Choose a method convert Data to Vectors (or Tensors)
● Decompose Real Problem into several ML or Math Problems
● Solve each ML or Math Problem individually ● Combine the Solutions of all ML or Math Problems● Check is that truly solve the Real Problem ?
The Blindness FromML Frameworks
Classification & Clustering
When orange-apple classifier meet an banana?
When a digital classifier meet an alphabet ?
A -> 9
The blindness of clustering methods
Cannot force two points in the same cluster
The fact is … We always get some data with labels
But some without
(1) How to propograte labels ?(2) How to detect new labels with labelers?
New Data & New Labelsare coming all the way
In e-commerce retailers &In news platforms
What we need … (1) Classifier just like a clustering method
(one-versus-all incremental classifier)
(2) Clustering Method just like a Classifier(Metric Learning)
one-versus-all incremental classifier
NotClass 1
NotClass 2
Metric Learning (shugon)
Actually … You can also use deep neural network
to construct the metric learning staff
Metric LearningAlways give me a whole new angle
to observe the world
Remember that ... !Whatever you observe …
Whatever you draw !
Thanks for your attention!