Post on 12-Jan-2017
Team: NCCU
A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan, Ming-Feng Tsai, Yi-Hsuan Yang, Hsin-Ping Chen, Pei-Wen Yeh, and Sin-Ya Peng
Research Center for Information Technology Innovation, Academia Sinica
Department of Computer Science,National Chengchi University
Team: NCCU
A Linear Ensemble of Classification Models with Novel Backward Cumulative Features for MOOC Dropout Prediction
Chih-Ming Chen, Man-Kwan Shan, Ming-Feng Tsai, Yi-Hsuan Yang, Hsin-Ping Chen, Pei-Wen Yeh, and Sin-Ya Peng
Research Center for Information Technology Innovation, Academia Sinica
Department of Computer Science,National Chengchi University
linearly combination of several models
the proposed data engineering method
it’s able to generate a bunch of distinct feature sets
Key Point Summary
Latent Space Representation — Clustering Model — Skip-Gram Model
Backward Cumulative Features— Generate 30 distinct sets of features
Linear Model +
Tree-based Model
alleviate the feature sparsity problem
alleviate the bias problem of statistical feature
good matchweakness when using sparse feature
Workflow
Train Data(75%)
Validate Data(25%)
Train Data
0"
50000"
100000"
150000"
200000"
10/27/2013"
11/27/2013"
12/27/2013"
1/27/2014"
2/27/2014"
3/27/2014"
4/27/2014"
5/27/2014"
6/27/2014"
7/27/2014"
Training'Date'Distribu.on�
0"20000"40000"60000"80000"
100000"120000"140000"
10/27/2013"
11/27/2013"
12/27/2013"
1/27/2014"
2/27/2014"
3/27/2014"
4/27/2014"
5/27/2014"
6/27/2014"
7/27/2014"
Tes.ng'Date'Distribu.on�
Split the training data based on the time distribution.
stable results
— 2 settings
Workflow
Train Data(75%)
Validate Data(25%)
LearnedModel Offline
Evaluation
Test Data
Train Data
Submissionmethod 1
cross-validation
— 2 settings
check if it leads to better performance
Workflow
Train Data(75%)
Validate Data(25%)
LearnedModel Offline
Evaluation
Test Data
Train Data
LearnedModel
Submissionmethod 2
— 2 settings
check if it leads to better performance
Workflow
Train Data(75%)
Validate Data(25%)
LearnedModel Offline
Evaluation
Test Data
Train Data
LearnedModel
Submission
— 2 settings
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
RawData
Support Vector Classifier
Student
Course
Time
A classical approach to a general prediction task.
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
Gradient Boosting Decision Trees
RawData
Support Vector Classifier
Student
Course
Time
Backward Cumulation
the feature engineering towards the MOOC dataset.
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
Gradient Boosting Decision Trees
Linear Combination
Final Prediction
RawData
Support Vector Classifier
Student
Course
Time
Backward Cumulation
— 2 solutions
Features
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
Gradient Boosting Decision Trees
Linear Combination
Final Prediction
RawData
Support Vector Classifier
Student
Course
Time
Backward Cumulation
solution 1
solution 2
xgboost
scikit-learn
— 2 solutions
http://scikit-learn.org/stable/https://github.com/dmlc/xgboost
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
Gradient Boosting Decision Trees
Linear Combination
Final Prediction
RawData
Support Vector Classifier
Student
Course
Time
Backward Cumulation
Feature Extraction / Feature Engineering
— 2 solutions
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature— Boolean (0/1) — Term Frequency (TF)
• Probability Value — Ratio— Naive Bayes
• Latent Space Feature — Clustering — DeepWalk
RawData
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature— Boolean (0/1) — Term Frequency (TF)
• Probability Value — Ratio— Naive Bayes
• Latent Space Feature — Clustering — DeepWalk
RawData
describing the status
e.g.the month of the course
the number of registration
video 5
problem 10
wiki 0
discussion 2
navigation 0
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature— Boolean (0/1) — Term Frequency (TF)
• Probability Value — Ratio— Naive Bayes
• Latent Space Feature — Clustering — DeepWalk
RawData
dropout probability
P( dropout|containing objects )
:= P(O1|dropout) … P(Od|dropout)
O = {O1, O2, …, Od}objects:
e.g.the dropout ratio of the course
estimate the probability from observed data
Feature Extraction
Student
Course
Time
Enrolment ID
• Bag-of-words Feature— Boolean (0/1) — Term Frequency (TF)
• Probability Value — Ratio— Naive Bayes
• Latent Space Feature — Clustering — DeepWalk
RawData
Latent Topic
K-means Clustering on 1. registered courses 2. containing objects
some features are sparse
DeepWalk / Skip-Gram for obtaining a dense feature representation
DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
DeepWalk
https://github.com/phanein/deepwalk
The Goal — Find the representation of each node of a graph.
It’s an extension work of word2vec’s Skip-Gram model.
The core is to model the context information. (in practical, the node’s neighbours)
Similar objects are mapped into similar space.
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
Course E
https://github.com/phanein/deepwalk
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1 Course B
Course E
Course CU2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graphas the sentence.
From DeepWalk to the MOOC Problem
U1
U2
U3
Course A
Course B
Course C
Course D
U1 Course B
Course E
Course CU2 U1
Random Walk
https://github.com/phanein/deepwalk
Treat Random Walks on heterogeneous graphas the sentence.
U10.3 0.2 -0.1 0.5 -0.8 Course B 0.1 0.3 -0.5 1.2 -0.3
Performance
Bag-of-words Bag-of-words
Probability
Bag-of-words
Probability
Naive Bayes
Bag-of-words
Probability
Naive Bayes
Latent Space
> 0.890 > 0.901 > 0.902 > 0.903
Backward Cumulation
ModelsCombination
Backward Cumulative Features — Motivation
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Logs Table10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features — Motivation
O X O X O O X O O X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
different period
Logs Table
different number of logs
10/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=210/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=310/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=410/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
O X O X O O X O X X X X X X O
X X X X O X X X X X O O O X X
X X X X X X X X X X X O X O X
Consider only the logs in last N days. N=510/13 10/14 10/15 10/16 10/17 10/18 10/19 10/20 10/21 10/22 10/23 10/24 10/25 10/26 10/27
U1
U2
U3
Backward Cumulative Features
RawData
Student
Course
Time
Backward Cumulation
.
.
.
Feature Set 1N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3...
Feature Set 29
Feature Set 30
— 2 strategies
Backward Cumulative Features
RawData
Student
Course
Time
Backward Cumulation
.
.
.
Feature Set 1N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3...
Feature Set 29
Feature Set 30
Classifier
Strategy 1.Concatenate all features.
— 2 strategies
Backward Cumulative Features
RawData
Student
Course
Time
Backward Cumulation
.
.
.
Feature Set 1N=1
N=2
N=3
N=29
N=30
Feature Set 2
Feature Set 3...
Feature Set 29
Feature Set 30
Classifier
Classifier
Classifier
Classifier
Classifier
Strategy 2.Build 30 distinct models.
Average
— 2 strategies
Prediction Model Overview
Logistic Regression
Gradient Boosting Classifier
Gradient Boosting Decision Trees
Linear Combination
Final Prediction
RawData
Support Vector Classifier
Student
Course
Time
Backward Cumulation
solution 1 * 0.5
solution 2 * 0.5
xgboost
scikit-learn
What We Learned from the Competition
• Team Work is important — share ideas— share solutions
• Model diversity & feature diversity — diverse models / features can capture different characteristic of the data
• Realize the data — the goal— the evaluation metric— the data structure
What We Learned from the Competition
• Team Work is important — share ideas— share solutions
• Model diversity & feature diversity — diverse models / features can capture different characteristic of the data
• Realize the data — the goal— the evaluation metric— the data structure
Start earlier …
What We Learned from the Competition
• Team Work is important — share ideas— share solutions
• Model diversity & feature diversity — diverse models / features can capture different characteristic of the data
• Realize the data — the goal— the evaluation metric— the data structure
Start earlier …
Feature FormatData Partition
Feature Scale
several things to be discussede.g.
changecandy [at] gmail.com
Any Question?