Donors Choose Project (1)

Post on 16-Apr-2017

89 views 0 download

Transcript of Donors Choose Project (1)

Funding Education through Donors ChooseGeneral Assembly 2016Fernando Hidalgo

Problem Description

Task: Predict Whether a Donor’s Choose Project will get FundedExperience: Donor’s Choose Data from Sept 2002 - CurrentlyPerformance: Classification Accuracy, the Number of correct prediction out of all predictions made.

The Data

LabelsCompleted: 592,757

&

Expired:261,536Class Skewness:

Use F1 Score as a way to use recall and precision in check.

Baseline: .69

Features Abbreviations Descriptions

total_price_excluding_optional_support Total Price of the Project (integer)(dollars)

students_reached # of students that are project reaches(integer)

school_type Types of School:Charter, magnet, year_round, nlns, kipp, Charter_ready_promise(categorical)

date_posted Day that the project was posted(categorical)

resource_type Type of Resources the project asks(categorical)

grade_level The Grade Level of the Project(categorical

poverty_level Poverty Level (categorial)

school_state From what state the project is posted(categorical)

Eligible_double_your_impact_matchWhether it was eligible to be matched(categorical

teacher_prefix The Prefix of the Teacher Posting(categorical)

primary_focus_area The Project’s Primary Area of Focus(categorical)

primary_focus_subject The Project’s Primary Subject of Focus(categorical)

Original Feature

s

Feature Engineering

New Features Description

price_per_student total_price/students_reached

project_length Date_expiration - date_posted

month_posted Extracted from date_posted

day_posted Extracted from date_posted

Visualizations

Rate of Projects Funded to Total Projects per Resource

Rate of Projects Funded to Total Projects per Month

Rate of Projects Funded to Total Projects per Grades

Rate of Projects Funded to Total Projects per Primary Focus Area

Rate of Projects Funded to Total Projects per Teacher Prefix

Rate of Projects Funded to Total Projects per Poverty Level

Relationship Between Project Length and Funding

Relationship Between Project Price and Funding

Relationship Between Price per Student and Funding

Predictive Model

The 3 Models:

1.AdaBoost

2.Random Forest

3.Logistic Regression

GridSearch Accuracy Scores

using F1 Score Metric

Model Accuracy Best Parameter

Random Forest 0.759 Criterion: Entropy

AdaBoost .7676 N_estimators: 60

Logistic Regression 0.811 Penalty: L2

Simplest Model with Best Score:Logistic Regression

Checking Feature Significance:

Using Random Forest Classifier

The top 5 Features Seem to Have Most of the Predictive Power

Using Only the 5 Most Significant Features

1. Total_price_excluding_optional_su

pport

2. Eligible_double_your_impact_match

3. Resource_Type_Books

4. Resource_Type_Technology

5. price_per_student

New Score withLogistic Regression:

.8171

Overview● Model Improvement of .1271 over the baseline using

Logistic Regression with F1 Score.

● Most of Predictive Power Lies in 5 Features

● Ethical Implications:○ The features with the most predictive power are not

ones that can be changed without fabrication

Model Improvements Add Prescriptive Data:

Project Essays Project Materials

Use Data Based on Location:Census

Skewed Data:Find Reasons

Methods