Puttinggpg yg the Spotlight on Identifying Students … Fora/Forum2011...Attrition within Australian...
Transcript of Puttinggpg yg the Spotlight on Identifying Students … Fora/Forum2011...Attrition within Australian...
AAIR 2011 Forum
Putting the Spotlight on Identifying Students at Risk of Attritiong p g y gA Case Study in Applied Data Analytics
© 2011 IBM Corporation
Nathan Banks, IBMDean Ward, Edith Cowan University
Attrition within Universities
All Universities are exposed to the Attrition
Large resources are devoted to reduce Attrition with varying impacts and Large resources are devoted to reduce Attrition with varying impacts and results
Attrition occurs at various points, with some major points being –Attrition occurs at various points, with some major points being oApplications Offered to Enrolments CommencedoEnrolment Commenced to First Census DateoFirst Census to end of 1st SemesteroFirst Census to end of 1 Semestero1st Semester to End of 1st Yearo1st Year to 2nd Year
2nd Year to 3rd Yearo2nd Year to 3rd Year
E h U i i ’ “ fil ” i diff h h f ll i iEach University’s “profile” is different, however the following is a generalised view -
© 2011 IBM Corporation2
Attrition within Australian Universities – Completion Rate
100%
80%90%
100%
60%70%80%
30%40%50%
10%20%30% Lowest Mid Highest
0%Commence Commence
and make itMake it
through theMake it
through toMake it
through toand make it to first
Census Date
through the following
Year
through to the end of
Year 2
through to the end of
Year 3
© 2011 IBM Corporation3
Predictive Analytics and Attrition
L ’ f i i i hLot’s of activity in the sector –
1) New York – Mayor Bloomberg – “get the Completion Rate up” from 46.5% in 2005 A l ti d t U i’ 61% (O t b 2011)2005 – Analytics used at some Uni’s – now 61% (October, 2011)
2) Models deployed and in use at a number of Universities in the USA - US Coast Guard Academy Arizona/Nevada Uni’s implement and improve retentionCoast Guard Academy, Arizona/Nevada Uni s implement and improve retention by 4% - “We have limited resources in terms of our student assistance program, and we want to make sure that we engage the right students and are not spending time on students that really don’t need the help ”not spending time on students that really don t need the help.
3) Bill & Melinda Gates Foundation – awarded $1M (USD) to unify data from Five3) Bill & Melinda Gates Foundation awarded $1M (USD) to unify data from Five Uni’s to “demonstrate the use of predictive analytics methods for improving student outcomes.” (May, 2011, Western Interstate Commission for Higher Education))• 640,000 anonymous student records• 3,000,000 Unit records• 33 common variables
© 2011 IBM Corporation4
• First finding were due to be released on the 28th October
ECU and IBM – Predictive Analytics
ECU in partnership with IBM sought to -
1)Determine a probability of Attrition for major student cohorts – Domestic Students, Undergraduate and certain Post Graduates through the use of a 450 variable dataset
2)Provide a drill through to how the probability was arrived at ie the driving variables
© 2011 IBM Corporation5
Overview
Agreed Scope of Modelling – Degree Type
Agreed Scope of Modelling – Semester of Study
Dealing with False Attrition – Combining/Excluding Course switchers
Model Building Approach
Predictive Characteristics Before 1st Semester
Attrition Model before Semester 1
Attrition Timing During 1st Semester
Predictive Characteristics Before 2nd SemesterPredictive Characteristics Before 2 Semester
Attrition During 2nd Semester
Model Score DistributionsModel Score Distributions
Recommendations
K Fi di
© 2011 IBM Corporation
Key Findings
6
Agreed Scope of Modelling – Degree Type
Students 2009-2011
39,655
Higher Education
38 307(97%)
VET, Other1,348 (3%) 38,307(97%)
UndergraduateEnablingPostgraduate Cross Institutn& Other g
23,132 (58%)g
3,021 (8%)g
11,881 (30%) & Other1,180 (3%)
Dip, Adv Dip & Ass Dip
108 (0%)
Bachelor Pass21,808 (96%)
Postgraduate Research506 (1%)
Postgraduate Course work11,375 (29%)
Ba Hons, grad entry. Ass Deg
5,045 (8%)
Grad Cert2,316 (20%)
Grad Dip & PG Dip
3 348 (30%)
Masters Coursework5 696 (50%)
Doctorate Coursework
15 (0%)
© 2011 IBM Corporation7
, ( )3,348 (30%)5,696 (50%)15 (0%)
Agreed Scope of Modelling – Semester of Study
Attrition Rate*
14%
16%
18%
Attrition Rate
8%
10%
12%
14%
4%
6%
8%
0%
2%
1 2 3 4 5
Semester of Study
* Attrition is for students enrolled between 2009-2010 in major campuses and excludes
© 2011 IBM Corporation
o s o s ude s e o ed be ee 009 0 0 ajo ca puses a d e c udesstudents who start a second course at ECU in that time.
8
Scope of Modelling – Semester of Study
Over 10% of students move course in the observation periodOver 10% of students move course in the observation period.
If a course code changes from A to B, then a students results need to be combined, failure to do so creates false attrition.combined, failure to do so creates false attrition.
If a student leaves a course A for a completely different course Z then students results should not be combined, or progress in course Z will be overestimated.
There are many cases where the degree of overlap between courses attempted by a student is uncertain e g from double degree to singleattempted by a student is uncertain e.g. from double degree to single degree, same course but different major, same field of study but different course.
Dean and myself made a case-by-case call to combine 735 courses and exclude 4000 students who switched courses guided by the following information:information:
oCourse CodeoCourse description
Co rse stat s (acti e inacti e pre e pire)
© 2011 IBM Corporation
oCourse status (active, inactive, pre-expire), oCourse field of study, same or differentoNumber students moving between courses9
Model Building Approach
1. Remove exclusions, and split students completing their first ,second and third semester into separate datasets.p
2. Split each modelling dataset into a training (75%) and validation (25%) dataset.
3. Classify students as attrite before next semester or don’t attrite before next semester.
4. Apply modelling techniques (neural networks, support vector machines, rule induction, logistic regression) to predict student attrition based on characteristics of students in the training dataset.g
5. Choose the best performing models based on their ability to predict attrition of students in the validation sample, which were not used to b ild h d lbuild the model.
© 2011 IBM Corporation10
Predictive Variables Before 1st SemesterDetailed Basis Of AdmDetailed Basis Of AdmDetailed Basis Of Adm
Narrow Funding CategoryAge At EnrolmentAttendance Type
Broad CitizenshipAdvanced Standing
Detailed Basis Of AdmNarrow Funding Category
Age At EnrolmentAttendance Type
Broad CitizenshipAdvanced StandingAdvanced Standing
Birth CountryCourse Credit Points Required
Course Total EFTSLYear Arrival
Detailed Course Level
Advanced Standing Birth Country
Course Credit Points Required Course Total EFTSLYear Arrival
Detailed Course LevelHome CampusDetailed Course LevelHome Campus
Years To Complete CourseHome Language
Year Completed SchoolCourse Size at Enrolment
pYears To Complete Course
Home LanguageYear Completed School
Course Size at EnrolmentPostgraduate/UndergraduateCourse Size at Enrolment
Postgrad/UndergradBroad Basis Of AdmissionBroad Field Of Education
National SES StatusNon English Speaking Background
g gBroad Basis Of AdmissionBroad Field Of Education
National SES Status Non-English Speaking Background
Highest Parent EducationNon-English Speaking BackgroundHighest Parent Education
ScholarshipCompetence In English
Secondary School National SESCurrent Course Offer Round
ScholarshipCompetence In English
Secondary School National SES Current Course Offer Round
Year 12 Student Count( )
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Current Course Offer RoundYear 12 Student Count
Tertiary Entrance Rank (TER)
Predicivity (Information Value)
Tertiary Entrance Rank ( TER )Start Semester (1 or 2)
© 2011 IBM Corporation11
Predicivity (Information Value)
Attrition Model before Semester 1
Due to the non-linear, interactive and categorical nature of the predictive characteristics a CHAID (Chi-squared Automatic Interaction Detector) tree was found to perform best.
The Natural first split of the CHAID tree is narrow funding category.
We force a first split undergraduate/postgraduate split which is highly correlated with narrow funding category.
Funding Category Postgrad Undergrad Overall %g g y g gCommonwealth Government Supported 35% 86% 70%Domestic Tuition Fee 47% 0% 15%Fee-paying Overseas 18% 13% 15%UNKNOWN 1% 0% 1%T t l 100% 100% 100%
© 2011 IBM Corporation12
Total 100% 100% 100%
First Semester Undergraduate – First 13 Nodes
© 2011 IBM Corporation13
First Semester Postgraduate – First 14 Nodes
© 2011 IBM Corporation14
Predictivity of Postgrad/Undergrad Models
Semester 1 Predictivity
CHAIDUndergrad
Postgrad
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
© 2011 IBM Corporation
Predictivity (Gini)
15
Detailed Basis Of Admission
© 2011 IBM Corporation37
Detailed Basis Of Admission
© 2011 IBM Corporation38
Narrow Funding Category
Commonwealth Government supported students make up majority of the population (80%)
Domestic Tuition fee paying students Funding category is closely related to undergraduate/postgraduate
Unknown represents incomplete data for 120 students (<1%)
© 2011 IBM Corporation
p p ( )
39
Age At Enrolment (Undergraduate)
© 2011 IBM Corporation40
Age At Enrolment (Postgraduate)
© 2011 IBM Corporation41
Attendance Type
© 2011 IBM Corporation42
Broad Citizenship
© 2011 IBM Corporation43
Advanced Standing
© 2011 IBM Corporation44
Birth Country
© 2011 IBM Corporation45