Aspiring Minds

www.aspiringminds.com

Grading Programs using Machine Learning

Varun Aggarwal

Presented at KDD, 2014

Programming Assessments: Existing solutions

• Manual evaluation: Can’t scale; not standardized

• Test-case based evaluation:

• High false-positives – hard code, inadvertent errors

• High false-negatives – correct code but not efficient

• Similarity metric between control flow graphs, syntax trees:

• Need to handle multiple correct implementations – theoretically doesn’t fit in

• No mapping of metric to an objective feedback

Automatic grading of programs– Why?

• Widely performed - will help professors and TAs save a lot of time.

• Companies can recruit efficiently

• MOOCs - need automated open response assessments to really make it effective. True scaling of such system currently not achieved.

A model to predict the logical correctness of a program, given the control and data

dependencies it possesses.

Our Approach

Automata – Automatic program evaluation engine

Machine Learning based scoring engine

Evaluation of programming best practices

Asymptotic complexity evaluation

Lint-styled rule-based system to detect programs not following programming best

practices.

Measures the run-time of the code for various input sizes

and empirically derives the complexity.

Why programming modules give a better test-shortlist rate ?

• Programming has more predictive power in identifying good performers than Logical ability.

• Due to lower predictive power of Logical, a higher cut-off has to be applied to it as compared to Programming to get the same organizational efficiency.

• Higher the Programming capability of the person, requirement on Logical score is lesser.

• Given the person is lower than a given score on Programming, even having a higher logical ability does not help.

Evaluation Rubric

ML based scoring

Understanding the human process

Problem and Language independent

Features

Machine learning model

Ungraded programs

Graded programs

Predicted grades

Evaluation Rubric

Our Approach

Features

Ungraded programs

Graded programs

Predicted grades

Evaluation Rubric

Our Approach

Features

Ungraded programs

Graded programs

Predicted grades

Evaluation Rubric

Our Approach

Features

Ungraded programs

Graded programs

Predicted grades

void print_1(int N){ for(i =1 ; i<=N; i++){ print newline;

count = i; for(j=0; j<i; j++) print count;

count++; }}

12 33 4 54 5 6 7

OBJECTIVE To print the pattern of integers

An implementation

1. Are there loops? Are there print statements?

3. Is the conditional in the inner loop dependent on - a variable modified in the outer loop? - a variable used in the conditional of the outer

What does a grader look for?

2. Is there a nested-loop structure?

Grammar for expressing features• Simple features

• Keywords and Tokens (Counts):

• Tokens like for, if, return, break; function calls like printf, strrev, strcat; declarations like int, char• Operators like various arithmetic, logical, relational operators used• Character constants like ‘\0’, ‘ ’, ‘65’, ‘96’

• Capturing logical constructs (Interactions)

• Control flow structure

• Data-dependencies

• Data-dependencies in context of control-flow

CONTROL FEATURES – COUNTS

Counts of control-related keywords/tokens Ex. count(for) = 2

count(for-in-for) = 1 count(while) = 0

Control-context of these keywords- The Print command as loop(loop(print)))

for(i =1 ; i<=N; i++){

print newline;

count = i;

CONTROL FLOW GRAPH

i <= N

count = ij = 0

print(count)count++

Loop 1

Loop 2

Parent scope

for(j=0; j<i; j++)

print count; count++;

void print(int N){

TARGET PROGRAM

DATA OPERATION FEATURES IN CONTROL-CONTEXT

Counts of data-related tokens in context of the control structureEx. count(block1 :loop(loop(++))) = 2

count(block1 :loop(loop_cond(<))) = 1

Capture control-context of data-dependencies in groups of expressions

i++ j < i : var (i) related to var (j) : appearing in a loop(loop_cond) previously incremented : appearing in a loop The relation and the increment happen in the same block

Loop 1

Loop 2

Loop 1

Loop 2

Loop 1

Loop 2

Loop 1

i <= N

print(count)

count = ii++

count = 0

count++

Parent scope Parent scope

Loop 1

Loop 2

CONTROL FLOW INFORMATION ANNOTATED IN A-D-D GRAPH

Loop 1

• Deployed Automata in a major product-based company’s recruitment

• Analyzed the performance improvement in using Automata over test-case pass based selection criterion

• 22.6% candidates who were not being shortlisted through test-case pass were now shortlisted using Automata.

Case study

Experimental Results

Sort Problem

Doing it the one-class way!

PROBLEM All features Basic featuresMean Min25 Mean Min25

1 0.57 0.61 0.52 0.562 0.80 0.83 0.72 0.75

3 0.75 0.81 0.59 0.734 0.81 0.81 0.75 0.755 0.68 0.69 0.55 0.61

Betters test-case in all, but one

How good is the final ML-based score?

Validation Correlation >= 0.79Matches Inter-rater Correlation between two human raters

PROBLEM # of features Cross-val correl Train correl Validation correl Test Case Score

1 80 0.61 0.85 0.79 0.54

2 68 0.77 0.93 0.91 0.80

3 193 0.91 0.98 0.90 0.64

4 66 0.90 0.94 0.90 0.80

5 87 0.81 0.92 0.84 0.84

Can we get insight? • The most contributing feature for Find Digit problem -

int findDigit(int N, int digit){

LOOP (N != <constant value>){

N = N / <constant value>

Features for FindDigit problem analyzed. Given a multi-digit number and a digit, one has to find the number of times the digit appears in the number

Yes, we can! • The most contributing feature for Find Digit problem -

LOOP (N != <constant value>){

N = N / <constant value>

while(N != 0){

d = N%10;

if(d == digit)

N = N / 10;

Evaluation Rubric

Score Interpretation5 Completely correct and efficient

An efficient implementation of the problem using right control structures and data-dependencies.

4 Correct with some silly errorsCorrect control structures and closely matching data-dependencies. Some silly mistakes fail the code to pass test-cases.

3 Inconsistent logical structuresRight control structures start exist with few correct data dependencies

2 Emerging basic structuresAppropriate keywords and tokens present, showing some understanding of the problem

1 Gibberish codeSeemingly unrelated to problem at hand.

Automata – Sample report

Candidate’s source codeFeedback on programming best practices

Asymptotic complexity of the candidate’s solution

Test case pass/fail information

Problem summary

Do our fancy features help?

Control and Data dependency features add around 0.15 correlation points above token information.

PROBLEM Type of feature # of features Cross-val correl Train correl Validation correl

1All, w/o test case 35 0.57 0.72 0.56

Basic 60 0.62 0.87 0.41

2All, w/o test case 80 0.81 0.99 0.80

Basic 26 0.59 0.72 0.67

3All, w/o test case 190 0.87 0.97 0.90

Basic 26 0.74 0.89 0.74

4All, w/o test case 134 0.85 0.91 0.82

Basic 35 0.83 0.88 0.69

5All, w/o test case 166 0.66 0.81 0.64

Basic 40 0.61 0.78 0.61

Conclusion

• We propose the first machine learning based approach to automatically grade programs

• An innovative feature grammar is proposed which matches human intuition of grading programs.

• Models built for sample problems show promising results.

• We propose and demonstrate machine learning techniques to lower the need of human-graded data to build models.

Aspiring Minds | Automata

Transcript of Aspiring Minds | Automata

Aspiring Minds | Automata

Technology

Transcript of Aspiring Minds | Automata

Chapter 4 Finite Automata - comp.utm.my · Different Kinds of Automata •Automata are distinguished by the temporary memory •Finite Automata : no temporary memory •Pushdown Automata

(AMCAT) ASPIRING MINDS COMPUTER ADAPTIVE TEST A B · PDF file(AMCAT) ASPIRING MINDS COMPUTER ADAPTIVE TEST ... A - queue B - tree C - list D - stack 2. ... initially empty hash table

Himanshu Aggarwal Aspiring Minds Credentialing and efficient talent mobility…

Aspiring Minds | Outcomes using test scores

QUANTITATIVE ABILITY - Aspiring Minds Ability... · the importance of quantitative ability in engineering domain trainability as well, be it electronics, mechanical or civil. The

NATIONAL G EMPLOYABILITY REPORT - … · AMCAT® - Aspiring Minds’ flagship product is India’s Largest Employability Test. Conducted across the country throughout the year, ...

A typical employment ad. Data / findings source National Employability Study by Aspiring Minds: AMCAT scores of more than 120,000 technical graduates.

SKILLS - Aspiring Minds...is based on standardized rubrics of evaluation benchmarked against experts for every job role. The test suite is available in multiple languages, delivered

Aspiring Minds' Campus Analysis Report Gaya College of ...€¦ · Aspiring Minds' Campus Analysis Report Gaya College of Engineering,2019 (B.Tech/B.E, 2019) Aspiring Minds Assessment

NATIONAL A EMPLOYABILITY REPORT - Aspiring Minds · management graduates. The ‘National Employability Report: Graduates, Annual Report 2013’ quantifies their employability and

2021 Aspiring Minds Program - Welcome Evening · 2021 Aspiring Minds Program Welcome Evening Thursday 28 January 2021. Tonight’s Program 1. Learn about Brisbane State High School

ENGLISH - Aspiring Minds€¦module of AMCAT (Aspiring Minds Computer Adaptive Test), which is widely recognized as India's largest and only standardized employability test. The AMCAT

Aspiring to excellence

SWAYAM Courses: At a GlanceofActive Learning for Young Aspiring Minds), an indigenously developed platform aimed at providing learningopportunities to the learners through MOOCs (Massive

Aspiring Minds In Forbes India

AspIRIng gROwth

Aspiring Minds Sample Papers

Read the entire Article - Aspiring Minds

ONLINE EXAMINATIONS PROCEDURE & INSTRUCTIONS: … · 2020. 12. 17. · Exam Cell & Aspiring Minds to give a fair knowledge to Sathyabama IST students about the i-Assess platform in

WHAT'S TRENDING WITH UNIVERSITY SOFTWARE ENGINEERING COURSES?€¦ · RATIONALE •According to a survey conducted by Aspiring Minds [1] for employability focused study “As many