Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Post on 14-Jan-2015

217 views 0 download

Tags:

description

Neil Brown, Research Associate at University of Kent presented his talk at Digibury, November 13, 2013. In it he explored how people learn to programme, what they find diffcult and what problems slow them down.

Transcript of Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Observing programming

novicesNeil BrownUniversity of Kent@twistedsq

Digibury, 13 Nov 2013

Neil Brown, University of Kent, @twistedsq

How do people learn to program?(and how can we help them?)

How can we find this outat a large scale?

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

Neil Brown, University of Kent, @twistedsq

What We Make

Neil Brown, University of Kent, @twistedsq

What We Make

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

What Are They All Doing?

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Study Size: 62 students

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

MEDIUM

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

✓✗ 40%

✓✗ ≈ 1 GB✗ 300-400 GB

Neil Brown, University of Kent, @twistedsq

Headline statistics so far (5 months in)

140,000 opted-in users

600,000 projects

5,100,000 successful compilations

4,700,000 unsuccessful compilations

Neil Brown, University of Kent, @twistedsq

Hardware Specs2 machines (1 for recording, 1 for analysis)24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Do they change during the term?

Neil Brown, University of Kent, @twistedsq

Compile errors over time

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x >= 6 && x <= 9){ x = 0;}

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x*x >= 36 && x*x <= 81);{ x = 0;}

Neil Brown, University of Kent, @twistedsq

Problematic if statements

How prevalent is this mistake?

How long does it take before people fix it?

Appeared in 0.15% of source files

Later fixed in half of them...

Neil Brown, University of Kent, @twistedsq

Problematic if statements

Neil Brown, University of Kent, @twistedsq

ChallengesA lot of data -- and a lot of method questions, e.g.

- How do you measure error difficulty? - What is a frequent error? (what is worth caring about?) - How much can you get from this kind of data-set?

Scaling the analysis (already maxing out 24 cores)

Questions?