Download - Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Transcript
Page 1: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Observing programming

novicesNeil BrownUniversity of Kent@twistedsq

Digibury, 13 Nov 2013

Page 2: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How do people learn to program?(and how can we help them?)

How can we find this outat a large scale?

Page 3: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

Page 4: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

Page 5: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

Page 6: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

Page 7: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

What Are They All Doing?

Page 8: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Page 9: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Study Size: 62 students

Page 10: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

Page 11: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

MEDIUM

Page 12: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

Page 13: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

✓✗ 40%

✓✗ ≈ 1 GB✗ 300-400 GB

Page 14: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Headline statistics so far (5 months in)

140,000 opted-in users

600,000 projects

5,100,000 successful compilations

4,700,000 unsuccessful compilations

Page 15: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Hardware Specs2 machines (1 for recording, 1 for analysis)24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID

Page 16: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Page 17: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Do they change during the term?

Page 18: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Compile errors over time

Page 19: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Page 20: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Page 21: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x >= 6 && x <= 9){ x = 0;}

Page 22: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x*x >= 36 && x*x <= 81);{ x = 0;}

Page 23: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

How prevalent is this mistake?

How long does it take before people fix it?

Appeared in 0.15% of source files

Later fixed in half of them...

Page 24: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

Page 25: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

ChallengesA lot of data -- and a lot of method questions, e.g.

- How do you measure error difficulty? - What is a frequent error? (what is worth caring about?) - How much can you get from this kind of data-set?

Scaling the analysis (already maxing out 24 cores)

Questions?