Digibury: SciVisum - Making your website fast - and scalable
Digibury: Neil Brown - Observing Programming Novices on a Large Scale
-
Upload
lizzie-hodgson -
Category
Technology
-
view
217 -
download
0
description
Transcript of Digibury: Neil Brown - Observing Programming Novices on a Large Scale
Observing programming
novicesNeil BrownUniversity of Kent@twistedsq
Digibury, 13 Nov 2013
Neil Brown, University of Kent, @twistedsq
How do people learn to program?(and how can we help them?)
How can we find this outat a large scale?
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
Neil Brown, University of Kent, @twistedsq
What We Make
Neil Brown, University of Kent, @twistedsq
What We Make
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
2.5 million usersannually
0.4 million usersannually
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
2.5 million usersannually
0.4 million usersannually
What Are They All Doing?
Neil Brown, University of Kent, @twistedsq
Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”
Neil Brown, University of Kent, @twistedsq
Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”
Study Size: 62 students
Neil Brown, University of Kent, @twistedsq
BIG DATAAdd recording to all BlueJ instances
(With explicit opt-in)
Neil Brown, University of Kent, @twistedsq
BIG DATAAdd recording to all BlueJ instances
(With explicit opt-in)
MEDIUM
Neil Brown, University of Kent, @twistedsq
How Much Data?
20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year
Neil Brown, University of Kent, @twistedsq
How Much Data?
20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year
✓✗ 40%
✓✗ ≈ 1 GB✗ 300-400 GB
Neil Brown, University of Kent, @twistedsq
Headline statistics so far (5 months in)
140,000 opted-in users
600,000 projects
5,100,000 successful compilations
4,700,000 unsuccessful compilations
Neil Brown, University of Kent, @twistedsq
Hardware Specs2 machines (1 for recording, 1 for analysis)24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID
Neil Brown, University of Kent, @twistedsq
Most common compile errorsUnknown variable 17%
Semi-colon expected 10%
Unknown method 7%
Bracket expected 7%
Unknown class 5%
Illegal start of expression 4%
Neil Brown, University of Kent, @twistedsq
Most common compile errorsUnknown variable 17%
Semi-colon expected 10%
Unknown method 7%
Bracket expected 7%
Unknown class 5%
Illegal start of expression 4%
Do they change during the term?
Neil Brown, University of Kent, @twistedsq
Compile errors over time
Neil Brown, University of Kent, @twistedsq
Rarer compile errors
65th most common compilation error:
非法的表�式开始
Neil Brown, University of Kent, @twistedsq
Rarer compile errors
65th most common compilation error:
非法的表�式开始
Neil Brown, University of Kent, @twistedsq
Problematic if statements
What does this code do?
if (x >= 6 && x <= 9){ x = 0;}
Neil Brown, University of Kent, @twistedsq
Problematic if statements
What does this code do?
if (x*x >= 36 && x*x <= 81);{ x = 0;}
Neil Brown, University of Kent, @twistedsq
Problematic if statements
How prevalent is this mistake?
How long does it take before people fix it?
Appeared in 0.15% of source files
Later fixed in half of them...
Neil Brown, University of Kent, @twistedsq
Problematic if statements
Neil Brown, University of Kent, @twistedsq
ChallengesA lot of data -- and a lot of method questions, e.g.
- How do you measure error difficulty? - What is a frequent error? (what is worth caring about?) - How much can you get from this kind of data-set?
Scaling the analysis (already maxing out 24 cores)
Questions?