Designing Programming Languages to Improve Software Quality David J. Pearce Software Quality New...

29
Designing Programming Languages to Improve Software Quality David J. Pearce Software Quality New Zealand, August 2013 @whileydave http://whiley.org http://github.com/DavePearce

Transcript of Designing Programming Languages to Improve Software Quality David J. Pearce Software Quality New...

Designing Programming Languages to Improve Software Quality

David J. Pearce

Software Quality New Zealand, August 2013

@whileydavehttp://whiley.org

http://github.com/DavePearce

Null Pointers

“I call it my billion-dollar mistake. It was the invention of the null reference in 1965”

-- Tony Hoare

“Of all the exceptions a Java programmer might encounter, the null-pointer exception is among the most dreaded, and for good reason: it is one of the least informative exceptions that a program can signal.”

-- Eric Allen

(from java/lang/String.java)

Experiences with Null Pointers

/**

* …

* @throws NullPointerException if sb is

* null

*/

boolean contentEquals(StringBuffer sb) {

synchronized(sb) { … }

}

(also from java/lang/String.java)

/**

* Tests if this string starts with the

* specified prefix.

*

* @param prefix the prefix.

* @return true if the character sequence…

*/

public boolean startsWith(String prefix) {

char pa[] = prefix.value;

}

/**

* Tests if this string starts with the

* specified prefix.

*

* @param prefix the prefix.

* @return true if the character sequence…

*/

public boolean startsWith(String prefix) {

char pa[] = prefix.value;

}

Found 83/1101 java.lang methods were misdocumented!

Thoughts

• Why is documentation bad?– Because programmers write it– Programmers have deadlines– Documentation doesn’t help meet deadlines

• When is documentation not bad?– Think about types in Java– Method types automatically documented!– Compiler ensures they are accurate

@NonNull Types

void f(@NonNull Integer x) {

x.toString(); // safe!

}

void g(Integer x) {

x.toString(); // Syntax Error

}

void h(Integer x) {

if(x != null) {

x.toString(); // Safe!

} }

JML Specifications

public class IntVec { private int data[];

private int length = 0;

//@ invariant data != null;

//@ invariant length >= 0;

//@ invariant length <= data.length;

//@ requires size > 0;

public IntVec(int size) { data = new int[size]; }

//@ ensures \result >= 0;

public int size() { return length; }

More JML Specifications

/*@ public normal_behavior

@ requires s1 != null && s2 != null;

@ {|

@ requires s2.length==0 && s1.length==0; ensures !\result;

@ ensures \result == (

@ (\exists int i; 0<=i && i<s1.length && i<s2.length;

@ s1[i]<s2[i] && equal(s1,0,s2,0,i))

@ ||

@ (s1.length<s2.length && equal(s1,0,s2,0,s1.length)));

@ |}

@*/ public static pure model boolean lessThan(char[] s1, char[] s2);

(from JML spec of java/lang/String.java)

So … ?

• PROBLEM: modern languages make compile-time verification unnecessarily hard…

• ANSWER: design a language to simplify verification:– Pure Functions vs Impure Methods– Value Semantics– Structural subtyping & Flow Typing– Unbound Integers and Rationals– First-class Collections (sets, lists, maps)– Strict Concurrency Model (e.g. non-reentrancy)

Whiley

Overview

• What is Whiley?– Hybrid functional / imperative language– Designed specifically for verification– Compiles to JVM (also prototype C backend)

• Why another language?– Verification is really hard– Many features of Java it even harder!– I think it’s basically impossible for Java– See ESC/Java and JML as good efforts here

A Zoo of Unusual Types!

• Primitives:– e.g.

• Collections (lists, sets, maps):– e.g.

• Records and Tuples:– e.g.

• Unions and Intersections:– e.g.

• Negations– e.g.

[int] {string} {int=>string}

!int

any int real charnull bool

{int x, int y} (int,int)

int|null int&null

Flow Typing

Flow Typing

• A flow-sensitive approach to type checking• Types declared only for parameters and returns• Variables can have different types!• Conditionals and/or assignments cause retyping

int sum([int] items):

r = 0

for item in items:

r = r + item

return r

Flow Typing

• Type tests automatically retype variables!– (even on the false branch)

define Circle as {int x, int y, int r} define Rect as {int x, int y, int w, int h}

define Shape as Circle | Rect

real area(Shape s):

if s is Circle:

return PI * s.r * s.r

else:

return s.w * s.h

Flow Typing & Unions

• Cannot treat null|int like an int• Must distinguish cases by explicit type testing

null|int indexOf(string str, char c): …

[string] split(string str, char c): idx = indexOf(str,c) if idx is int: below = str[0..idx] above = str[idx..] return [below,above] else: return [str]

Can safely treat x as int here

Verification

Verification

• Function f():– Accepts an arbitrary integer …– Should return a natural number …– But, this implementation is broken!

define nat as int where $ >= 0

nat f(int x):

return x A compile time error!

define nat as int where $ >= 0

nat f(int x):

if x >= 0:

return x

else:

return 0

• Function f():– Accepts an arbitrary integer …– Returns a natural number …– This implementation satisfies the spec!

OK, because x implicitly a nat

Verification

• Function g():– Accepts a positive number …– And returns a natural number …– But, how to know pos subtypes nat ?

define nat as int where $ >= 0

define pos as int where $ > 0

nat g(pos x):

return xOK, because pos

implies nat

Verification

• Function h():– Accepts a natural number …– And returns a positive number …– But, how to know nat+1 gives pos ?

define nat as int where $ >= 0

define pos as int where $ > 0

pos h(nat x):

return x + 1OK, because

nat+1 gives pos

Verification

• Function h1() and h2() are identical

define nat as int where $ >= 0

define pos as int where $ > 0

pos h1(nat x):

return x + 1

int h2(int x) requires x>=0, ensures $>0:

return x + 1

Verification

• Function sum():– Accepts a list of natural numbers …– Then adds them together …– And returns a natural number.

define nat as int where $ >= 0

nat sum([nat] list): r = 0 for x in list where r >= 0: r = r + x return r

Ok, because adding nat to nat gives nat

Implementation

Compiler Overview

• Developed 2009 – present• 304 classes (80KLOC) with 934 end-end tests• Also, 133 verifier tests and 15178 type system unit tests

Performance

Eclipse Plugin

• Update Site: http://whiley.org/eclipse

http://whiley.org

@whileydavehttp://github.com/DavePearce