Principles of Software Construction: Objects, Design and...

74
toad Spring 2013 © 2012-13 J Aldrich, C Jaspan, C Garrod, C Kästner, and W Scherlis School of Computer Science School of Computer Science Principles of Software Construction: Objects, Design and Concurrency Static Analysis Jonathan Aldrich Charlie Garrod 15-214

Transcript of Principles of Software Construction: Objects, Design and...

Page 1: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad

Spring 2013

© 2012-13 J Aldrich, C Jaspan, C Garrod, C Kästner, and W Scherlis

School of Computer Science

School of Computer Science

Principles of Software Construction:

Objects, Design and Concurrency

Static Analysis

Jonathan Aldrich Charlie Garrod

15-214

Page 2: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 215-214 Aldrich

The four course themes

• Threads and Concurrency� Concurrency is a crucial system abstraction

� E.g., background computing while responding to users

� Concurrency is necessary for performance

� Multicore processors and distributed computing

� Our focus: application-level concurrency

� Cf. functional parallelism (150, 210) and systems concurrency (213)

• Object-oriented programming� For flexible designs and reusable code

� A primary paradigm in industry – basis for modern frameworks

� Focus on Java – used in industry, some upper-division courses

• Analysis and Modeling� Practical specification techniques and verification tools

� Address challenges of threading, correct library usage, etc.

• Design� Proposing and evaluating alternatives

� Modularity, information hiding, and planning for change

� Patterns: well-known solutions to design problems

Page 3: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 315-214 Aldrich

Static Analysis

• Analyzing the code, without executing it

• Find bugs / proof correctness

• Many flavors

Page 4: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 415-214 Aldrich

Find the Bug!

public void loadFile(String filename) throws IOExce ption {BufferedReader reader =

new BufferedReader(new FileReader(filename));

if (reader.readLine().equals("version 1.0"))return; // invalid version

String c;while ((c = reader.readLine()) != null) {

processData(c)}

reader.close();}

Page 5: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 515-214 Aldrich

Limits of Inspection

• People

• …are very high cost

• …make mistakes

• …have a memory limit

So, let’s automate inspection!

Page 6: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 615-214 Aldrich

Mental Model for Analyzing "Freed Resources"

free

allocated

allocateclose

close=>err(double close)

end path =>err(end path

without close)

Page 7: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 715-214 Aldrich

Find the Bug!

public void loadFile(String filename) throws IOExce ption {BufferedReader reader =

new BufferedReader(new FileReader(filename));

if (reader.readLine().equals("version 1.0"))return;

String c;while ((c = reader.readLine()) != null) {

processData(c)}

reader.close();}

initial state free

transition to allocated

final state allocated: ERROR!

transition to free

final state free is OK

Page 8: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 815-214 Aldrich

Static Analysis Finds “Mechanical” Errors

• Defects that result from inconsistently following simple, mechanical design rules

• Security vulnerabilities� Buffer overruns, unvalidated input…

• Memory errors� Null dereference, uninitialized data…

• Resource leaks� Memory, OS resources…

• Violations of API or framework rules� e.g. Windows device drivers; real time libraries; GUI frameworks

• Exceptions� Arithmetic/library/user-defined

• Encapsulation violations� Accessing internal data, calling private functions…

• Race conditions� Two threads access the same data without synchronization

Page 9: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 915-214 Aldrich

findbugs.sourceforge.net

Page 10: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1015-214 Aldrich

Example Tool: FindBugs

•Origin: research project at U. Maryland� Now freely available as open source� Standalone tool, plugins for Eclipse, etc.

•Checks over 250 “bug patterns”� Over 100 correctness bugs� Many style issues as well� Includes the two examples just shown

•Focus on simple, local checks� Similar to the patterns we’ve seen� But checks bytecode, not AST

• Harder to write, but more efficient and doesn’t require source

•http://findbugs.sourceforge.net/

Page 11: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1115-214 Aldrich

Example FindBugs Bug Patterns

•Correct equals()

•Use of ==

•Closing streams

•Illegal casts

•Null pointer dereference

•Infinite loops

•Encapsulation problems

•Inconsistent synchronization

• Inefficient String use

•Dead store to variable

Page 12: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1215-214 Aldrich

Demonstration: FindBugs

Page 13: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1315-214 Aldrich

Adapted from [Chaturvedi 2005]

Empirical Results on Static Analysis

•InfoSys study [Chaturvedi 2005]

� 5 projects� Average 700 function points each

� Compare inspection with and without static analysis

•Conclusions� Fewer defects� Higher productivity

Page 14: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1415-214 Aldrich

Outline

• Why static analysis?� Automated� Can find some errors faster than people� Can provide guarantees that some errors are found

• How does it work?

• What are the hard problems?

• How do we use real tools in an organization?

Page 15: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1515-214 Aldrich

Outline

• Why static analysis?

• How does it work?� Systematic exploration of program abstraction� Many kinds of analysis

•AST walker•Control-flow and data-flow•Type systems•Model checking

� Specifications frequently used for more information

• What are the hard problems?

• How do we use real tools in an organization?

Page 16: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1615-214 Aldrich

Abstract Interpretation

• Static program analysis is the systematic examination of an abstraction of a program’s state space

• Abstraction� Don’t track everything! (That’s normal interpretation)� Track an important abstraction

• Systematic� Ensure everything is checked in the same way

• Let’s start small…

Page 17: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1715-214 Aldrich

AST Analysis

Page 18: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1815-214 Aldrich

A Performance Analysis

What’s the performance problem?

public foo() {…

logger.debug(“We have ” + conn + “connections.”);}

Seems minor…but if this performance gain on 1000 servers means we need 1 less machine,we could be saving a lot of money

public foo() {…if (logger.inDebug()) {

logger.debug(“We have ” + conn + “connections.”);}

}

Page 19: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 1915-214 Aldrich

A Performance Analysis

•Check that we don’t create debug strings outside of a Logger.inDebug check

•Abstraction� Look for a call to Logger.debug()� Make sure it’s surrounded by an if (Logger.inDebug())

•Systematic� Check all the code

•Known as an Abstract Syntax Tree (AST) walker� Treats the code as a structured tree� Ignores control flow, variable values, and the heap� Code style checkers work the same way

• you should never be checking code style by hand

� Simplest static analysis: grep

Page 20: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2015-214 Aldrich

Abstract Syntax Trees

class X {Logger logger;public void foo() {

…if (logger.inDebug()) {

logger.debug(“We have ” + conn + “connections.”);

}}

}

class X

method foo

…fieldlogger

if stmt…

method invoc.

logger inDebug

block

method invoc.

logger debug parameter …

Page 21: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2115-214 Aldrich

Page 22: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2215-214 Aldrich

Type Checking

Page 23: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2315-214 Aldrich

Page 24: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2415-214 Aldrich

Type Checking

• Classifying values into types

• Checking whether operations are allowed on those types

• Detects a class of problems at compile time, e.g.� Method not found� Cannot compare int and boolean

Page 25: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2515-214 Aldrich

class X {Logger logger;public void foo() {

…if (logger.inDebug()) {

logger.debug(“We have ” + conn + “connections.”);

}}

}class Logger {

boolean inDebug() {…}void debug(String msg) {…}

}

class X

method foo

…fieldlogger

if stmt…

method invoc.

logger inDebug

block

method invoc.

logger debug parameter …

Logger

boolean

expects boolean

Logger

Logger ->boolean

String -> void

String

void

Page 26: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2615-214 Aldrich

Typechecking in different Languages

• In Perl…� No typechecking at all!

• In ML, no annotations required� Global typechecking

• In Java, we annotate with types� Modular typechecking� Types are a specification!

• In C# and Scala, no annotations for local variables� Required for parameters and return values

� Best of both?

foo() {a = 5;b = 3;bar(“A”, “B”);print(5 / 3);

}

bar(x, y) {print(x / y);

}

Page 27: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2715-214 Aldrich

Bug finding

Page 28: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2815-214 Aldrich

Intermission: Soundness and Completeness

Page 29: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 2915-214 Aldrich

Error exists No error exists

Error Reported True positive(correct analysis result)

False positive

No Error Reported False negative True negative(correct analysis result)

How does testing relate? And formal verification?

Sound Analysis: reports all defects-> no false negativestypically overapproximated

Complete Analysis:every reported defect is an actual defect -> no false positivestypically underapproximated

Page 30: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3015-214 Aldrich

Sound Analysis

All Defects

Complete Analysis

Unsoundand IncompleteAnalysis

Page 31: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3115-214 Aldrich

The Bad News: Rice's Theorem

• Every static analysis is necessarily incomplete or unsound or undecidable (or multiple of these)

"Any nontrivial property about the language recognized by a Turing machine is undecidable.“

Henry Gordon Rice, 1953

Page 32: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3215-214 Aldrich

Control-Flow Analysis

Page 33: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3315-214 Aldrich

An interrupt checker

• Check for the interrupt problem

• Abstraction� 2 states: enabled and disabled� Program counter

• Systematic� Check all paths through a function

• Error when we hit the end of the function with interrupts disabled

• Known as a control flow analysis� More powerful than reading it as a raw text file� Considers the program state and paths

Page 34: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3415-214 Aldrich

Example: Interrupt Problem

1. int foo() {2. unsigned long flags;3. int rv;4. save_flags(flags);5. cli();6. rv = dont_interrupt();7. if (rv > 0) {8. do_stuff();9. restore_flags();10. } else {11. handle_error_case();12. }13. return rv;14. }

Error: did not reenable interrupts on some path34

6-7

8-9 11

13

0

end

enabled

disabled

enabled

2-4

5

enabled

unknown

disabled

disabled

Page 35: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3515-214 Aldrich

Adding branching

• When we get to a branch, what should we do?� 1: explore each path separately

•Most exact information for each path•But—how many paths could there be?•Leads to an exponential state explosion

� 2: join paths back together•Less exact•But no state explosion

• Not just conditionals!� Loops, switch, and exceptions too!

Page 36: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3615-214 Aldrich

Data-Flow Analysis

Page 37: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3715-214 Aldrich

A null pointer checker

• Prevent accessing a null value

• Abstraction� Program counter� 3 states for each variable: null, not-null, and maybe-null

• Systematic� Explore all data values for all variables along all paths in a method or program

• Known as a data-flow analysis� Tracking how data moves through the program� Very powerful, many analyses work this way� Compiler optimizations were the first� Expensive

Page 38: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3815-214 Aldrich

Example: Null Pointer Problem

1. int foo() {2. Integer x = new Integer(6);3. Integer y = bar();4. int z;

5. if (y != null)6. z=x.intVal()+y.intVal();7. else {8. z = x.intVal();9. y = x;10. x = null;11. }12. return z + x.intVal();13. }

Error: may have null pointer on line 12

4-5

6 8-10

12

0

end

2

3

x � not-null

x � not-null, y -> maybe-null

x � not-null, y -> maybe-null

x � not-null,

y -> not-null

x � null,

y -> not-null

x � maybe-null, y -> not-null

Page 39: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 3915-214 Aldrich

Example: Method calls

1. int foo() {2. Integer x = bar();3. Integer y = baz();4. Integer z = noNullsAllowed(x, y);5. return z.intValue();6. }

7. Integer noNullsAllowed(Integer x, Integer y) {8. int z;9. z = x.intValue() + y.intValue();10. return new Integer(z);11. }

Two options:1. Global analysis2. Modular analysis with specifications

Page 40: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4015-214 Aldrich

Global Analysis

• Dive into every method call� Like branching, exponential without joins� Typically cubic (or worse) in program size even with joins

• Requires developer to determine which method has the fault� Who should check for null? The caller or the callee?

Page 41: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4115-214 Aldrich

Modular Analysis w/ Specifications

• Analyze each module separately

• Piece them together with specifications� Pre-condition and post-condition

• When analyzing a method� Assume the method’s precondition� Check that it generates the postcondition

• When the analysis hits a method call� Check that the precondition is satisfied� Assume the call results in the specified postcondition

• See formal verification and Dafny

Page 42: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4215-214 Aldrich

Example: Method calls1. int foo() {2. Integer x = bar();3. Integer y = baz();4. Integer z = noNullsAllowed(x, y);5. return z.intValue();6. }

7. @Nonnull Integer noNullsAllowed(8. @Nonnull Integer x,9. @Nonnull Integer y) {10. int z;11. z = x.intValue() + y.intValue();12. return new Integer(z);13. }

14. @Nonnull Integer bar();

15. @Nullable Integer baz();

Page 43: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4315-214 Aldrich

public void loadFile(String filename)BufferedReader reader = …;if (reader.readLine().equals("ver 1.0"))

return; // invalid versionString c;while ((c = reader.readLine()) != null) {

// load data}reader.close();

}

1-2

4 3

5

0

6

end8

Another Data-Flow Example

abstractions:needs-closing, closed, unknown

Page 44: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4415-214 Aldrich

Recap: Class invariants

• Is always true outside a class’s methods

• Can be broken inside, but must always be put back together again

public class Buffer {boolean isOpen;int available;/*@ invariant isOpen <==> available > 0 @*/

public void open() {isOpen = true;//invariant is brokenavailable = loadBuffer();

}}

ESC/Java, Dafny are akind of static analysis tool

Page 45: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4515-214 Aldrich

Other kinds of specifications

• Class invariants� What is always true when entering/leaving a class?

• Loop invariants� What is always true inside a loop?

• Lock invariant� What lock must you have to use this object?

• Protocols� What order can you call methods in?

•Good: Open, Write, Read, Close•Bad: Open, Write, Close, Read

Page 46: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4615-214 Aldrich

Model Checking

Page 47: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4715-214 Aldrich

Static Analysis for Race Conditions

• Race condition defined:

[From Savage et al., Eraser: A Dynamic Data Race Detector for Multithreaded Programs]� Two threads access the same variable� At least one access is a write� No explicit mechanism prevents the accesses from being simultaneous

• Abstraction� Program counter of each thread, state of each lock

• Abstract away heap and program variables

• Systematic� Examine all possible interleavings of all threads

• Flag error if no synchronization between accesses• Exploration is exhaustive, since abstract state abstracts all

concrete program state

• Known as Model Checking

Page 48: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4815-214 Aldrich

Model Checking for Race Conditions

thread1() {

read x;

}

thread2() {

lock();

write x;

unlock();

}

Interleaving 1: OK

Thread 1 Thread 2

read x

lock

write x

unlock

Page 49: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 4915-214 Aldrich

Model Checking for Race Conditions

thread1() {

read x;

}

thread2() {

lock();

write x;

unlock();

}

Interleaving 1: OK

Interleaving 2: OK

Thread 1 Thread 2

read x

lock

write x

unlock

Page 50: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5015-214 Aldrich

Model Checking for Race Conditions

thread1() {

read x;

}

thread2() {

lock();

write x;

unlock();

}

Interleaving 2: OK

Interleaving 3: Race

Thread 1 Thread 2

read x

lock

write x

unlock

Page 51: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5115-214 Aldrich

Model Checking for Race Conditions

thread1() {

read x;

}

thread2() {

lock();

write x;

unlock();

}

Interleaving 3: Race

Interleaving 4: Race

Thread 1 Thread 2

read x

lock

write x

unlock

Page 52: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5215-214 Aldrich

Outline

• Why static analysis?

• How does it work?

• What are the important properties?� Precision� Side effects� Modularity� Aliases� Termination

• How do we use real tools in an organization?

Page 53: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5315-214 Aldrich

Tradeoffs

• You can’t have it all1. No false positives2. No false negatives3. Perform well4. No specifications5. Modular

• You can’t even get 4 of the 5� Halting problem means first 3 are incompatible(Rice's theorem)

� Modular analysis requires specifications

• Each tool makes different tradeoffs

53

Page 54: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5415-214 Aldrich

Sound Analysis

All Defects

Complete Analysis

Page 55: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5515-214 Aldrich

Soundness / Completeness / Performance Tradeoffs

• Type checking does catch a specific class of problems, but does not find all problems

• Data-flow analysis for compiler optimizations must err on the safe side (only perform optimizations when sure it's correct)

• Many practical bug-finding tools analyses are unsound and incomplete� Catch typical problems� May report warnings even for correct code� May not detect all problems

• Overwhelming amounts of false negatives make analysis useless

• Not all "bugs" need to be fixed

Page 56: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5615-214 Aldrich

“False” Positives

1. int foo(Person person) {

2. if (person != null) {

3. person.foo();

4. }

5. return person.bar();

6. }

• Is this a false positive?

• What if that branch is never run in practice?

• Do you fix it? And how?

Error on line 5: Redundant comparison to null

Page 57: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5715-214 Aldrich

“False” Positives

1. public class Constants {

2. static int myConstant = 100;

3. }

• Is this a false positive?

• What if it’s in an open-source library you imported?

• What if there are 1000 of these?

Error on line 3: field isn’t final but should be

Page 58: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5815-214 Aldrich

Outline

• Why static analysis?

• How does it work?

• What are the important properties?

• How do we use real tools in an organization?� FindBugs @ eBay� SAL @ Microsoft� Coverity

Page 59: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 5915-214 Aldrich

True Positives

• Technical Defn: An issue that could result in a runtime error

• True Positives that we care about� 1. Any issue that the developer intends to fix� 2. (more subtle) Any issue that the developer wants to see, regardless of whether it is fixed

� Varies between projects and people

• Soundness and completeness are defined technically� But sometimes don’t exactly match what people want

Page 60: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6015-214 Aldrich

FindBugs at eBay

• eBay wants to use static analysis

• Need off the shelf tools

• Focus on security and performance

• Had bad past experiences� Too many false positives� Tools used too late in process

• Help them choose a tool and add it to the process

Page 61: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6115-214 Aldrich

How important is this issue?

1. void foo(int x, int y)

2. int z;

3. z = x + y;

4. }

Line 3: Dead store to local

How about this one?

void foo(int x, int y)List dataValues;dataValues = getDataFromDatabase(x, y);

}

Significant overhead, and not caught any other way!

Page 62: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6215-214 Aldrich

Tool Customization

• Turn on all defect detectors

• Run on a single team’s code

• Sort results by detector

• Assign each detector a priority

• Repeat until consensus (3 teams)

Page 63: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6315-214 Aldrich

Priority = Enforcement

•Priority must mean something� (otherwise it’s all “high priority”)

•High Priority� High severity functional issues� Medium severity, but easy to fix

•Medium Priority� Medium severity functional issues� Indicators to refactor� Performance issues

•Low Priority� Only some domain teams care about them� Stylistic issues

•Toss� Not cost effective and lots of noise

Page 64: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6415-214 Aldrich

Cost/Benefit Analysis

•Costs� Tool license� Engineers internally supporting tool� Peer reviews of defect reports

•Benefits� How many defects will it find?� What priority?

•Compare to cost equivalent of testing by QA Engineers� eBay’s primary quality assurance mechanism� Back of the envelope calculation� FindBugs discovers significantly more defects

• Order of magnitude difference• Not as high priority defects

Page 65: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6515-214 Aldrich

Quality Assurance at Microsoft

• Original process: manual code inspection� Effective when system and team are small� Too many paths to consider as system grew

• Early 1990s: add massive system and unit testing� Tests took weeks to run

•Diversity of platforms and configurations•Sheer volume of tests

� Inefficient detection of common patterns, security holes•Non-local, intermittent, uncommon path bugs

� Was treading water in Windows Vista development

• Early 2000s: add static analysis

Page 66: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6615-214 Aldrich

PREFast at Microsoft

• Concerned with memory usage

• Major cause of security issues

• Manpower to developer custom tool

Page 67: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6715-214 Aldrich

Standard Annotation Language (SAL)

• A language for specifying contracts between functions� Intended to be lightweight and practical� Preconditions and Postconditions� More powerful—but less practical—contracts supported in systems like JML or Spec#

• Initial focus: memory usage� buffer sizes� null pointers� memory allocation

Page 68: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6815-214 Aldrich

SAL is checked using PREfast

• Lightweight analysis tool� Only finds bugs within a single procedure� Also checks SAL annotations for consistency with code

• To use it (for free!)� Download and install Microsoft Visual C++ 2005 Express Edition•http://msdn.microsoft.com/vstudio/express/visualc/

� Download and install Microsoft Windows SDK for Vista•http://www.microsoft.com/downloads/details.aspx?familyid=c2b1e300-f358-4523-b479-f53d234cdccf

� Use the SDK compiler in Visual C++•In Tools | Options | Projects and Solutions | VC++ Directories add C:\Program Files\Microsoft SDKs\Windows\v6.0\VC\Bin (or similar)

• In project Properties | Configuration Properties | C/C++ | Command Line add /analyze as an additional option

Page 69: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 6915-214 Aldrich

Buffer/Pointer Annotations

_in

_inout

_out

_bcount(size)

_ecount(size)

_opt

The function reads from the buffer. The caller provides the buffer and initializes it.

The function both reads from and writes to buffer. The caller provides the buffer and initializes it.

The function writes to the buffer. If used on the return value, the function provides the buffer and initializes it. Otherwise, the caller provides the buffer and the function initializes it.

The buffer size is in bytes.

The buffer size is in elements.

This parameter can be NULL.

Page 70: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 7015-214 Aldrich

PREfast: Immediate Checks

• Library function usage� deprecated functions

•e.g. gets() vulnerable to buffer overruns� correct use of printf

•e.g. does the format string match the parameter types?

� result types•e.g. using macros to test HRESULTs

• Coding errors� = instead of == inside an if statement

• Local memory errors� Assuming malloc returns non-zero� Array out of bounds

Page 71: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 7115-214 Aldrich

SAL: the Benefit of Annotations

•Annotations express design intent� How you intended to achieve a particular quality attribute

• e.g. never writing more than N elements to this array

•As you add more annotations, you find more errors� Get checking of library users for free� Plus, those errors are less likely to be false positives

• The analysis doesn’t have to guess your intention

� Instant Gratification Principle

•Annotations also improve scalability through modularity� PreFAST uses very sophisticated analysis techniques� These techniques can’t be run on large programs� Annotations isolate functions so they can be analyzed one at a time

Page 72: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 7215-214 Aldrich

SAL: the Benefit of Annotations

•How to motivate developers?� Especially for millions of lines of unannotated code?

•Require annotations at checkin� Reject code that has a char* with no __ecount()

•Make annotations natural� Ideally what you would put in a comment anyway

• But now machine checkable• Avoid formality with poor match to engineering practices

• Incrementality� Check code ↔ design consistency on every compile� Rewards programmers for each increment of effort

• Provide benefit for annotating partial code• Can focus on most important parts of the code first• Avoid excuse: I’ll do it after the deadline

•Build tools to infer annotations� Inference is approximate� Unfortunately not yet available outside Microsoft

Page 73: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 7315-214 Aldrich

Impact at Microsoft

• Thousands of bugs caught monthly

• Significant observed quality improvements� e.g. buffer overruns latent in codebaes

• Widespread developer acceptance� Tiered Check-in gates� Writing specifications

Page 74: Principles of Software Construction: Objects, Design and ...aldrich/214/slides/24-staticanalysis.pdf · Multicore processors and distributed computing Our focus: application-level

toad 7415-214 Aldrich

Static Analysis in Engineering Practice

•A tool with different tradeoffs�Soundness: can find all errors in a given class� Focus: mechanically following design rules

•Major impact at Microsoft and eBay�Tuned to address company-specific problems�Affects every part of the engineering process