The Psychology of C# Analysis

Post on 13-Jun-2015

20.947 views 0 download

Tags:

description

Our C# expert Eric Lippert provides his take on the psychology of C# analysis, including the business case for C#, developer characteristics and analysis tools.

Transcript of The Psychology of C# Analysis

The Psychology of C# AnalysisEric Lippert

C# Analysis Architect

Coverity

Intro

Intro

• Psychological factors in language design…

• … and compiler error messages…• … and static analysis tools…• … and funny pictures of cats.

Who is this guy?

• Compiler developer / language designer at Microsoft from 1996 through 2012• Visual Basic, VBScript, JScript, VS Tools for Office, C# /

Roslyn

• Static analysis architect for C# at Coverity since January• I will use “we” totally inconsistently

• I have no formal background in static analysis• I take an engineering rather than academic approach

This guy is you, not me

Body

The business case for C#

The business case for C#

• Productive, successful professional developers who target Microsoft platforms make those platforms more attractive to Microsoft’s customers

• Original design goal was “a simple, modern, general-purpose language”• Any language with an 800 page specification is no longer

simple, but modern and general-purpose still apply

• Understanding developer psychology is key to achieving wide adoption of any developer tool

Target C# Developer Characteristics• Professionals, not amateurs

• Engineers, not hackers

• Programming experts, not line-of-business experts

• Pragmatists, not academics

• Skeptics, not true believers

• Conservatives, not radicals

Conservatism

Conservatism

• C# developers hate breaking changes imposed by tools

• Even trivial breaking changes are agonized over

• In 11 years and 6 releases C# has never added a new reserved keyword• New keywords are contextual so as to not be breaking

• This imposes considerable restrictions on new syntaxes

• For example, consider iterator blocks:

double yield = 123.4;

yield return yield;

Conservatism

• C# app developers also hate breaking their users• Facilitating versionable components was a pri 1 design

goal

• Numerous seemingly-counterintuitive features actually mitigate brittle-base-class failures:

class Base { public void M(int x) { } }class Derived : Base { public void M(double x) { } }...derived.M(123); // Base.M or Derived.M?

Conservatism

Conservatism

C# 4.0 added dynamic dispatch to facilitate interoperability with dynamic languages and “legacy” object models

• Enormous MVP community pushback

• I will use this feature correctly but my coworkers are going to abuse it and then I’m going to have to fix their god-awful hacked-up code

• Anything that makes the compiler less capable of finding bugs is met with skepticism and resistance

• Completely redesigned based on early feedback

Error reporting psychology

FAIL

Error reporting psychology

• Dealing with correct code is literally the smallest problem

• “Roslyn” does syntactic analysis of broken code in the time between keystrokes; semantic analysis takes a little longer

• Error messages need to be understandable, accurate, polite and diagnostic rather than prescriptive

• Let’s take a look at some examples

Error reporting psychology

Error reporting psychology

A params parameter must be the last parameter in a formal parameter list

Is this saying:

• If there is a params parameter, it must be the last one? or

• The last parameter and only the last parameter must always be a params parameter? Or

• The last parameter must be a params parameter; if others are as well, that’s fine too?

The error is only clear if the feature is already understood

Error reporting psychology

Error messages must read the mind of a developer who wrote broken code and figure out what they meant.class C { public virtual static void M(){}}

Error reporting psychology

Error reporting psychology

Complex operator + (Complex x, Complex y) { ...

User-defined operator must be declared static and public

• This is an example of a prescriptive error done right• The user absolutely positively has to do this to overload

an operator

• Odds that they were not trying to overload an operator are low

Warnings are harder than errors

Warnings are harder than errors

• Must infer developers erroneous thoughts

• Compiler must be fast• This makes an opportunity for third-party tools

• Must be plausibly wrong• A warning for code that no one would reasonably type is

unhelpful

• Must be able to eliminate warning• And ideally the warning should tell you how

• Must have low false positive rate• Encouraging developers to change correct code is

harmful

• We will return to this point later

What do C# developers want?

Rigidly defined areas of doubt and uncertainty

• Static type checking, type safety, memory safety…

• … that can be disabled if necessary.

• A compiler that infers developer intent…

• … with predictable behavior and understandable rules

• Actionable errors when inference fails…

• …rather than muddling on through and getting it wrong

It hurts because its true

C# was originally called SafeC

C# throws developers into the “Pit of Success”:

• Eliminate unimportant dangerous features entirely• switch fall through

• Restrict dangerous features to clearly-marked unsafe code regions

• Eliminate implementation-defined behaviours• x = ++x + x++; is well-defined in C# …

• …but still a bad idea.

• Define common undefined behaviours• Accessing an array out of bounds causes an exception

• Mandate compiler warnings

There are numerous defects that the Coverity C/C++ analysis checkers detect which are impossible, unlikely, or already warnings in C#.

Let’s look at a few dozen. Quickly. These are all defects found by Coverity in C/C++ that are not worth checking in C#…

C/C++ defects inapplicable to C#:

• Local read before assignment • C# rejects programs that use uninitialized locals

• Uninitialized fields / arrays• Fields and arrays are automatically zeroed out

• Treating a pointer to a variable as a pointer to an array• Rare, must be marked as unsafe

• Buffer length arithmetic errors• Strings and arrays know their lengths; checked at runtime

• Pointer/integer/char/bool/enum type errors• Not inter-assignable in C# without explicit cast operators

C/C++ defects inapplicable to C#:

• Failure to consistently check error return codes• C# uses exceptions

• Accidental sign extension• Either error or warning

• Implementation-defined side effect order• Side effect order is well-defined

• Statement with no effect• is actually a parse time error in C#

• Accidental use of ambiguous names• C# requires that a simple name have a unique meaning in

a block

C/C++ defects inapplicable to C#:• sizeof mistakes

• C#’s sizeof operator only takes types

• Unintentional switch fall-through• Is an error

• Unreachable code• Is a warning

• Accidental assignment or comparison of variable to itself• Yep, that’s a warning too

• Field never written or never read• Man that’s a lot of warnings

• Missing return statement• Is illegal

• malloc without free / free without malloc / allocator – deallocator mismatch / use after free• Not needed in a garbage-collected language

• Dereferencing an address that lived longer than the storage it refers to• References to variables may not be stored in long-term storage

• Accidental use of function pointer• Method group expressions can only be used in strictly limited locations

• Overriding errors• The language was designed to mitigate brittle base class failures by default

Of course the compiler is not perfect…

Defects common to C/C++ and C#• Copy paste mistakes

• Expression contains variables but always has the same result

• You checked for null here, you dereferenced without checking there.

• Some infinite loops

• Dangling else and other indentation issues

• Array index out of bounds

• Integer overflow • checked arithmetic is off by default

• Non-memory resource leaks • Such as forgetting to close a file

• Stray semicolons

• Swapped arguments

• Unused return value

• Uncaught exception

• Missing or misordered critical sections• Including non-atomic operations

inconsistently inside critical sections

• And many more!

And these are just a few that are common to C and C#; there are a whole host of defects specific to C# programs that we could find statically.

Let’s consider the psychological aspects of static analysis tools beyond the compiler.

Day one training at Coverity

Developer Adoption is Key

• Soundness is explicitly a non-goal• We don’t want to find all defects or even most defects

• We want every defect reported to be a customer-affecting bug

• Developers won’t adopt a product that they perceive as making their jobs harder for no customer benefit

• Our business model requires adoption to drive renewals

• How do developers – who, remember, are using C# because they like a statically-typed language – react to static analysis tools?

Developer psychology WRT analysis tools

Developer psychology WRT analysis tools

• Egotistical• I don’t need this tool for my code

• But my coworkers on the other hand…

• Clever management uses this trait to advantage

Developer psychology WRT analysis tools

Developer psychology WRT analysis tools

• Skeptical, conservative, dismissive• Resistant to change

• Quick to criticize “stupid” false positives

• The first five defects they see had better be true positives

Developer psychology WRT analysis tools

Developer psychology WRT analysis tools

• “Busy” with, you know, “real work”• Code annotations are unacceptable

• Analysis tool must adapt to customer’s build process

• Overnight analysis runs are acceptable – barely

Developer psychology WRT analysis tools

Developer psychology WRT analysis tools

• Any change in what defects are reported on the same code over time – a.k.a. “churn” – is the enemy

• Randomized analysis is right out, unfortunately

• Any improvement to our analysis heuristics can cause unwanted churn

• We try to keep churn below 5% on every release

Developer psychology WRT analysis tools

Developer psychology WRT analysis tools

• Responds well to perverse incentives• Hard-to-understand defect reports are easy to ignore

• No downside to incorrectly triaging true positives as false positives

• Finding defects is hard; presenting evidence that prevents incorrect classification as a false positive is harder• Deep analysis with theorem provers can be worse than

shallow analysis with cheap heuristics.

• Presenting the result is insufficient; the developer must understand the proof to fix the defect.

Displaying good defect messages

Displaying good defect messages

public void GetThing(Type type, bool includeFrobs){ bool isFrob = (type != null) && typeof(IFrob).IsAssignableFrom(type); object instance = this.objects[this.name] if (instance is IFrob && includeFrobs) { [...] } else if (type.IsAssignableFrom(instance.GetType()) { [...] }

Displaying good defect messages

public void GetThing(Type type, bool includeFrobs){ Assuming type is null. type != null evaluated to false. bool isFrob = (type != null) && typeof(IFrob).IsAssignableFrom(type); object instance = this.objects[this.name] instance is IFrob evaluated to true. includeFrobs evaluated to false. if (instance is IFrob && includeFrobs) { [...] } Dereference after null check: dereferencing type while it is null. else if (type.IsAssignableFrom(instance.GetType()) { [...] }

Management psychology

Management psychology

• The first time static analysis runs there may be thousands of errors; typical rate is one defect per thousand LOC

• Academic answer: rank heuristics

• Pragmatic answer: ignore them all• Simply ignore all defects in existing code

• Triage and fix defects in new code

• “Someday” get around to fixing defects in old code

• Why is this so popular?• Old code is in the field. It works well enough. Risk is low.

• New code is unproven. It might work, or it might not. Risk is high.

Management psychology

Management psychology

• Management actually pays for the developer tools• And typically has no idea how to use them effectively

• Middle management has perverse incentives too• Time, cost and complexity are easily measured; quality is

not

• “Never upgrade the static analysis tool before release”

• Worse tools are better; better tools are worse

Worse is better; better is worse

Time

Know

n D

efe

cts

No tool improvements == Management gets bonus

Worse is better; better is worse

Time

Know

n D

efe

cts

No tool improvements == Management gets bonus

Tool upgrades find more defects == Management gets no bonus

The fix rate is the same in these two graphs but if the tool improves faster than the fix rate, no bonus.

Good news

If you have a well-engineered product that:• makes good use of theoretical and pragmatic approaches,

• finds real-world, user-affecting defects, and

• takes developer and management psychology into account

Then you can make a positive difference

Conclusion

Special thanks to Scott at BasicInstructions.net

Conclusion

Conclusion

• Theoretical static analysis techniques are awesome; we can and do use them in industry…• … but doing all that math is actually only one small part of

shipping a static analysis product

• Understanding developer and management psychology is necessary to ensure adoption of any developer tools• C# was carefully designed to match a target developer

mindset

• Coverity thinks about developer and manager psychology at every stage in the analysis and overall product design

• Research into better ways to present defects would be awesome

More information

• Learn about Coverity at www.Coverity.com

• Read “A Few Billion Lines Of Code Later”

• Find me on Twitter at @ericlippert

• Or read my C# blog at www.EricLippert.com

• Or ask me about C# at www.StackOverflow.com

Copyright 2013 Coverity, Inc.