Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational...

35
ntroduction he Approach’s Overview Language of Pointers he Type System perational Semantics ype Safety ype Inference he Rest of C xperiments ummary CCURED: TYPE-SAFE RETROFITTING OF LEGACY CODE 1 George Necula Scott McPeak Wes Weimer Presented by Anastasia Braginsky Some slides were taken from George Necula presentation : http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    4

Transcript of Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational...

Page 1: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

1

CCURED: TYPE-SAFE

RETROFITTING OF LEGACY CODE

George Necula Scott McPeak Wes Weimer

Presented by Anastasia Braginsky

Some slides were taken from George Necula presentation :

http://www.slidefinder.net/c/ccured_taming_pointers_george_necula/6827275

Page 2: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

2

Problem

C is popular; it is part of the

infrastructure

C is also unsafe and has a weak

type system that can cause

subtle bugs

Page 3: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

3

Solution

Add type safety to C – Make C “feel” as safe as Java

Catch memory safety errors, by static analysis as much as possible

Add run-time checks to C programs, as less as possible (performance)

Minimal user effort Add type inference to C

Page 4: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

4

The CCured System

C Program CCuredTranslator

CCuredTranslator

InstrumentedC Program Compile &

Execute

Compile &Execute

Halt: MemorySafety Violation

Success

Page 5: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

5

Two Main Premises

Usually in C a large part of the program

can be verified statically to be type safe

The remaining part can be instrumented with

run-time checks to ensure that the execution

is memory safe

In many applications, some loss of

performance due to run-time checks is an

acceptable price for the type safety

Page 6: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

6

Example C Program

Boxed integer

31 bit 1 bit

Un-boxing

C type int* is used to represent boxed integer

integer or pointer taginteger or pointer tag

0011…11101001 00011…11101001 0

0101…10101110 00101…10101110 0

0001…11000101 10001…11000101 1

Page 7: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

7

Example C Program1 int * * a; //array2 int i; // index3 int acc; // accumulator4 int * * p; // element ptr5 int * e; // unboxer6 acc = 0;7 for (i=0; i<100; i++) {8 p = a + i; // ptr

arithmetic9 e = *p; // read

element10 while ( (int)e%2 == 0 ) { // check tag11 e = * (int * * ) e; // unbox12 }13 acc += ((int)e >> 1); //

strip tag14 }

0011…11101001 00011…11101001 0

0101…10101110 10101…10101110 1

0001…11000101 10001…11000101 1 0101…10101001 00101…10101001 0

1101…10110110 11101…10110110 1

aa

pp

ee 0101…10101110 1

SAFE

SEQuence

DYNamic

Page 8: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

8

Example C Program1 int * * a; //array2 int i; // index3 int acc; // accumulator4 int * * p; // element ptr5 int * e; // unboxer6 acc = 0;7 for (i=0; i<100; i++) {8 p = a + i; // ptr

arithmetic9 e = *p; // read

element10 while ( (int)e%2 == 0 ) { // check tag11 e = * (int * * ) e; // unbox12 }13 acc += ((int)e >> 1); //

strip tag14 }

0011…11101001 00011…11101001 0

0101…10101110 10101…10101110 1

0001…11000101 10001…11000101 1 0101…10101001 00101…10101001 0

1101…10110110 11101…10110110 1

aa

pp

ee 0101…10101110 1

SAFE

SEQuence

DYNamic

But due to aliases all are considered to point to dynamic!

But due to aliases all are considered to point to dynamic!

Page 9: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

9

SAFE Pointers

SAFE pointer to type t

t

ptr

On use: - null check

Can do: - dereference

Page 10: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

10

SEQuence Pointers

SEQ pointer to type t

t t t

base ptr

On use: - null check - bounds check

Can do: - dereference - pointer arithmetic

end

Page 11: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

11

DYNamic Pointers

DYN DYN int

home ptr

DYN pointer

len

tags

On use: - null check - bounds check - tag check/update

Can do: - dereference - pointer arithmetic - arbitrary typecasts

1 1 0

Page 12: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

12

A Formal Language

To simplify the presentation, it is

described formally for a small

language: CCured

Then it is described informally

how to extend the approach to

handle the remaining C constructs

Page 13: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

13

The Syntax

Types: τ ::= int | τ ref SAFE |τ ref SEQ |

DYNAMIC

Expressions: e ::= x | e1 op e2 | (τ)e | e1 ⊕ e2 | !e

Commands: c ::= skip | c1; c2 | e1:= e2

Only integers or pointers

Only integers or pointers

ML syntax of references

ML syntax of references

Doesn’t carry the type of the

pointed value

Doesn’t carry the type of the

pointed value

Integer literals

Integer literals

Assortment of binary integer

operations

Assortment of binary integer

operationsCastingCasting Pointers

arithmetic

Pointers arithmetic

Like *e in C

Like *e in C

Memory update through a pointer, like *e1= e2 in C

Memory update through a pointer, like *e1= e2 in C

Page 14: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

Example C Program, translated to CCured

1 int *1 *2 a; //array

2 int i; // index

3 int acc; // accumulator

4 int *3 *4 p; // element ptr

5 int *5 e; // unboxer

6 acc = 0;

7 for (i=0; i<100; i++) {

8 p = a + i; // ptr arithmetic

9 e = *p; // read element

10 while ( (int)e%2 == 0 ) { // check tag

11 e = * (int *6 *7 ) e; // unbox

12 }

13 acc += ((int)e >> 1); // strip tag

14 }

14

1 DYNAMIC ref SEQ a; // array

2 int ref SAFE p_i; // index

3 int ref SAFE p_acc; // accumulator

4 DYNAMIC ref SAFE ref SAFE p_p; // element ptr

5 DYNAMIC ref SAFE p_e; // unboxer

6 p_acc := 0;

7 for ( p_i := 0 ; !p_i<100 ; p_i := !p_i + 1 ) {

8 p_p := (DYNAMIC ref SAFE) (a ⊕ !p_i); // ptr arith

9 p_e := !!p_p; // read element

10 while ( (int) !p_e % 2 == 0 ) { // check tag

11 p_e := !! p_e; // unbox

12 }

13 p_acc := !p_acc + ((int)!p_e >> 1); // strip tag

14 }

Sequence pointer to DYNSequence pointer to DYN

Safe pointer to DYNSafe pointer to DYN

DynamicDynamic

Page 15: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

16

The CCured Type System

The purpose is to maintain the separation between the statically typed and the un-typed words

For presented type system assume that the program contains complete pointer kind information

Type environment is provided with the types for every variable name

It needs to give types, using derivation rules, to expressions and commands

Page 16: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

17

The derivation rules: convertibility

“a ≤ b” – it is possible to convert type a to type b

τ ≤ τ reflexivity

τ ≤ int reading addresses

int ≤ τ ref SEQ pointers arithmetic

int ≤ DYN dereferences are prevented by run-time checks; the

pointer has lost its capability to perform memory operations

τ ref SEQ ≤ τ ref SAFE

reference types can’t change; bounds are checked by run-time checks

Page 17: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

18

The derivation rules: expressions

“x : τ” – expression x is from type τ

(τ ref SAFE) 0 : τ ref SAFE creating safe null pointer

IF e : τ ref SAFE THAN !e : τ memory operations only for

IF e : DYN THAN !e : DYN safe and dynamic pointers

IF ( e : τ’ AND τ’ ≤ τ ) THAN (τ)e : τ casting rules

IF (e1 : int AND e2 : int ) THAN e1 op e2 : int

binary integer operations

IF ( e1 : τ ref SEQ AND e2 : int ) THAN e1⊕e2 : τ ref SEQ

IF ( e1 : DYN AND e2 : int ) THAN e1⊕e2 : DYN

pointer arithmetic only for sequence and dynamic pointers

Page 18: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

19

The derivation rules: commands

IF ( e1 : τ ref SAFE AND e2 : τ ) THAN

e1 := e2

IF ( e1 : DYN AND e2 : DYN ) THAN

e1 := e2

Page 19: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

20

Homes

H is a set of memory allocated areas (which are

called homes)

A home is represented by its starting address and

its size

All homes are disjoint

A special null-home: 0H size(0)=1

Safe pointers and integers have no representation

overhead over C

Sequence and dynamic pointers carry with them

their home

Home starting

at h1

Home starting

at h1

Home - h2Home - h2

Page 20: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

21

Casts Any integer with value n, can be casted to sequence or

dynamic pointer with value n with null-home

No further memory operations

Any sequence or dynamic pointers with value n and with home

with starting address h, can be cast to integer with value n+h

Any dynamic pointer can be cast to different dynamic pointer

with same value and home

No dynamic ↔ sequence since it is not allowed by type system

Any sequence pointer with value n and with home with starting

address h, can be cast to safe pointer with value n+h.

Only if 0≤n<size(home) run-time check

Page 21: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

22

Run-time checks

A null-pointer check for memory operation that uses safe pointer

Memory access boundaries Non-pointer check (null-home)

for sequence and dynamic pointers Programs that cast pointers to

integers and then back to pointers will not be able to use the resulting pointers as memory addresses

Page 22: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

23

Well-typed CCured programs

Can fail

Due to failed run-time check

Can not fail

Due to unexpected types

Due to trying to access an invalid

memory location

Page 23: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

24

Theorem I (Progress and type preservation)

IF e : τ (for valid type τ)

AND

The contents of each memory address corresponds to the typing constraints of the home to which it belongs

THEN EITHER

One of the run-time checks fails during the evaluation of the expression e

OR ELSE

e evaluates to value v AND v is the valid value of type τ

Page 24: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

25

Theorem II (Progress for commands)

For any command c which is built from valid types

IF The contents of each memory address corresponds to

the typing constraints of the home to which it belongs

THEN EITHER

The command execution fails due to run-time checks

OR ELSE

The commands succeeds and still the contents of each memory address corresponds to the typing constraints of the home to which it belongs

Page 25: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

26

Type inference algorithm

Given a C program, translate the pointer types to make the program well-typed in the CCured type system

The C program already uses types of the form “τ ref ”. It is needed to discover whether it should be safe, sequence or dynamic.

τ ref q where q is a qualifier ranging over the set {SAFE, SEQ,

DYN}

The overall strategy is to find as many SAFE and SEQ pointers as possible

Page 26: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

27

Algorithm overview

1. Introduce a qualifier variable for each syntactic occurrence of the pointer type constructor in the C program

2. Scan the program and collect a set of constrains C on these qualifier variables

3. Solve the system of constrains to produce a substitution S of qualifier variables with qualifier values

S(int) = int

S(τ ref q) = DYNAMIC if S(q)=DYN

S(τ) ref S(q)otherwise

4. Apply the substitution to the types of C program to produce a CCured program

Page 27: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

28

Constraint Generation Rules

Convertibility int ≤ τ ref q {q ≠ SAFE} C

τ1 ref q1 ≤ τ2 ref q2

{q1 ← q2} { q1=q2=DYN OR τ1=τ2=int} C

q1 ← q2 = SEQ can be cast to SAFE (q1 is SEQ and q2 is SAFE) or qualifiers are equal

Expressions and commands If e1 : τ ref q and e2 : int than e1⊕e2 : τ ref q

{q ≠ SAFE} C (pointer arithmetic)

Page 28: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

29

Constraint Collection

Additional rules to bridge the gap between C and CCured Allow memory access through SEQ (not just SAFE)

pointers

Allow ints to be read or written through DYNAMIC pointers

In both cases implicit cast, no run-time checks

In a memory write allow a conversion of the value being written to the type of the referenced type

For each type of the form τ ref q’ ref q collect a constraint q=DYN => q’=DYN

Page 29: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

30

Final Set of Constrains

ARITH: q ≠ SAFE

CONV: q ← q’

POINTSTO:

q = DYN => q’ = DYN

ISDYN: q = DYN

EQ: q = q’

Constraint Solving

1. Propagate the ISDYN constrains using the constraints EQ, CONV, and POINTSTO.

2. All qualifier variables involved in ARITH constrains are set to SEQ and this information is propagated using the constraints EQ and CONV

3. Make all the other variables SAFE

The whole type inference process is linear in the size of

the program!

The whole type inference process is linear in the size of

the program!

Page 30: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

31

Handling the rest of C In the DYNAMIC world, structures and arrays are simply

alternative notations for saying how bytes of storage to

allocate

Explicit de-allocation is ignored (Garbage Collecor is used)

The address-of operator in C can yield a pointer to a stack-

allocated variable – additional run-time check that stack

pointer is not copied to a heap or globals

DYNAMIC function pointers and variable-argument functions

are handled by passing a hidden argument which specifies the

types of all arguments passed (checked by callee)

Page 31: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

32

Source Changes

There are still a few cases in which legal

program will stop with a failed run-time

check – some manual invention is still

necessary

Pointer to integer then back to pointer make it

all void*

Some programs attempt to store stack variables

into a memory allocate on the heap

Calling functions in libraries that were not

compiled with CCured write wrapper function

Page 32: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

33

Experimental Results

LOC %Safe %Seq %Dyn CCured Ratio

Purify Ratio

compress 1590 87 12 0 1.25 28

go 29315 96 4 0 2.01 51

ijpeg 31371 36 1 62 2.15 30

li 7761 93 6 0 1.86 50

bh 2053 80 18 0 1.53 94

bisort 707 90 10 0 1.03 42

em3d 557 85 15 0 2.44 7

ks 973 92 8 0 1.47 31

health 725 93 7 0 0.94 25

Page 33: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

34

Bugs Found

ks passes FILE* to printf, not char*

compress, ijpeg: array bound violations

go: 8 array bound violations go: 1 uninit variable as array

index Many involve multi-dimensional

arrays Purify only found go uninit bug ftpd buffer overrun bug

Page 34: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

35

Conclusions

C is a popular and useful program language, but need

to have type safety

Even in C programs most pointers can be verified to be

type safe, rest can be checked in run-time

This work provide us ability to infer simple and

accurately which pointers need to be checked in run-

time

Since majority of the pointers are safe, the overheads

are smaller then those of comparable tools

The presented type system is formally defined and

proved

Page 35: Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.

IntroductionThe Approach’s OverviewA Language of PointersThe Type SystemOperational SemanticsType SafetyType InferenceThe Rest of CExperimentsSummary

36

QUESTIONS?Thank you!