The static code analysis rules for diagnosing potentially unsafe constructions from the viewpoint of...

1

The static code analysis rules for diagnosing potentially unsafe constructions from the

viewpoint of 64-bit programs

Evgeniy Ryzhkov, October, 2008

2

Abstract The article formulates the rules of diagnosing potentially unsafe syntactic constructions in source code of C++

programs and describes the principles of building a static source code analyzer implementing support of the

mentioned rules.

Introduction The task of the static source code analysis has been known for a long time [1] and there are traditional methods of

solving it both in theory and in practice.

However, progress of the industrial software development sets new tasks before the developers of static code

analyzers. We speak about porting the code of applications on 64-bit platforms, support of parallel programming and

so on. There are many peculiarities and problems [2, 3] in these tasks which many programmers already face. Various

tools and methods can help you diagnose them [4].

This article considers one of the approaches to diagnosing problems in the code of 64-bit applications, exactly –

development of a specialized static code analyzer.

http://www.viva64.com/go.php?url=13

3

A static code analyzer consists of two parts:

The front end compiler – a unit providing parsing and lexical and syntactical analysis of the source code and

building of the parse tree for further analysis;

A set of rules of diagnosing potentially unsafe constructions.

By potentially unsafe constructions we understand such constructions in programs’ code which can cause incorrect

operation of the programs while porting an application on a 64-bit platform. You shouldn’t confuse them with defects

[5] in programs’ code which are errors and must be corrected anyway. Unlike defects potentially unsafe constructions

which are diagnosed by the static code analyzer, must be looked through by a programmer. And it is the programmer

who decides if this code should be considered incorrect in a particular situation. If the programmer considers the

code incorrect, it must be corrected.

Thus, the task of a static code analyzer is to diagnose potentially unsafe constructions with the help of a set of rules.

4

Analysis unit development The principles of building a static code analyzer are well studied and reviewed in literature [6]. That’s why you should

choose a traditional approach to building the analysis unit to implement an analyzer intended for development of 64-

bit applications.

As the code analyzer being developed is intended for C and C++ languages, we should proceed from what we know

about these programming languages’ type when constructing the analysis unit.

C++ is defined by context-free grammar (classification by Homskiy). To parse C++ programs a syntactic analyzer

recognizing context-free grammar is used. But lexical parsing is implemented on the basis of regular grammar. The

necessity of both lexical and syntactical analyses is explained by the peculiarities of the rules being checked.

Parsing of C++ is implemented by the recursive descendant method (recursive descendant analysis) with return. This

recognition is implemented in the code analysis library VivaCore [7].

The result of the code parse is a derivation tree. In comparison to an abstract syntax tree the derivation tree contains

more information which is sometimes necessary for further analysis. After that a special algorithm traverses the tree

and checks concrete rules.

5

Data types Before we speak about some rules of diagnosing potentially unsafe constructions we need to decide upon the

architecture we will work out rules for. What is of most importance for us is such a part of an architecture as the data

model. A data model [2] is correlation of the basic data types’ sizes on a particular architecture. Thus, the data model

on the 64-bit Windows version is called LLP64. In the 64-bit Linux LP64 is used. All the rules will be given further for

LLP64 architecture but they can be applied to LP64 too after you replace definitions of the basic types.

Let’s introduce T set – a set of all basic and derived integer types of C++, including pointers. Examples: int, bool, short

int, size_t, viod*, pointers to classes.

Let’s introduce S set – a set of sizes of these types (in bytes), so that Tt Ss . Examples: 1, 2, 4, 8, 16, 32, 64.

The number of members in T and S sets is different – there are more members in T than in S .

Let’s introduce match operation 32 where C++ type is represented within the framework of the 32-bit architecture in

the size of this type: SsSt 32 , and also operation 64 where the language’s type is represented within the

framework of the 64-bit architecture in the size of this type: SsSt 64 . Formally these operations look as follows:

ST :32 and ST :64 .

6

Let’s introduce T set – a set of all memsize-types (types of variable size) of C++, TT . Examples: size_t, ptrdiff_t,

int*, void*.

Members of T set have the same property as

Tt :

SsSt

SsSt*

64

32 , *ss .

In other words, memsize-types are StStTtTT 6432:, .

Let’s introduce TT 32 set – all the data types which are 32-bit both in 32-bit and 64-bit architectures, that is

StStTtTT 64323232323232 :, . An example: int.

By analogy let’s introduce TT 64 set – all the data types which are 64-bit both in 32-bit and 64-bit architectures. An

example: long long.

7

Sizes of all memsize-types on a 32-bit architecture equal one number q =4 (4 bytes):

SpTt , it is true that qSt 64 . Sizes of all memsize-types on a 64-bit architecture equal number *q =8 (8

bytes).

Let’s introduce P set – data types "pointers" in C++ language, TP .

Let’s introduce the indirection operation * in the following way:

TP :* .

This operation is intended for getting a data type pointed to by the pointer: tp * . An example: intint** .

Let’s introduce D set consisting of all the types derived from double type. An example: double, long double.

8

Rules of code correctness analysis All the rules of code correctness analysis are presented in the form of functions which receive some arguments

(different for different rules) and return true in case of incorrect code and false if the code is correct. All the rules are

comprised on the results of study and processing of errors of porting code on 64-bit platforms [2].

Conversion of 32-bit integer types to memsize-types You should consider unsafe constructions of explicit and implicit conversion of 32-bit integer types to memsize-types.

Examples:

unsigned a, c;

size_t b = a;

array[c] = 1;

.otherwise

, if ),(

2321

211false

TtTttruettF

9

Conversion of memsize-types to 32-bit integer types You should consider unsafe constructions of explicit and implicit conversion of memsize-types to 32-bit integer types.

An example:

size_t a;

unsigned b = a;

.otherwise

, if ),(

3221

212false

TtTttruettF

10

Memsize-types in virtual functions You should consider unsafe a virtual function which satisfies these conditions:

a). The function is defined in a basic class and in the derived class.

b). Types of the functions’ arguments don’t coincide but are equivalent on a 32-bit system (for example: unsigned,

size_t) and non-equivalent on a 64-bit one.

11

An example:

class Base {

virtual void foo(size_t);

};

class Derive : public Base {

virtual void foo(unsigned);

};

Let’s consider tuples 1M and 2M which are sets of members from T set. You should consider unsafe the situation

when 1M and 2M tuples coincide in 32-bit mode and differ in 64-bit mode.

.otherwise

,..1,)()()()(

)()()()( if

),(642641322321

642641322321

213

false

niSmSmSmSm

SmSmSmSmtrue

MMFiiii

iiii

12

Memsize-types in overloaded functions You should consider unsafe call of overloaded functions with the argument of memsize-type. Functions must be

overloaded for 32-bit and 64-bit integer data types.

An example:

void WriteValue(__int32);

void WriteValue(__int64);

...

ptrdiff_t value;

WriteValue(value);

13

Let’s consider call of a function with n actual arguments. If we have 2 or more overloaded functions with the same

number of arguments, we need to perform the following check.

A - the tuple of types of the function’s actual parameters;

1A - the tuple of types of the first overloaded function’s formal parameters;

2A - the tuple of types of the second overloaded function’s formal parameters;

.otherwise

,..1,)()()()()( if ),,(

641322642321

214false

niTaTaTaTaTatrueAAAF

iiiii

14

Conversion of pointers’ types You should consider unsafe explicit conversion of one type of a pointer to another if one of them points to a 32-/64-

bit type and the other to memsize-type.

An example:

int *array;

size_t *sizetPtr = (size_t *)(array);

.otherwise

,

if

),(64322

*

11

*

2

64322

*

21

*

1

215

false

TTtpTtp

TTtpTtptrue

ppF

Conversion of memsize-types to double You should consider unsafe explicit and implicit conversions of a memsize-type to double and vice versa.

An example:

size_t a;

double b = a;

.otherwise

, if ),(

1221

216false

TtDtTtDttruettF

15

Memsize-types in a function with the variable number of arguments

You should consider unsafe transfer of a memsize-type (except for pointers) to a function with the variable number of

arguments.

An example:

size_t a;

printf("%u", a);

Let K be a tuple of all the actual types serving as parameters of a function with the variable number of arguments.

Let the function be called a function with m arguments.

.otherwise

,..1,/if )(7

false

miPT ktrueKF

i

16

Dangerous constants You should consider unsafe constants of a particular type. Let’s introduce N set of integer numbers which can be

written by means of C++ language. Let’s introduce a set of "dangerous" constants NC . Example of "dangerous"

constants: 4, 32, 0xffffffff etc.

.otherwise

, if )(8

false

CctruecF

17

Memsize-types in unions You should consider unsafe presence of members of memsize-types in unions.

An example:

union PtrNumUnion {

char *m_p;

unsigned m_n;

} u;

Let’s call all the data types included into the union U tuple.

.otherwise

, if )(9

false

TUtrueUF

18

Exceptions and memsize-types You should consider unsafe throwing and processing of exceptions using memsize-types.

An example:

char *p1, *p2;

try {

throw (p1 – p2);

}

catch (int) {

...

}

.otherwise

,if )(10

false

T ttruetF

19

Conclusion The rules of diagnosing potentially unsafe constructions from the viewpoint of 64-bit applications considered in the

article may be implemented in any static code analyzer.

But at present they are implemented in a complete form only in Viva64 code analyzer (www.viva64.com). Viva64

program product provides diagnoses of errors specific for 64-bit Windows applications. Viva64 is a lint-like static

analyzer of C/C++ code. Viva64 tool integrates into Visual Studio 2005/2008 development environment and provides

a convenient user interface for checking program projects.

http://www.viva64.com/

20

References 1. Scott Meyers, Martin Klaus "A First Look at C++ Program Analyzers.", 1997,

http://www.viva64.com/go.php?url=13.

2. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. RSDN Magazine #1-

2007. pp. 65 – 75.

3. Alexey Kolosov, Evgeniy Ryzhkov, Andrey Karpov. 32 OpenMP traps for C++ developers. RSDN Magazine #2-

2008. pp. 3 – 17.

4. E. A. Ryzhkov, A.N. Karpov. Approaches to verification and testing of 64-bit applications. "Information

Technologies" №7, 2008, pp. 41 – 45.

5. S. McConnell. Perfect code. Master-class / translated from English – Moscow: "Russian Edition" publishing

house, St. Petersburg, 2007 – 896 pp.: illustrations.

6. System software / A.V. Gordeev, A.U. Molchanov. – St. Petersburg: Piter, 2002. – 736 pp.: illustrations.

7. Evgeniy Ryzhkov, Andrey Karpov. The essence of the VivaCore code analysis library. RSDN Magazine #1-2008.

pp. 56 – 63.

http://www.viva64.com/go.php?url=13

The static code analysis rules for diagnosing potentially unsafe constructions from the viewpoint of...

Technology

Transcript of The static code analysis rules for diagnosing potentially unsafe constructions from the viewpoint of...