1 Oil Field Safety Bill Luther, APS-FSR. 2 Unsafe Act or Unsafe Condition.
The static code analysis rules for diagnosing potentially unsafe constructions from the viewpoint of...
-
Upload
sergey-vasilyev -
Category
Technology
-
view
96 -
download
1
Transcript of The static code analysis rules for diagnosing potentially unsafe constructions from the viewpoint of...
1
The static code analysis rules for diagnosing potentially unsafe constructions from the
viewpoint of 64-bit programs
Evgeniy Ryzhkov, October, 2008
2
Abstract The article formulates the rules of diagnosing potentially unsafe syntactic constructions in source code of C++
programs and describes the principles of building a static source code analyzer implementing support of the
mentioned rules.
Introduction The task of the static source code analysis has been known for a long time [1] and there are traditional methods of
solving it both in theory and in practice.
However, progress of the industrial software development sets new tasks before the developers of static code
analyzers. We speak about porting the code of applications on 64-bit platforms, support of parallel programming and
so on. There are many peculiarities and problems [2, 3] in these tasks which many programmers already face. Various
tools and methods can help you diagnose them [4].
This article considers one of the approaches to diagnosing problems in the code of 64-bit applications, exactly –
development of a specialized static code analyzer.
3
A static code analyzer consists of two parts:
The front end compiler – a unit providing parsing and lexical and syntactical analysis of the source code and
building of the parse tree for further analysis;
A set of rules of diagnosing potentially unsafe constructions.
By potentially unsafe constructions we understand such constructions in programs’ code which can cause incorrect
operation of the programs while porting an application on a 64-bit platform. You shouldn’t confuse them with defects
[5] in programs’ code which are errors and must be corrected anyway. Unlike defects potentially unsafe constructions
which are diagnosed by the static code analyzer, must be looked through by a programmer. And it is the programmer
who decides if this code should be considered incorrect in a particular situation. If the programmer considers the
code incorrect, it must be corrected.
Thus, the task of a static code analyzer is to diagnose potentially unsafe constructions with the help of a set of rules.
4
Analysis unit development The principles of building a static code analyzer are well studied and reviewed in literature [6]. That’s why you should
choose a traditional approach to building the analysis unit to implement an analyzer intended for development of 64-
bit applications.
As the code analyzer being developed is intended for C and C++ languages, we should proceed from what we know
about these programming languages’ type when constructing the analysis unit.
C++ is defined by context-free grammar (classification by Homskiy). To parse C++ programs a syntactic analyzer
recognizing context-free grammar is used. But lexical parsing is implemented on the basis of regular grammar. The
necessity of both lexical and syntactical analyses is explained by the peculiarities of the rules being checked.
Parsing of C++ is implemented by the recursive descendant method (recursive descendant analysis) with return. This
recognition is implemented in the code analysis library VivaCore [7].
The result of the code parse is a derivation tree. In comparison to an abstract syntax tree the derivation tree contains
more information which is sometimes necessary for further analysis. After that a special algorithm traverses the tree
and checks concrete rules.
5
Data types Before we speak about some rules of diagnosing potentially unsafe constructions we need to decide upon the
architecture we will work out rules for. What is of most importance for us is such a part of an architecture as the data
model. A data model [2] is correlation of the basic data types’ sizes on a particular architecture. Thus, the data model
on the 64-bit Windows version is called LLP64. In the 64-bit Linux LP64 is used. All the rules will be given further for
LLP64 architecture but they can be applied to LP64 too after you replace definitions of the basic types.
Let’s introduce T set – a set of all basic and derived integer types of C++, including pointers. Examples: int, bool, short
int, size_t, viod*, pointers to classes.
Let’s introduce S set – a set of sizes of these types (in bytes), so that Tt Ss . Examples: 1, 2, 4, 8, 16, 32, 64.
The number of members in T and S sets is different – there are more members in T than in S .
Let’s introduce match operation 32 where C++ type is represented within the framework of the 32-bit architecture in
the size of this type: SsSt 32 , and also operation 64 where the language’s type is represented within the
framework of the 64-bit architecture in the size of this type: SsSt 64 . Formally these operations look as follows:
ST :32 and ST :64 .
6
Let’s introduce T set – a set of all memsize-types (types of variable size) of C++, TT . Examples: size_t, ptrdiff_t,
int*, void*.
Members of T set have the same property as
Tt :
SsSt
SsSt*
64
32 , *ss .
In other words, memsize-types are StStTtTT 6432:, .
Let’s introduce TT 32 set – all the data types which are 32-bit both in 32-bit and 64-bit architectures, that is
StStTtTT 64323232323232 :, . An example: int.
By analogy let’s introduce TT 64 set – all the data types which are 64-bit both in 32-bit and 64-bit architectures. An
example: long long.
7
Sizes of all memsize-types on a 32-bit architecture equal one number q =4 (4 bytes):
SpTt , it is true that qSt 64 . Sizes of all memsize-types on a 64-bit architecture equal number *q =8 (8
bytes).
Let’s introduce P set – data types "pointers" in C++ language, TP .
Let’s introduce the indirection operation * in the following way:
TP :* .
This operation is intended for getting a data type pointed to by the pointer: tp * . An example: intint** .
Let’s introduce D set consisting of all the types derived from double type. An example: double, long double.
8
Rules of code correctness analysis All the rules of code correctness analysis are presented in the form of functions which receive some arguments
(different for different rules) and return true in case of incorrect code and false if the code is correct. All the rules are
comprised on the results of study and processing of errors of porting code on 64-bit platforms [2].
Conversion of 32-bit integer types to memsize-types You should consider unsafe constructions of explicit and implicit conversion of 32-bit integer types to memsize-types.
Examples:
unsigned a, c;
size_t b = a;
array[c] = 1;
.otherwise
, if ),(
2321
211false
TtTttruettF
9
Conversion of memsize-types to 32-bit integer types You should consider unsafe constructions of explicit and implicit conversion of memsize-types to 32-bit integer types.
An example:
size_t a;
unsigned b = a;
.otherwise
, if ),(
3221
212false
TtTttruettF
10
Memsize-types in virtual functions You should consider unsafe a virtual function which satisfies these conditions:
a). The function is defined in a basic class and in the derived class.
b). Types of the functions’ arguments don’t coincide but are equivalent on a 32-bit system (for example: unsigned,
size_t) and non-equivalent on a 64-bit one.
11
An example:
class Base {
virtual void foo(size_t);
};
class Derive : public Base {
virtual void foo(unsigned);
};
Let’s consider tuples 1M and 2M which are sets of members from T set. You should consider unsafe the situation
when 1M and 2M tuples coincide in 32-bit mode and differ in 64-bit mode.
.otherwise
,..1,)()()()(
)()()()( if
),(642641322321
642641322321
213
false
niSmSmSmSm
SmSmSmSmtrue
MMFiiii
iiii
12
Memsize-types in overloaded functions You should consider unsafe call of overloaded functions with the argument of memsize-type. Functions must be
overloaded for 32-bit and 64-bit integer data types.
An example:
void WriteValue(__int32);
void WriteValue(__int64);
...
ptrdiff_t value;
WriteValue(value);
13
Let’s consider call of a function with n actual arguments. If we have 2 or more overloaded functions with the same
number of arguments, we need to perform the following check.
A - the tuple of types of the function’s actual parameters;
1A - the tuple of types of the first overloaded function’s formal parameters;
2A - the tuple of types of the second overloaded function’s formal parameters;
.otherwise
,..1,)()()()()( if ),,(
641322642321
214false
niTaTaTaTaTatrueAAAF
iiiii
14
Conversion of pointers’ types You should consider unsafe explicit conversion of one type of a pointer to another if one of them points to a 32-/64-
bit type and the other to memsize-type.
An example:
int *array;
size_t *sizetPtr = (size_t *)(array);
.otherwise
,
if
),(64322
*
11
*
2
64322
*
21
*
1
215
false
TTtpTtp
TTtpTtptrue
ppF
Conversion of memsize-types to double You should consider unsafe explicit and implicit conversions of a memsize-type to double and vice versa.
An example:
size_t a;
double b = a;
.otherwise
, if ),(
1221
216false
TtDtTtDttruettF
15
Memsize-types in a function with the variable number of arguments
You should consider unsafe transfer of a memsize-type (except for pointers) to a function with the variable number of
arguments.
An example:
size_t a;
printf("%u", a);
Let K be a tuple of all the actual types serving as parameters of a function with the variable number of arguments.
Let the function be called a function with m arguments.
.otherwise
,..1,/if )(7
false
miPT ktrueKF
i
16
Dangerous constants You should consider unsafe constants of a particular type. Let’s introduce N set of integer numbers which can be
written by means of C++ language. Let’s introduce a set of "dangerous" constants NC . Example of "dangerous"
constants: 4, 32, 0xffffffff etc.
.otherwise
, if )(8
false
CctruecF
17
Memsize-types in unions You should consider unsafe presence of members of memsize-types in unions.
An example:
union PtrNumUnion {
char *m_p;
unsigned m_n;
} u;
Let’s call all the data types included into the union U tuple.
.otherwise
, if )(9
false
TUtrueUF
18
Exceptions and memsize-types You should consider unsafe throwing and processing of exceptions using memsize-types.
An example:
char *p1, *p2;
try {
throw (p1 – p2);
}
catch (int) {
...
}
.otherwise
,if )(10
false
T ttruetF
19
Conclusion The rules of diagnosing potentially unsafe constructions from the viewpoint of 64-bit applications considered in the
article may be implemented in any static code analyzer.
But at present they are implemented in a complete form only in Viva64 code analyzer (www.viva64.com). Viva64
program product provides diagnoses of errors specific for 64-bit Windows applications. Viva64 is a lint-like static
analyzer of C/C++ code. Viva64 tool integrates into Visual Studio 2005/2008 development environment and provides
a convenient user interface for checking program projects.
20
References 1. Scott Meyers, Martin Klaus "A First Look at C++ Program Analyzers.", 1997,
http://www.viva64.com/go.php?url=13.
2. Andrey Karpov, Evgeniy Ryzhkov. 20 issues of porting C++ code on the 64-bit platform. RSDN Magazine #1-
2007. pp. 65 – 75.
3. Alexey Kolosov, Evgeniy Ryzhkov, Andrey Karpov. 32 OpenMP traps for C++ developers. RSDN Magazine #2-
2008. pp. 3 – 17.
4. E. A. Ryzhkov, A.N. Karpov. Approaches to verification and testing of 64-bit applications. "Information
Technologies" №7, 2008, pp. 41 – 45.
5. S. McConnell. Perfect code. Master-class / translated from English – Moscow: "Russian Edition" publishing
house, St. Petersburg, 2007 – 896 pp.: illustrations.
6. System software / A.V. Gordeev, A.U. Molchanov. – St. Petersburg: Piter, 2002. – 736 pp.: illustrations.
7. Evgeniy Ryzhkov, Andrey Karpov. The essence of the VivaCore code analysis library. RSDN Magazine #1-2008.
pp. 56 – 63.