INFO 631 Prof . Glenn Booker

50
www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1 INFO631 Week 3

description

INFO 631 Prof . Glenn Booker. Week 3 – Complexity Metrics and Models. Origin. Complexity metrics were developed by computer scientists and software engineers Strongly based on empirical (real world) measurement, with little theory Primarily broken into internal and external measures. - PowerPoint PPT Presentation

Transcript of INFO 631 Prof . Glenn Booker

Page 1: INFO  631 Prof . Glenn Booker

www.ischool.drexel.edu

INFO 631 Prof. Glenn Booker

Week 3 – Complexity Metrics and Models

1INFO631 Week 3

Page 2: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 2

Origin

• Complexity metrics were developed by computer scientists and software engineers

• Strongly based on empirical (real world) measurement, with little theory

• Primarily broken into internal and external measures

Page 3: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 3

Internal versus External

• Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.)

• External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.)

Page 4: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 4

Internal Measures

Page 5: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 5

Internal Product Attributes• Size measures

– Input to prediction models– Normalizing factor for cost, productivity, etc.– Progress during development

• Typically use lines of code (LOC) or function point counts; – LOC is a better measure for predicting cost

and schedule

Page 6: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 6

Lines of Code• Simple complexity metric, often based on

number of executable statements or instruction statements– Highest defect rates often occurs in small

modules– Larger modules have a smaller defect rate

(if they exist at all) - until too cumbersome– Optimum module size ~ 250 lines

Page 7: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 7

Function Points

• Function points help avoid biases due to the programming language(s) used

• Provide a more “fair” basis for comparing different environments

• Focuses on how much work the program accomplishes, not how concisely it is expressed

Page 8: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 8

Halstead Metrics• Also known as Software Science, 1977• Examine program as compilable “tokens”• Tokens are either operators (+, -) or operands

(variables)• Derive metrics such as Vocabulary, Length, Volume,

Difficulty, etc.• Not widely used

Page 9: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 9

Data Structure (Halstead)

• Halstead’s 2 - number of distinct operands in a module– Operands include: number of variables,

number unique constants, and number of labels

• Operand usage (OU)– OU = 2/N2 where N2 is the total number of

operand references

Page 10: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 10

Software Complexity

• Is a characteristic that influences the resources needed to build and maintain it

• Many different characteristics of software relate to complexity

• These complexity characteristics revolve around the structure of the software

Page 11: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 11

Types of Structural Measures• Control flow

– Addresses sequence in which instructions are executed

– Iteration and looping• Data flow

– Follows trail of data as it is created and handled

– Depicts behavior of data as it interacts with the program

Page 12: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 12

Types of Structural Measures

• Data structure– Concerned with organization of data itself– Provides information about difficulties in

handling data and in defining test cases

Page 13: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 13

Control Flow

• Modeled by directed graphs (control flow graphs)– Each node corresponds to a single program

statement– Arcs (directed edges) indicate flow of control

from one statement to another

Page 14: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 14

Control Flow

• Control flow graphs are useful for:– Analysis (estimating number of defects)– Expressing complexity by a single value– Assessing testability and test coverage

Page 15: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 15

Basic Control Constructs

If A then X else YA

YXt f

Repeat X until A

X

A

f

t

Case A of a1 : X1 . . an : Xn

...

a1a2

an

A

X1 X2Xn

Note: t=true f=false

If A then X

A

X

tf

While A do X

A

X

ft

Page 16: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 16

Cyclomatic Complexity

• McCabe, 1976• Based on a program’s control flow chart• Related to number of separate graphable

areas, or number of linearly independent paths in the program

• Complexity MC = edges - nodes + 2*(# of unconnected paths)

Page 17: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 17

Cyclomatic Complexity• Complexity under 10 generally desired• Can also find M as number of binary

decisions (yes/no) minus one– Multiple choice decisions with ‘n’ choices

count as (n-1) binary decisions• Ignores differences among specific types

of control structures

Page 18: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 18

Cyclomatic Complexity

• Uses of complexity metric:– Identify complex modules needing detailed

inspection or redesign– Identify simple modules needing minimal

inspection and/or testing– Estimate programming, testing and

maintenance effort– Identify potentially troublesome code

Page 19: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 19

Control Flow Representation of Programs

• Software programs can be represented by linear directed segments combined with the basic control flow constructs

• Control flow constructs may be nested, e.g. an IF statement can be inside of a WHILE loop

Page 20: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 20

Control Flow Representation of Programs

• Example:

1 2

34

56

7

89

10

111213

14

McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program

MC = # of edges - # of nodes +2

Linearly independent paths for example <2, 11> <2, 10, 12, 14> <2, 10, 12, 13, 12, 14> <1, 3, 5, 6, 9> <1, 4, 6,9> <1, 4, 6, 7, 8, 9>

Page 21: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 21

Control Flow--Linearly Independent Paths

b

c

e

g

f

d

a1 2

3 45 6

7

8

910

MC = edges - nodes + 2 = 10 - 7 + 2 = 5

Set of linearly independent paths: b1: abcg b2: abcbcg b3: abefg b4: adefg b5: adfgAny arbitrary path is equal to a linear combination of the linearly independent paths listed aboveFor example, path abcbefg is equal to: b2 + b3 - b1

Page 22: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 22

Knots - Control Flow Crossovers

• Knot measure -- total number of points at which control flow lines cross

IF (TIME) 30,30,1010 CALL TEMP1 IF (X1) 20,20,4020 Y1=Y+1 Y2=0 CALL TEMP2 GO TO 5030 Z1=140 CALL TEMP3 Z2=Z2+150 CALL TEMP4

How many are here?

Page 23: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 23

Syntactic Constructs

• Examine effect of using specific control structures on defect rate

• Is, by definition, language-specific• Can result in statistically significant

relationships– e.g. Lo used to show that DO WHILE should

be avoided in COBOL

Page 24: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 24

External Measures

Page 25: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 25

Computational Complexity• Examines algorithmic efficiency and use of

machine resources (memory, I/O, storage)• Studies quantitative aspects of solutions to

computational problems• Examples may include sorting efficiency

for a database, managing I/O constraints across a large scale network, etc.

Page 26: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 26

Psychological Complexity

• Concerned with characteristics of software that affect human performance- Injection of defects (when and why does a

programmer make errors?)- Ease of building the software (effort required)- Ease of maintenance (effort required)

Page 27: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 27

Data Structure (Database)• Database size per program size

(DBSPPS)– DBSPPS = DBS/PS

• Where DBS is database size in bytes or characters• PS is program size in source instructions

– Used in COCOMO model as a cost driver• Ordinal scale measure derived from DBSPPS

Page 28: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 28

Fan-in and Fan-out

• Focus is the interaction among code modules– Fan-in = # of modules which call a given

module– Fan-out = # of modules which are called by a

given module• Or, more formally...

Page 29: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 29

Fan-in and Fan-out• Fan-in of a module is the number of local flows

terminating at the module, plus the number of data structures from which info is retrieved by the module

• Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module

Page 30: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 30

Fan-in and Fan-out• Do fan-in and fan-out affect software

quality?– Large fan-in modules may be interpolation or

look-up routines - no defect correlation– Large fan-out often relates to high defect rate

- has a high defect correlation• Is large fan-in and fan-out bad?

Page 31: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 31

Fan-in and Fan-out• Information flow complexity

– Henry and Kafura: Size*(fan-in * fan-out)2

– Shepperd: (fan-in * fan-out)2

• Henry and Kafura measure helps predict the number of software maintenance problems

Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518 Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10.

Page 32: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 32

Structure Metrics

• Shepperd measure correlates with software development time

• Information flow metric (Henry & Selig) HC = C * (fan-in * fan-out)^2– where C is the cyclometric complexity

Page 33: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 33

Structure Metrics

• System complexity (Card & Glass)– Based on structural complexity (average fan-

out squared) and data complexity (based on number of I/O variables and fan-out)

– Quantified effect of complexity on error rate

Page 34: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 34

Module Call Graph

• Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier– Or, a distinct, named group of LOC

• The module call graph shows which modules call each other, and what key information is passed among them

Page 35: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 35

Module Call Graph example

Find_Ave

Main

AverageRead_Scores

Print_Ave

scores

average

average

scores

eof

scores

Page 36: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 36

Module Coupling Measures

• Average number of calls per module (ANCPM)

• Fraction of modules that make calls (FMC)

ANCPM = Number of Interconnections

Number of Modules

FMC = Number of Modules that call

Number of Modules

Page 37: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 37

Information Flow Measures• Types of information flows

– Local direct flow • Module invokes a 2nd module & passes info to it• Invoked module returns result to the caller

– Local indirect flow• Invoked module returns info that is subsequently passed

to a second invoked module– Global flow

• Info flows from one module to another via a global data structure

Page 38: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 38

IEEE-STD-982

• Number of Entries and Exits per Module, ‘m’– Like fan-in and fan-out m = entries + exits

• Software Science measures

Page 39: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 39

IEEE-STD-982• Graph-Theoretic Complexity

– Static ComplexityC = Edges - Nodes + 1

– Generalized Static ComplexityBased on summing resources needed for each module (e.g. storage, access time, etc.)

– Dynamic complexityComplexity as it changes over time across a network

Page 40: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 40

IEEE-STD-982

• Cyclomatic complexity• Minimal Unit Test Case Determination

– Determine number of independent paths through a module, to get minimum number of test cases for unit testing

• Data or information flow complexity– Fan-in and fan-out of variables

Page 41: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 41

IEEE-STD-982

• Design Structure adds weighted (%) average of six parameters:

1. Whether designed top down (Y/N)2. Module inter-dependence3. Module dependence on prior processing4. Database size (# of elements)5. Database compartmentalization6. Module single entrance and exit (Y/N)

– Weighting is chosen to meet project needs

Page 42: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 42

Other Measures

• Compiler measures– Size (bytes of compiled code)– Number of symbols and variables– Cross-reference of all labels– Statement count

Page 43: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 43

Other Measures• Configuration Management Library

Measures– Number of code modules– Number of versions of each module– History of change dates of each module– Module size– Number of related documents for each

module

Page 44: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 44

Availability Metrics

• Most information systems are critical to day-to-day operations– Witness Google or Blackberry being offline

for mere minutes is news• Availability depends on 1) how often the

system goes down, and 2) how long it takes to restore it after a crash

Page 45: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 45

Availability Metrics

• Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive

• Often measure availability by the number of 9’s in the desired level of availability – Two nines is 99%, three nines is 99.9%, four

nines is 99.99%, etc.– How many nines can you afford?

Page 46: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 46

Availability MetricsNo. of 9’s Availability Down time

per year

2 99% 87.6 hours

3 99.9% 8.8 hours

4 99.99% 53 minutes

5 99.999% 5.3 minutes

Page 47: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 47

Achieving High Availability

• Many techniques are used to help ensure that high levels of availability are possible– Duplicate systems (clustering)– RAID data duplication– Duplicate power supplies– Independent power supplies– Uninterruptible power supplies (UPS’)

Page 48: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 48

Availability and Code Quality

• Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability– Consistent methods for measurement and

definitions of terms are needed for further refinement

Page 49: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 49

Customer Outage Data• In order to determine availability, the

actual customer-visible system outage time needs to be collected– In order to get this data, the customer must

place a very high priority on availability– This data could be used to identify software

components which most reduce availability

Page 50: INFO  631 Prof . Glenn Booker

www.ischool.drexel.eduINFO631 Week 3 50

Availability

• We also expect that availability for a new system should increase over the first couple years of its use

• Defect causal analysis can help reduce the root cause of defects, thereby improving availability