COCOMO II and Big Data - CSSE | Center for Systems and Software...

39
COCOMO II and Big Data Rachchabhorn Wongsaroj*, Jo Ann Lane, Supannika Koolmanojwong, Barry Boehm *Bank of Thailand and Center for Systems and Software Engineering Computer Science Department , Viterbi School of Engineering University of Southern California 28 th International Forum on COCOMO and System/Software Cost Modeling

Transcript of COCOMO II and Big Data - CSSE | Center for Systems and Software...

COCOMO II and Big Data

Rachchabhorn Wongsaroj*, Jo Ann Lane,

Supannika Koolmanojwong, Barry Boehm

*Bank of Thailand and Center for Systems and Software Engineering

Computer Science Department , Viterbi School of Engineering

University of Southern California

28th International Forum on COCOMO and System/Software Cost Modeling

Outline

Big Data Concept

COCOMO II Cost factor

COCOMO II Cost factor and Big Data

Future Works

(c) USC CSSE

2

Source: IBM

Big Data concept

3

3V’s concepts of Big Data (IBM)

Volume -- The amounts of data generated

Variety -- The different data types and sources

Velocity -- The speed of data is generated in/out and moves around

Big Data

Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute)

Volume People to People People to

Machine

Machine to Machine

Variety

Velocity 8 Billion

messages/day 845M active users

340Million

Tweets/day 140M active users

20 Hours of

video uploaded every minute

Source: IBM

4

Big Data concept

Source: IBM

Big Data Landscape

5 Source: Sajal Das, Keith Marzullo

6

Big Data Landscape (cont.)

Source: blogs.forbes.com/davefeinleib

Lots of data is being created &

collected

World interconnection

Data

Quantity

Data

Quality

Data

Variety

Data

Timely

Big Data problems

7

COCOMO II

product size estimate

product, process, platform, and personnel attributes

reuse, maintenance, and increment parameters

organizational project data

development, maintenance cost and schedule estimates

cost, schedule distribution by phase, activity, increment

recalibration to organizational data

COCOMO Black Box Model

8

(c) USC CSSE

COCOMO II – Cost factor

9

(c) USC CSSE

Significant factors of development cost: scale drivers are sources of exponential effort variation cost drivers are sources of linear effort variation

product, platform, personnel and project attributes effort multipliers associated with cost driver ratings

Each factor is rated between very low and very high per rating guidelines

Precedentedness (PREC) Degree to which system is new and past experience applies

Development Flexibility (FLEX) Need to conform with specified requirements

Architecture/Risk Resolution (RESL)

Degree of design thoroughness and risk elimination

Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict

Process Maturity (PMAT) SEI CMM process maturity rating

(c) USC CSSE 10

Scale Drivers

(c) USC CSSE

10

(c) USC CSSE 11 (c) USC CSSE

11

Precedentedness (PREC) Degree to which system is new and past experience applies

Development Flexibility (FLEX) Need to conform with specified requirements

Architecture/Risk Resolution (RESL)

Degree of design thoroughness and risk elimination

Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict

Process Maturity (PMAT) SEI CMM process maturity rating

Scale Drivers

(c) USC CSSE 12

Scale Factors (Wi) Very Low Low Nominal High Very High Extra High

Precedentedness (PREC)

thoroughly unprecedented

largely unprecedented

somewhat unprecedented

generally familiar

largely familiar

throughly familiar

Development Flexibility (FLEX)

rigorous occasional relaxation

some relaxation

general conformity

some conformity

general goals

Architecture/Risk Resolution (RESL)*

little (20%) some (40%) often (60%) generally (75%)

mostly (90%)

full (100%)

Team Cohesion (TEAM)

very difficult interactions

some difficult interactions

basically cooperative interactions

largely cooperative

highly cooperative

seamless interactions

Process Maturity (PMAT)

Weighted average of “Yes” answers to CMM Maturity Questionnaire

* % significant module interfaces specified, % significant risks eliminated

(c) USC CSSE

12

Scale Drivers

Elaboration of the PREC rating scales:

(c) USC CSSE 13

Feature

Very Low

Nominal / High Extra High

Precedentedness

Organizational understanding of product objectives

General

Considerable

Thorough

Experience in working with related software systems

Moderate

Considerable

Extensive

Concurrent development of associated new hardware and operational procedures

Extensive

Moderate

Some

Need for innovative data processing architectures, algorithms

Considerable

Some

Minimal

Precedentedness (PREC)

(c) USC CSSE

13

Platform Factors Time constraint (TIME)

Storage constraint (STOR)

Platform volatility (PVOL)

Personnel Factors Analyst capability (ACAP)

Program capability (PCAP)

Applications experience (APEX)

Platform experience (PLEX)

Language and tool experience (LTEX)

Personnel continuity (PCON)

Cost Drivers

14

(c) USC CSSE

Product Factors Reliability (RELY)

Data (DATA)

Complexity (CPLX)

Reusability (RUSE)

Documentation (DOCU)

Project Factors Software tools (TOOL)

Multisite development (SITE)

Required schedule (SCED)

Personnel Factors Analyst capability (ACAP)

Program capability (PCAP)

Applications experience (APEX)

Platform experience (PLEX)

Language and tool experience (LTEX)

Personnel continuity (PCON)

15

(c) USC CSSE

Product Factors Reliability (RELY)

Data (DATA)

Complexity (CPLX)

Reusability (RUSE)

Documentation (DOCU)

Cost Drivers and Big Data

Platform Factors Time constraint (TIME)

Storage constraint (STOR)

Platform volatility (PVOL)

Project Factors Software tools (TOOL)

Multisite development (SITE)

Required schedule (SCED)

Required Software Reliability (RELY) Measures the extent to which the software must perform its

intended function over a period of time.

Ask: what is the effect of a software failure?

Product Factors (cont’d)

(c) USC CSSE

16

Very Low

Low

Nominal

High

Very High

Extra High

RELY Descriptors

slight

inconvenience

low, easily

recoverable

losses

moderate,

easily

recoverable

losses

high financial

loss

risk to human

life

Source: IBM

Big Data Landscape

17 Source: Sajal Das, Keith Marzullo

Data Base Size (DATA) Captures the effect large data requirements have on development

to generate test data that will be used to exercise the program.

Calculate the data/program size ratio (D/P):

)(Program

)(

SLOCSize

ByteszeDataBaseSi

P

D

Very Low Low Nominal High Very High Extra High

DATA DB bytes/ Pgm SLOC < 10 10 D/P < 100 100 D/P < 1000 D/P > 1000

Product Factors (cont’d)

(c) USC CSSE

18

IBM: Data Base Size of

Big Data -> Scale from

terabytes to zettabytes

(c) USC CSSE (c) USC CSSE

19

20

Source: (c)2012 Enterprise Strategy Group

Product Complexity (CPLX) Complexity is divided into five areas:

control operations,

computational operations,

device-dependent operations,

data management operations, and

user interface management operations.

Select the area or combination of areas that characterize the product or a sub-system of the product.

(c) USC CSSE

21

Product Factors (cont’d)

Module Complexity Ratings vs. Type of Module Use a subjective weighted average of the attributes, weighted by their relative

product importance.

Very Low Low Nominal High Very High Extra High

Control Operations

Straightline code with a few non-nested structured programming operators: DOs, CASEs, IFTHENELSEs. Simple module composition via procedure calls or simple scripts.

Straightforward nesting of structured programming operators. Mostly simple predicates.

Mostly simple nesting. Some intermodule control. Decision tables. Simple callbacks or message passing, including middleware-supported distributed processing.

Highly nested structured programming operators with many compound predicates. Queue and stack control. Homogeneous, dist. processing. Single processor soft real-time ctl.

Reentrant and recursive coding. Fixed-priority interrupt handling. Task synchronization, complex callbacks, heterogeneous dist. processing. Single-processor hard real-time ctl.

Multiple resource scheduling with dynamically changing priorities. Microcode-level control. Distributed hard real-time control.

Computational Operations

Evaluation of simple expressions: e.g., A=B+C*(D-E)

Evaluation of moderate-level expressions: e.g., D=SQRT(B**2-4.*A*C)

Use of standard math and statistical routines. Basic matrix/vector operations.

Basic numerical analysis: multivariate interpolation, ordinary differential eqns. Basic truncation, roundoff concerns.

Difficult but structured numerical analysis: near-singular matrix equations, partial differential eqns. Simple parallelization.

Difficult and unstructured numerical analysis: highly accurate analysis of noisy, stochastic data. Complex parallelization.

(c) USC CSSE

22

Product Factors (cont’d)

Very Low Low Nominal High Very High Extra High

Device-dependent Operations

Simple read, write statements with simple formats.

No cognizance needed of particular processor or I/O device characteristics. I/O done at GET/PUT level.

I/O processing includes device selection, status checking and error processing.

Operations at physical I/O level (physical storage address translations; seeks, reads, etc.). Optimized I/O overlap.

Routines for interrupt diagnosis, servicing, masking. Communication line handling. Performance-intensive embedded systems.

Device timing-dependent coding, micro-programmed operations. Performance-critical embedded systems.

Data Management Operations

Simple arrays in main memory. Simple COTS-DB queries, updates.

Single file subsetting with no data structure changes, no edits, no intermediate files. Moderately complex COTS-DB queries, updates.

Multi-file input and single file output. Simple structural changes, simple edits. Complex COTS-DB queries, updates.

Simple triggers activated by data stream contents. Complex data restructuring.

Distributed database coordination. Complex triggers. Search optimization.

Highly coupled, dynamic relational and object structures. Natural language data management.

User Interface Management

Simple input forms, report generators.

Use of simple graphic user interface (GUI) builders.

Simple use of widget set.

Widget set development and extension. Simple voice I/O, multimedia.

Moderately complex 2D/3D, dynamic graphics, multimedia.

Complex multimedia, virtual reality.

Product Factors (cont’d)

(c) USC CSSE

23

Source: (c)2012 Enterprise Strategy Group

(c) USC CSSE 25

25

Execution Time Constraint (TIME) Measures the constraint imposed upon a system in terms of the

percentage of available execution time expected to be used by the system consuming the execution time resource.

(c) USC CSSE 26

Very Low Low Nominal High Very High Extra High

TIME 50% use of available execution time 70% 85% 95%

Platform Factors

(c) USC CSSE

26 http://www.parstream.com/product/

Source: (c)2012 Enterprise Strategy Group

Main Storage Constraint (STOR) Measures the degree of main storage constraint imposed on a

software system or subsystem.

(c) USC CSSE 28

Very Low Low Nominal High Very High Extra High

STOR 50% use of available storage 70% 85% 95%

(c) USC CSSE

28

Platform Factors

The largest big data practitioners – Google, Facebook, Apple, etc – run what are known as hyper scale computing environments.

The key requirements of big data storage are that:

Must be capable of handling large volumes of data

Must be scalable to growth

Must provide the input/output operations per second (IOPS) to deliver data to analytic tools

(c) USC CSSE

29

Big Data Storage

Analyst Capability (ACAP) Analysts work on requirements, high level design and detailed design.

Consider analysis and design ability, efficiency and thoroughness, and the ability to communicate and cooperate.

Programmer Capability (PCAP) Evaluate the capability of the programmers as a team rather than as

individuals. Consider ability, efficiency and thoroughness, and the ability to

communicate and cooperate.

(c) USC CSSE 30

Very Low Low Nominal High Very High Extra High

ACAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile

Very Low Low Nominal High Very High Extra High

PCAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile

Personnel Factors

(c) USC CSSE

30

Applications Experience (AEXP) Assess the project team's equivalent level of experience with this

type of application.

(c) USC CSSE 31

Very Low Low Nominal High Very High Extra High

AEXP 2 months 6 months 1 year 3 years 6 years

Personnel Factors (cont’d)

(c) USC CSSE

31

(c) USC CSSE 32

32

Source: (c)2012 Enterprise Strategy Group

Platform Experience (PEXP) Assess the project team's equivalent level of experience with this

platform including the OS, graphical user interface, database,

networking, and distributed middleware.

(c) USC CSSE 33

Very Low Low Nominal High Very High Extra High

PEXP 2 months 6 months 1 year 3 years 6 year

Personnel Factors (cont’d)

(c) USC CSSE

33

(c) USC CSSE 34

34

Source: (c)2012 Enterprise Strategy Group

(c) USC CSSE 35

35

Source: (c)2012 Enterprise Strategy Group

(c) USC CSSE 36 (c) USC CSSE

36

Conclusion - Scale Drivers and Big Data

Scale Drivers COCOMO II Coverage

Precedentedness (PREC) Covered

Development Flexibility (FLEX) Covered

Architecture/Risk Resolution (RESL) Covered

Team Cohesion (TEAM) Covered

Process Maturity (PMAT) Covered

(c) USC CSSE 37 (c) USC CSSE

37

Cost Drivers COCOMO II Coverage / Future Work

Reliability (RELY) Covered

Data (DATA) Need to define EXTRA HIGH Cost rating For terabytes to zettabytes data project

Complexity (CPLX) Covered but need more detail for Big Data - custom developed solution (25% of all projects)

Reusability (RUSE) Covered

Documentation (DOCU) Covered

Time constraint (TIME) Covered

Storage constraint (STOR) Covered

Platform volatility (PVOL) Covered

Conclusion - Cost Drivers and Big Data

(c) USC CSSE 38 (c) USC CSSE

38

Conclusion - Cost Drivers and Big Data Cost Drivers COCOMO II Coverage / Future Work

Analyst capability (ACAP) Covered

Program capability (PCAP) Covered

Applications experience (APEX) Covered

Platform experience (PLEX) Covered

Language and tool experience (LTEX) Covered

Personnel continuity (PCON) Covered

Software tools (TOOL) Covered

Multisite development (SITE) Covered

Required schedule (SCED) Covered

(c) USC CSSE 39 (c) USC CSSE

39

Reference Barry W. Boehm, et al (2000), Software Cost Estimation With COCOMO II,

Prentice Hall, New Jersey.

Barry W. Boehm (1981), Software Engineering Economics , Prentice Hall, New Jersey.

McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity , June 2011 (www.mckinsey.com/mgi)

Zikopoulos, P., and Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data, McGraw-Hill Osborne Media.

Enterprise Strategy Group, Research Report : The Convergence of Big Data Processing and Integrated Infrastructure

http://en.wikipedia.org/wiki/Big_data