COCOMO II and Big Data - CSSE | Center for Systems and Software...
Transcript of COCOMO II and Big Data - CSSE | Center for Systems and Software...
COCOMO II and Big Data
Rachchabhorn Wongsaroj*, Jo Ann Lane,
Supannika Koolmanojwong, Barry Boehm
*Bank of Thailand and Center for Systems and Software Engineering
Computer Science Department , Viterbi School of Engineering
University of Southern California
28th International Forum on COCOMO and System/Software Cost Modeling
Outline
Big Data Concept
COCOMO II Cost factor
COCOMO II Cost factor and Big Data
Future Works
(c) USC CSSE
2
Source: IBM
Big Data concept
3
3V’s concepts of Big Data (IBM)
Volume -- The amounts of data generated
Variety -- The different data types and sources
Velocity -- The speed of data is generated in/out and moves around
Big Data
Datasets whose size are beyond the ability of typical database software tools to capture, store, manage, and analyze (McKinsey Global Institute)
Volume People to People People to
Machine
Machine to Machine
Variety
Velocity 8 Billion
messages/day 845M active users
340Million
Tweets/day 140M active users
20 Hours of
video uploaded every minute
Source: IBM
4
Big Data concept
Lots of data is being created &
collected
World interconnection
Data
Quantity
Data
Quality
Data
Variety
Data
Timely
Big Data problems
7
COCOMO II
product size estimate
product, process, platform, and personnel attributes
reuse, maintenance, and increment parameters
organizational project data
development, maintenance cost and schedule estimates
cost, schedule distribution by phase, activity, increment
recalibration to organizational data
COCOMO Black Box Model
8
(c) USC CSSE
COCOMO II – Cost factor
9
(c) USC CSSE
Significant factors of development cost: scale drivers are sources of exponential effort variation cost drivers are sources of linear effort variation
product, platform, personnel and project attributes effort multipliers associated with cost driver ratings
Each factor is rated between very low and very high per rating guidelines
Precedentedness (PREC) Degree to which system is new and past experience applies
Development Flexibility (FLEX) Need to conform with specified requirements
Architecture/Risk Resolution (RESL)
Degree of design thoroughness and risk elimination
Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict
Process Maturity (PMAT) SEI CMM process maturity rating
(c) USC CSSE 10
Scale Drivers
(c) USC CSSE
10
(c) USC CSSE 11 (c) USC CSSE
11
Precedentedness (PREC) Degree to which system is new and past experience applies
Development Flexibility (FLEX) Need to conform with specified requirements
Architecture/Risk Resolution (RESL)
Degree of design thoroughness and risk elimination
Team Cohesion (TEAM) Need to synchronize stakeholders and minimize conflict
Process Maturity (PMAT) SEI CMM process maturity rating
Scale Drivers
(c) USC CSSE 12
Scale Factors (Wi) Very Low Low Nominal High Very High Extra High
Precedentedness (PREC)
thoroughly unprecedented
largely unprecedented
somewhat unprecedented
generally familiar
largely familiar
throughly familiar
Development Flexibility (FLEX)
rigorous occasional relaxation
some relaxation
general conformity
some conformity
general goals
Architecture/Risk Resolution (RESL)*
little (20%) some (40%) often (60%) generally (75%)
mostly (90%)
full (100%)
Team Cohesion (TEAM)
very difficult interactions
some difficult interactions
basically cooperative interactions
largely cooperative
highly cooperative
seamless interactions
Process Maturity (PMAT)
Weighted average of “Yes” answers to CMM Maturity Questionnaire
* % significant module interfaces specified, % significant risks eliminated
(c) USC CSSE
12
Scale Drivers
Elaboration of the PREC rating scales:
(c) USC CSSE 13
Feature
Very Low
Nominal / High Extra High
Precedentedness
Organizational understanding of product objectives
General
Considerable
Thorough
Experience in working with related software systems
Moderate
Considerable
Extensive
Concurrent development of associated new hardware and operational procedures
Extensive
Moderate
Some
Need for innovative data processing architectures, algorithms
Considerable
Some
Minimal
Precedentedness (PREC)
(c) USC CSSE
13
Platform Factors Time constraint (TIME)
Storage constraint (STOR)
Platform volatility (PVOL)
Personnel Factors Analyst capability (ACAP)
Program capability (PCAP)
Applications experience (APEX)
Platform experience (PLEX)
Language and tool experience (LTEX)
Personnel continuity (PCON)
Cost Drivers
14
(c) USC CSSE
Product Factors Reliability (RELY)
Data (DATA)
Complexity (CPLX)
Reusability (RUSE)
Documentation (DOCU)
Project Factors Software tools (TOOL)
Multisite development (SITE)
Required schedule (SCED)
Personnel Factors Analyst capability (ACAP)
Program capability (PCAP)
Applications experience (APEX)
Platform experience (PLEX)
Language and tool experience (LTEX)
Personnel continuity (PCON)
15
(c) USC CSSE
Product Factors Reliability (RELY)
Data (DATA)
Complexity (CPLX)
Reusability (RUSE)
Documentation (DOCU)
Cost Drivers and Big Data
Platform Factors Time constraint (TIME)
Storage constraint (STOR)
Platform volatility (PVOL)
Project Factors Software tools (TOOL)
Multisite development (SITE)
Required schedule (SCED)
Required Software Reliability (RELY) Measures the extent to which the software must perform its
intended function over a period of time.
Ask: what is the effect of a software failure?
Product Factors (cont’d)
(c) USC CSSE
16
Very Low
Low
Nominal
High
Very High
Extra High
RELY Descriptors
slight
inconvenience
low, easily
recoverable
losses
moderate,
easily
recoverable
losses
high financial
loss
risk to human
life
Data Base Size (DATA) Captures the effect large data requirements have on development
to generate test data that will be used to exercise the program.
Calculate the data/program size ratio (D/P):
)(Program
)(
SLOCSize
ByteszeDataBaseSi
P
D
Very Low Low Nominal High Very High Extra High
DATA DB bytes/ Pgm SLOC < 10 10 D/P < 100 100 D/P < 1000 D/P > 1000
Product Factors (cont’d)
(c) USC CSSE
18
IBM: Data Base Size of
Big Data -> Scale from
terabytes to zettabytes
Product Complexity (CPLX) Complexity is divided into five areas:
control operations,
computational operations,
device-dependent operations,
data management operations, and
user interface management operations.
Select the area or combination of areas that characterize the product or a sub-system of the product.
(c) USC CSSE
21
Product Factors (cont’d)
Module Complexity Ratings vs. Type of Module Use a subjective weighted average of the attributes, weighted by their relative
product importance.
Very Low Low Nominal High Very High Extra High
Control Operations
Straightline code with a few non-nested structured programming operators: DOs, CASEs, IFTHENELSEs. Simple module composition via procedure calls or simple scripts.
Straightforward nesting of structured programming operators. Mostly simple predicates.
Mostly simple nesting. Some intermodule control. Decision tables. Simple callbacks or message passing, including middleware-supported distributed processing.
Highly nested structured programming operators with many compound predicates. Queue and stack control. Homogeneous, dist. processing. Single processor soft real-time ctl.
Reentrant and recursive coding. Fixed-priority interrupt handling. Task synchronization, complex callbacks, heterogeneous dist. processing. Single-processor hard real-time ctl.
Multiple resource scheduling with dynamically changing priorities. Microcode-level control. Distributed hard real-time control.
Computational Operations
Evaluation of simple expressions: e.g., A=B+C*(D-E)
Evaluation of moderate-level expressions: e.g., D=SQRT(B**2-4.*A*C)
Use of standard math and statistical routines. Basic matrix/vector operations.
Basic numerical analysis: multivariate interpolation, ordinary differential eqns. Basic truncation, roundoff concerns.
Difficult but structured numerical analysis: near-singular matrix equations, partial differential eqns. Simple parallelization.
Difficult and unstructured numerical analysis: highly accurate analysis of noisy, stochastic data. Complex parallelization.
(c) USC CSSE
22
Product Factors (cont’d)
Very Low Low Nominal High Very High Extra High
Device-dependent Operations
Simple read, write statements with simple formats.
No cognizance needed of particular processor or I/O device characteristics. I/O done at GET/PUT level.
I/O processing includes device selection, status checking and error processing.
Operations at physical I/O level (physical storage address translations; seeks, reads, etc.). Optimized I/O overlap.
Routines for interrupt diagnosis, servicing, masking. Communication line handling. Performance-intensive embedded systems.
Device timing-dependent coding, micro-programmed operations. Performance-critical embedded systems.
Data Management Operations
Simple arrays in main memory. Simple COTS-DB queries, updates.
Single file subsetting with no data structure changes, no edits, no intermediate files. Moderately complex COTS-DB queries, updates.
Multi-file input and single file output. Simple structural changes, simple edits. Complex COTS-DB queries, updates.
Simple triggers activated by data stream contents. Complex data restructuring.
Distributed database coordination. Complex triggers. Search optimization.
Highly coupled, dynamic relational and object structures. Natural language data management.
User Interface Management
Simple input forms, report generators.
Use of simple graphic user interface (GUI) builders.
Simple use of widget set.
Widget set development and extension. Simple voice I/O, multimedia.
Moderately complex 2D/3D, dynamic graphics, multimedia.
Complex multimedia, virtual reality.
Product Factors (cont’d)
(c) USC CSSE
23
Execution Time Constraint (TIME) Measures the constraint imposed upon a system in terms of the
percentage of available execution time expected to be used by the system consuming the execution time resource.
(c) USC CSSE 26
Very Low Low Nominal High Very High Extra High
TIME 50% use of available execution time 70% 85% 95%
Platform Factors
(c) USC CSSE
26 http://www.parstream.com/product/
Main Storage Constraint (STOR) Measures the degree of main storage constraint imposed on a
software system or subsystem.
(c) USC CSSE 28
Very Low Low Nominal High Very High Extra High
STOR 50% use of available storage 70% 85% 95%
(c) USC CSSE
28
Platform Factors
The largest big data practitioners – Google, Facebook, Apple, etc – run what are known as hyper scale computing environments.
The key requirements of big data storage are that:
Must be capable of handling large volumes of data
Must be scalable to growth
Must provide the input/output operations per second (IOPS) to deliver data to analytic tools
(c) USC CSSE
29
Big Data Storage
Analyst Capability (ACAP) Analysts work on requirements, high level design and detailed design.
Consider analysis and design ability, efficiency and thoroughness, and the ability to communicate and cooperate.
Programmer Capability (PCAP) Evaluate the capability of the programmers as a team rather than as
individuals. Consider ability, efficiency and thoroughness, and the ability to
communicate and cooperate.
(c) USC CSSE 30
Very Low Low Nominal High Very High Extra High
ACAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile
Very Low Low Nominal High Very High Extra High
PCAP 15th percentile 35th percentile 55th percentile 75th percentile 90th percentile
Personnel Factors
(c) USC CSSE
30
Applications Experience (AEXP) Assess the project team's equivalent level of experience with this
type of application.
(c) USC CSSE 31
Very Low Low Nominal High Very High Extra High
AEXP 2 months 6 months 1 year 3 years 6 years
Personnel Factors (cont’d)
(c) USC CSSE
31
Platform Experience (PEXP) Assess the project team's equivalent level of experience with this
platform including the OS, graphical user interface, database,
networking, and distributed middleware.
(c) USC CSSE 33
Very Low Low Nominal High Very High Extra High
PEXP 2 months 6 months 1 year 3 years 6 year
Personnel Factors (cont’d)
(c) USC CSSE
33
(c) USC CSSE 36 (c) USC CSSE
36
Conclusion - Scale Drivers and Big Data
Scale Drivers COCOMO II Coverage
Precedentedness (PREC) Covered
Development Flexibility (FLEX) Covered
Architecture/Risk Resolution (RESL) Covered
Team Cohesion (TEAM) Covered
Process Maturity (PMAT) Covered
(c) USC CSSE 37 (c) USC CSSE
37
Cost Drivers COCOMO II Coverage / Future Work
Reliability (RELY) Covered
Data (DATA) Need to define EXTRA HIGH Cost rating For terabytes to zettabytes data project
Complexity (CPLX) Covered but need more detail for Big Data - custom developed solution (25% of all projects)
Reusability (RUSE) Covered
Documentation (DOCU) Covered
Time constraint (TIME) Covered
Storage constraint (STOR) Covered
Platform volatility (PVOL) Covered
Conclusion - Cost Drivers and Big Data
(c) USC CSSE 38 (c) USC CSSE
38
Conclusion - Cost Drivers and Big Data Cost Drivers COCOMO II Coverage / Future Work
Analyst capability (ACAP) Covered
Program capability (PCAP) Covered
Applications experience (APEX) Covered
Platform experience (PLEX) Covered
Language and tool experience (LTEX) Covered
Personnel continuity (PCON) Covered
Software tools (TOOL) Covered
Multisite development (SITE) Covered
Required schedule (SCED) Covered
(c) USC CSSE 39 (c) USC CSSE
39
Reference Barry W. Boehm, et al (2000), Software Cost Estimation With COCOMO II,
Prentice Hall, New Jersey.
Barry W. Boehm (1981), Software Engineering Economics , Prentice Hall, New Jersey.
McKinsey Global Institute, Big data: The next frontier for innovation, competition, and productivity , June 2011 (www.mckinsey.com/mgi)
Zikopoulos, P., and Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data, McGraw-Hill Osborne Media.
Enterprise Strategy Group, Research Report : The Convergence of Big Data Processing and Integrated Infrastructure
http://en.wikipedia.org/wiki/Big_data