Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel...

15
Better Better Reliability Reliability With Software With Software Reliability Reliability Engineering Engineering Russel D’Souza Russel D’Souza

Transcript of Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel...

Page 1: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Achieving Better Achieving Better Reliability With Reliability With

Software Software Reliability Reliability

EngineeringEngineering

Russel Russel D’SouzaD’Souza

Page 2: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

The Software ProblemThe Software Problem

  Customers Demand :-Customers Demand :-   More reliable softwareMore reliable software   Faster productsFaster products   Cheaper productsCheaper products

Success or Failure in meeting demands affects Success or Failure in meeting demands affects Market shareMarket share ProfitabilityProfitability Demands conflict, Causing risk and Demands conflict, Causing risk and

Overwhelming pressure Overwhelming pressure 

Page 3: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Problem AggravatedProblem Aggravated

Software Glitches Cause:-Software Glitches Cause:- Loss of Competitive position & Market shareLoss of Competitive position & Market share Poor Quality products, High costs of defectsPoor Quality products, High costs of defects Lack of Security/Fault protectionLack of Security/Fault protection Loss of Consumer ConfidenceLoss of Consumer Confidence Poor Quality products and Slow response to Poor Quality products and Slow response to

Consumer’s needsConsumer’s needs Unsatisfactory return on Software Unsatisfactory return on Software

InvestmentInvestment

Page 4: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Real World ProblemsReal World Problems

Defective Software cost Industry $175 Defective Software cost Industry $175 billion in Y2Kbillion in Y2K

Loss Of a Single Network cell costs Loss Of a Single Network cell costs $18K per minute of downtime$18K per minute of downtime

More than 110 million Computers are More than 110 million Computers are online Connected via Internet and are online Connected via Internet and are prone to Virus attacks and Defectsprone to Virus attacks and Defects

>90% of Institutions reported Insider >90% of Institutions reported Insider abuse of Network Access in year 2000abuse of Network Access in year 2000

Page 5: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

The Solution – Software The Solution – Software Reliability Engineering (SRE)Reliability Engineering (SRE)

►Reduces or Eliminates Defects from Reduces or Eliminates Defects from SoftwareSoftware

►Designs Software for Reliability, Fault Designs Software for Reliability, Fault tolerance, Rapid fault recoverytolerance, Rapid fault recovery

►Maximizes use of proven SRE modelsMaximizes use of proven SRE models►Applies existing Statistical models to Real Applies existing Statistical models to Real

world Software Environmentsworld Software Environments►Adds and Integrates SRE with other Good Adds and Integrates SRE with other Good

processes and practices without Replacing processes and practices without Replacing themthem

Page 6: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

What is SRE ?What is SRE ? A Sub-discipline of Software engineering based on A Sub-discipline of Software engineering based on

Solid body of Theory that includes Operational Solid body of Theory that includes Operational profiles, Random process software reliability profiles, Random process software reliability models, Statistical estimation and Sequential models, Statistical estimation and Sequential sampling theorysampling theory

Works by quantitatively characterizing the Works by quantitatively characterizing the operational behavior of software based systemsoperational behavior of software based systems

Based on two fundamental ideas:-Based on two fundamental ideas:- Deliver desired functionality for a product Deliver desired functionality for a product

efficiently by quantitatively characterizing the efficiently by quantitatively characterizing the expected use of product, precisely focusing expected use of product, precisely focusing resources in most used and critical functionsresources in most used and critical functions

Make testing realistically represent field conditions Make testing realistically represent field conditions

Page 7: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

SRE Is Widely Applicable SRE Is Widely Applicable 

Technically speaking, you can apply SRE to any Technically speaking, you can apply SRE to any software-based product, beginning at start of any software-based product, beginning at start of any release cycle.release cycle.

  Economically speaking, the complete SRE process Economically speaking, the complete SRE process may be impractical for small components (involving may be impractical for small components (involving perhaps less than 2 staff months of effort), unless perhaps less than 2 staff months of effort), unless used in a large number of products.  used in a large number of products. 

  Independent of development technology and Independent of development technology and platformplatform

  SRE requires no changes in architecture, design, or SRE requires no changes in architecture, design, or code - but it may suggest changes that would be code - but it may suggest changes that would be beneficial. beneficial. 

Page 8: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

The SRE ProcessThe SRE Process

Page 9: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

List Associated SystemsList Associated Systems Lists all systems associated with the product Lists all systems associated with the product

that must be tested independently and are that must be tested independently and are of two types – Base product & variations and of two types – Base product & variations and Super-systemsSuper-systems

Develop Operational ProfilesDevelop Operational Profiles An Operational profile is a Complete set of An Operational profile is a Complete set of

commands with their probabilities of commands with their probabilities of occurrenceoccurrence

System Testers and System engineers are System Testers and System engineers are included in this activityincluded in this activity

Testers get more in contact with the product Testers get more in contact with the product users which allows them to get a feedback users which allows them to get a feedback from the users as to what system behavior from the users as to what system behavior is acceptable, what is not and how users is acceptable, what is not and how users employ the productemploy the product

Page 10: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Define “Just Right” ReliabilityDefine “Just Right” Reliability Define what “failure” means for the productDefine what “failure” means for the product Failure is defined as any Departure of system Failure is defined as any Departure of system

Behavior in execution from user needsBehavior in execution from user needs Failure intensity is the number of Failures per Failure intensity is the number of Failures per

unit timeunit time Choose a common Measure for all failure Choose a common Measure for all failure

intensities, either Failures per some natural intensities, either Failures per some natural unit or Failures per hourunit or Failures per hour

Set the total system Failure Intensity Objective Set the total system Failure Intensity Objective (FIO) for each associated System using Field (FIO) for each associated System using Field data like Customer Satisfaction surveys data like Customer Satisfaction surveys related to measured failure intensity, or an related to measured failure intensity, or an analysis of competing products balancing analysis of competing products balancing among major quality characteristics users among major quality characteristics users need. need.

Page 11: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Prepare for TestPrepare for Test Use the Operational profiles to prepare the Test Use the Operational profiles to prepare the Test

cases and the Test procedures cases and the Test procedures Select Test cases within the Operation on a Uniform Select Test cases within the Operation on a Uniform

Basis Basis Execute TestExecute Test Allocate Test time among Feature test, Load test, Allocate Test time among Feature test, Load test,

and Regression testand Regression test• Feature tests - Interactions and Effects of the field Feature tests - Interactions and Effects of the field

environment minimizedenvironment minimized• Load tests execute Test cases simultaneously, with Load tests execute Test cases simultaneously, with

full interactions and all the effects of the Field full interactions and all the effects of the Field environmentenvironment

• Regression executes some or all feature tests and it Regression executes some or all feature tests and it is designed to reveal failures caused by faults is designed to reveal failures caused by faults introduced by program changesintroduced by program changes

Page 12: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Guiding TestGuiding Test Involves guiding the product’s system Involves guiding the product’s system

Test phase and ReleaseTest phase and Release Failure data is interpreted differently for Failure data is interpreted differently for

software we are developing and software software we are developing and software we acquire. We attempt to remove the we acquire. We attempt to remove the faults that are causing Failuresfaults that are causing Failures

For developed software, we estimate the For developed software, we estimate the FI/FIO ratio from the times of failure FI/FIO ratio from the times of failure events or the number of failures per time events or the number of failures per time interval, using reliability estimation interval, using reliability estimation programs such as CASRE (Computer programs such as CASRE (Computer Aided Software Reliability estimation)Aided Software Reliability estimation)

Page 13: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Reliability Growth TestReliability Growth Test SRE is used to Estimate and track ReliabilitySRE is used to Estimate and track Reliability Main objective of this Test is to find and remove faultsMain objective of this Test is to find and remove faults Includes Feature, Load and Regression testsIncludes Feature, Load and Regression tests• Feature test is one in which operations are executed Feature test is one in which operations are executed

separately with interactions and effects of field separately with interactions and effects of field environment minimizedenvironment minimized

• Load test on the other hand has the environment similar Load test on the other hand has the environment similar to that in actual field when carried out.. It is sub-divided to that in actual field when carried out.. It is sub-divided into two types - Acceptance test and Performance testinto two types - Acceptance test and Performance test

• Regression test is the execution of randomly selected or Regression test is the execution of randomly selected or all Feature tests after a significant change in a System all Feature tests after a significant change in a System BuildBuild

Certification TestCertification Test► Makes a Binary type decision about the Software being Makes a Binary type decision about the Software being

tested. I.e. the software is either accepted or rejectedtested. I.e. the software is either accepted or rejected► Certification test is generally used only for Load testsCertification test is generally used only for Load tests

Software Reliability Engineering (SRE) Software Reliability Engineering (SRE) Types of TestsTypes of Tests

Page 14: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

Software Reliability Models Software Reliability Models (SRM)(SRM)

Modeling techniques can be divided:-Modeling techniques can be divided:- Prediction modeling & Estimation modeling Prediction modeling & Estimation modeling

Both techniques Based on Observing and Both techniques Based on Observing and Accumulating Failure data and analyzing Accumulating Failure data and analyzing with Statistical Inferencewith Statistical Inference

Features Of A Good SRM:-Features Of A Good SRM:- Give good predictions of future failure Give good predictions of future failure

behaviorbehavior Compute useful quantitiesCompute useful quantities Be simple enough for many to useBe simple enough for many to use Be widely applicableBe widely applicable Be based on sound assumptionsBe based on sound assumptions Become and remain stableBecome and remain stable

Page 15: Achieving Better Reliability With Software Reliability Engineering Russel D’Souza Russel D’Souza.

ConclusionConclusionSRE is a field of engineering where you:-SRE is a field of engineering where you:- Design, Build, Balance Testing and other Design, Build, Balance Testing and other

Reliability improvement approaches for a Reliability improvement approaches for a software productsoftware product

Allocate Testing resources in accordance with use Allocate Testing resources in accordance with use and criticality of operationsand criticality of operations

Control the Software-Based products you Control the Software-Based products you develop, rather than the process controlling you. develop, rather than the process controlling you.

Can be confident of the reliability and availability Can be confident of the reliability and availability of the products.of the products.

Can deliver them in minimum time and cost for Can deliver them in minimum time and cost for High levels of Reliability and Availability achievedHigh levels of Reliability and Availability achieved

Thus SRE is a vital skill to possess to be Thus SRE is a vital skill to possess to be Competitive in Today’s marketplaceCompetitive in Today’s marketplace