Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

17
Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00

Transcript of Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Page 1: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Software Reliability:The “Physics” of “Failure”

SJSU ISE 297

Donald Kerns

7/31/00

Page 2: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Thesis:

The field of software engineering is a very complex mix of technology, management and human psychology.

Glib usage of the phrase “software reliability” implies gross simplifications that make the measurement useless in all but the most closely defined situations.

A significant increase in the sophistication of the general field of software engineering will be necessary before true measures of “software reliability” are meaningful.

Page 3: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Software is not monolithic, yet the literature treats it as such. Different types of software have different failure modes and consequences…

Page 4: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Embedded software

• Does your Furby have bugs?– Microwave? Car?

• Well defined applications• Little data• Tightly constrained resources in the

delivered product drive...• High technical complexity

Page 5: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Batch/Database Driven Software

•Industrial scale applications

•Highly data intensive

•Usually low % of functionality is user interaction

•Once running, most apparent “software defects” are actually defective data or business rules

Page 6: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

User interactive

•Usually event driven•Defects include:

•broken functionality •behavior different than user expectation•lack of interoperability with other software

Page 7: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Usage of the word “reliability” implies defects and failure, but the community (much less the literature)

has yet to settle on what exactly constitutes a software failure. Most only use the first type.

• Catastrophic failures

• Functional failure

• Poor performance

• Wrong answers

• Does not conform to user expectations

All are “failures” yet the standard reliability measuresare at a loss to evaluate the different consequences.

Page 8: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Frequently, software is just the most visible element of a complex system. Almost all system defects start

out appearing as software errors.

Page 9: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Example: “Fire Bay #1”

B ay # 2 B ay # 1

B a y # 2 w irin g B a y # 1 w irin g

Normal configuration: 2 satellite bus

B ay # 1

B a y # 2 w irin g B a y # 1 w irin g

“Cost saving” configuration: Bigger satellite bus

Separation failure was a software defect only if the softwarehad been modified to fire bay 1 through the bay 2 wiring and didn’t.

Page 10: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

More examples

• The system isn’t getting the signal, we must have a software defect!– Is the system configured to scan that part of the spectrum?

– Is the system configured to report the signal during that portion of the target identification?

– Is the system configured to report signals of that priority?

• The Built In Test software is reporting a failed component, we must have a software defect!– No, by reporting a failed component the software is functioning

CORRECTLY.

– It is the component that has a defect.

Page 11: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Does software age?No, but the behavior of software depends on the environment that it is executing in

and the environment may degrade.• Changes in environment may reveal failure modes that have lain

dormant for the life of the software.if (strcmp(compiler, “Visual C++”))

do_compile_things();

elseif (strcmp(compiler, “Borland C++”))

crash_in_flames();

• Common environment changes:– Change in system configuration (OS, hardware, applications).

– Increased processor loading due to above.

– Decreased available memory due to above.

– Increased network traffic due to growth.

– Intentional or non-intentional self-modifying code.

Page 12: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

“Does not meet customer expectations” is considered a software

defect, however there is almost always a mismatch between customer expectations and the economics of the

situation.

• Windows normally ships with 10,000s of defects. Would you pay 10x as much for 10x fewer defects?

• Heretical thought:– The methods for producing 80-90% defect free software have been

known since the late 1960s (inspections, formal requirements, design and test).

– Why aren’t they being used?

– The field of software engineering is a very complex mix of technology, management and human psychology.

Page 13: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Finally, even if customer expectations are clearly documented at the beginning of a software development, and properly executed during

implementation, the installation of that software system is a significant change to the environment

that developed those expectations. This yields new expectations.

• “Well, since that data is now on the computer we should be able to…”– Share it with our other systems.

– Work on it with spreadsheets

– Put it on the web

– Share it with our Aunt Sally

• “What do you mean that costs more? The software is defective. Fix it!”

Page 14: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

The software AND customer communities will need to address all of these issues in a formal,

comprehensive, and consistent manner before the phrase “software reliability” has meaning.

Page 15: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

SEI S/W Capability Maturity Model• 1) Initial. The software process is characterized as ad hoc, and occasionally

even chaotic. Few processes are defined, and success depends on individual effort and heroics.

• 2) Repeatable. Basic project management processes are established to track cost, schedule, and functionality. The necessary process discipline is in place to repeat earlier successes on projects with similar applications.

• 3) Defined. The software process for both management and engineering activities is documented, standardized, and integrated into a standard software process for the organization. All projects use an approved, tailored version of the organization's standard software process for developing and maintaining software.

• 4) Managed. Detailed measures of the software process and product quality are collected. Both the software process and products are quantitatively understood and controlled.

• 5) Optimizing. Continuous process improvement is enabled by quantitative feedback from the process and from piloting innovative ideas and technologies.

Page 16: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Software “maturity”

Page 17: Software Reliability: The “Physics” of “Failure” SJSU ISE 297 Donald Kerns 7/31/00.

Four years of consistent effort...