FOR0383 Software Quality Assurance

18
16/06/22 Dr Andy Brooks 1 FOR0383 Software Quality Assurance Lecture 2 ESA Ariane 5 Rocket Flight 501

description

FOR0383 Software Quality Assurance. Lecture 2 ESA Ariane 5 Rocket Flight 501. 4 June 1996. at ~40 seconds into launch at an altitude of ~3700m the launcher veered off path and began to break up the self-destruct system was triggered ~$500 million (uninsured, maiden flight) - PowerPoint PPT Presentation

Transcript of FOR0383 Software Quality Assurance

Page 1: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 1

FOR0383 Software Quality Assurance

Lecture 2

ESA Ariane 5 Rocket Flight 501

Page 2: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 2

4 June 1996

• at ~40 seconds into launch

• at an altitude of ~3700m

• the launcher veered off path and began to break up

• the self-destruct system was triggered

• ~$500 million (uninsured, maiden flight)

• the launcher was unmanned

Page 3: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 3

Board of Inquiry

• what was the cause of failure?

• was appropriate testing undertaken?

• what corrective actions should there be?

• the report by the Board of Inquiry was completed in less than 6 weeks

Page 4: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 4

Weather conditions

• the weather was acceptable• there was no risk of lightning• but visibility had worsened for a time• the launch was delayed by about 1hr

The Challenger Space Shuttle disaster was partly due to the weather. Overnight conditions at the launch pad had been extremely cold which meant the O-rings on the booster rockets were brittle and prone to fracture.

Page 5: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 5

Briefly• nominal behaviour of the launcher until H0 +

36 seconds• the backup Inertial Reference System fails• the active Inertial Reference System fails

– after the backup

• all the rocket nozzles are swivelled into extreme positions

• the launcher breaks up and the self-destruct system was triggered

Page 6: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 6

Recovery of material

• debris fell back to ground, scattered over a wide area (5 x 2,5km)

• despite mangrove swamps, the two Inertial Reference Systems were recovered

• telemetry data was received on the ground• trajectory data was received from radar

stations• optical observations (camera and film)

Page 7: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 7

Unrelated Anomaly

• at H0 + 22 seconds

• variations started in the hydraulic pressure of the actuators of the main engine nozzle with a frequency of 10Hz

• “This phenomenon is significant and has not yet been fully explained, but after consideration it has not been found relevant to the failure.”

Page 8: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 8

Inertial Reference System (SRI)

• complex piece of equipment

• measures attitude and movements in space

• output transmitted to the On-Board Computer (OBC) executing the flight control program

• to improve reliability, two SRIs operated in parallel with identical hardware and software

First question to ask: how is the system backed up?...

Page 9: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 9

Equipment Redundancy

• there are two On-Board Computers

• and a number of other units in the flight control system are also duplicated

Page 10: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 10

So, what really happened?

• the OBC received incorrect data • the SRI had declared a failure due to a

software exception (Operand Error)• a data conversion from a 64-bit floating point

was too large for the target 16-bit signed integer value

• this particular data conversion was not protected

Page 11: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 11

…Different Trajectory

• the operand error occurred because Ariane 5 built up a horizontal velocity much more quickly than Ariane 4– Ariane 5 built up horizontal velocity five times

more quickly than Ariane 4

• the failure context was precisely determined from memory readouts from the recovered SRIs

Page 12: FOR0383 Software Quality Assurance

Ariane family

20/04/23 Dr Andy Brooks 12

Page 13: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 13

…No useful purpose• the software module which generated the

exception served no useful purpose after launch!

• simply re-used from Ariane 4

“Effective reuse requires design by contract. Without a precise specification attached to each reusable component - precondition, postcondition, invariant - no one can trust a supposedly reusable component. Without a specification, it is probably safer to redo than to reuse.”Jean-Merc Jézéquel and Betrand Mayer, IEEE Computer, January 1997 p130

Page 14: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 14

Unprotected variables?• 3 variables were unprotected “because a maximum

workload target of 80% had been set for the SRI computer”– remember, this is a real-time system

• the justification was not given in source code• the reasoning was that variables were either physically

limited or there was a large safety margin– this was true for Ariane 4

• the decision to protect some but not all of the variables was taken jointly by project partners

Page 15: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 15

The specification of exception-handling contributed to the failure.

• the failure should be indicated on the databus– the OBC interpreted the diagnostic data it was sent as

valid data, causing the nozzle deflections

– remember, the backup SRI failed first• the failure context should be stored in EEPROM

memory• the SRI processor should be shut down• this approach addressed random hardware failures

Page 16: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 16

Testing• no test was performed to verify that the SRI would

behave correctly when subject to the count-down and trajectory of Ariane 5

• the SRI specification did not contain Ariane 5 trajectory data as a functional requirement

“It would have been technically feasible to include almost the entire inertial reference system in the overall system simulations which were performed. For a number of reasons it was decided to use the simulated output of the inertial reference system, not the system itself or its detailed simulation. Had the system been included, the failure could have been detected.”

Page 17: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 17

Recommendations

R1 … no software function should run during flight unless it is needed

R2 … test facility must include as much real equipment as possible… Complete simulations must take place...

R3 … do not allow sensors to stop sending best effort data

Page 18: FOR0383 Software Quality Assurance

20/04/23 Dr Andy Brooks 18

… more RecommendationsR5 review all flight software… identify all implicit

assumptions

R9 include external participants when reviewing specifications, code and justification documents (someone with a fresh mind can sometimes easily spot mistakes that the authors miss)

R14 provide more transparent organisation of co-operation among partners