OPS Forum Embracing the future - a retrospective look 05.09.2008

27
Embracing the future - a retrospective look Michael Jones OPS-G Forum 5 th September 2008

description

Preparing for the future in ESA's Operations Directorate is more important than ever. A review of how we've handled change in the past to help better prepare for future transformations. Preparing for the future in ESA's Operations Directorate is more important than ever. In particular, we must prepare for changes in ESOC's workload during 2012-18 and cope with organisational changes such as financial reform happening at the Agency level. Coping with change is nothing new to ESOC – this ability has been there from the beginning. The speaker has at various points in his career been involved in transformations and he has selected four subjects for this forum, all of which bring useful lessons for the future.

Transcript of OPS Forum Embracing the future - a retrospective look 05.09.2008

Page 1: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the future - a retrospective look

Michael JonesOPS-G Forum

5th September 2008

Page 2: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD2

Contents

1. “Governance”

2. Change can be slower than you think!

3. FFP contracts – the magic bullet?

4. The Black Swan: the improbable in operations

5. Software Dependability

Page 3: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD3

Governance

“The use of institutions, structures of authority and even collaboration to allocate resources and coordinate or control activity [in society or the economy].” (Wikipedia)

Page 4: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD4

Governance in the SSA Programme

What is the SSA Programme? Provides a systematic capability for surveillance of man-made

objects in the space around the earth; provides warnings of collisions that may endanger space activities

or even life on earth.

Governance = making decisions on how the programme and deployed assets are to be run.

Page 5: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD5

Data Systems Governance: Data Systems Task Force

The Data Systems Task Force must :

“Ensure the availability of adequate strategy and plans for the mission data infrastructure and monitor the execution of those plans in order to ensure timely availability of the mission data infrastructure.”

This means that the DSTF in effect carries out governance of the data systems infrastructure.

Page 6: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD6

Conclusions on Governance

Governance is a buzz word that you will continue to hear!

Recent or emerging examples are: Establishment of the ESA Security Office; Software licence governance for ESA and Third Party Software.

Mike Jones’s proposed definition of “governance” to fit its usage in ESA:

“The process of making decisions, the oversight of the results of those decisions and also the oversight of organisations or structures of authority for decision making.”

Page 7: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD7

Change can be slower than you think: Example 1: SCOS-2000 SCOS-2 (which became SCOS-2000) was a new MCS infrastructure developed from

scratch.

A very brief summary of the timeline of the project up to 2002:

Start of project as SCOS-2: 1992; Version 1 (as SCOS-2) used for Huygens, MTP and Teamsat: late 1997; Re-engineering of SCOS-2 (mainly TC chain): 1997-1998; Parallel production of architectural designs for both SCOS-1 and SCOS-2 baselines for the

Integral MCS: 2nd half of 1998; Adoption of SCOS-2 as the Integral MCS baseline: January 1999;

Integral was the first major ESA science spacecraft based on SCOS-2; SCOS-2 renamed SCOS-2000: 2000; Supported INTEGRAL LEOP: 17th October 2002, using SCOS-2000 rel. 2.3.

So it took 10 years to reach the point at which the new infrastructure became generally accepted – original plan was 5 years.

Page 8: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD8

Change can be slower than you think: Example 1: SCOS-2000 - Conclusion

Developing a new mission control system infrastructure from scratch is difficult and time consuming.

First lesson – try to avoid building new MCS infrastructures – “evolution, not revolution”.

Second lesson– if you have to do it, develop a simple version first.

Page 9: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD9

Change can be slower than you think : Example 2: Intel/Linux MCS Infrastructure In 2001, it was decided to port SCOS-2000 to Linux.

Straightforward: by 2002 a SCOS-2000 version was available which could run on either SUN Solaris platforms or on LINUX.

Outside ESOC, S2K became popular as a licensable product and, with one exception, external (non-ESOC) projects using SCOS-2000 have been based on Linux.

At ESOC, the move to the Linux version proceeded cautiously in two stages:

1. a pilot project with Linux server and SUN clients (Herschel Planck, S2K rel. 4); 2. a Linux transition project to install Linux clients in all the common areas.

Stage 1 was successfully completed ca. 2006. Stage 2, started in 2007, has been completed for the MCR.

Intel workstations for the remaining common areas will be procured this year together with a reserve of spares.

We are now aiming at supporting Herschel Planck LEOP using the new Linux infrastructure installed by the LIT project.

Page 10: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD10

Change can be slower than you think: Conclusion – Linux

It has taken more than 6 years to reach the point of having a common Intel/Linux infrastructure.

Where you have a large installed park of workstations (ca. 1400 in this case) change is quite slow, since the missions already installed on the old platforms will not want to, or be able to, change.

Page 11: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD11

FFP Contracts – the Magic Bullet? Before1996 most software development at ESOC was done under fixed-unit price

conditions.

Implications: ESOC “owned the risk” for the software requirements and their implementation. The contractor companies took no responsibility - they simply provided man-hours of staff.

In 1996 firm-fixed price (FFP) contracts for development of spacecraft control systems and simulators were introduced with the new frame contracts.

Prime motivation: Move contract staff off-site to their own companies’ premises; FFP regime much more suitable for off-site work.

FFP became the rule for most work awarded under these frame contracts, achieving: Far more rigorous scrutiny of requirements by frame contractors; Better competition; Equitable risk sharing between ESA and its suppliers; Formal change control (contract change notices - CCNs).

Page 12: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD12

FFP Contracts – the Magic Bullet?

Firm-fixed price contracts have been rather successful for MCS, simulator and station back-end software.

But:

Firm Fixed Price does not meanFirm Fixed Schedule!

Contractor can underestimate the work to be done.

Recent example: Herschel Planck MPS, where the cheapest offer was taken and the contractor had underestimated the budget by a factor of nearly 10.

Page 13: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD13

FFP Contracts – The Magic Bullet? Conclusions

1. The lowest acceptable offer may not always be the right choice, particularly if the schedule is important.

A careful evaluation of management plan and technical solution is needed to ensure that the schedule can be met.

2. For schedule-critical developments, a look at more sophisticated techniques such as Earned Value Analysis may be needed.

Page 14: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD14

Black Swans

“Black Swan” - title of a book by Nassim Nicholas Taleb.

A black swan is a large-impact, hard-to-predict, and rare event beyond the realm of normal expectations.

Comes from ancient Western conception that 'All swans are white'.

Page 15: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD15

Black Swan: The Turkey Example

A turkey before and after thanksgiving. A history of a process over 1000 days tells you nothing about what will happen next

Page 16: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD16

The Black Swan: How we deal with the unexpected in operations

Try to make operations fully predictable: Plan and prepare ground segments very carefully. Technically validate them thoroughly. Prepare procedures and plans for operations. Operationally validate extensive simulations programme

aimed at training all the teams and ensuring systems, documentation and operations staff all work together.

The operations validation also includes contingencies or anomaly cases to ensure the unexpected can be handled.

This is the discipline of Operations Engineering.

Page 17: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD17

The Black Swan and Software

ESOC uses systems containing lots of software.

In the real world much software is complex -

no single person can understand it completely.

“Complex” in this case means “Big” - complexity varies as a power of the size.

Behaviour of any complex software system cannot be fully understood - highly improbable or “black swan” events may occur.

Page 18: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD18

Black Swan: The MCS Incident during the MSG-1 LEOP (28th August 2002) A number of the client workstations in the MCR, PSR and SSR suddenly

became unusable - went to the SUN login.

Softcoor logged into the A server from the SSR and restarted the system.

This appeared to work, but then a SCOS-2000 communications task stopped processing on the server; two telecommanding tasks (multiplexer and releaser) crashed.

Attempts to switch clients to the redundant B server also failed.

Fortunately in the meantime the spacecraft was safe - despite the problems with the clients, telemetry was received and processed

on both A and B servers.

Softcoor then took the decision to move to a third chain, the C-system. He was then able to logout all clients on the A and B chains and to restart

the servers on both of them.

The systems were made available to the flight control and project teams about 20 minutes later.

Page 19: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD19

Black Swan: MSG-1 Diagnosis and Conclusions

Diagnosis: The server had been started as foreground task remote from a SUN WS in

the SSR – this created a dependency between the server and the SSR SUN WS.

For reasons unknown, this SUN had a problem and went to “login” status, resulting in the stopping of the server tasks started directly from this SUN.

There was an implementation error in the MISCdynamic server relating to CORBA event processing.

Problem resolution:1. Start the server as a background task. 2. Correct one CORBA call in the MISCdyn server.

A full explanation of everything that happened was not possible - for example, why the SSR SUN went to “login” in the first place - since the logs were inadequate.

Page 20: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD20

MSG-1 Incident: Discussion

Problems in complex software cannot be excluded.

ESOC approach is very practical and sound: a software coordinator thoroughly familiar with the system; assisted by a very qualified software support team;

both fully involved in the sim campaign; Ensured quick recovery in MSG case.

An operations engineering technique is applied to software engineering.

Page 21: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD21

Software Dependability

“Software dependability” seeks to quantify how much we can rely on a software system to function as required.

However, it is impossible with any reasonable effort to ensure there are no errors in a large software system, e.g. SCOS-2000, which comprises several millions of lines of

software code written since the mid-1990s.

There is a widespread misapprehension that it is possible to quantify the errors in computer code.

Page 22: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD22

Software Dependability: Example 1 – Misunderstanding Software Bugs

“Even if the tools are better, the number of bugs in newly written code has remained constant at around five per “function point”. . . Worse,. . . only about 85% of these bugs are eliminated before software is put into use.” [my underlining] (Economist Technology Quarterly, March 6, 2008)

You can measure the number of bugs found before putting the software into use;

But you cannot know how many bugs remain, unless the software is very simple.

Page 23: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD23

Black Swan: Example 2 – Misunderstanding Software Bugs

It is impossible to demonstrate a negative proposition such as this:

e.g. no run-time errors.

Absence of evidence is not evidence of absence.

“The supplier shall verify the software code ensuring: . . .

7. absence of run-time errors;

8. absence of memory leaks . . .” (Source: ECSS-E-40C)

Page 24: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD24

Black Swan: Conclusion on Example 2

There are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know. Donald Rumsfeld

U.S. Secretary of Defense,

2001 to 2006

Page 25: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD25

Software Dependability: Example 3 – Software Criticality

ECSS-E-40C puts tailoring according to software criticality in a normative annex. For example the standard requires 100% path coverage testing for Class B criticality software.

Critique: 100% coverage testing is, in practice, impossible for very complex systems; Even if you ensure 100% coverage testing, there is still no guarantee that the software is

free from error.

Discussion: For on-board software it is reasonable to take quite heavy measures in development are taken to

ensure dependable software. ECSS-E-40C

Shows a very strong influence from on board development practice. Does not take into account the impacts for ground software which typically are much bigger and more complex.

Page 26: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD26

Final Conclusions

You can Have good governance; Develop your ground systems in careful way, piloting new

technology and taking plenty of time; Ensure our industrial partners are fully motivated via

competitive firm-fixed price contracts; But you can still be hit by unexpected problems in

operations, especially in complex software.

The way to successfully tackle these unpredictable anomalies or incidents is to have a skilled team, fully familiar with the software and fully involved in the sims campaign.

Page 27: OPS Forum Embracing the future - a retrospective look 05.09.2008

Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD27

Questions?