OPS Forum Embracing the future - a retrospective look 05.09.2008
-
Upload
esaesoc-darmstadt-germany -
Category
Technology
-
view
1.096 -
download
1
description
Transcript of OPS Forum Embracing the future - a retrospective look 05.09.2008
Embracing the future - a retrospective look
Michael JonesOPS-G Forum
5th September 2008
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD2
Contents
1. “Governance”
2. Change can be slower than you think!
3. FFP contracts – the magic bullet?
4. The Black Swan: the improbable in operations
5. Software Dependability
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD3
Governance
“The use of institutions, structures of authority and even collaboration to allocate resources and coordinate or control activity [in society or the economy].” (Wikipedia)
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD4
Governance in the SSA Programme
What is the SSA Programme? Provides a systematic capability for surveillance of man-made
objects in the space around the earth; provides warnings of collisions that may endanger space activities
or even life on earth.
Governance = making decisions on how the programme and deployed assets are to be run.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD5
Data Systems Governance: Data Systems Task Force
The Data Systems Task Force must :
“Ensure the availability of adequate strategy and plans for the mission data infrastructure and monitor the execution of those plans in order to ensure timely availability of the mission data infrastructure.”
This means that the DSTF in effect carries out governance of the data systems infrastructure.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD6
Conclusions on Governance
Governance is a buzz word that you will continue to hear!
Recent or emerging examples are: Establishment of the ESA Security Office; Software licence governance for ESA and Third Party Software.
Mike Jones’s proposed definition of “governance” to fit its usage in ESA:
“The process of making decisions, the oversight of the results of those decisions and also the oversight of organisations or structures of authority for decision making.”
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD7
Change can be slower than you think: Example 1: SCOS-2000 SCOS-2 (which became SCOS-2000) was a new MCS infrastructure developed from
scratch.
A very brief summary of the timeline of the project up to 2002:
Start of project as SCOS-2: 1992; Version 1 (as SCOS-2) used for Huygens, MTP and Teamsat: late 1997; Re-engineering of SCOS-2 (mainly TC chain): 1997-1998; Parallel production of architectural designs for both SCOS-1 and SCOS-2 baselines for the
Integral MCS: 2nd half of 1998; Adoption of SCOS-2 as the Integral MCS baseline: January 1999;
Integral was the first major ESA science spacecraft based on SCOS-2; SCOS-2 renamed SCOS-2000: 2000; Supported INTEGRAL LEOP: 17th October 2002, using SCOS-2000 rel. 2.3.
So it took 10 years to reach the point at which the new infrastructure became generally accepted – original plan was 5 years.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD8
Change can be slower than you think: Example 1: SCOS-2000 - Conclusion
Developing a new mission control system infrastructure from scratch is difficult and time consuming.
First lesson – try to avoid building new MCS infrastructures – “evolution, not revolution”.
Second lesson– if you have to do it, develop a simple version first.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD9
Change can be slower than you think : Example 2: Intel/Linux MCS Infrastructure In 2001, it was decided to port SCOS-2000 to Linux.
Straightforward: by 2002 a SCOS-2000 version was available which could run on either SUN Solaris platforms or on LINUX.
Outside ESOC, S2K became popular as a licensable product and, with one exception, external (non-ESOC) projects using SCOS-2000 have been based on Linux.
At ESOC, the move to the Linux version proceeded cautiously in two stages:
1. a pilot project with Linux server and SUN clients (Herschel Planck, S2K rel. 4); 2. a Linux transition project to install Linux clients in all the common areas.
Stage 1 was successfully completed ca. 2006. Stage 2, started in 2007, has been completed for the MCR.
Intel workstations for the remaining common areas will be procured this year together with a reserve of spares.
We are now aiming at supporting Herschel Planck LEOP using the new Linux infrastructure installed by the LIT project.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD10
Change can be slower than you think: Conclusion – Linux
It has taken more than 6 years to reach the point of having a common Intel/Linux infrastructure.
Where you have a large installed park of workstations (ca. 1400 in this case) change is quite slow, since the missions already installed on the old platforms will not want to, or be able to, change.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD11
FFP Contracts – the Magic Bullet? Before1996 most software development at ESOC was done under fixed-unit price
conditions.
Implications: ESOC “owned the risk” for the software requirements and their implementation. The contractor companies took no responsibility - they simply provided man-hours of staff.
In 1996 firm-fixed price (FFP) contracts for development of spacecraft control systems and simulators were introduced with the new frame contracts.
Prime motivation: Move contract staff off-site to their own companies’ premises; FFP regime much more suitable for off-site work.
FFP became the rule for most work awarded under these frame contracts, achieving: Far more rigorous scrutiny of requirements by frame contractors; Better competition; Equitable risk sharing between ESA and its suppliers; Formal change control (contract change notices - CCNs).
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD12
FFP Contracts – the Magic Bullet?
Firm-fixed price contracts have been rather successful for MCS, simulator and station back-end software.
But:
Firm Fixed Price does not meanFirm Fixed Schedule!
Contractor can underestimate the work to be done.
Recent example: Herschel Planck MPS, where the cheapest offer was taken and the contractor had underestimated the budget by a factor of nearly 10.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD13
FFP Contracts – The Magic Bullet? Conclusions
1. The lowest acceptable offer may not always be the right choice, particularly if the schedule is important.
A careful evaluation of management plan and technical solution is needed to ensure that the schedule can be met.
2. For schedule-critical developments, a look at more sophisticated techniques such as Earned Value Analysis may be needed.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD14
Black Swans
“Black Swan” - title of a book by Nassim Nicholas Taleb.
A black swan is a large-impact, hard-to-predict, and rare event beyond the realm of normal expectations.
Comes from ancient Western conception that 'All swans are white'.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD15
Black Swan: The Turkey Example
A turkey before and after thanksgiving. A history of a process over 1000 days tells you nothing about what will happen next
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD16
The Black Swan: How we deal with the unexpected in operations
Try to make operations fully predictable: Plan and prepare ground segments very carefully. Technically validate them thoroughly. Prepare procedures and plans for operations. Operationally validate extensive simulations programme
aimed at training all the teams and ensuring systems, documentation and operations staff all work together.
The operations validation also includes contingencies or anomaly cases to ensure the unexpected can be handled.
This is the discipline of Operations Engineering.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD17
The Black Swan and Software
ESOC uses systems containing lots of software.
In the real world much software is complex -
no single person can understand it completely.
“Complex” in this case means “Big” - complexity varies as a power of the size.
Behaviour of any complex software system cannot be fully understood - highly improbable or “black swan” events may occur.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD18
Black Swan: The MCS Incident during the MSG-1 LEOP (28th August 2002) A number of the client workstations in the MCR, PSR and SSR suddenly
became unusable - went to the SUN login.
Softcoor logged into the A server from the SSR and restarted the system.
This appeared to work, but then a SCOS-2000 communications task stopped processing on the server; two telecommanding tasks (multiplexer and releaser) crashed.
Attempts to switch clients to the redundant B server also failed.
Fortunately in the meantime the spacecraft was safe - despite the problems with the clients, telemetry was received and processed
on both A and B servers.
Softcoor then took the decision to move to a third chain, the C-system. He was then able to logout all clients on the A and B chains and to restart
the servers on both of them.
The systems were made available to the flight control and project teams about 20 minutes later.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD19
Black Swan: MSG-1 Diagnosis and Conclusions
Diagnosis: The server had been started as foreground task remote from a SUN WS in
the SSR – this created a dependency between the server and the SSR SUN WS.
For reasons unknown, this SUN had a problem and went to “login” status, resulting in the stopping of the server tasks started directly from this SUN.
There was an implementation error in the MISCdynamic server relating to CORBA event processing.
Problem resolution:1. Start the server as a background task. 2. Correct one CORBA call in the MISCdyn server.
A full explanation of everything that happened was not possible - for example, why the SSR SUN went to “login” in the first place - since the logs were inadequate.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD20
MSG-1 Incident: Discussion
Problems in complex software cannot be excluded.
ESOC approach is very practical and sound: a software coordinator thoroughly familiar with the system; assisted by a very qualified software support team;
both fully involved in the sim campaign; Ensured quick recovery in MSG case.
An operations engineering technique is applied to software engineering.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD21
Software Dependability
“Software dependability” seeks to quantify how much we can rely on a software system to function as required.
However, it is impossible with any reasonable effort to ensure there are no errors in a large software system, e.g. SCOS-2000, which comprises several millions of lines of
software code written since the mid-1990s.
There is a widespread misapprehension that it is possible to quantify the errors in computer code.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD22
Software Dependability: Example 1 – Misunderstanding Software Bugs
“Even if the tools are better, the number of bugs in newly written code has remained constant at around five per “function point”. . . Worse,. . . only about 85% of these bugs are eliminated before software is put into use.” [my underlining] (Economist Technology Quarterly, March 6, 2008)
You can measure the number of bugs found before putting the software into use;
But you cannot know how many bugs remain, unless the software is very simple.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD23
Black Swan: Example 2 – Misunderstanding Software Bugs
It is impossible to demonstrate a negative proposition such as this:
e.g. no run-time errors.
Absence of evidence is not evidence of absence.
“The supplier shall verify the software code ensuring: . . .
7. absence of run-time errors;
8. absence of memory leaks . . .” (Source: ECSS-E-40C)
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD24
Black Swan: Conclusion on Example 2
There are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know. Donald Rumsfeld
U.S. Secretary of Defense,
2001 to 2006
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD25
Software Dependability: Example 3 – Software Criticality
ECSS-E-40C puts tailoring according to software criticality in a normative annex. For example the standard requires 100% path coverage testing for Class B criticality software.
Critique: 100% coverage testing is, in practice, impossible for very complex systems; Even if you ensure 100% coverage testing, there is still no guarantee that the software is
free from error.
Discussion: For on-board software it is reasonable to take quite heavy measures in development are taken to
ensure dependable software. ECSS-E-40C
Shows a very strong influence from on board development practice. Does not take into account the impacts for ground software which typically are much bigger and more complex.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD26
Final Conclusions
You can Have good governance; Develop your ground systems in careful way, piloting new
technology and taking plenty of time; Ensure our industrial partners are fully motivated via
competitive firm-fixed price contracts; But you can still be hit by unexpected problems in
operations, especially in complex software.
The way to successfully tackle these unpredictable anomalies or incidents is to have a skilled team, fully familiar with the software and fully involved in the sims campaign.
Embracing the Future: a Restrospective Look - 5th Sept. 2008 - M. Jones OPS-GD27
Questions?