Is it safe? - ASQasq.org/asd/2012/04/is-it-safe.pdf•Of course it isn’t “safe”! •Our...
Transcript of Is it safe? - ASQasq.org/asd/2012/04/is-it-safe.pdf•Of course it isn’t “safe”! •Our...
-
Is it safe? Bill McArthur Director, Safety and Mission Assurance Johnson Space Center
-
Safety and Mission Assurance NASA, Lyndon B. Johnson Space Center
• Is it safe? • How do you answer that question when we’ve
put people atop a vehicle loaded with millions of pounds of rocket fuel?
• Of course it isn’t “safe”! • Our challenge is to achieve the best
understanding of the risk our mission poses to the public and our workforce, to include flight crews, in order to ensure that the goals we hope to achieve justify the risk.
• Is it safe enough?
-
Lessons Learned (the hard way)
• Processes grew out of painful experiences. • Human spaceflight organizations focused on a few, high
visibility missions. • Why do people, well-trained and committed to
excellence, make horrific mistakes? – Over confidence – Schedule pressure – Budget pressure – Extremely complex systems – Normalization of deviance – Failure to heed “lessons learned”
• What is different today?
-
JSC S&MA Director
NASA Administrator
Products
Process
Process
Culture
Process
Culture
Products
Culture
Process
Products
Process
Culture
Process
Products
Products
Process
Culture
Culture
Culture
Rote compliance – “No, because …”
HarlanRaines
Merrell (GA3/NA2)McCarty (LA/NA)
GermanyColonnaCohen Aldrich
Graham
Moore Cohen
Engineering
Technical
Base Funding
Thompson
GriffinKraft
Lunney Aldrich Kohrs Nicholson
FletcherLowPainePaine
Gilruth
Limited career path prestige and potential within Safety
Minimal communications within and between
Safety organizations reduced integration and
organizational effectiveness. Prior to the
Challenger accident there was not a strong
Level II SR&QA integration function.
Safety products were not used to drive operations products (flight rules, crew procedures, etc.) S&MA products integrated into operations products.
“Silent Safety” era
Space Shuttle Program (SSP) controlled Safety funding
1970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969
1970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969
Democratic majorityDemocratic majority Republican majority
President
Senate Majority
House Majority
Nixon Ford Carter Reagan Bush
Democratic majority
Safety became a rote compliance organization. Rote
compliance to requirements without regard for sound engineering
judgment and risk trades gave SR&QA a bad reputation. This
reputation has been very hard to overcome once established and
hindered effectiveness and recruitment – “no one wants to be a
traffic cop.”
Brief period of Engineering Technical Base funding
Safety products must be available early if they
are to influence design and operations. The
hazard reports and FMEA/CILs for the shuttle
program were always lagging behind the design
process and therefore were ineffective in influencing
the design and operation during the DDT&E phase
of the program. Prior to the loss of STS-51L the
flight rules and crew procedures were developed
without them and did not even reference them.
Certification by similarity can result in unforeseen
problems. Be very cautious about certifying
hardware by similarity, especially when safety is
weak. Variations in design, materials, manufacture,
operations, and operational environments may
invalidate similarity assumptions.
Comprehensive safety requirements need to be
developed and adopted at program inception. A
comprehensive set of safety requirements are key to
the success of a program. With the exception of a
good set of reliability and quality requirements, such a
set was lacking at the start of the shuttle program.
Use of hazard analysis methodologies was also
lacking. This resulted in the failure to identify and
adequately control hazards early in the program.
Political Factors
Apollo
Launch Prep
Design & Construction
Design & Construction
† Crew and vehicle lost during launch of STS-51L (January 28, 1986)
Design & Construction
Approach and Landing Tests Launch Site Interface Testing On display at Smithsonian Institute
Launch Prep Operational
Design & Construction
Design & Construction
STA à Challenger
Operational
Operational
OV101 Enterprise
OV102 Columbia
OV103 Discovery
OV104 Atlantis
OV105 Endeavour
Vehicles
OV99 Challenger
Rodney
Frosch Lovelace Beggs Fletcher Truly
Key Personnel
AA Safety
Shuttle Program Manager
Shuttle Safety Manager
Congress approves Space Shuttle Program (April 1972) à
† ASTP crew critically injured.
1:101970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969
Skylab
ASTP
Shuttle
1:10 1:10 1:10 1:10 1:10 1:18 à 1:34 1:35 1:35
Space Station
NASA HQ Office of Safety &
Mission Assurance (OSMA)
formed.
Retrospective Risk
OSMA did not exist at this time
Orbiter Project Manager
Maintenance, logistics, and
supportability analysis must start
early and account for the impacts
of ground processing cycles on
time/life/cycle limits. Such analysis
was not performed early in the
program, which resulted in hardware
life being used up on the ground.
JSC Center Director
Major Spaceflight Programs
Final report of the Space Transportation
System Safety Risk Assessment Ad Hoc
Committee, 1987
Augustine Report,
1990
Paté-Cornell
Report, 1990
Funding source impacts on Safety. A lack of
independent funding can lead to Safety being
compromised - “Don’t bite the hand that feeds
you.” At the very least, safety needs to have
independent funding for an independent
assessment function to ensure that areas
dependant on funding from the program are not
being unduly influenced or compromised.
Extensive re-evaluation and update
of existing FMEAs and CILs.
NASA HQ Office of Safety & Mission Assurance (OSMA) era
S&MA
Culture
“Silent Safety”
Safety products lagged the
development process and were not
considered to be in the critical path.
Rather than being used to drive the
design and operations, Safety
products, such as hazard reports and
FMEAs, were being used to document
and justify the design and operation.
Safety not watching trend
data such as time-cycle data.
Quality and Reliability disciplines were
strong, proactive, and effective.
Strong design requirements were in place
and enforced:
- NHB 5300.4(1D-2)
- JSCM-8080
- NSTS 07700
- NHB 1700.7A
1986 Roger’s Commission Report
S&MA staffing levels and
funding decreasing as flight
rates rapidly increase.
Unique and effective electrical,
electronic, and electromechanical
parts program.
Notional
Strength/
Effectiveness
of NASA
S&MA for the
Space Shuttle
Program
Shuttle Program inherited a weak safety organization
from Apollo. Years of mission success, budget
pressure, and attrition had taken a toll.
Office of Safety and Mission
Assurance formed at NASA
Headquarters.
S&MA assigned a
position in the
MER to review
and sign-off on
CHITs.
Report of the Presidential Commission on the Space Shuttle Challenger Accident
“… faults include a lack of problem reporting requirements, inadequate trend analysis, misrepresentation of criticality and lack of involvement in critical discussions. A properly staffed, supported, and robust safety organization might well have avoided these faults and thus eliminated the communication failures.”
Menear (NA)
First flights (glide and powered) were
crewed flights. Prior programs had
always used non-crewed flights to
validate design prior to risking crew.
Launch Rate by Year
Limited career path potential within the Safety
organization made it difficult to attract and retain
skilled people. Safety organization must have
similar career potential and support from human
resources, including recruitment emphasis, to ensure
a functional capability base.
Loss of
Challenger,
STS-51L
-
Culture
Culture
Process
Culture
Process
Process
Culture
Culture
Culture
Products
JSC S&MA Director
Improving career path potential within Safety (Safety Technical Authority, Training, Rotations, etc.)
Space Shuttle Program (SSP) controlled Safety funding
Safety products integrated into operations products.
Participatory S&MA
Changed to "risk-based" management of safety. “Yes, if …”
NASA HQ Office of Safety & Mission Assurance (OSMA) era
April 2004 – Shuttle S&MA Office established (code MX w/ matrix support from NA/NC) – Integrated S&MA across centers
Sept 2004 – Governance model – established S&MA Tech Authorities (included previous IA org role – no more separate IA org)
S&MA Problem Investigation Team – Era of proactive support to the Mission Management Team
Roe
Doremus (MX)McArthur (MX)Wilcutt (MX)Harris (MA)Merrell (GA3/NA2)
PoulosGermany
Erminger (MA3/NA2)
Greene
Hollaway Dittemore Parsons Hale Shannon
WilcuttMarshallCasperHarlan
ShawNicholson
CoatsAbbey Estess HowellHuntoonCohen
O’ConnorGregoryRodney
GoldinTruly
For a lesson learned to be beneficial, it must be
captured and implemented in future programs in
the form of requirements, organizational
structures, and/or policies.
Return to “Silent Safety”
Obama
1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992 Calendar Year
Republican majorityDemocratic majority Democratic majorityRepublican majorityDemocratic majorityPresident
Senate Majority
House Majority
Bush BushClinton
Democratic majority Republican majority Democratic majority
SSIAT
► The S&MA role should include: mandatory
participation on prevention/resolution teams
and in problem categorization, investigation of
escapes and diving catches and
dissemination of lessons learned.
► One causal factor is success-engendered
safety optimism. The SSP must rigorously
guard against the tendency to accept risk
solely because of prior success.
► Risk Management process erosion created
by the desire to reduce costs
► NASA S&MA should be restored to the
process in its previous role of an independent
oversight body, and not be simply a "safety
auditor."
► SSP should systematically evaluate and
eliminate all potential human single point
failures.
NIAT
► Develop a Risk Management Plan.
► Develop tools to access, analyze,
manage, and mitigate risk.
► Define “allowable risk” as a function
of the program/project.
► The S&MA role should include: mandatory
participation on prevention/resolution teams and in
problem categorization, investigation of escapes and
diving catches, and dissemination of lessons learned.
► Management needs to foster an environment
where problems are raised without fear of reprisal.
► Develop methods of quantitative risk assessment.
► Quantitative methods of risk assessment and
safety need to be integrated.
► Maintain and communicate a database of lessons
learned.
RAND
► A strong independent safety
organization is critical to maintaining
and/or improving safety.
► Establish an Independent Safety
Assurance Office.
► Establish a “Three-Key CoFR”
process in which NASA, the ISAO,
and the operational contractor share.
1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992 Calendar Year
Brief period of Engineering Technical Base funding
O’Keefe Griffin BoldenScolese
GAO: NASA Infrastructure, 1996
GAO: NASA Workforce Reductions, 1996
Super light Weight Tank Independent Assessment, 1997Agreement between the United States of America and the
Russian Federation Concerning Cooperation in the Exploration
and Use of Outer Space for Peaceful Purposes, 1992
NASA’s Implementation Plan for Space Shuttle Return to Flight and Beyond, 2005
SSIAT, 1999 NIAT, 2000
Government Mandatory Inspection Points (GMIPS), Independent Assessment Final Report, 2004
Columbia Crew Survival
Investigation Report, 2008
Calendar
On display at Smithsonian Institute
Operational
Operational
Operational
Operational
† Crew and vehicle lost during entry of STS-107
OV101 Enterprise
OV102 Columbia
OV103 Discovery
OV104 Atlantis
OV105 Endeavour
OV99 Challenger
Shuttle
1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:68 1:68 1:68 1:68 1:68 1:68 Retrospective Risk
Contingency Shuttle Crew Support era
ISS
Phase 1: Shuttle-Mir
Phase 2: 1st element launchSpace Station
1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992
1:89
Effective August 31, 2011, the Space
Shuttle Program becomes the Space
Shuttle Transition and Retirement Office.
Shuttle Safety Manager
Products
Culture
Process
► Improve risk assessment capability through use of
quantitative and qualitative risk assessment tools.
Benefits: Helped frame risk discussions and characterized
Safety risks for program and agency management.
► Safety must be involved early to be effective and must be
committed to “near real-time/real-time” support for flight
operations.
Benefits: Provided MMT with information needed for risk-
based decision making.
Aldrich Report, 1992
Limited career path potential within Safety
Rote compliance – “No, because …”
Columbia Crew Survival
Investigation Report
► During the design of the space
shuttle, NASA focused on preventing
an accident through high reliability
and redundancy. As a result,
traditional crew survival principles
were not part of the SSP.
► Crew survival should be an
integral part of all human spaceflight
programs. Safety should extend
beyond just preventing the accident
and include provisions for minimizing
the consequences of the accident.
Loss of
Columbia,
STS-107
Columbia Accident Investigation Board (CAIB) Report, 2003
“... tendency to
accept risk solely
because of prior
success.” -SSIAT
“… success-engendered
safety optimism.” -SSIAT
Culture
“… echoes of Challenger ...”
“Faster, Better, Cheaper”
Culture
“Cultural traits and organizational
practices detrimental to safety
were allowed to develop…”
~40% budget
reduction over 10
years. -CAIB
Maturing of PRA and other
S&MA risks tools increase
contributions and
effectiveness of SR&QA.
Various efforts to improve
effectiveness of S&MA.
AA Safety
Shuttle Program Manager
Shuttle Safety Manager
NASA Administrator
Orbiter Project Manager
JSC Center Director
Continuing budget
reductions and
workforce attrition.
Culture
“… NASA is not functioning as
a learning organization.”
Culture
Process
Products
ISS Phase 1 Manager
asked for EVA operations
safety reviews. Start of
greater S&MA involvement
in operational safety.
Dittamore Stich
Currie (MX)
Constellation Program Cancelled
► Cooperation and Coordination within and between
organizations is needed to increase organizational
effectiveness - avoid “sandboxing.”
Benefits: (1) Better coordination and sharing of technical
data resulted in improved timeliness and quality of
S&MA risk assessments.
(2) Earlier coordination and resolution of technical
issues resulted in better S&MA support to the SSP
and OSMA.
► Safety must be more than a rote compliance organization.
(Rote compliance to requirements without regard for sound
engineering risk trades gave SR&QA a bad reputation.)
Benefits: Improved working relationship with other
organizations and added real value to program
mission support.
McArthur
Safety needs a balance between
systems specialist and systems
engineers to facilitate affective
risk-based decision making.
S&MA staffing shifts
toward a better skills
balance with training
programs to increase
effectiveness.
Launch Rate by Year
Safety and Mission
Assurance
Space Shuttle Program
Legacy Report
JSC Flight Safety Office
● Established MX office – Integrated S&MA across centers.
● S&MA included in Chief Engineer’s Council events.
● Changed to "risk-based" management of safety.
● SMSR process improvements – focus on risks (3x4 matrix),
participation of Engineering.
● Utilized analytical tools for risk characterization and trade
studies.
● Fostered a “Yes, if …” mindset rather than the old “No,
because …” Offered alternative technical solutions if a
disapproval or non-concur was submitted.
● Considered rationale and applicability of requirements rather
than blind applicability – no longer just a “traffic cop” with
iron checklist.
● Risks were highlighted in a real-time manner.
● Developed the capability for rapid S&MA team response to
significant events or anomalies.
● Set the expectation that all of the Safety Directorate was
accessible to support real-time operations.
● Developed detailed Shuttle PRA model.
● Developed expertise and capability to quickly generate
quantitative and qualitative risk assessments for significant
technical issues and anomalies.
● Improved utility of traditional/heritage risk management tools
(Hazards, CILs, etc).
● Broad implementation of risk analysis techniques (7
Elements of Flight Rationale)
CAIB Report
► NASA is not functioning as a
learning organization.
► NASA’s safety culture has become
reactive, complacent, and dominated
by unjustified optimism.
► Careers in safety have lost
organizational prestige.
► An Independent Safety Assurance
function should be created that would
hold one of “three keys” in the CoFR
process … effectively giving this
function the ability to stop any
launch.
► Safety is largely dependent upon
the SSP for funding.
► Detrimental organizational and
cultural traits were allowed to form.
► Establish an independent
technical engineering authority.
► NASA HQ OS&MA should have
direct line authority over the entire
SSP safety organization and should
be independently funded.
-
Future Current Challenges
• Constrained budgets – less oversight
• Reductions in the contractor workforce
• Reductions in the Civil Service workforce
• Risk Informed Decision Making/Continuous Risk Management
• We all are responsible for identifying opportunities to reduce risk, day-in and day-out
• Is it safe (enough)? If not, SAY SO!
-
Managing Risk
• The systematic identification, assessment, prioritization and mitigation of risk is essential to our success.
• One of the most effective ways to reduce risk during the development and production of new vehicles is an emphasis on quality assurance, up front.
– “Quality goes in before the name goes on.”
• Success requires commitment at all levels.
-
Managing Risk
• Smart people learn from their mistakes; a wise people learn from the mistakes of others
• Study history, especially “lessons-learned”
– Avoid mistakes of the past
– Emulate the successes
• Promote a culture that accepts “bad news” without retribution
– Do not shoot the messenger
– Encourage dissenting opinions
– Bad news is not like a fine wine…
-
Managing Risk
• We have forged partnerships which at one time were unthinkable.
– ISS
• We will succeed…or fail…together
• Trust – Insight versus oversight
– When is “better” the enemy of “good enough”?
– “Business as usual” is not good enough.
-
How can the sky be the limit…
…when there are footprints on the moon?