Is it safe? - ASQasq.org/asd/2012/04/is-it-safe.pdf•Of course it isn’t “safe”! •Our...

11
Is it safe? Bill McArthur Director, Safety and Mission Assurance Johnson Space Center

Transcript of Is it safe? - ASQasq.org/asd/2012/04/is-it-safe.pdf•Of course it isn’t “safe”! •Our...

  • Is it safe? Bill McArthur Director, Safety and Mission Assurance Johnson Space Center

  • Safety and Mission Assurance NASA, Lyndon B. Johnson Space Center

    • Is it safe? • How do you answer that question when we’ve

    put people atop a vehicle loaded with millions of pounds of rocket fuel?

    • Of course it isn’t “safe”! • Our challenge is to achieve the best

    understanding of the risk our mission poses to the public and our workforce, to include flight crews, in order to ensure that the goals we hope to achieve justify the risk.

    • Is it safe enough?

  • Lessons Learned (the hard way)

    • Processes grew out of painful experiences. • Human spaceflight organizations focused on a few, high

    visibility missions. • Why do people, well-trained and committed to

    excellence, make horrific mistakes? – Over confidence – Schedule pressure – Budget pressure – Extremely complex systems – Normalization of deviance – Failure to heed “lessons learned”

    • What is different today?

  • JSC S&MA Director

    NASA Administrator

    Products

    Process

    Process

    Culture

    Process

    Culture

    Products

    Culture

    Process

    Products

    Process

    Culture

    Process

    Products

    Products

    Process

    Culture

    Culture

    Culture

    Rote compliance – “No, because …”

    HarlanRaines

    Merrell (GA3/NA2)McCarty (LA/NA)

    GermanyColonnaCohen Aldrich

    Graham

    Moore Cohen

    Engineering

    Technical

    Base Funding

    Thompson

    GriffinKraft

    Lunney Aldrich Kohrs Nicholson

    FletcherLowPainePaine

    Gilruth

    Limited career path prestige and potential within Safety

    Minimal communications within and between

    Safety organizations reduced integration and

    organizational effectiveness. Prior to the

    Challenger accident there was not a strong

    Level II SR&QA integration function.

    Safety products were not used to drive operations products (flight rules, crew procedures, etc.) S&MA products integrated into operations products.

    “Silent Safety” era

    Space Shuttle Program (SSP) controlled Safety funding

    1970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969

    1970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969

    Democratic majorityDemocratic majority Republican majority

    President

    Senate Majority

    House Majority

    Nixon Ford Carter Reagan Bush

    Democratic majority

    Safety became a rote compliance organization. Rote

    compliance to requirements without regard for sound engineering

    judgment and risk trades gave SR&QA a bad reputation. This

    reputation has been very hard to overcome once established and

    hindered effectiveness and recruitment – “no one wants to be a

    traffic cop.”

    Brief period of Engineering Technical Base funding

    Safety products must be available early if they

    are to influence design and operations. The

    hazard reports and FMEA/CILs for the shuttle

    program were always lagging behind the design

    process and therefore were ineffective in influencing

    the design and operation during the DDT&E phase

    of the program. Prior to the loss of STS-51L the

    flight rules and crew procedures were developed

    without them and did not even reference them.

    Certification by similarity can result in unforeseen

    problems. Be very cautious about certifying

    hardware by similarity, especially when safety is

    weak. Variations in design, materials, manufacture,

    operations, and operational environments may

    invalidate similarity assumptions.

    Comprehensive safety requirements need to be

    developed and adopted at program inception. A

    comprehensive set of safety requirements are key to

    the success of a program. With the exception of a

    good set of reliability and quality requirements, such a

    set was lacking at the start of the shuttle program.

    Use of hazard analysis methodologies was also

    lacking. This resulted in the failure to identify and

    adequately control hazards early in the program.

    Political Factors

    Apollo

    Launch Prep

    Design & Construction

    Design & Construction

    † Crew and vehicle lost during launch of STS-51L (January 28, 1986)

    Design & Construction

    Approach and Landing Tests Launch Site Interface Testing On display at Smithsonian Institute

    Launch Prep Operational

    Design & Construction

    Design & Construction

    STA à Challenger

    Operational

    Operational

    OV101 Enterprise

    OV102 Columbia

    OV103 Discovery

    OV104 Atlantis

    OV105 Endeavour

    Vehicles

    OV99 Challenger

    Rodney

    Frosch Lovelace Beggs Fletcher Truly

    Key Personnel

    AA Safety

    Shuttle Program Manager

    Shuttle Safety Manager

    Congress approves Space Shuttle Program (April 1972) à

    † ASTP crew critically injured.

    1:101970 197619751974197319721971 1977 198319821981198019791978 1984 199019891988198719861985Calendar Year 1968 1969

    Skylab

    ASTP

    Shuttle

    1:10 1:10 1:10 1:10 1:10 1:18 à 1:34 1:35 1:35

    Space Station

    NASA HQ Office of Safety &

    Mission Assurance (OSMA)

    formed.

    Retrospective Risk

    OSMA did not exist at this time

    Orbiter Project Manager

    Maintenance, logistics, and

    supportability analysis must start

    early and account for the impacts

    of ground processing cycles on

    time/life/cycle limits. Such analysis

    was not performed early in the

    program, which resulted in hardware

    life being used up on the ground.

    JSC Center Director

    Major Spaceflight Programs

    Final report of the Space Transportation

    System Safety Risk Assessment Ad Hoc

    Committee, 1987

    Augustine Report,

    1990

    Paté-Cornell

    Report, 1990

    Funding source impacts on Safety. A lack of

    independent funding can lead to Safety being

    compromised - “Don’t bite the hand that feeds

    you.” At the very least, safety needs to have

    independent funding for an independent

    assessment function to ensure that areas

    dependant on funding from the program are not

    being unduly influenced or compromised.

    Extensive re-evaluation and update

    of existing FMEAs and CILs.

    NASA HQ Office of Safety & Mission Assurance (OSMA) era

    S&MA

    Culture

    “Silent Safety”

    Safety products lagged the

    development process and were not

    considered to be in the critical path.

    Rather than being used to drive the

    design and operations, Safety

    products, such as hazard reports and

    FMEAs, were being used to document

    and justify the design and operation.

    Safety not watching trend

    data such as time-cycle data.

    Quality and Reliability disciplines were

    strong, proactive, and effective.

    Strong design requirements were in place

    and enforced:

    - NHB 5300.4(1D-2)

    - JSCM-8080

    - NSTS 07700

    - NHB 1700.7A

    1986 Roger’s Commission Report

    S&MA staffing levels and

    funding decreasing as flight

    rates rapidly increase.

    Unique and effective electrical,

    electronic, and electromechanical

    parts program.

    Notional

    Strength/

    Effectiveness

    of NASA

    S&MA for the

    Space Shuttle

    Program

    Shuttle Program inherited a weak safety organization

    from Apollo. Years of mission success, budget

    pressure, and attrition had taken a toll.

    Office of Safety and Mission

    Assurance formed at NASA

    Headquarters.

    S&MA assigned a

    position in the

    MER to review

    and sign-off on

    CHITs.

    Report of the Presidential Commission on the Space Shuttle Challenger Accident

    “… faults include a lack of problem reporting requirements, inadequate trend analysis, misrepresentation of criticality and lack of involvement in critical discussions. A properly staffed, supported, and robust safety organization might well have avoided these faults and thus eliminated the communication failures.”

    Menear (NA)

    First flights (glide and powered) were

    crewed flights. Prior programs had

    always used non-crewed flights to

    validate design prior to risking crew.

    Launch Rate by Year

    Limited career path potential within the Safety

    organization made it difficult to attract and retain

    skilled people. Safety organization must have

    similar career potential and support from human

    resources, including recruitment emphasis, to ensure

    a functional capability base.

    Loss of

    Challenger,

    STS-51L

  • Culture

    Culture

    Process

    Culture

    Process

    Process

    Culture

    Culture

    Culture

    Products

    JSC S&MA Director

    Improving career path potential within Safety (Safety Technical Authority, Training, Rotations, etc.)

    Space Shuttle Program (SSP) controlled Safety funding

    Safety products integrated into operations products.

    Participatory S&MA

    Changed to "risk-based" management of safety. “Yes, if …”

    NASA HQ Office of Safety & Mission Assurance (OSMA) era

    April 2004 – Shuttle S&MA Office established (code MX w/ matrix support from NA/NC) – Integrated S&MA across centers

    Sept 2004 – Governance model – established S&MA Tech Authorities (included previous IA org role – no more separate IA org)

    S&MA Problem Investigation Team – Era of proactive support to the Mission Management Team

    Roe

    Doremus (MX)McArthur (MX)Wilcutt (MX)Harris (MA)Merrell (GA3/NA2)

    PoulosGermany

    Erminger (MA3/NA2)

    Greene

    Hollaway Dittemore Parsons Hale Shannon

    WilcuttMarshallCasperHarlan

    ShawNicholson

    CoatsAbbey Estess HowellHuntoonCohen

    O’ConnorGregoryRodney

    GoldinTruly

    For a lesson learned to be beneficial, it must be

    captured and implemented in future programs in

    the form of requirements, organizational

    structures, and/or policies.

    Return to “Silent Safety”

    Obama

    1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992 Calendar Year

    Republican majorityDemocratic majority Democratic majorityRepublican majorityDemocratic majorityPresident

    Senate Majority

    House Majority

    Bush BushClinton

    Democratic majority Republican majority Democratic majority

    SSIAT

    ► The S&MA role should include: mandatory

    participation on prevention/resolution teams

    and in problem categorization, investigation of

    escapes and diving catches and

    dissemination of lessons learned.

    ► One causal factor is success-engendered

    safety optimism. The SSP must rigorously

    guard against the tendency to accept risk

    solely because of prior success.

    ► Risk Management process erosion created

    by the desire to reduce costs

    ► NASA S&MA should be restored to the

    process in its previous role of an independent

    oversight body, and not be simply a "safety

    auditor."

    ► SSP should systematically evaluate and

    eliminate all potential human single point

    failures.

    NIAT

    ► Develop a Risk Management Plan.

    ► Develop tools to access, analyze,

    manage, and mitigate risk.

    ► Define “allowable risk” as a function

    of the program/project.

    ► The S&MA role should include: mandatory

    participation on prevention/resolution teams and in

    problem categorization, investigation of escapes and

    diving catches, and dissemination of lessons learned.

    ► Management needs to foster an environment

    where problems are raised without fear of reprisal.

    ► Develop methods of quantitative risk assessment.

    ► Quantitative methods of risk assessment and

    safety need to be integrated.

    ► Maintain and communicate a database of lessons

    learned.

    RAND

    ► A strong independent safety

    organization is critical to maintaining

    and/or improving safety.

    ► Establish an Independent Safety

    Assurance Office.

    ► Establish a “Three-Key CoFR”

    process in which NASA, the ISAO,

    and the operational contractor share.

    1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992 Calendar Year

    Brief period of Engineering Technical Base funding

    O’Keefe Griffin BoldenScolese

    GAO: NASA Infrastructure, 1996

    GAO: NASA Workforce Reductions, 1996

    Super light Weight Tank Independent Assessment, 1997Agreement between the United States of America and the

    Russian Federation Concerning Cooperation in the Exploration

    and Use of Outer Space for Peaceful Purposes, 1992

    NASA’s Implementation Plan for Space Shuttle Return to Flight and Beyond, 2005

    SSIAT, 1999 NIAT, 2000

    Government Mandatory Inspection Points (GMIPS), Independent Assessment Final Report, 2004

    Columbia Crew Survival

    Investigation Report, 2008

    Calendar

    On display at Smithsonian Institute

    Operational

    Operational

    Operational

    Operational

    † Crew and vehicle lost during entry of STS-107

    OV101 Enterprise

    OV102 Columbia

    OV103 Discovery

    OV104 Atlantis

    OV105 Endeavour

    OV99 Challenger

    Shuttle

    1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:35 1:68 1:68 1:68 1:68 1:68 1:68 Retrospective Risk

    Contingency Shuttle Crew Support era

    ISS

    Phase 1: Shuttle-Mir

    Phase 2: 1st element launchSpace Station

    1993 199919981997199619951994 2000 200620052004200320022001 2007 20112010200920081991 1992

    1:89

    Effective August 31, 2011, the Space

    Shuttle Program becomes the Space

    Shuttle Transition and Retirement Office.

    Shuttle Safety Manager

    Products

    Culture

    Process

    ► Improve risk assessment capability through use of

    quantitative and qualitative risk assessment tools.

    Benefits: Helped frame risk discussions and characterized

    Safety risks for program and agency management.

    ► Safety must be involved early to be effective and must be

    committed to “near real-time/real-time” support for flight

    operations.

    Benefits: Provided MMT with information needed for risk-

    based decision making.

    Aldrich Report, 1992

    Limited career path potential within Safety

    Rote compliance – “No, because …”

    Columbia Crew Survival

    Investigation Report

    ► During the design of the space

    shuttle, NASA focused on preventing

    an accident through high reliability

    and redundancy. As a result,

    traditional crew survival principles

    were not part of the SSP.

    ► Crew survival should be an

    integral part of all human spaceflight

    programs. Safety should extend

    beyond just preventing the accident

    and include provisions for minimizing

    the consequences of the accident.

    Loss of

    Columbia,

    STS-107

    Columbia Accident Investigation Board (CAIB) Report, 2003

    “... tendency to

    accept risk solely

    because of prior

    success.” -SSIAT

    “… success-engendered

    safety optimism.” -SSIAT

    Culture

    “… echoes of Challenger ...”

    “Faster, Better, Cheaper”

    Culture

    “Cultural traits and organizational

    practices detrimental to safety

    were allowed to develop…”

    ~40% budget

    reduction over 10

    years. -CAIB

    Maturing of PRA and other

    S&MA risks tools increase

    contributions and

    effectiveness of SR&QA.

    Various efforts to improve

    effectiveness of S&MA.

    AA Safety

    Shuttle Program Manager

    Shuttle Safety Manager

    NASA Administrator

    Orbiter Project Manager

    JSC Center Director

    Continuing budget

    reductions and

    workforce attrition.

    Culture

    “… NASA is not functioning as

    a learning organization.”

    Culture

    Process

    Products

    ISS Phase 1 Manager

    asked for EVA operations

    safety reviews. Start of

    greater S&MA involvement

    in operational safety.

    Dittamore Stich

    Currie (MX)

    Constellation Program Cancelled

    ► Cooperation and Coordination within and between

    organizations is needed to increase organizational

    effectiveness - avoid “sandboxing.”

    Benefits: (1) Better coordination and sharing of technical

    data resulted in improved timeliness and quality of

    S&MA risk assessments.

    (2) Earlier coordination and resolution of technical

    issues resulted in better S&MA support to the SSP

    and OSMA.

    ► Safety must be more than a rote compliance organization.

    (Rote compliance to requirements without regard for sound

    engineering risk trades gave SR&QA a bad reputation.)

    Benefits: Improved working relationship with other

    organizations and added real value to program

    mission support.

    McArthur

    Safety needs a balance between

    systems specialist and systems

    engineers to facilitate affective

    risk-based decision making.

    S&MA staffing shifts

    toward a better skills

    balance with training

    programs to increase

    effectiveness.

    Launch Rate by Year

    Safety and Mission

    Assurance

    Space Shuttle Program

    Legacy Report

    JSC Flight Safety Office

    ● Established MX office – Integrated S&MA across centers.

    ● S&MA included in Chief Engineer’s Council events.

    ● Changed to "risk-based" management of safety.

    ● SMSR process improvements – focus on risks (3x4 matrix),

    participation of Engineering.

    ● Utilized analytical tools for risk characterization and trade

    studies.

    ● Fostered a “Yes, if …” mindset rather than the old “No,

    because …” Offered alternative technical solutions if a

    disapproval or non-concur was submitted.

    ● Considered rationale and applicability of requirements rather

    than blind applicability – no longer just a “traffic cop” with

    iron checklist.

    ● Risks were highlighted in a real-time manner.

    ● Developed the capability for rapid S&MA team response to

    significant events or anomalies.

    ● Set the expectation that all of the Safety Directorate was

    accessible to support real-time operations.

    ● Developed detailed Shuttle PRA model.

    ● Developed expertise and capability to quickly generate

    quantitative and qualitative risk assessments for significant

    technical issues and anomalies.

    ● Improved utility of traditional/heritage risk management tools

    (Hazards, CILs, etc).

    ● Broad implementation of risk analysis techniques (7

    Elements of Flight Rationale)

    CAIB Report

    ► NASA is not functioning as a

    learning organization.

    ► NASA’s safety culture has become

    reactive, complacent, and dominated

    by unjustified optimism.

    ► Careers in safety have lost

    organizational prestige.

    ► An Independent Safety Assurance

    function should be created that would

    hold one of “three keys” in the CoFR

    process … effectively giving this

    function the ability to stop any

    launch.

    ► Safety is largely dependent upon

    the SSP for funding.

    ► Detrimental organizational and

    cultural traits were allowed to form.

    ► Establish an independent

    technical engineering authority.

    ► NASA HQ OS&MA should have

    direct line authority over the entire

    SSP safety organization and should

    be independently funded.

  • Future Current Challenges

    • Constrained budgets – less oversight

    • Reductions in the contractor workforce

    • Reductions in the Civil Service workforce

    • Risk Informed Decision Making/Continuous Risk Management

    • We all are responsible for identifying opportunities to reduce risk, day-in and day-out

    • Is it safe (enough)? If not, SAY SO!

  • Managing Risk

    • The systematic identification, assessment, prioritization and mitigation of risk is essential to our success.

    • One of the most effective ways to reduce risk during the development and production of new vehicles is an emphasis on quality assurance, up front.

    – “Quality goes in before the name goes on.”

    • Success requires commitment at all levels.

  • Managing Risk

    • Smart people learn from their mistakes; a wise people learn from the mistakes of others

    • Study history, especially “lessons-learned”

    – Avoid mistakes of the past

    – Emulate the successes

    • Promote a culture that accepts “bad news” without retribution

    – Do not shoot the messenger

    – Encourage dissenting opinions

    – Bad news is not like a fine wine…

  • Managing Risk

    • We have forged partnerships which at one time were unthinkable.

    – ISS

    • We will succeed…or fail…together

    • Trust – Insight versus oversight

    – When is “better” the enemy of “good enough”?

    – “Business as usual” is not good enough.

  • How can the sky be the limit…

    …when there are footprints on the moon?