stdprod_062642.pdf

download stdprod_062642.pdf

of 40

Transcript of stdprod_062642.pdf

  • 7/29/2019 stdprod_062642.pdf

    1/40

    Government of Ontario IT Standard (GO-ITS)

    GO-ITS Number 37

    Enterprise Incident Management Process

    Version 2.0

    Status: Approved

    Prepared for the Information Technology Standards Council (ITSC) under thedelegated authority of the Management Board of Cabinet

    Queen's Printer for Ontario, 2010 Last Review Date: 2010-04-01

  • 7/29/2019 stdprod_062642.pdf

    2/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Copyright & Disclaimer

    Government of Ontario reserves the right to make changes in the information contained in this publicationwithout prior notice. The reader should in all cases consult the Document History to determine whether anysuch changes have been made.

    2010 Government of Ontario. All rights reserved.

    Other product or brand names are trademarks or registered trademarks of their respective holders. Thisdocument contains proprietary information of Government of Ontario, disclosure or reproduction isprohibited without the prior express written permission from Government of Ontario.

    Template Info

    Template Name Template #Template

    Version No.Template Author

    TemplateCompletion Date

    GO-ITSTemplate

    09.03.25 1.0 Design: PMCoE

    Boilerplate: TAB/OCCTO

    2009-03-26

    Document History (including ITSC and ARB approval dates)

    Date Summary

    2009-06-17 Version 1.7: presented to ITSC

    2009-07-16 Version 1.8: reflects feedback from Stakeholders, received up to and including 2009-07-16

    2009-08-14 Version 1.9: reflects additional roles and new principle regarding security related incidents

    2009-09-09 Version 1.94: reflects feedback since August 19 and injection of Urgency / Impact definitions (Section 6.4)2010-02-02 Version 1.95: accepts all changes in version 1.94 and incorporates results of discussions held in Dec 2009 and

    J an 2010 with ITSM Leads and ITS / OCCTO OEIP

    2010-02-08 Version 1.95: updated subsequent to meeting with Head, Corporate Architecture Branch, OCCTO, post ITSMLdiscussion of 2010-02-04. Suggestions received at ITSML embedded.

    2010-02-10 Version 1.95: updated to modify references to Post-Mortem terminology (changed to Major Incident Review) perdiscussion / feedback from ITSML

    2010-03-03 Version 1.97: inserted effective date for this revised version as J uly 1, 2010

    Hyperlink inserted in Appendix for MIP Normative reference

    2010-03-17 Endorsed: IT Standards Council endorsement

    2010-03-19 Version 2.0 Final Draft post-ITSC endorsement of 2010-03-17

    Section 4.2.10 and Principle 9 removed specific reference to Service Management Branches

    and replaced with generic wording appropriate branches

    Updated Section 4.3.1 Process Flow - added box in diagram to reflect User Reporting Incident

    Section 6.2.1 Added clarification statement to describe illustrative characteristic of diagram

    2010-04-01 Approved: Architecture Review Board approval

    GO-ITS 37 Enterprise Incident Management Process Page 2 of 40

  • 7/29/2019 stdprod_062642.pdf

    3/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Table of Contents

    1. FOREWORD..........................................................................................................................................................42. INTRODUCTION...................................................................................................................................................5

    2.1. Background............................................................................................................................................................................................................52.2. Purpose....................................................................................................................................................................................................................52.3. Value to the Business.........................................................................................................................................................................................62.4. Basic Concepts.....................................................................................................................................................................................................62.5. Scope........................................................................................................................................................................................................................9

    2.5.1.In Scope.................................................................................................................92.6. Applicability Statements......................................................................................................................................................................................9

    2.6.1.Organization...........................................................................................................9 2.6.2.Requirements Levels ...........................................................................................102.6.3.Compliance Requirements...................................................................................10

    3. STANDARDS LIFECYCLE MANAGEMENT..........................................................................................113.1. Contact Information...........................................................................................................................................................................................11

    3.1.1.Roles and Responsibilities................................................................................... 113.2. Recommended Versioning and/or Change Management...............................................................................................................133.3.

    Publication Details..............................................................................................................................................................................................13

    4. TECHNICAL SPECIFICATION......................................................................................................................14

    4.1. Process Principles.............................................................................................................................................................................................144.2. Process Roles and Responsibilities............................................................................................................................................................21

    4.2.1.Enterprise Incident Management Process Owner...............................................214.2.2.Incident Manager (IM)..........................................................................................224.2.3.Situation Manager (SM).......................................................................................224.2.4.Queue Manager (QM)..........................................................................................234.2.5.Service Desk Manager (SDM).............................................................................234.2.6.Service Desk Team Lead.....................................................................................244.2.7.Service Desk Agent (SDA)...................................................................................244.2.8.Incident Analyst (IA).............................................................................................244.2.9.Service Owner......................................................................................................254.2.10. Major Incident Manager (MIM) .......................................................................264.2.11. Partner Incident Management Liaison ...........................................................26

    4.3. Process Flows....................................................................................................................................................................................................274.3.1.Incident Management Process Overview............................................................274.3.2.Incident Management Process Tasks..................................................................29

    4.4. Linkages to other processes..........................................................................................................................................................................314.5. Incident Management Process Quality Control......................................................................................................................................314.6. Metrics....................................................................................................................................................................................................................324.7. Standard Process Parameters.....................................................................................................................................................................33

    5. RELATED STANDARDS................................................................................................................................335.1. Impacts to Existing Standards.......................................................................................................................................................................335.2. Impacts to Existing Environment..................................................................................................................................................................33

    6. APPENDICES.....................................................................................................................................................346.1. Normative References....................................................................................................................................................................................34

    6.1.1.Major Incident Protocol ........................................................................................346.2. Informative References...................................................................................................................................................................................34

    6.2.1.Enterprise Differentiation: Process, Procedure, Work Instruction .......................346.2.2.Definitions: Urgency and Impact..........................................................................35

    7. GLOSSARY.........................................................................................................................................................37

    GO-ITS 37 Enterprise Incident Management Process Page 3 of 40

  • 7/29/2019 stdprod_062642.pdf

    4/40

    Sensitivity: Unclassified Approved Version #: 2.0

    1. Foreword

    Government of Ontario Information Technology Standards (GO-ITS) are the official publications on theguidelines, preferred practices, standards and technical reports adopted by the Information TechnologyStandards Council (ITSC) under delegated authority of the Management Board of Cabinet (MBC). These

    publications support the responsibilities of the Ministry of Government Services (MGS) for coordinatingstandardization of Information & Information Technology (I&IT) in the Government of Ontario. Publications thatset new or revised standards provide enterprise architecture guidance, policy guidance and administrativeinformation for their implementation. In particular, GO-ITS describe where the application of a standard ismandatory and specify any qualifications governing the implementation of standards.

    GO-ITS 37 Enterprise Incident Management Process Page 4 of 40

  • 7/29/2019 stdprod_062642.pdf

    5/40

    Sensitivity: Unclassified Approved Version #: 2.0

    2. Introduction

    2.1. Background

    The requirement for an all-encompassing OPS Incident Management standard was predicated by the

    positioning of all infrastructure service and support within Infrastructure Technology Services (ITS), a neworganization within the OPS mandated in 2005 to deliver these types of services to the OPS. The ITSorganization was created in 2006 to achieve this goal. Establishment of this goal required an update ofthe requirements for the GO-ITS Standard for Incident Management based on the situation describedabove. The result was an updated version of GO-ITS #37 created and approved in J uly of 2007.

    During February 2009, a series of outages to Ontario.ca infrastructure prompted I & IT ExecutiveManagement to conduct a review of both Incident and Change Management processes and procedures.

    The review identified deficiencies in a number of areas including; procedures, operational processmanagement and behaviour. The review made specific recommendations to address the deficiencies andthese recommendations have subsequently been sanctioned by ITELC. Accordingly, the OPS EnterpriseIT Service Management Program (OEIP) has updated the Enterprise Incident Management ProcessStandard to incorporate the recommendations.

    This document redefines certain aspects of the enterprise Incident Management Principles, Roles and theassociated process model. Updates to GO-ITS #37 include:

    Principles, Roles, Responsibilities and the high-level process flow required to ensure anenterprise perspective of Incident Management for the OPS.

    Definition of a Major Incident Protocol at the process standard level

    Incorporation of ITIL1 V3 (2007) concepts, introduction of a service-based focus for enterprise

    incident management disciplines and the natural evolution of IT Service Management within theOPS

    These standard elements continue to provide a single unified process for enterprise IncidentManagement within the OPS. Use of a single process and supporting information will enable OPS-wide

    management and reporting for the enterprise Incident Management process through establishment ofassociated metrics.

    GO-ITS 44 ITSM Terminology Reference Model Portable Guide provides a common information model forkey process parameters that require standardization across the OPS to ensure consistency, reliablebusiness intelligence and to support end-to-end cross-jurisdictional service management. GO-ITS 44 willbe updated with additional values defined as part of GO ITS 37. Please refer to:http://www.gov.on.ca/MGS/en/IAndIT/STEL02_047295.html

    2.2. Purpose

    The goals of the enterprise Incident Management process are to restore normal service operation as quickly as

    possible, minimize the adverse impact on business operations and ensure that the best possible levels ofservice quality and availability are maintained.

    This process standard describes best practices to be utilized for Incident Management. The process design isorganizationally agnostic and is not constrained by the status quo. Implementation of the process may requireorganizational or behavioural transformation.

    1ITIL and IT Infrastructure Library are registered trademarks of the Office of Government Commerce (OGC), U.K.

    GO-ITS 37 Enterprise Incident Management Process Page 5 of 40

    http://www.gov.on.ca/MGS/en/IAndIT/STEL02_047295.htmlhttp://www.gov.on.ca/MGS/en/IAndIT/STEL02_047295.html
  • 7/29/2019 stdprod_062642.pdf

    6/40

    Sensitivity: Unclassified Approved Version #: 2.0

    2.3. Value to the Business

    The value of Incident Management includes:

    The ability to detect and resolve incidents, which results in lower downtime to the business, whichin turn means higher availability of the service. This means that the business is able to exploit thefunctionality of the service as designed.

    The ability to align IT activity to real-time business priorities. This is because IncidentManagement includes the capability to identify business priorities and dynamically allocateresources as necessary.

    The ability to identify potential improvements to services. This happens as a result ofunderstanding what constitutes an incident and also from being in contact with the activities ofbusiness operational staff.

    The Service Desk can, during its handling of incidents, identify additional service or trainingrequirements found in IT or the business.

    2.4. Basic Concepts

    ITIL defines an incident as: An unplanned interruption to an IT service or reduction in the quality of an ITservice. Failure of a service component or element item that has not yet impacted service is alsoconsidered an incident (e.g. Failure of one disk from a mirrored set).

    Incident Management is the process for dealing with all incidents. This can include;

    failures, questions or queries reported by the users (usually via a telephone call to the Service Desk)

    anomalies detected by technical staff

    automatically detected errors or conditions reported by event monitoring tools

    The Service Desk Agent (SDA) captures the pertinent information and logs, classifies and prioritizes theincident.

    The priority of an Incident is primarily determined by the impact on the business and the urgency withwhich a resolution or work-around is needed (as defined in Appendix 6.4) Objective targets for resolvingIncidents are defined in Service Level Agreements (SLAs). Major Incidents, which typically have highestimpact and demand quicker resolution, follow the same process as any other Incident, but are managedby a separate procedure.

    The Service Desk takes advantage of diagnostic scripts to capture and verify information that is requiredto quickly resolve the event. In the case where the Service Desk cannot achieve resolution, thisinformation helps in ensuring the Incident is assigned to the appropriate Tier 2 group for action. TheService Desk Agent often references Incident Patterns, the Known Error database and any availableKnowledge Management records to obtain any information that will assist them in attempting to resolvethe Incident at first point of contact (FPOC).

    If the Incident cannot be resolved at first point of contact, the Service Desk Agent assigns the incident toa group with more specialized skills. (This is known as Functional Escalation).

    Tier 1-N Thresholds

    Each support tier is allocated a certain amount of time to resolve the incident, following which the Incidentmust be functionally escalated to a more specialized group. The amount of time allocated to each tier isset so that service restoration occurs within the agreed targets, as defined in the SLA/SLO. Theseallocations may be adjusted from time to time based upon staffing models, experience on supporting thevarious services and ongoing changes to service specifications and components.

    GO-ITS 37 Enterprise Incident Management Process Page 6 of 40

  • 7/29/2019 stdprod_062642.pdf

    7/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Queues, Support Groups and Functional Escalation

    The Incident Management system supports the practice of Queues and Queue Management: each queuerepresents a view of all Incidents assigned to an organization at all levels of priority. This provides aQueue Manager with an overall perspective of how the Incident Management process is being executedacross all support groups within an organization at any given time. Should a certain part of theorganization be experiencing a back log related to incidents in their respective queues, the Service DeskManager may be asked by the Queue Manager to perform Hierarchical Escalation, to notify more senior

    management of the situation in an effort to relieve the pressure on any specific queue.

    This basic concept applies to the design of the Incident Management process within the Ontario PublicService, however, organization maturity currently prevents the industry best practice from being strictlyfollowed. It is important to note this concept as it describes the desired organizational behaviour orfuture-state model.

    Various support groups have also been established in each OPS organization based upon areas offunctional expertise. An Incident can be assigned to any one of these support groups where it is thenassigned to an individual member of that group to undertake incident diagnosis and resolution. All ofthese support groups must roll up into an organizational queue view, so that the overall perspective isavailable to the Queue Manager.

    A Service Desk Agent who cannot resolve an Incident at FPOC, assigns it to the appropriate Tier 2Support Group, based upon the initial diagnosis (This is called Functional Escalation).

    Once the Service Desk Agent has assigned the incident to a Tier 2 Incident Analysts, one of three thingstypically occurs: Resolution: The Incident Analyst restores service and informs the Service Desk Re-assignment: The Incident Analyst concludes that the cause of the incident does not lie in his area

    of expertise and assigns the incident back to the Service Desk for re-assignment to a moreappropriate group

    GO-ITS 37 Enterprise Incident Management Process Page 7 of 40

  • 7/29/2019 stdprod_062642.pdf

    8/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Functional escalation: the Incident Analyst cannot resolve the incident within defined threshold andrequests that the Incident be assigned to a Tier 3 support group with more specialized skills.

    A Queue Manager role may also be established for an individual Support Group to monitor theirrespective queues at regular intervals to identify any incidents that have not been assigned to individualsor have not been resolved within defined thresholds and to take proactive action before being prompted

    by the overall Queue Manager.

    Accountability

    Regardless of the support staff and organization to which an incident may be assigned, the IncidentManager (part of the OPS ITSD organization) remains accountable for ensuring that enterprise IncidentManagement process and procedures are followed and that prompt incident resolution activities areundertaken with Service Level Objectives in mind.

    Irrespective of who restores service (Service Desk Agent or Tier 2-N Support group), the OPS ITSD remainsaccountable to confirm with the customer and / or end-user that service has been restored and verifies theaccuracy of the condition/reason code, prior to closing the Incident.

    Inputs to the Incident Management process include2: Incident records from calls to the Service Desk Service Level Objectives (from SLAs) Capacity Management thresholds Incident resolution details from the Knowledge base Incident patterns (and workarounds) from Incident Knowledge Management database Known Errors from Problem Management CI data from Configuration Management

    Outputs from this process include3: Closed Incidents Services restored Requests for Change (RFCs) Incident resolution

    Inconsistencies found while interrogating the CMDB Consistent, meaningful (and maintained) Incident records Meaningful management information

    2Source: Copyright 2003-2007 Ahead Technology Inc.

    3Source: Copyright 2003-2007 Ahead Technology Inc.

    GO-ITS 37 Enterprise Incident Management Process Page 8 of 40

  • 7/29/2019 stdprod_062642.pdf

    9/40

    Sensitivity: Unclassified Approved Version #: 2.0

    2.5. Scope

    2.5.1. In Scope

    Incident Management includes any event which disrupts, or which could disrupt, a service. This includes eventswhich are communicated directly by users through the Service Desk or events detected through an automatedinterface from event management to Incident Management tools.

    For purposes of clarity, any use of the terms, Incident Manager, Incident Management or Incidents, within thisdocument includes the enterprise perspective described in Section 2.1.

    Service Requests do not represent a disruption to agreed service, but are a way of meeting the customersneeds and may be addressing a specific aspect or feature of the service being provided (Service Fulfillment).

    This will be documented in the Service Level Agreement with each customer and the Service Level Objectivewill be outlined therein. Service requests are dealt with by a separate Request Fulfilment process.

    Service Requests in the OPS are currently tracked under the same incident management enabling technologyused by the Service Desk for incident logging.

    Incident Management Scope:

    IS IS NOTHow To and technical questions Service Requests (Request fulfillment) This is handled

    in the OPS through Service Order Desk Online(SODO)

    Root Cause Analysis (part of Problem Management)

    Steps and procedures to manage Major Incidents Establishment of communication thresholds forcustomers (these are defined through Service LevelManagement)

    2.6. Applicability Statements

    2.6.1. Organization

    Government of Ontario IT Standards and Enterprise Solutions and Services apply (are mandatory) for use by allministries/clusters and to all former Schedule I and IV provincial government agencies under their presentclassification (Advisory, Regulatory, Adjudicative, Operational Service, Operational Enterprise, Trust or CrownFoundation) according to the current agency classification system.

    Additionally, this applies to any other new or existing agencies designated by Management Board of Cabinetasbeing subject to such publications, i.e. the GO-ITS publications and enterprise solutions and services - andparticularly applies to Advisory, Regulatory, and Adjudicative Agencies (see also procurement link, OPSparagraph). Further included is any agency which, under the terms of its Memorandum of Understanding with itsresponsible Minister, is required to satisfy the mandatory requirements set out in any of the Management Boardof Cabinet Directives (cf. Operational Service, Operational Enterprise, Trust, or Crown Foundation Agencies).

    As new GO-IT standards are approved, they are deemed mandatory on a go-forward basis. Specifically, in thecase of this revised version of GO-ITS 37 (placeholder anticipating approved version number set to 2.0), theeffective date has been established as J uly 1, 2010.

    When implementing or adopting any Government of Ontario IT standards or IT standards updates, ministriesand I&IT Cluster must follow their organization's pre-approved policies and practices for ensuring that adequatechange control, change management and risk mitigation mechanisms are in place and employed.

    For the purposes of this document, any reference to ministries or the Government includes applicable agencies.

    GO-ITS 37 Enterprise Incident Management Process Page 9 of 40

    http://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.htmlhttp://www.itstandards.gov.on.ca/http://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.htmlhttp://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.htmlhttp://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.htmlhttp://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.htmlhttp://www.itstandards.gov.on.ca/http://www.doingbusiness.mgs.gov.on.ca/mbs/psb/psb.nsf/english/bpsdef.html
  • 7/29/2019 stdprod_062642.pdf

    10/40

    Sensitivity: Unclassified Approved Version #: 2.0

    2.6.2. Requirements Levels

    Within this document, certain wording conventions are followed. There are precise requirements and obligationsassociated with the following terms:

    MustThis word, or the terms "REQUIRED" or "SHALL", means that the statement is an absolute

    mandatory requirement.

    Should

    This word SHOULD, or the adjective "RECOMMENDED", means that there may existvalid reasons in particular circumstances to ignore the recommendation, but the fullimplications (e.g., business functionality, security, cost) must be understood and carefullyconsidered before deciding to ignore the recommendation

    2.6.3. Compliance Requirements

    Execution of this process at the operational level requires use of procedures, work instructions and enablingtechnology to automate certain workflow aspects. These elements will be produced by the organization selectedby OEIP as the Operational Process Manager. Pending formalization of an ITSM Process LifecycleManagement protocol, the following statements are presented to ensure that these elements are fully compliant

    with this Standard:

    Procedures must be developed by decomposing each process step from section 4.3 into proceduralsub-tasks. These procedures must be submitted to the Enterprise Process Ownerfor certification thatthey comply with the spirit and intent of the Process Standard.

    Work Instructions must be developed by decomposing all procedural sub-tasks into further sub-tasks.These must be then submitted to the Enterprise Process Ownerfor certification that they comply withthe certified process and procedures.

    Functional Requirements must be developed for enabling technology that will be used to automateaspects of the work Instructions and procedures. Functional Requirements must also be submitted tothe Enterprise Process Ownerfor certification that they align with the certified procedures.

    Any subsequent modifications to the Procedures, Work Instructions or enabling technology mustbe managed via Enterprise Change Management and will require authorization by OEIP

    GO-ITS 37 Enterprise Incident Management Process Page 10 of 40

  • 7/29/2019 stdprod_062642.pdf

    11/40

    Sensitivity: Unclassified Approved Version #: 2.0

    3. Standards Lifecycle Management

    3.1. Contact Information

    3.1.1. Roles and Responsibilities

    Provide the following information:

    Accountable Role DefinitionThe individual ultimately accountable for the process of developing this standard. There must be exactlyone accountable role identified. The accountable person also signs off as the initial approver of theproposed standard before it is submitted for formal approval to ITSC and ARB. (Note: in the OPS this roleis at a CIO/Chief or other senior executive level)

    Accountable Role:Title: Head, Corporate Architecture Branch (OCCTO)Ministry: MGSDivision: OCCTO

    Responsible Role DefinitionThe organization responsible for the development of this standard, There may be more than oneresponsible organization identified if it is a partnership/joint effort. (Note: the responsible organization(s)provides the resource(s) to develop the standard)

    Responsible Organization:Ministry: MGSDivision: OCCTOBranch: Corporate Architecture

    Support Role DefinitionThe support role is the resource(s) to which the responsibility for actually completing the work anddeveloping the standard has been assigned. There may be more than one support role identified. If there

    is more than one support role identified, the following contact information must be provided for each ofthem. If there is more than one support role, the first role identified should be that of the editor theresource responsible for coordinating the overall effort.

    Support Role (Editor):Ministry: MGSDivision: OCCTOBranch: Corporate ArchitectureSection: ITSM

    J ob Title: Lead, OPS Enterprise ITSM ProgramName: Norm WattPhone: 416-327-3542Email: [email protected]

    The above individual will be contacted by the Standards Section once a year, or as required, to discussand determine potential changes and/or updates to the standard (including version upgrades and/orwhether the standard is still relevant and current).

    GO-ITS 37 Enterprise Incident Management Process Page 11 of 40

  • 7/29/2019 stdprod_062642.pdf

    12/40

    Sensitivity: Unclassified Approved Version #: 2.0

    ConsultedPlease indicate who was consulted as part of the development of this standard. Include individuals (byrole and organization) and committees, councils and/or working groups.(Note: consulted means those whose opinions are sought, generally characterized by two-waycommunications such as workshops):

    Organization Consulted(Ministry/Cluster)

    Division Branch Date

    Committee/Working Group Consulted Date

    ITSM Leads Dec 2009 andFeb 2010

    InformedPlease indicate who was informed during the development of this standard. Include individuals (by role andorganization) and committees, councils and/or working groups.

    (Note: informed means those who are kept up-to-date on progress, generally characterized by one-waycommunication such as presentations):

    Organization Informed(Ministry/Cluster)

    Division Branch Date

    Committee/Working Group Informed Date

    GO-ITS 37 Enterprise Incident Management Process Page 12 of 40

  • 7/29/2019 stdprod_062642.pdf

    13/40

    Sensitivity: Unclassified Approved Version #: 2.0

    3.2. Recommended Versioning and/or Change Management

    Changes (i.e. all revisions, updates, versioning) to the standard require authorization from the responsibleorganization.

    Once a determination has been made by the responsible organization to proceed with changes, the StandardsSection, Technology Adoption Branch, OCCTO, will coordinate and provide assistance with respect to theapprovals process.

    The approval process for changes to standards will be determined based on the degree and impact of thechange. The degree and impact of changes fall into one of two categories:

    Minor changes - requiring communication to stakeholders. No presentations required. No ITSC or ARBapprovals required. Changes are noted in the Document History section of the standard;

    Major changes - requiring a presentation to ITSC for approval and ARB for approval (Note: ARBreserves the right to delegate their approval to ITSC)

    Below are guidelines for differentiating between minor and major changes:

    Major:

    represents a change to one or more of Scope, Principles, Roles or high-level Process Flow responds to legislative changes

    Minor:

    does not impact other standards (e.g. updated Glossary information or updated Informative orNormative reference documentation)

    3.3. Publication Details

    All approved Government of Ontario IT Standards (GO-ITS) are published on the ITSC Intranet web site.Please indicate with a checkmark below if this standard is also to be published on the public, GO-ITS InternetSite.

    Standard to be published on both the OPS Intranet and the GO-ITSInternet web site (available to the public, vendors etc.) ;

    GO-ITS 37 Enterprise Incident Management Process Page 13 of 40

  • 7/29/2019 stdprod_062642.pdf

    14/40

    Sensitivity: Unclassified Approved Version #: 2.0

    4. Technical Specification

    4.1. Process Principles

    Principles are established to ensure that the process identifies the desired outcomes or behaviours

    related to adoption at an enterprise level. They also serve to provide direction for the development ofprocedures and (as necessary) work instructions that will ensure consistent execution of the process. Theabsence of well-defined and well understood principles may result in process execution that is not alignedwith the process standard. Process Principles for OPS enterprise Incident Management are listed below.

    Principle 1:

    A single enterprise Incident Management process shall be used across the OPSin support of I & IT services.

    Rationale:

    Single support model eliminates costs and inefficiencies of multiple models for different services

    Establishment of a Single Point of Contact (SPOC) OPS IT Service Desk (ITSD) in FY 2006/2007 implied asingle incident management process for OPS I & IT Incident Management

    Implications:

    Legacy Incident Management related procedures and work instructions must be integrated and aligned toOPS enterprise Incident Management process

    Application Support groups must adapt existing procedures and work instructions to comply with the OPSenterprise Incident Management process

    Principle 2:

    Incident classification must identify the Service(s) that is / are impacted (from theCustomers perspective).

    Rationale:

    OPS Service Directive

    OEIP business architecture principle to establish a Service Focus for ITSM processes

    Enable implementation of ISAM (Integrated Service Agreement Model)

    Implications:

    staff must adopt an end-to-end service perspective for all incidents

    Service classification requirements must be defined and included in enabling technology

    Cluster Service owners must identify the services/hierarchy

    A service Configuration hierarchy must exist in order to identify impacted services

    Staff must be trained in new classification techniques

    Incident messaging with user/customer must communicate the service that the user feels is impacted

    Internal assignment routing, currently component-based, may have to be modified

    GO-ITS 37 Enterprise Incident Management Process Page 14 of 40

  • 7/29/2019 stdprod_062642.pdf

    15/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Principle 3:

    The OPS ITSD shall be the single entry point into the enterprise IncidentManagement process and will manage Incidents through their complete lifecycle,including: assignment, functional and hierarchical escalation, tracking,communication and closure.

    Consistent management and coordination of Incident resolution Rationale:

    Single accountability for execution of enterprise Incident Management process

    Ability to share topical information within a single group and provide enterprise perspective

    Ability to cross-reference other incidents and establish incident priority from an enterprise perspective

    Implications:

    effective diagnostic scripts and Support Models are required to assist in triage of incidents and ensure

    accurate assignment to the appropriate Tier-N resources

    ITSD Senior Management must support the objective assessment of reported Incidents and ensurecriteria for Impact and Urgency (used to determine Priority) are established and communicated toCustomers through the Service Level Management process

    Incident assignments / re-assignments to Tier-N support must occur via Service Desk only

    Principle 4:

    The OPS ITSD shall act as the single point of contact for all communicationregarding reported Incidents.

    Rationale:

    Consistent support interface for customers

    Consistent delivery and coordination of communications to internal staff

    Reduces duplicative messaging and ensures common perspective is provided to customers and to I & ITsenior management

    IT Tier 2-N support staff are more productive since they are protected from interruptions and the needto manage communications

    Implications:

    Assistance and incident status information must be available (7*24) from the OPS ITSD throughoutthe entire lifecycle of the incident

    OPS ITSD and technical support staff will have to adjust their messaging to describe impacts / statusin terminology that is service-focussed and customer-based rather than technical in nature

    OPS ITSD will distribute all Major Incident communications (sanctioned by the Major Incident Manager)

    GO-ITS 37 Enterprise Incident Management Process Page 15 of 40

  • 7/29/2019 stdprod_062642.pdf

    16/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Tier 2-N resources may request OPS ITSD staff to coordinate dialogue with end-user or customers (used togather additional detail or information to effect incident resolution) if they are unable to contact the end-userdirectly.

    Customers or I & IT Clusters must have in place a mechanism to broadly disseminate information providedto them by the OPS ITSD

    Principle 5:

    An Incident must be logged through the OPS ITSD as a pre-requisite forengagement of any Tier 2-N Support Staff, including external Service Providers.

    Rationale:

    The Incident record is the source of record with the OPS ITSD for all incident resolution activitiesundertaken by any support staff. Failure to document these activities increases the risk of delayedresolution.

    Implications:

    OPS ITSD procedures must identify the minimum level of information required to initiate an Incidentrecord and to enable effective investigation and diagnosis.

    Principle 6:

    Closure of incidents shall be dependent upon validating with the either the end-user or the customer that service has been restored.

    Rationale:

    Obtaining positive confirmation of incident resolution ensures that the customer is satisfied with theservice delivered

    Validation step enhances the image of the IT organization

    Implications:

    Customers will identify an appropriate level of resource to accept the validation request.

    A suitable mechanism must be defined to deal with circumstances when end-user(s) cannot be reached forvalidation within a pre-defined time period.

    GO-ITS 37 Enterprise Incident Management Process Page 16 of 40

  • 7/29/2019 stdprod_062642.pdf

    17/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Principle 7:

    There shall be notification & escalation procedures that ensure consistent timelyincident resolution and communication of progress relative to Service LevelAgreements.

    Rationale:

    Setting customer expectation for timing of periodic status reports will prevent interruptions caused byrequests for status

    More effective delivery of end to end service as IT staff will have a clear understanding of IncidentSLOs which will guide appropriate functional and hierarchical escalation

    Incidents resolved within customer expectations will increase customer satisfaction

    Implications:

    Clear triggers and thresholds must be defined for functional & hierarchical escalation, as well as anyperiodic status notifications (this implies some form of automation); Service Level Objectives(documented in Service Level Agreements) must be clearly and explicitly defined and linked to thesethresholds

    A single Escalation procedure must exist for functional and hierarchical escalation and must beadhered to by all participants in the Incident Management process

    A single Notification procedure must exist for notification.

    Any unique requirements for service specific notification thresholds must be documented andmanaged through the Service Level Management process and outputs from these situations must beconfigured within the ITSD enabling technology to support the requirements

    Templates and scripts are required to ensure consistency of messaging

    Customer Messaging must be tailored to deliver a customer perspective

    Messaging for internal Service Provider community may carry different level of detail, and this will bemanaged through local work instructions at the OPS ITSD

    Principle 8:

    All Incident information, including resolution details, shall be logged in anaccessible Incident Management repository.

    Rationale:

    Single source of data for all enterprise incidents, ensures consistent view and authoritative source formanagement of incidents

    tracking of progress enables ability to escalate

    Provides knowledge base to enable:

    Reduction in Mean Time to Resolve (MTTR) for similar incidents by applying previous workaround

    Analysis and identification of Problems (by Problem Management Process) Audit trail informs reporting (Service Level Management)

    Implications:

    Incident Management must be supported by an integrated IT support system with a common database forlogging all incident & resolution information

    Incident Management and Problem Management must have access to the same database

    Validation of accuracy of resolution details must occur before any auto-closure of tickets

    GO-ITS 37 Enterprise Incident Management Process Page 17 of 40

  • 7/29/2019 stdprod_062642.pdf

    18/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Principle 9:

    A separate procedure shall be established to manage resolution of MajorIncidents that will include nomination of a single Manager for the incident. Thisresource will be assigned from a pool of management staff within the OPS ITSD,the appropriate Branch of the I & IT Cluster or Corporate Security.

    Rationale:

    Special leadership may be required to secure and manage resources to ensure prompt resolution of majorincidents

    Establishment of an accountable Lead will ensure ownership of the Major Incident and provide an objectivepoint of escalation and contact throughout the life of the incident from declaration to Major Incident Review

    Implications:

    Criteria for Major Incident declaration must be defined, documented and communicated to Stakeholdersand then linked to Incident prioritization activities at the OPS ITSD

    o Criteria may vary by Service - It is neither reasonable nor efficient to define a one size fits all

    criteria that apply to all Incidents.

    o It is an expensive undertaking to invoke Major Incident Procedures and secure and coordinatethe resources required to deal with a Major Incident. Therefore, care must be taken to preventsubjective or reactive declaration by specifying objective, quantifiable attributes for an Incidentto be declared Major.

    Ability to engage and receive confirmation of acceptance from the accountable Major Incident Managermust be 7*24

    Incident Analyst staff in any organization must be contactable on a 7*24 basis to support Major Incidents

    Some Major Incidents may not require special leadership if resolution activities are outside the span ofcontrol of the OPS I & IT community (i.e. major power outage or major weather situation across theprovince)

    Staff involved in Incident Management and Service Level Management functions must be trained in theMajor Incident Procedure

    Logistics, facilities and technical requirements for a Situation or WAR Room must be identified andprovisioned to support prolonged or multiple incident events. This information must be made widelyavailable to all Stakeholders in the enterprise incident management process.

    Principle 10:

    Any proposed service restoration activity, which has the potential to impact otherservices or other customers of the same service, must be approved by theService Owner(s) before being undertaken.

    Rationale:

    Ensures that incident resolution activities do not impact other services or other users of the same service

    Ensures a business perspective is considered before possible disruptive actions are taken for incidentresolution

    GO-ITS 37 Enterprise Incident Management Process Page 18 of 40

  • 7/29/2019 stdprod_062642.pdf

    19/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Implications:

    Service Owner(s) must be contactable 7 * 24

    As an alternative to 7* 24 availability, a defined policy must be developed by the Service Owner thatwill outline the proposed approach for each of the Services in the catalogue of the Service Provider.

    This policy must be shared with Stakeholders and embedded in all Service Level Agreements. The

    Incident Manager or Major Incident Manager (see Principle 9 above) would be contacted to providerequisite approval (after due consideration of the policy).

    An ability to relate components and enabling services is required to understand potential impact toother users. This information is typically obtained from the infrastructure Configuration ManagementData Base (CMDB).

    Principle 11:

    Incident resolution activities must commence as soon as possible for allIncidents regardless of Priority.

    Rationale:

    Industry best practice supports determining as soon as possible the extent and effort required to resolveincidents

    Delaying resolution activities for a seemingly minor or misdiagnosed incident could increase the impact tocustomer (activities to resolve incidents reported during non-prime shifts, if deferred to next business day,can result in service-affecting impact to the customer)

    Implications:

    Unresolved incidents must be monitored on a periodic basis and their impact re-assessed based onService Level Objectives

    Local work instructions at the OPS ITSD must prescribe that a sweeping of the incident queues beperformed on a periodic basis to ensure outstanding incidents have been actioned in support of

    Service Level Objectives Ability to engage active support of Tier 2-N resources off normal hours

    Priority 2 and Priority 3 incidents that are assigned to Tier2-N support groups outside of regularbusiness hours may not be actioned until next business day. Current practice is to place theseIncidents in a Pending state within the enabling technology. This can result in misleading Availabilityand Performance metrics. Service Level Managers must be prepared to address these concerns if /when they are raised by Customers.

    Principle 12:

    All Service Owners and OPS Service Providers shall fulfill their roles incompliance with the OPS enterprise Incident Management process.

    Rationale:

    Consistent participation from all Stakeholders is required to ensure success of the enterprise IncidentManagement process

    Implications:

    Underpinning Contracts (UCs) with external service providers must reflect the enterprise IncidentManagement process requirements

    GO-ITS 37 Enterprise Incident Management Process Page 19 of 40

  • 7/29/2019 stdprod_062642.pdf

    20/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Operating Level Agreements (OLAs) between internal service providers must be in place and reflectenterprise Incident Management process requirements

    Principle 13:

    A mechanism must be in place to identify security-related incidents and engageappropriate support staff to resolve the issue.

    Rationale:

    Security related incidents may require specialized skills that are not resident in the ITSD organization.

    Implications:

    A security support group must be established and staffed on a 7*24 basis.

    Special procedures must be defined and agreed to by the OPS ITSD and CSB to address securityrelated incidents.

    ITSD staff must be provided with initial and ongoing training to ensure they are equipped to identifypotential security related incidents

    This mechanism must be bi-directional in nature as Corporate Security Branch (CSB) must have theability to pro-actively inform the OPS ITSD of a security-related Incident

    GO-ITS 37 Enterprise Incident Management Process Page 20 of 40

  • 7/29/2019 stdprod_062642.pdf

    21/40

    Sensitivity: Unclassified Approved Version #: 2.0

    4.2. Process Roles and Responsibilities

    Each process requires specific roles to undertake defined responsibilities for process design, development,execution and management. An organization may choose to assign more than one role to an individual.Additionally, the responsibilities of one role could be mapped to multiple individuals.

    One role is accountable for each process activity. With appropriate consideration of the required skills andmanagerial capability, this person may delegate certain responsibilities to other individuals, However, it isultimately the job of the person who is accountable to ensure that the job gets done.

    Regardless of the mapping of responsibilities within an organization, specific roles are necessary for the properoperation & management of the process. This section lists the mandatory roles and responsibilities that must beestablished to execute the Incident Management process.

    Process Task

    IncidentManager

    (AllIncidents)

    MajorIncidentManager

    (P1)

    SituationManager

    (P2)

    ServiceDeskAgent

    IncidentAnalyst(Tier2-N)

    ServiceOwner

    PartnerIM

    Liaison

    Log & Classify Incident A R

    Prioritize Incident A RDeclare Major Incident A,R I C IPerform Tier 1 Diagnosis A RFunctional Escalation A R R R CPerform Tier-N Diagnosis A R IResolve Incident A A* A* R,I R IMonitor Incident A R RClose Incident A R

    Legend: Responsible, Accountable, Consult before, Informed

    A*

    Major Incident Manager is Accountable to resolve Major Incidents per Major Incident protocol

    Situation Manager may be called upon to resolve other Incidents as deemed necessary by theIncident Manager

    4.2.1. Enterprise Incident Management Process Owner

    The Process Owner owns the process and the supporting documentation for the process. The ProcessOwner provides process leadership to the IT organization by overseeing the process and ensuring thatthe process is followed by the organization. When the process isn't being followed or isn't working well,the Process Owner is responsible for identifying why and ensuring that required actions are taken tocorrect the situation. In addition, the Process Owner is responsible for the approval of all proposedchanges to the process, and development of process improvement plans.

    Responsibilities

    Ensures that the process is defined, documented, maintained and communicated at an Enterpriselevel through appropriate vehicles (IT Standards Council / Corporate ARB).

    Undertakes periodic review of all ITSM processes from an Enterprise perspective and ensuresthat a methodology of Continuous Service Improvement, (including applicable Process-levelsupporting metrics) is in place to address shortcomings and evolving requirements.

    Ensures that all Enterprise ITSM processes are considered and managed in an integratedmanner, taking into consideration OPS Policies and Directives and factoring in evolving trends intechnology and practice.

    GO-ITS 37 Enterprise Incident Management Process Page 21 of 40

  • 7/29/2019 stdprod_062642.pdf

    22/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Solicits OPS Stakeholders and communities of interest to identify Enterprise ITSM processrequirements for consideration by the Enterprise ITSM Program. Coordinate, present andrecommend options for the prioritization, development and delivery of these to appropriategoverning body.

    Ensures Enterprise ITSM procedures and work instructions and functional requirements forenabling technology are aligned with the enterprise process.

    Segregation of DutiesThe role of Enterprise Process Owner is separate and distinct from that of the Incident Manager and the rolesshall be separately staffed. The Enterprise IM Process Owner shall reside in OCCTO, while the EnterpriseIncident Manager shall reside in the organization of the OPS infrastructure Service Provider.

    4.2.2. Incident Manager (IM)

    The Incident Manager is accountable for managing execution of the Incident Management process anddirecting the activities of all OPS I&IT organizations required to respond to incidents in compliance withSLAs and SLOs. The Incident Manager is accountable for the lifecycle of all incidents and acts as theincident management point of escalation for incident notification and for hierarchical escalation.

    Responsibilities

    Develops and maintains an appropriate level of incident management procedures and / or workinstructions to support the needs of the business.

    Ensures that Incident Management staff are trained and familiar with IM procedures

    Monitors IT support staff performance of the Incident Management process; creates and executesaction plans when necessary to ensure effective operation and continuous improvement

    Manages Incident resource allocation and workload distribution

    Invokes the Major Incident Procedure, as appropriate

    Engages upper levels of management as appropriate

    Ensures that a Major Incident Review is conducted for all major incidents and that recommendedaction items are completed.

    Provides information for management related to OPS ITSD performance

    Highlights trends resulting from recurring incidents for review by Problem Management.

    Monitors performance of the Incident Management process and identifies process improvementsto the Enterprise IM Process Owner

    4.2.3. Situation Manager (SM)

    The Situation Manager is called upon by the Incident Manager to manage escalations of Incidentsmeeting pre-specified criteria (Typically of a Priority 2 P2 level). The SM is accountable for takingactions necessary to resolve P2 Incidents and restore service

    Responsibilities

    Resolve the escalated Incident leveraging resources provided by the Incident Coordinator

    Identifies and leads the required members of the resolution team to develop the plan to restore

    service or create a workaround

    Ensure that status messages are provided by the ITSD for periodic progress reports based on the

    defined Notification schedule

    Perform escalation evaluations

    Coordinate the establishment of resolution teams

    GO-ITS 37 Enterprise Incident Management Process Page 22 of 40

  • 7/29/2019 stdprod_062642.pdf

    23/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Provide point-of-contact for resolution teams

    Manage further hierarchical and functional escalations

    Recommend activating Disaster Recovery Process (as necessary)

    4.2.4. Queue Manager (QM)

    The Queue Manager monitors the queue to ensure that all incident tickets assigned to various support groups intheir organization are promptly actioned and / or escalated within defined thresholds in support of Service LevelAgreements / Objectives (SLAs/SLOs). This Role is pre-dominantly concerned with the overall performance ofresources involved in the Incident Management process, and is defined to establish an objective perspective onhow Incident Management is being undertaken within a specific organization. As such, there are no specificAccountabilities.

    Responsibilities

    Address process execution issues encountered by support personnel and ensure that all tickets

    assigned to a queue are promptly actioned.

    Monitor the incident queues

    Ensure that all incidents placed in a queue are assigned to the appropriate resource within the queue

    Monitor all incidents and advises support group members of upcoming and actual Service Level

    Breaches (Note: Engaging support group will only occur if a Service Desk Analysts has not already

    performed this action.)

    Respond to the escalated incidents in a timely and appropriate fashion to minimize the effect ofincidents on agreed service levels

    Follow defined escalation path, as defined in the escalation policy

    Facilitate support resource commitment and allocation

    Attend incident review meetings as required

    Participate in process improvement sessions

    4.2.5. Service Desk Manager (SDM)

    The SDM is accountable for all aspects of the OPS ITSD and for effective management of the Incident Queuesacross the OPS I & IT organizations.

    Responsibilities

    Manages overall Service Desk activities

    Acts as escalation point for Team Leads

    Monitors incident volumes and trends to ensure appropriate staffing levels

    Recommends procedural improvements to the Incident Manager

    GO-ITS 37 Enterprise Incident Management Process Page 23 of 40

  • 7/29/2019 stdprod_062642.pdf

    24/40

    Sensitivity: Unclassified Approved Version #: 2.0

    4.2.6. Service Desk Team Lead

    Ensures currency and effectiveness of diagnostic scripts used to perform incident triage

    Manages shift schedules to ensure appropriate staffing and skill levels are maintained

    Acts as escalation point for Service Desk Agents in difficult or controversial situations

    Arranges staff training and awareness sessions Produces statistics and management reports

    Undertakes HR activities as required

    Assists Service Desk Agents when workloads are high or more experience is required

    4.2.7. Service Desk Agent (SDA)

    The Service Desk Agent provides the single point of contact for customers during the incident lifecycle.

    Responsibilities

    Authenticates the caller (User or Customer) and captures minimum level of defined contactinformation

    Authenticates the level of support to which the individual reporting the incident is entitled

    Creates an Incident record for the new incident or updates the record for existing incidents

    Classifies the incident

    Ensure that description of all incident resolution activities is accurately captured in incidentrecords

    Continually updates incident records with progress / status information

    o to reflect their own activities

    o to support Tier 2-N resources as / if requested

    Attempts Incident resolution at first point of contact (Tier 1) using diagnostic scripts and

    knowledge records such as Known Errors If unable to restore service within predefined threshold performs Functional Escalation and

    assigns incident to the appropriate Tier 2 support group

    Facilitates functional escalation between Tier-2 and Tier-N support groups and recordscircumstances in the incident record

    Informs the Queue and / or Incident Manager of any non-minor Incidents

    Keeps the customer or user updated on incident progress based on notification protocol

    Obtains user (or customer) concurrence that the support actions provided addressed their needsprior to closing the Incident

    4.2.8. Incident Analyst (IA)

    Incident Analysts are Tier 2-N support group staffs in each organization who provide progressivelygreater technical expertise to resolve Incidents that have not been resolved at the previous tier.

    Responsibilities

    Responds to assigned incidents within agreed timeframes

    Diagnoses, develops workarounds and / or attempt s to resolve assigned incidents

    Requests assistance from other Tier 2 support areas via the Incident or Queue Manager

    If unable to resolve, requests functional escalation via the OPS ITSD

    GO-ITS 37 Enterprise Incident Management Process Page 24 of 40

  • 7/29/2019 stdprod_062642.pdf

    25/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Keeps the OPS ITSD informed of progress on assigned incidents via incident enablingtechnology

    Notifies the OPS ITSD as soon as it is known that the expected resolution will not occur withinservice thresholds

    When requested by the Queue and / or Incident Manager, provides technical assistance for other

    Tier-N resources When requested by the Queue and / or Incident Manager, provides technical communication /

    explanation to customers and / or end-users.

    Ensures creation of an incident record for all / any activities undertaken related to remedial actionfor technology or service supported

    When designated by the Major Incident Manager as the technical lead for a Major Incident, theIncident Analyst has additional responsibilities:

    o Undertake the technical leadership of the analysis, diagnosis and develop thesubsequent action plan to remediate the Major Incident

    o Provide periodic updates and status reports to the Major Incident Manager to ensurecommunication and notification requirements of the Incident Management process are

    satisfied

    4.2.9. Service Owner

    In addition to the general Service Owner responsibilities identified in enterprise Problem Managementprocess (e*PM), the Service Owner has additional responsibilities specific to the enterprise IncidentManagement process. (Note: These fall under the broad category of the Service Support Model that is theresponsibility of the Service Owner to define and maintain.)

    In order to provide seamless, end-to-end support for Incident Management for OPS I & IT services, it isnecessary to document all aspects of the Support Model. As the I & IT Clusters are accountable for theApplication component of many of the OPS Services, the enterprise Incident Management process mustbe informed with key aspects of the support structure for applications.

    The Service Owner is responsible for the identification, documentation and maintenance of internalpartner solution / service knowledge required to inform the Support Model used by the OPS ITSD.

    Responsibilities

    Define and establish the support model (including required skills for Tier 2-N support staff) up toand including the application

    Provide information, via the Partner Liaison, to the ITSD. This would include items such asservice / solution descriptions, diagnostic content, mandatory information capture at Tier 1 andFirst Point of Contact (FPOC) resolution steps for use by Service Desk Agents

    Maintain the above information and inform the appropriate parties of updates:o Partner Liaison for support model updateso Service Level Manager for revisions to Service Level Objectives

    Develop local procedure information in support of incident management for Cluster services /solutions and obtain endorsement from enterprise Incident Manager that these align with OPSITSD procedures

    GO-ITS 37 Enterprise Incident Management Process Page 25 of 40

  • 7/29/2019 stdprod_062642.pdf

    26/40

    Sensit

    GO-ITS 37 En

    ivity: Unclassified Approved Version #: 2.0

    terprise Incident Management Process Page 26 of 40

    4.2.10. Major Incident Manager (MIM)

    In certain cases of incidents, a Major Incident Manager may be required to manage resolution activities.The Incident Manager or delegate will make this determination, and as required, will assign a singleindividual to undertake the MIM role for the service recovery activities related to that Incident. The MIM isaccountable for taking actions necessary to resolve a Major Incident and restore service. In all cases aMajor Incident will be classified using Urgency / Impact definitions documented in Section 6.4 of this

    Standard. By definition Major Incidents will be classified as Priority 1 P1). Activities managed by thisindividual may cross organizational boundaries. The MIM will be selected from a pool of managers withinthe appropriate Branches in the OPS I&IT organization (ITS, Clusters or CSB).

    The administrative aspects of the major incident will continue to be managed through the OPS ITSD andthe Incident Manager or delegate will continue to perform responsibilities related to incident notification,escalation and communication. The Incident Manager maintains ownership and accountability for thelifecycle of the Incident. This allows the MIM to fully focus effort and attention upon managing thetechnical resolution of the incident.

    Responsibilities

    Identifies the required members of the resolution team, and requests their participation via the

    ITSD Ensures that a systematic approach is used to evaluate the reported symptoms, impacts and

    contributing factors of the incident

    Ensure assignment of key Incident Analyst to develop the optimum plan to restore service orcreate a workaround

    Provides timely updates to the OPS ITSD to ensure the incident record is maintained.

    Ensures that status messages are provided to the OPS ITSD for periodic progress reports basedon the Major Incident Notification Schedule

    Undertakes functional escalation based upon pre-defined thresholds for the service beingsupported. (Note: Problem Management resources may also be requested should a workaroundnot be found and a real-time Root Cause Analysis (RCA) be required.)

    Provides documentation for Major Incident Review report.

    4.2.11. Partner Incident Management Liaison

    The Partner Liaison provides a point of contact between the Incident Manager and partner organizations(e.g. Clusters, CSB, 3rd Party Service Providers) to enable effective and efficient execution of the IMprocess.

    Responsibilities

    Coordinates with Service Owners in their organization to provide and maintain the SupportModel information required by the OPS ITSD SDAs (e.g. service / solution descriptions,diagnostic approach, mandatory information capture and First Point of Contact (FPOC)

    resolution steps). Provides ITSD with accurate partner organization information management relative to IM

    process, including VIP Lists, Location, details, organizational and / or staff changes etc.)

    Coordinates incident resolution activities within an organization

    Acts as the escalation point for any organizational issues regarding execution of the IncidentManagement process

  • 7/29/2019 stdprod_062642.pdf

    27/40

    4.3. Process Flows

    4.3.1. Incident Management Process Overview

    GO-ITS 37 Enterprise Incident Management Process Version 1.62 Page 27

  • 7/29/2019 stdprod_062642.pdf

    28/40

    Sensitivity: Unclassified Approved Version #: 2.0

    GO-ITS 37 Enterprise Incident Management Process Page 28 of 40

  • 7/29/2019 stdprod_062642.pdf

    29/40

    Sensitivity: Unclassified Approved Version #: 2.0

    4.3.2. Incident Management Process Tasks

    No Task Roles Input, Trigger Description

    1.0 Report Incident User

    Ops Staff

    User-Perceived service

    outage or degradation,Monitoring Event

    Users may call or email service desk to report a

    incident. Event Monitoring may also pro-activelyindicate an incident before the users is impacte

    2.0 Log & ClassifyIncident

    SDA Service Desk informedof Incident

    SDA crates incident record and captures user cinformation, classification data and details abousymptoms.

    3.0 Prioritize Incident SDA Incident classified SDA prioritizes the incident, based upon Impacurgency (usually via a predetermined formula).

    4.0 Declare MajorIncident

    SDAIM

    Major Incident criteria ismet

    SDA determines that Incident meets agreed criMajor Incident and informs the Incident Managedetermines whether or not to declare a Major Inand what parts of the Major Incident Protocol winvoked

    5.0 Perform Tier 1Diagnosis

    SDA Incident Prioritized Service Desk agent conducts initial diagnosis todiscover the full symptoms of the incident and tdetermine exactly what has gone wrong and hocorrect it. The agents will use diagnostic scriptsknown error information to assist in his task.

    6.0 FunctionalEscalation

    SDAQM

    SD cannot restoreservice within agreedthreshold

    IF SDA cannot restore service at first point of cowithin predetermined timeframe, the incident wassigned to an Incident Analyst (Tier 2 support to attempt to restore service within Service Levtargets. This functional escalation is repeated tand so on (if the Tier 2 Incident Analyst cannot the incident within a defined threshold).

    7.0 Perform Tier-NDiagnosis

    IA Functional escalation Incident Analysts will conduct further diagnosis determine how to restore service.

    8.0 Resolve Incident SDAIA

    Diagnosis has indicatedprobable resolution

    The Incident Analyst or SDA takes (or coordinanecessary action to restore service and conducto ensure that service is restored (Note: this couinclude asking user to take actions, eg. rebootincomputer.)

    GO-ITS 37 Enterprise Incident Management Process Page 29 of 40

  • 7/29/2019 stdprod_062642.pdf

    30/40

    ivity: Unclassified Approved Version #: 2.0

    terprise Incident Management Process Page 30 of 40

    No Task Roles Input, Trigger Description

    Sensit

    GO-ITS 37 En

    9.0 Monitor Incident IMQM

    Incident logged Incidents are monitored throughout their lifecyc Queue Manager ensures that incidents assig

    Tier N support groups are resolved or functioescalated within defined thresholds.

    Incident Manager monitors thresholds and mescalate or manage notifications if Service LeTargets are in jeopardy

    10.0 Close Incident SDA Analyst indicatesservice restoration

    SDA requests the User to confirm that service hbeen restored from their perspective and then cthe incident. If the user cannot be reached withagreed threshold, the SDA follows the predefinpolicy for such situations.

  • 7/29/2019 stdprod_062642.pdf

    31/40

    4.4. Linkages to other processes

    Process Linkage

    Problem Management PM requires that Incident Management capture sufficient andaccurate information to enable problem identification:

    o Proper closure codeso Proper classificationo Link new Incidents to existing Problemso Known defective components (based upon event

    monitoring and component alarms). PM makes information available that can support Incident

    resolution activities (eg. Known Errors, workarounds, patterns) Enabling technology must be able to define relationship

    between Incident, Problem and Known Error records Incident Management may identify potential Problems to

    Problem Management

    Enterprise Change Management(ECM)

    Should restoration of a service require modification of to acomponent under the control of Configuration Management,then ECM must be engaged

    Enabling technology must be able to define relationshipbetween Incident and Change records

    Configuration Management A portable guide was developed as an OPS Standard in2004.This portable guide will be updated to reflect Enterpriserequirements in the near future. At that time it will be linked toIncident Management so that a faulty CI can be referenced inthe Incident record

    Service Level Management Although this process has not yet been formalized at theenterprise level, there is an expectation that incident escalationthresholds are defined to support SLAs and OLAs.

    Consistent use of Service and Component Classification schemas must be used across ITSM Processes suchas Incident, Change and Problem Management to enable industry best practice process integration. Failure toadopt a common approach to implementing these three processes will result in needless re-work and additionaladministrative overhead for operational staff.

    4.5. Incident Management Process Quality Control

    Certain aspects of execution of the Incident Management process are monitored, as a quality controlmeasure, to identify opportunities to improve process effectiveness and efficiency.

    Monitoring: The Incident Manager is responsible for monitoring certain aspects of the activitiesperformed by the Incident Management team on a regular basis. This serves a twofold purpose:

    1. The Incident Manager can identify any bottlenecks at the operational level and take appropriatecorrective action.

    2. Both the Incident Manager and the enterprise Process Owner can identify opportunities forimprovement at the process and procedural level.

    Reporting involves measuring the process via metrics and recording how well it behaves in relation to theobjectives or targets specified in the metrics. Metrics provides the Incident Management personnel withfeedback on the process. They also provide the Incident Management Process Owner with the

    GO-ITS 37 Enterprise Incident Management Process Version 1.62 Page 31

  • 7/29/2019 stdprod_062642.pdf

    32/40

    Sensitivity: Unclassified Approved Version #: 2.0

    necessary information to review overall process health and to undertake continual service improvementtechniques.

    Evaluating the process involves regular reviews of the execution of the process and identification of possibleimprovements or actions to address performance gaps. Every process is only as good as its last improvement;hence, the feedback loop of continuous improvement is inherent in every process.

    4.6. Metrics

    Metrics are intended to provide a useful measurement of a process effectiveness and efficiency. Metricsare also required for strategic decision support. The following need careful consideration:

    Reporting metrics will be readily measurable (preferably automated collection and presentation ofdata)

    Metrics will to be chosen to reflect process activity (how much work is done?), process quality(how well was it done?) and process execution (to review and plan job on hand).

    The Enterprise Incident Management Process Owner is accountable for the definition of anappropriate suite of metrics to determine the overall health of the Enterprise ProblemManagement process.

    The Incident Manager will develop and run the reports and may develop other metrics to monitorother operational aspects of process execution, such as workload and resource balancing

    The following represents the initial suite of metrics that will be used to analyze process performance, identifyopportunities for improvements and for strategic decision support. Any count of Incidents must exclude ServiceRequests.

    Workload:

    Total numbers of Incidents per period (as a control measure) (excluding Service Requests)

    Number and percentage of major incidents

    Size of current Incident backlog

    Process Effectiveness:

    Number and percentage of incidents re-assigned

    Number and percentage of incidents incorrectly classified

    Average Call Time with no escalation ( ITSD metric)

    Percentage of incidents resolved within agreed response time,

    Average time for Tier 2-N support to respond to functionally escalated incident

    Process Efficiency:

    Percentage of Incidents closed by the Service Desk without reference to other levels of support (oftenreferred to as first point of contact)

    Mean time to resolve incidents (MTTR),

    Percentage of Incidents resolved on first attempt.

    Percentage of assigned Incidents resolved within Service Level Objectives (total and broken down byqueue)

    GO-ITS 37 Enterprise Incident Management Process Page 32 of 40

  • 7/29/2019 stdprod_062642.pdf

    33/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Aging Report showing #and % of assigned Incidents per organization that have been outstanding forlonger than periods as designated from time to time by the IM Process Owner

    4.7. Standard Process Parameters

    For an enterprise process to be effective, parameters used for the classification, categorization,prioritization and closure of problems must be consistently used across OPS. Special attention must begiven to parameters required for consistency of reporting. This is particularly important for the provision ofreliable business intelligence.

    Please refer to the Classification Model section of the GO-ITS 44 ITSM Terminology Reference ModelPortable Guide for standard process parameters and allowable values for Incident Management.

    Please refer to the State Model section of the GO-ITS 44 ITSM Terminology Reference Model Portable Guidefor standard status/state parameters and their definitions for Incident Management.

    5. Related Standards

    5.1. Impacts to Existing Standards

    GO-IT Standard Impact Recommended Action

    GO-ITS 44

    Terminology Reference Model

    GO-ITS 37 re-defines and Urgencyand Impact classification elements.

    Future repatriation of all TRMelements into the appropriate ITSMprocess standards.

    GO-ITS 55Incident Management ContextualModel and Service DeskInteraction Model

    GO-ITS 55 contains Roledefinitions that are redundant.

    Eliminate all Role definitions fromGO-ITS 55. Update pending.

    GO-ITS 38

    Enterprise Problem Management

    Nil N/A

    5.2. Impacts to Existing Environment

    Impacted Infrastructure Impact Recommended Action

    EIT New Role of Major IncidentManager and must beimplemented

    Future EIT update

    GO-ITS 37 Enterprise Incident Management Process Page 33 of 40

  • 7/29/2019 stdprod_062642.pdf

    34/40

    Sensitivity: Unclassified Approved Version #: 2.0

    6. Appendices

    6.1. Normative References

    6.1.1. Major Incident Protocol

    Located on theGO-ITS web site - Title:Normative Reference to GOITS 37 - Major Incident Protocol

    6.2. Informative References

    6.2.1. Enterprise Differentiation: Process, Procedure, Work Instruction

    Note: The following diagram depicts three levels of task descriptions that are often confused with one another:

    This example is from Enterprise Change Management and illustrates the level of information required in taskdescriptions

    Level 1 Tasks are defined in a Process. They specify what action must be taken and who is involved.

    Level 2 tasks are defined in Procedures that decompose each level 1 task into more granular operational

    tasks, and additionally, prescribe how the activity should be performed.

    Level 3 tasks represent Work instructions: they are further decomposition of procedure-level tasks thattypically are defined to address any unique local requirements when performing a procedural task.

    GO-ITS 37 Enterprise Incident Management Process Page 34 of 40

    http://www.itstandards.gov.on.ca/http://www.itstandards.gov.on.ca/
  • 7/29/2019 stdprod_062642.pdf

    35/40

    Sensitivity: Unclassified Approved Version #: 2.0

    6.2.2. Definitions: Urgency and Impact

    The following table provides the framework for classifying the Urgency and Impact of Incidents, which arethen used to establish Incident Priority. Urgency and Impact were originally defined in GO-ITS 44,

    Terminology Reference Model, to ensure that local process implementations used common terminology.ITSM has matured across the OPS and enterprise processes are now in place for Incident, Problem andChange Management. The definitions have been updated to reflect best practices. This is the first step in

    relocation of Classification elements from the TRM into the corresponding ITSM process standard.

    Classifications DefinitionsField

    ValuesCriteria

    (At least 1 criteria must be met)

    High

    A failure of anIT Business Service affecting multipleorganizations4

    A failure affecting public safety A Security-related incident affecting a large number

    of users across multiple organizations where totalloss or compromise of critical business data mayresult.

    A Core network outage or a network outage affectingmission critical government location

    A failure affecting >1000 Users A failure that affects a money back guarantee public

    service offering Mission-critical applications fully unavailable

    Citizen-facing government websites

    Medium

    Failure of an IT Business Service affecting a singleorganization which may include:

    A network outage affecting business criticalgovernment offices

    A security related incident affecting large number ofusers where work may be seriously impeded /

    interrupted within large groups or some businessinformation may be at risk.

    A failure or serious degradation affecting >500 users A failure of that affects a public-facing non-

    guaranteed service offering

    Failure of business-critical applications

    A failure affecting all users in a single organization

    Impact

    Measure of scopeand criticality tobusiness. Oftenequal to the extentto which anIncident leads to

    distortion of agreedor expected servicelevels.

    Low

    All remaining failures of IT Business Services which mayinclude:

    Single user(s) A small isolated group of users with a common failure

    (single application, location, a failure on one ofseveral IT Business Services utilized)

    Security related incident affecting single or smallnumber of users where some business data may besubject to limited compromise.

    4As it relates to an IT Business Service, an organization is deemed to be a Ministry

    GO-ITS 37 Enterprise Incident Management Process Page 35 of 40

  • 7/29/2019 stdprod_062642.pdf

    36/40

    Sensitivity: Unclassified Approved Version #: 2.0

    High

    A formal SLA is in place that specifies an IT restoration ofservice time of

  • 7/29/2019 stdprod_062642.pdf

    37/40

    Sensitivity: Unclassified Approved Version #: 2.0

    7. Glossary

    Term Description

    Assignment Assignment occurs when an incident is assigned by the ITSD to a Tier 2-

    N support group within the OPS to attempt incident resolution. Theassigned support group must respond in accordance with the OPSIncident Management Process/Procedures and their actions may bedirected by the OPS Incident Manager. (see Dispatch)

    Customer Someone who buys goods or Services. The Customer of an IT ServiceProvider is the person or group that defines and agrees the ServiceLevel Targets. The term Customers is also sometimes informally used tomean Users, for example this is a Customer-focused Organization.

    Diagnostic Scripts Documents used by the Service Desk to help classify and resolveincidents. These documents, based upon input from specialist supportgroups and suppliers, identify key questions to be asked to obtain detailsabout what has gone wrong, with suggestions for resolution activities tobe performed.

    Dispatch Dispatch occurs when the ITSD assigns an Incident to a ServiceProvider outside the OPS to attempt resolution. Provider behaviour isspecified by an Underpinning Contract and the OPS Incident Managerdoes not have authority to direct the providers activities other thancoordination of activities between the provider and other OPS Supportgroups

    ECM The Enterprise Change Management Process.OPS GO-IT Standard 38

    Error (Service Operation) A design flaw or malfunction that causes a Failure ofone or more Configuration Items or IT Services. A mistake made by aperson or a faulty Process that affects a CI or IT Service is also an Error.

    Escalation An Activity that obtains additional Resources when these are needed tomeet Service Level Targets or Customer expectations. Escalation maybe needed within any IT Service Management Process, but is most

    commonly associated with Incident Management, Problem Managementand the management of Customer complaints. There are two types ofEscalation: Functional Escalation and Hierarchic Escalation.

    External Service Provider An IT Service Provider that is part of a different Organization from itsCustomer. An IT Service Provider may have both Internal Customersand External Customers.

    Functional Escalation Transferring an Incident, Problem or Change to a technical team with ahigher level of expertise to assist in an Escalation.

    Hierarchical Escalation Informing or involving more senior levels of management to assist in anEscalation.

    Impact A measure of the effect of an Incident, Problem or Change on BusinessProcesses. Impact is often based on how Service Levels will be affected.Impact and Urgency are used to assign Priority.

    Incident An unplanned interruption to an IT Service or reduction in the Quality ofan IT Service. Failure of a Configuration Item that has not yet affectedService is also an Incident. For example Failure of one disk from a mirrorset.

    Incident Management The Process responsible for managing the Lifecycle of all Incidents. Theprimary Objective of Incident Management is to return the IT Service toCustomers as quickly as possible.

    GO-ITS 37 Enterprise Incident Management Process Page 37 of 40

  • 7/29/2019 stdprod_062642.pdf

    38/40

    Sensitivity: Unclassified Approved Version #: 2.0

    Term Description

    Incident Pattern A pattern exists for each high level business service, to define how theITSD interacts with OPS service chain partners such as Clusters,Ministries and corporate providers to resolve reported incidents

    Incident Record A Record containing the details of an Incident. Each Incident recorddocuments the Lifecycle of a single Incident.

    Internal Service Provider An IT Service Provider that is part of the same Organization as itsCustomer. An IT Service Provider may have both Internal Customersand External Customers.

    Ishikawa Diagram A technique that helps a team to identify all the possible causes of aProblem. Originally devised by Kaoru Ishikawa, the output of thistechnique is a diagram that looks like a fishbone.

    IT Service A Service provided to one or more Customers by an IT Service Provider.An IT Service is based on the use of Information Technology andsupports the Customers Business Processes. An IT Service