Intel® Itanium Processor Family Error Handling Guide ...

Click here to load reader

  • date post

    29-Jul-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Intel® Itanium Processor Family Error Handling Guide ...

  • Document Number: 249278-004

    Intel® Itanium® Processor Family Error Handling Guide

    February 2010

  • 2 Document Number: 249278-004

    Legal Lines and DisclaimersINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

    Intel may make changes to specifications and product descriptions at any time, without notice.

    Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

    Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

    The Itaniun® processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

    Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

    Copies of documents which have an order number and are referenced in this document, or other Intel literature may be obtained by calling 1-800-548-4725 or by visiting Intel's website at http://www.intel.com.

    Intel, Itanium, Pentium, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

    *Other names and brands may be claimed as the property of others.

    Copyright © 2001-2010, Intel Corporation. All Rights Reserved.

    http://www.intel.com/products/processor_number http://www.intel.com

  • Document Number: 249278-004 3

    Contents

    1 Introduction.................................................................................................................1 1.1 Purpose .............................................................................................................1 1.2 Target Audience..................................................................................................1 1.3 Related Documents .............................................................................................1 1.4 Terminology .......................................................................................................1

    2 Machine Check Architecture ...........................................................................................7 2.1 Overview ...........................................................................................................7 2.2 Itanium® Processor Family Firmware Model ............................................................7 2.3 Machine Check Error Handling Model......................................................................8 2.4 MCA Scope....................................................................................................... 10 2.5 Error Severity ................................................................................................... 11

    2.5.1 Hardware-Corrected Errors ...................................................................... 11 2.5.2 Firmware-Corrected Errors ...................................................................... 12 2.5.3 OS Recoverable Errors ............................................................................ 12 2.5.4 Fatal Errors ........................................................................................... 12

    2.6 Software Handling ............................................................................................. 13 2.6.1 PAL Responsibilities ................................................................................ 13 2.6.2 SAL Responsibilities ................................................................................ 13 2.6.3 Operating System Responsibilities ............................................................ 14

    2.7 Multiple Errors .................................................................................................. 16 2.7.1 SAL Issues Related to Nested Errors ......................................................... 17

    2.8 Expected MCA Usage Model ................................................................................ 18

    3 Processor Error Handling ............................................................................................. 19 3.1 Processor Errors................................................................................................ 19

    3.1.1 Processor Cache Check ........................................................................... 19 3.1.2 Processor TLB Check............................................................................... 20 3.1.3 System Bus Check.................................................................................. 20 3.1.4 Processor Register File Check................................................................... 20 3.1.5 Processor Microarchitecture Check ............................................................ 21

    3.2 Processor Error Correlation ................................................................................. 21 3.3 Processor CMC Signaling .................................................................................... 21

    3.3.1 CMC Masking ......................................................................................... 22 3.3.2 Error Severity Escalation ......................................................................... 22

    4 Platform Error Handling ............................................................................................... 23 4.1 Platform Errors ................................................................................................. 23

    4.1.1 Memory Errors ....................................................................................... 23 4.1.2 I/O Errors ............................................................................................. 23 4.1.3 OEM-Specific Errors ................................................................................ 24

    4.2 Platform Error Correlation................................................................................... 24 4.3 Platform-Corrected Error Signaling ...................................................................... 24

    4.3.1 Scope of Platform Errors ......................................................................... 24 4.3.2 Handling Corrected Platform Errors ........................................................... 24

    4.4 Platform MCA Signaling ...................................................................................... 25 4.4.1 Global Signal Routing.............................................................................. 26 4.4.2 Error Escalation...................................................................................... 27

    Figures 2-1 Itanium® Processor Family Firmware Machine Check Handling Model..........................8 2-2 Machine Check Error Handling Flow...................................................................... 10

  • 4 Document Number: 249278-004

    2-3 Error Types and Severity ....................................................................................11 2-4 Multiple MCA Events...........................................................................................17

    Tables 3-1 Processor Machine Check Event Masking ...............................................................22 3-2 Machine Check Event Escalation...........................................................................22

  • Document Number: 249278-004 5

    Revision History

    §

    Version Description Date

    -004

    • Updates for Intel® QuickPath Interconnect-based platforms. • Corrections and clarifications based upon the current Intel®

    Itanium® Architecture Software Developer’s Manual and Intel® Itanium® Processor Family System Abstraction Layer Specification.

    • Corrected definitions in the Terminology section. • Added reference to the DIG64 Corrected Platform Error Polling

    Interface Specification. • Clarified global error signaling and error containment

    requirements. • Removed Chapter 5: Error Records because of redundancy

    with the Intel® Itanium® Processor Family System Abstraction Layer Specification.

    February 2010

    -003

    • Updated links to related documents. • Changes to reflect updated trademarks. • Provided differentiation for machine checks vs. MCA

    references. • Updated PAL, EFI, PMI, Data Poisoning and MCA definitions. • Updated diagram for the Error Handling Flow. • Removed references to IA-32 Operating Environment. • Removed Chapter 6: OS Error Handling. • Corrected Local and Global MCA references.

    September 2003

    -002

    • Added definitions for