Sonexion™ 3000 Release Notes (2.1.0-002)S-2533
Contents1 About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533...................................................................................3
2 Sonexion 3000 Terms, Abbreviations, and Definitions...........................................................................................5
3 Software Versions and Requirements..................................................................................................................11
4 What Is Supported in Sonexion 2.1.0...................................................................................................................12
5 Sonexion 3000 Components and Hardware List..................................................................................................14
6 Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002................................................................17
7 Firmware...............................................................................................................................................................27
8 Notices and Precautions.......................................................................................................................................33
Contents
2
1 About Sonexion™ 3000 Release Notes (2.1.0-002)S-2533
This guide includes information about bugs, features, and components in the Sonexion 3000 (2.1.0-002) release.
Release 2.1.0 SU 002This is the initial release of this publication. This version includes updated information about Sonexion softwarerelease 2.1.0-002, released November 2016. This information pertains only to model Sonexion 3000, not to earliermodels.
Table 1. Record of Revision
Publication Title Date Updates
Sonexion™ 3000 Release Notes2.1.0-002 S-2533
November 2016 This is the original release of thisdocument, released for 2.1.0 SU002
Scope and AudienceThis publication is written for Cray personnel and users, to familiarize themselves with this release and model. Itdoes not include information about installation, repair, nor day-to-day operation of a Sonexion 3000 system.
FeedbackVisit the Cray Publications Portal at http://pubs.cray.com and make comments online using the Contact Us buttonin the upper-right corner or Email [email protected]. Your comments are important to us and we will respond within24 hours.
Typographic ConventionsMonospace Indicates program code, reserved words, library functions, command-line prompts,
screen output, file/path names, and other software constructs.
Monospaced Bold Indicates commands that must be entered on a command line or in response to aninteractive prompt.
Oblique or Italics Indicates user-supplied values in commands or syntax definitions.
Proportional Bold Indicates a GUI Window, GUI element, cascading menu (Ctrl→Alt→Delete), orkey strokes (press Enter).
\ (backslash) At the end of a command line, indicates the Linux® shell line continuation character(lines joined by a backslash are parsed as a single line).
About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533
3
TrademarksThe following are trademarks of Cray Inc. and are registered in the United States and other countries: CRAY anddesign, SONEXION, URIKA, and YARCDATA. The following are trademarks of Cray Inc.: APPRENTICE2,CHAPEL, CLUSTER CONNECT, CRAYDOC, CRAYPAT, CRAYPORT, DATAWARP, ECOPHLEX, LIBSCI,NODEKARE. The following system family marks, and associated model number marks, are trademarks of CrayInc.: CS, CX, XC, XE, XK, XMT, and XT. The registered trademark LINUX is used pursuant to a sublicense fromLMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. Other trademarks used inthis document are the property of their respective owners.
About Sonexion™ 3000 Release Notes (2.1.0-002) S-2533
4
2 Sonexion 3000 Terms, Abbreviations, and Definitions2U4N, 2U 4-Node Intel server
Generic term referring to the Intel 2U 4-Node server used for the Sonexion 2000 CMU andthe CNG.
ADU, Additional DNE UnitDeprecated term that refers to the additional MDS nodes and MDT storage supported by theLustre DNE (Distributed Namespace) Phase 1 feature. For the 3000, this term issynonymous with the additional MMUs that may be optionally installed in up to 8 storageracks, 1 per rack.
Base MMU, Base Metadata Management UnitThe base MMU is the MMU that is always installed in the base rack. The base MMUprovides two MDS nodes along with two MDTs. MDT0 functions as the default MDT and asthe root MDT for DNE phase 1. MDT1 requires DNE in order to be utilized.
Base RackThe first rack in a Sonexion storage cluster that contains the SMU and base MMU alongwith the rack networking infrastructure and from 1 to 6 SSUs.
CLI, Command Line Interface
A text-based interface that is used to operate software and operating systems.
CLP, Sonexion Linux PlatformBase OS used by all the rack components.
CNG, CIFS/NFS Gateway2U4N server configured to export the Lustre file system to CIFS2 and NFS clients.
CMU, Cluster Management UnitThe Sonexion component that provides the physical deployment of the MDS, MGS, MGMTserver nodes and associated storage. This term is deprecated for the 3000 platform and isfunctionally equivalent to the combination of SMU and base MMU installed in the base rack.
Critical, Critical Array StateThe state of a GridRAID or MDRAID array where the subsequent failure of 1 more storagecomponent may lead to data loss.
CSI, Cray Sonexion InstallerSonexion software used for manufacturing and installing Sonexion systems.
CSSM, Cray Sonexion System ManagerSonexion platform, software and hardware management system.
CSMS, Cray Sonexion Management ServerSonexion MGMT node. The primary and secondary instances of the CSSM software and allassociated components and services running on a server node in the CMU.
Sonexion 3000 Terms, Abbreviations, and Definitions
5
CMU StorageSonexion storage enclosure dedicated to the CMU. Deprecated for the Sonexion 3000, asthe SMU and MMU have their own storage resources.
Data BlockA component of a “parity group” (or “stripe”) containing actual user data, also referred to asa “data chunk” or “data unit.”
Degraded, Degraded Array StateThe state of a GridRAID or MDRAID array operating with one failed storage component.
Distributed Spare, Distributed Spare VolumeThe aggregate collection of distributed spare data blocks in a GridRAID array thatcomprises a single logical spare volume for the specific GridRAID array that contains it.Each distributed spare contains the equivalent of one physical drive’s worth of distributedspare space and is used as the target of the GridRAID reconstruction process and theprimary data source for the GridRAID rebalance process.
DMN, Dual LMNRefers to the “Dual local Management Networks” aka “Dual Management Networks” feature.
DNE, Distributed NamespaceLustre DNE Phase 1 feature supported in Lustre 2.5 that allows multiple MDS/MDTcomponents to operate within a single file system.
EAC, Embedded Application ControllerSBB form factor x86 base application controller provides the CPU platform for codeexecuting as part of the Sonexion file system cluster components.
EAN, External Administration NetworkCustomer administration network, external to the Sonexion solution. Connected to theCSMS nodes in order to provide access to the CSM software.
ECN, Enterprise Client NetworkRefers to the 10GbE or 40GbE data network connecting non-Lustre enterprise clients to theoptional CIFS NFS Gateway (CNG).
ESM, Embedded Server ModuleDeprecated term for an Embedded Application Controller (EAC) because it implies generalserver functionality that is not supported on the dedicated Sonexion Embedded ApplicationControllers (EAC).
ESU, Expansion Storage UnitA 5U84 storage enclosure with two SAS EBOD controllers installed in place of the EACs.
Expansion RackThe additional racks (to the base rack) in a Sonexion storage cluster that contain the racknetworking infrastructure and some number of SSUs. Sometimes called "storage rack."
Failed, Failed Array StateThe state of a GridRAID or MDRAID array that has experienced data loss and has beenfailed by the system.
GB/sec, GigaBytes per Second10^9 Bytes per second
Gbit/sec, Gigabit per Second10^9 bits per second
Sonexion 3000 Terms, Abbreviations, and Definitions
6
GbE, Gigabit EthernetEthernet standard that transmits at 1 gigabit per second.
GridRAIDSonexion implementation of parity declustered RAID. A RAID level organization thatcombines RAID 6 data protection with a declustering methodology. GridRAID overcomessingle drive throughput bottlenecks by distributing parity groups and spare space across allstorage components in an array.
ICL, Inter-Controller LinkA link that connects two controllers or two servers together. Used in Sonexion as adedicated HA communication path.
ISL, Inter-Switch LinkA connection between two related switches.
KiB, Kibibyte1024 bytes
LCN, Lustre Client NetworkHigh speed data network connecting Lustre clients to the Sonexion Local Data Switches(LDS).
ldiskfs, Lustre Disk File SystemLustre version of a patched Ext4 file system.
LDN, Local Data NetworkA dual InfiniBand or 40GbE network with switches installed in all racks, connecting allservers and enclosures as needed and used as uplink points to the end user clientinfrastructure.
LDS, Local Data SwitchA dual InfiniBand or 10GbE network switch installed in a Sonexion rack as part of the LDNand used for providing high speed data connectivity. Used as uplink points to the end userclient infrastructure.
LMN, Local Management NetworkA private 1GbE network connecting all Sonexion servers and enclosures.
LMS, Local Management SwitchA 1GbE switch installed in a Sonexion rack as part of the LMN and used for providingprivate management network connectivity for all Sonexion servers and enclosures.
Lustre®Open source clustered file system trademarked by Xyratex/Seagate.
Lustre ServersThe set of Lustre servers that comprise the Lustre file system; includes the MGS, MDS, andmultiple OSSes.
MDS, Metadata ServerLustre server component that manages the Lustre file system metadata.
MDT, Metadata TargetLustre component, a storage volume that holds the Lustre file system metadata.
MGMT, Management Server Node
Sonexion 3000 Terms, Abbreviations, and Definitions
7
One of two Sonexion management servers that provide management functions for thestorage cluster.
MGMT0The primary Sonexion management server, typically used for web access and SSH loginsfor managing the storage cluster.
MGMT1The secondary management server, typically used to provide boot services to nodes in thestorage cluster.
MGS, Management ServerLustre server component that manages the Lustre MGT.
MGT, Management TargetLustre component, the storage volume holding the Lustre file system management data thatallows clients to discover, mount, and operate the file system.
MMU, Metadata Management Unit2U24 with two EACs and associated storage provides dual MDS nodes and dual MDTs.Lustre requires the use of the built-in DNE phase 1 feature in order to make use of multipleMDTs and multiple concurrent MDSes.
NIS, Network Information ServiceMaintains and distributes a central directory of user and group information in a network.
Normal, Normal Array ActivityCharacterizes the activity of a GridRAID or MDRAID array that is engaged in processing I/Oonly and is not conducting any recovery, sync, or RAID checking activities.
Offline, Array Is OfflineThe array is not available.
Optimal, Optimal Array StateThe state of a GridRAID or MDRAID array where all drives in the array are operationalwithout the involvement of spare volumes or dedicated hot spares. For GridRAID this isequivalent to the “Redundant 0/2” terminology.
OSS, Object Storage ServerLustre server component that operates and manages the Lustre OSTs.
OST, Object Storage TargetLustre component, a storage volume that holds Lustre file system data.
Parity BlockComponent of a parity group that contains protection information for the group derived fromthe set of data blocks in the parity group. Also referred to as a “parity chunk” or “parity unit.”
Parity GroupThe set of “data blocks” and derivative “parity blocks” that together comprise a protecteddata set. Also referred to as a “stripe.”
RAID Check, RAID Consistency CheckThis is the process whereby the system periodically checks that the parity information isconsistent for every “parity group” (stripe) in the array. This process is sometimes referred toas “parity scrubbing.”
RAS System, Reliability, Availability, Serviceability SystemSonexion feature providing system RAS features.
Sonexion 3000 Terms, Abbreviations, and Definitions
8
Rebalance, Rebalance ProcessPhase 2 of the 2-phase GridRAID recovery process whereby a GridRAID array essentiallycopies reconstructed data from a distributed spare volume in the array to a physicalreplacement drive, freeing the distributed spare volume when complete for future reuse.
Rebalancing, Rebalancing Array ActivityCharacterizes the activity of a GridRAID array that is engaged in the rebalance phase of therecovery process.
Reconstructing, Reconstructing Array ActivityCharacterizes the activity of a GridRAID array that is engaged in the reconstruction phase ofthe recovery process.
Reconstruction, Reconstruction ProcessPhase 1 of the 2 phase GridRAID recovery process whereby a GridRAID array reconstructsthe data from a missing storage component onto one of the distributed spare volumes.
Recovering, Recovering Array ActivityCharacterizes the activity of a GridRAID or MDRAID array that is engaged in the recoveryprocess.
Recovery, Recovery ProcessThe process whereby a GridRAID or MDRAID array recovers from a storage componentfailure.
Rebuild, Rebuild ProcessThe single phase recovery process whereby an MDRAID array reconstructs data for a faileddrive and copies it to a dedicated replacement drive.
SED, Self-Encrypted DriveA disk drive that automatically encrypts/decrypts data to/from the media.
SMU, System Management Unit2U24 with dual EACs that provides two MGMT nodes and associated storage. There isalways only one SMU in a Sonexion file system cluster, and it is always installed in the baserack. In conjunction with the base MMU, the SMU replaces the functionality of the earlierSonexion CMU component.
Spare Volume, GridRAID Spare Volume or Distributed Spare VolumeThe aggregation of the equivalent of one drive's worth of distributed spare space consideredcollectively as a logical spare drive or volume and used as the target of the GridRAID repairoperation.
SSU, Scalable Storage Unit5U84 storage enclosure and dual EACs (Embedded Application Controllers), provides dualOSSes and associated storage.
SSU AdditionRefers to the process of increasing the storage capacity of a Sonexion file system byincorporating additional SSUs into the cluster.
SSU ExpansionRefers to the attachment of an ESU to each SSU, thus increasing the amount of storagemanaged by each SSU.
Storage ComponentRefers to an individual drive when considered as part of a configured GridRAID or MDRAIDarray.
Sonexion 3000 Terms, Abbreviations, and Definitions
9
Storage RackSee "Expansion Rack."
TB, Terabyte10^9 bytes
Sonexion 3000 Terms, Abbreviations, and Definitions
10
3 Software Versions and RequirementsThis section provides information about the environment and software required for the Sonexion 2.1.0-002software release.
Cray Sonexion System Manager (CSSM) VersionVersion CSSM Sonexion 3000
Current Revision CSSM 2.1.0 Build v2.1.0-r29315,2016-06-30
SMU/MMU/SSU:GOBI OneStor USMSTX_GOBI_R1.16 ESU: USMr4.1.16
yes
Lustre Server (x86_64 Architecture)Version OperatingSystem Kernel FileSystem
Current Version Scientific Linux 6. 5 2.6.32-431.17.1.x2.1.32.x86_64
lustre-2.5.1.x7-241_2.6.32_431.17.1.x2.1.32.x86_64_g541638b
Required Customer-Supplied Network Infrastructure*DHCP Server Provides the MGMT nodes’ IP addresses for browser connections (customer can
choose to use a static IP address configuration for the “public” interfaces on theMGMT nodes)
NTP Server Synchronizes clocks across the cluster’s nodes
DNS Server Resolves LDAP and NTP hostnames on the MGMT nodes
* Manual workarounds may be available for environments without these servers. Contact your supportrepresentative for more information.
To view Lustre performance information, port 3306 must be open between the browser and the server hostingCSSM (GUI).
Software Versions and Requirements
11
4 What Is Supported in Sonexion 2.1.0Qualified Functionality
● Installation and deployment of Sonexion 3000 systems with the following SSUconfigurations:
○ Single or Multi-SSU (SSU Only)
○ SSU + Single ESU (SSU+1)
● High Availability:
○ SSU Node Failover/Failback „RSMU Node Failover/Failback
○ MMU Node Failover/Failback
○ Dual Management Network Switch Redundancy (DMN)
○ Dual PDU Redundancy
● Lustre 2.5.1: Lustre Performance Monitoring of LMT
● Kernel/OS: Scientific Linux (SL) 6.5 / OS 6.2
● CSCLI - Sonexion Command-Line Interface
● CSSM - Sonexion Manager
● Chrome, Firefox, Safari, and Internet Explorer 11 browsers for Windows, Linux, andMacOS
● Support File Bundle Collection
● High Speed Interface: Mellanox CX-4 - EDR / FDR HCA
● RAID Stack: Updates to optimize GridRAID and SCSI performance.
● Drives: All 4K Native, T10-PI Format Type2
○ Seagate Thunderbolt 10K 900GB
○ Seagate Valkyrie 15K, 300GB
○ Seagate Tardis (HPC) 10K, 4TB (SED)
○ Seagate Makara 4TB, 6TB (SED)
○ Seagate Makara+ 8TB (SED)
○ Seagate Gibson SSD 800GB (SED)
● FRU Replacements
● RAS (Reliability, Availability, Serviceability)
○ RAS Infrastructure: CLI, Nagios and Ganglia plugins, REST API
○ Guided Walkthrough Repairs: 2U24 / 4U24 / 5U84 Drives, 2U24 / 4U24 PCMs, 2UQuad Server PSUs
○ Fault Isolation: 5U84 cooling module, 2U24 / 5U84 I/O controllers
What Is Supported in Sonexion 2.1.0
12
○ Telemetry: service events, IEMs (Interesting Event Messages), inventory snapshots
● Extra MMU (ADU/DNE) Addition Procedure
● SSU Addition Procedure (SSU Only and SSU+1)
New Functionality
● Introduction of split SMU/MMU architecture that combines to make a CMU (seeSonexion 3000 Hardware Guide H-6144)
● Introduction of Mellanox IB EDR capabilities
● Introduction of next generation AP controller platform (Laguna Seca) using Intel HaswellCPUs
● Support for Mellanox SB7790 EDR Switches
● Support for Mellanox CX-4 HCAs
● Support for 4K Native and SED Hard Drives
● Support for 10K RPM HPC Drive
● Support for GOBI OneStor USM STX_GOBI_R1.16
NOT Supported in Sonexion 2.1.0
● CNG
● 40Gb Ethernet Data Fabric
● Intel Omni-Path Data Fabric
What Is Supported in Sonexion 2.1.0
13
5 Sonexion 3000 Components and Hardware ListSonexion 3000 is a next-generation HPC storage platform that delivers industry-leading performance anddurability using 12GB architecture and the Intel Grantley/Haswell platform. The Sonexion 3000 platform buildsupon Sonexion’s history of HPC excellence by offering substantial upgrades and enhancements to systemcomponents and hardware. For a more thorough discussion of this hardware, please see the Sonexion 3000Hardware Guide H-6144.
CMUThe re-engineered CMU consists of two sub-components: the System Management Unit(SMU) and Metadata Management Unit (MMU), housed in separate 2U24 enclosures, whichreplace the Intel quad server and adjacent EBOD from the previous models.
SMU -- dual MGMT nodes in an HA pair:
● 2U24 enclosure
○ Dual PSUs
● Dual basic EACs
● 12 drives
○ 7 x Thunderbolt 10K HDDs (900 GB 2.5-inch)
○ 5 x Valkyrie 15K HDDs (300 GB 2.5-inch)
MMU -- dual MDS nodes in an HA pair:
● 2U24 enclosure
○ Dual PSUs
● Dual standard EACs
● 22 drives
○ 22 x Thunderbolt 10K HDDs (900 GB 2.5-inch)
SSUEach SSU hosts dual OSS nodes in an HA pair.
● 5U84 G2 enclosure
○ Dual PSUs
● Dual standard EACs
● 84 drives
○ 82 x HDDs (SAS 3.5-inch)
○ 2 x SSDs (SAS 2.5 inch)
ESU
Sonexion 3000 Components and Hardware List
14
The ESU uses the 5U84 G2 enclosure with 6Gb EBOD controllers and high capacity HDDs.
● 5U84 G2 enclosure
○ Dual PSUs
● 6Gb EBOD controllers
● 82 HDDs (SAS 3.5 inch)
Management SwitchesDual Brocade ICX6610 switches are used for the management network (LMN).
● Base rack
○ Dual Brocade ICX6610 switches (24-port or 48-port, 1 GbE)
● Expansion rack
○ Dual Brocade ICX6610 switches (24-port, 1 GbE)
Network SwitchesDual Mellanox SB7790 EDR switches are used for the Lustre client network (LCN).
● Dual Mellanox SB7790 EDR (36-port, 100Gb InfiniBand)
5U84 G2The re-engineered 5U84 enclosure (5U84 G2) offers the following features:
● Enhanced LED display
● Improved drawer release
● Redesigned side card cover
● Improved sensor placement
EACsTwo EACs, basic and standard, are supported:
● Basic EAC
○ 64 GB DRAM
○ E5-2609 v3 CPU
○ 12Gb SAS controller
○ Dual 128GB SSDs
○ FDR IB
Sonexion 3000 Components and Hardware List
15
Figure 1. EAC for SMU Nodes (Basic EAC)
● Standard EAC
○ 64 GB DRAM
○ E5-2618L v3 CPU
○ 12Gb SAS controller
○ Single 128GB SSD
○ EDR IB
○ 12 Gb SAS card
Figure 2. EAC for MMU and SSU (Standard EAC)
Sonexion 3000 Components and Hardware List
16
6 Bug Fixes, Features, Improvements, and KnownIssues for 2.1.0-002
This section lists the known issues for 2.1.0-002 at the time of this writing.
Bug Fixes799286 R35 SSU6 Disk Monitor reports SSDs as Hot Spares
808825 Heartbeat loss: CPUs executing ldlm_bl tasks on behalf of drop_caches
810767Multiple Deactivated and Nearly Full OSTs Resulted in High OSS Node Load Averages andUnusability of FS
814266 Pool modification commands
815953 1.5.0 Qualification - When removing a controller fail-over does not occur.
816617 crashed with LBUG on ASSERTION( lock != NULL )
820179 CSSM warns of critical firmware issue
820215 Multiple Nodes Powered Off
820639 SMP after perf of mds-survey was backgrounded
821046 Pool modification commands
821304 Slow raid rebuilds on MDRAID
821657 SU 17 upgrade problems
821763 powered down
821931 Sonexion 1.5 Control+c "exits" instead of "interrupts"
822520n002 failed over to n003 - LBUG :(ldlm_flock.c:849:ldlm_export_flock_put())ASSERTION( flock->blocking_export != ((void *)0) ) failed
822661 t0db database issues after 3 SSU add
822717 App hung in cl_sync_io_wait; bulk io rpcs stuck in unregistering phas
823580 The spacing in the output from "cscli show_nodes" is not correct on the Sonexino 900.
823581 Error messages when beSystemNetConfig script was run on 1.3.1 system
823633 stonith and failover - HA Timeout
823922 MDS stalls, cpu soft lockup while running mdtest
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
17
824493 mds failovers with kdump during purie rhine rel-runs
824993 Unable to mount Filesystem from the Cray
825073 During 2.0 SU-06 install, 3 OSS nodes went down, install hung in beFreezeHA
825622 Unable to power up/communicate with SSU node
825638 Services in "pending" status in CSSM
825854 Sonexion cscli errors
826067 failover following disk failure and md0 had to be manually assembled
826087 Seagate SSDs too small to replace Hitachi SSDs
826317 ASSERTION( lock->l_export == opd->opd_exp ) failed:
826698 RAID Check Disabled- getting ***Error: I/O timeouts*** on multiple nodes
826806 8 Disk(s) failure(s) Slot 70 SSD I/O errors
826856 S7 - n007 & n006 panicked and md3 resources fail to start
827614 twistd memory usage (snx11128n000)
827656 db.py , inventory.py : ERROR updating t0db database error after drive replacement
827730 1.5.0 Upgrade beUpdatePuppet failure
827828 powered off and failed to fail-over
828105 Numerous OSTs fail to run monthly raid-check
828474 NEO 1.5 Upgrade Planning / Resource Request
828499 Sonexion fails to persist CLOSE event in changelog mask after unmount
828609MMU and ADU/DNE drives replaced with spare drive, but spare was marked as failed assoon as it was inserted.
828782 Nodes shows up as unknown
828958 Sonexion node falsely reports as non-responsive
829002 MDT corruption on the main Lustre filesystem
829283stopping raid-check causes multiple nodes to go down, full power cycle needed to recoverf/s
829453 Sonexion fail to reinstalle replaced node n030
829576 filesystem inconsistency
829750 8 failed slots, OST not starting
829787 node fails to kdump after lbug crash
830030 SWO - Changelog index count
830925 failover testing failure
831490 LBUG/ASSERTION "fid_is_sane(&md.body->fid1) ) failed:"
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
18
831540mds, mgs crashes, ldiskfs panic following "ldiskfs_xattr_inode_iget: error while reading EAinode
831793 Kernel panic, Failover failed
832154 frequent sas driver messages
832511 Ping rpc hung in unregistering phase with rq_receiving_reply set
832809 Lustre not starting, resource failed actions "not installed"
833268 Failback taking a lot longer under heavy load.
833608 Staging a directory sometimes results in zero length files
834135 hlus01 - n018 - "CRITICAL:device OSS reported unhealthy"
834414 ASSERTION( get_current()->journal_info == ((void *)0) ) failed
834486fs down after MDT crash "ldiskfs_xattr_inode_iget: Backpointer from EA inode 2300579986to parent invalid."
834793 n009 failed over to n008 for unknown reason, failover was successful
834796 mds controller downed.
834805 MDS node crashed
834945 n211 failed to failover to n210
835090 soft lockup on MDS lead to fs failure, MDS and MGS nodes both down
835240 Application failure due to client eviction
835282 S072 - n007 Kernel panic - not syncing: LDISKFS-fs (device md3)
835444LBUG: (osc_page.c:333:osc_page_delete()) Trying to teardown failed: -16;ASSERTION( 0 )
835485 Sonexion stripe 8 files missing OST parts
835883 Client evicted from expired blocking callback timer
836705 failback resulted in _md66-fsys (ocf::heartbeat:XYMNTR)
838390 SU 15 install
838584 client crash in osc_cache.c:3107:discard_cb() LBUG after OST failover or failback
838602 single-shared file IOR jobs hang during OST failover/failback
838832 IOR jobs fail during IB cable pull test
839072 IOR data compare error during OST failover test
839147 Snx 1.3.1 SU25 caused serious upgrade delay due to incompatible DB entries
839275 Lustre recovery issues on OST failback, recovery sometimes hits hard limit
839678 2.0SU19 cscli fs_info and show_nodes not working
839743 IOR fails w/data check errors
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
19
840345 md66-fsys unmount hang during mdt failback
841983 stonithed for HA failure in mdadm_conf_regenerate
842237 cscli failover triggers STONITH
826317OSS node crashed with assertion failure: ASSERTION( lock->l_export == opd->opd_exp )failed
831827, 832154 Multiple Instances of nodes being unexpectedly powered off
832809Add error message to catch the situation where drives are swapped between twoenclosures.
838602 ldlm: lost BL AST during failover
840984We need to validate all RPM packages from both base and SU repos before installation ofSU to avoid issues described in CSLTR-6550. Additionally we need to add check forsuccessful diskless image creation.
Seagate Internal Unable to see failed disk slot location in GUI
Seagate Internal During Resiliency Testing, pulled 2 drives and pdrepair did not start
Seagate Internal 2.0 SU-11.68, errors from post-install step
Seagate Internal Max write performance sometimes requires re-reading block bitmaps into memory
Seagate Internal support bundle collection hanging
Seagate Internal log rotate not running correctly on 1.2 system
Seagate Internal extraneous node entries in t0db netdev table caused beSystemNetConfig to fail
Seagate Internal CSSM did not allow configuration of LDAP via Configuration tab
Seagate Internal su-1.3.1-023.87 exposes CLSTR-4175 on systems upgraded from NEO 1.2.x
Seagate Internal 2.0 SU11-61 node powered off during fail over testing.
Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing
Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing
Seagate Internal3 down OSS nodes around the time of n000 failure, OST resources being stopped/restarted
Seagate Internal2 MDS nodes crash on DNE system while running beSystemNetConfig script,"osp_obd_connect()) ASSERTION( osp->opd_connects == 1 ) failed"
Seagate Internal 2.0 SU11.6X GUI and CSCLI not reporting lustre status correctly
Seagate Internal MGS failover nid problems on 2.0 systems
Seagate Internal file per process IOR jobs fail w/short write during failover/failback testing
Seagate Internal MGS n002 node stonithed when setting lustre parameter
Seagate InternalCLONE - manual MDS failover on 180-ssu fs failed this morning; admin interventionrequired to complete failover. Timeout in start_xyraid too short
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
20
Seagate InternalbeSystemNetConfig.sh should avoid erase_params if possible, or should warn aboutparams erased
Seagate Internal 2.0 SU11.6X GUI and CSCLI not reporting lustre status correctly
Seagate Internal cscli show_nodes shows inaccurate target status
Seagate Internal Very slow to read file inode. `ls`shows file metadata with '?' marks.
Seagate Internal Unable to mount lustre clients, even from n01, during installation.
Seagate Internal3 disks on single md failed during RAID check ... node stonithed, md4 didn't start on thepartner node
Seagate Internal S005 - cron.hourly issues - MySQL /var/lib/mysql/mysql.sock
Seagate Internal stonith problems, stonith of MDS fails (after MDS crash that failed to cleanly panic)
Seagate Internal Remove ‘lctl notransno’ and ‘lctl readonly’ commands from the XYMNTR stop operation.
Seagate InternalIn order to support switch in image handling in SU Trinity code has to be updated to handlenew versioning schema - currently it relies only on cs-release package version instead ofconsidering SU version to
Seagate InternalGenerate special lustre_config for Sonexion 3000 case. Incorrect lustre configuration ofprimary MGS node is fixed.
Seagate InternalThe problem is active targets are updated based on the resources active on that node, Notconsidering the primary roles of the node. Fixed that part.
Seagate Internal tests: conflicting locks are not flushed properly
Seagate Internal llite: Lustre I/O hung waiting for page
Seagate Internal “MRP-3603 osp: wakeup osp_precreate_reserve on umount”
Seagate Internal ofd: handle last_rcvd file can’t update properly
Seagate Internal tests: race MDT->OST reconnection with create
Seagate Internal llite: add forgotten copy_from/to_user
Seagate Internal Need to enforce that management node arrays are comprised of uniform drive types.
Seagate Internalplex service restarts on active management node due to exceeding memory usagethreshold allocated to a single service on management node.
Seagate InternalDue to fault in puppet manifest regeneration logic in SU script it can throw false errors thatcan be interpreted by user as actual issues.
Seagate Internalwhen applying SUs the Pacemaker config is not updated, so all updates related toadjustments in HA (eg. ustonith..) are not in effect.
Seagate Internal Update between different builds of same SU version don’t work
Seagate Internal tests: In interop, ensure to save/restore correct debug flags
Seagate Internal tests: lnet-selftest Error inserting modules
Seagate Internal Modified ll_find_alias to avoid cache corruption
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
21
Seagate Internal ldlm: fixing a server crash with ASSERTION(flock->blocking_export != 0)) failure
Seagate Internal test: wait on MDS for ost-pool proc entry to update
Seagate Internal scrub: NOT assign LMA for EA inode
Seagate Internal osd: Add nodelalloc to ldisk mount options
Seagate Internal tests: customise ior, simul cmds and MPIRUN
Seagate Internal osd-ldiskfs: pass uid/gid/xtime directly to ldiskfs
Seagate Internal Port LU-7130 changes b_neo_stable_2.x
Seagate Internal nrs: add lock to protect TBF rule linkage
Seagate Internal tests: skip several tests for CLIENTONLY mode
Seagate Internal test: racer on NFS
Seagate Internal ldlm: soft lockup in ldlm_plain_compat_queue
Seagate Internal Remove force option from XYMNTR Lustre lazy umount path
Seagate InternalMGMT framework fails to restart successfully on active management node when memoryusage threshold allocated to a single service on management node has been exceeded.
Seagate Internal Failed Leguna Seca not resolving once it was repaired when the firmware alert is unknown.
Seagate Internal MRPD Collection failing with permission denied
Seagate InternalDue to having two MDT’s (combined with MGS and separate one) on Laguna Secasystems Lustre upcall may be set incorrectly
Seagate Internal Increase timeout for crash memory dumping procedure
Seagate Internal Repetitive errors messages while communicating with GEM are suppressed appropriately.
Seagate InternalDDUMP monitoring not able to clear statesave bit in ses_page 2, triggering DDUMPcollection frequently.
Seagate InternalMgmt node should not have lustre status as Started. It should be N/a. Added hotfix to notcheck lustre status for mgmt nodes.
Seagate Internal ldlm: Wrong evict during failover
Features797089 Monitor/measure PDU power consumption
793717 Sonexion - heartbeat is insufficient to detect MDS failure
793269 Access LMT data on Sonexion filesystem
793583 SNMP Monitoring
800652 Enable Nagios Notifications
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
22
Improvements833368 Many OSS nodes will not come online after SU10 and FW updates
831400 ldap does not appear to be functioning on the sonexion, user can not create files
Seagate Internal ADU add failing after usm 3.26 upgrade
822718 CSSM GUI does not allow failover/failback of MGS (n002) to its partner
828842 Disk watcher daemon KILLDRIVE alert trigger under high IO load
833048 OST down after drive problem; OSS stonith'd, OST fails to start on partner
826047 S14 slow array/disk errors.
818646 raid-check and rebuilds should have minimal impact on production jobs
827265 SWO due to multiple drives going offline following sled reset
834222 9 disk failed --- OST down
835920 reports several disk disabled and n041 does not mount all of the disks
835764 several disk failures & OST failed-over to n224.
832398 disk problems, failover, n250 couldn't reassemble disk.
836071 lost disks due to a raid disassemble
825255 OSS nodes is down:
834918 powered down.
827027 monitor timeout, node stonithed
827028 monitor timeout, node stonithed
Seagate Internal MDS nodes stopped
Seagate Internal MDS nodes stopped
Seagate Internal 2.0 SU10 pm -q not working as admin
827716 After disk failure during raidcheck, sync_min is left at a non-zero value
835952 SWO Lustre - FS offline after split brain resulted in double mount
830809 OST should not be allowed to failback to node without infiniband connectivity
838419 stonithed after HA timeout - prm-snmp-heartbeat:0_monitor_10000 Timed Out
819194 Disk Watcher Daemon interrogating all disks
833048 OST down after drive problem; OSS stonith'd, OST fails to start on partner
826047 S14 slow array/disk errors.
828842 Disk watcher daemon KILLDRIVE alert trigger under high IO load
823922 MDS stalls, cpu soft lockup while running mdtest
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
23
Seagate InternalDocumented procedure to backup, drop, and recreate the t0db, mysql and LMTdatabases.
813897 Puppet is not starting after cold boot of snx11003 and LMT database corruption
827425Change the behavior of Lustre changelogs so that a client changelog config problem (orsome other unexpected client issue) cannot take out the file system
Seagate Internal 1.2.1 install, mds start hangs during beSystemNetConfig.sh
Seagate Internal tests: customise the list of loads
Seagate InternalPreserve timestamps in the Ganglia plugin when streaming data to the Ganglia server.The ensures accurately of plotted data in ganglia-web.
Known IssuesKey Summary Workaround
DOC-1323Sonexion 3000 - Daily Mode CLI commands list from online helpis out of date and need to be updated
No workaround at thistime
FMW-18954After BIOS update of GOBI, node does not always boot upcorrectly
No workaround at thistime
MRP-3515 I/O from all clients is halted when one client loses power
I/O resumes afterapproximately half anhour, dependent onthe work load prior tothe event
MRP-3559 2.1 Aero RC13 LustreError general protection fault panicsNo workaround at thistime
NEO-2690On Sonexion 3000 platform, controllers will power upautomatically when power is applied
No workaround at thistime
NEO-2715Intermittently, after mgmt node(s) come back up after being shutdown, xybridge does not start up correctly on both nodes.
No workaround at thistime
NEO-2782Firmware for Thunderbolt and Valkyrie HDDs is missing in 2.1release
No workaround at thistime
NEO-2789 MDS and MGS nodes died while doing failover of MGS nodeNo workaround at thistime
NSIT-12Drives accessible via left side expander may intermittentlydisappear from the system. Root cause is under investigation.
No workaround at thistime
NSIT-17 "Verify" capability is missing from GOBI usmtool in Sonexion 2.1No workaround at thistime
OSG-1773Can't boot with Live key on Sonexion 3000 system. Likelyproblem with drive setup mismatch between BIOS and kickstartoptions
No workaround at thistime
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
24
OSG-1850xybridge link down on secondary mgmt node after install.Possible mismatch with enclosure firmware and driver
No workaround at thistime
OSG-1852Both mgmt nodes went down during FOFB. Unknown cause atthis time
No workaround at thistime
OSG-1937 Watchdog timer is too short for kdumps to complete.No workaround at thistime
OSG-1946 Can't set LDAP after RC18 install. TRT-4571.No workaround at thistime
OSG-1947 Gemhpi and ses_monitor spinning on unresponsive enclosureNo workaround at thistime
OSG-1950 /tmp disk full preventing Lustre from mountingNo workaround at thistime
OSG-1957
There are two variations of HDD’s within the SMU enclosure(300GB 15K RPM & 900GB 10K RPM). The RAID Arrays createdduring installation may not have respected the differences of theHDDs and constructed RAID Arrays containing HDDs from bothdrive variants. Physical Impact: SMU RAID Arrays consist ofmixed drive capacities / variants and, as such, the size of thearray is based on the lowest capacity point present in the array(RAID10 = 600GB instead of 1.8TB / RAID1 = 300GB instead of900GB) Functional Impact: SMU RAID Array sizes are lower thandetailed in the Architectural Specification, and, as such: ¡ Forlarge systems (~50+ nodes) may have an impact by consumingall available storage prior to clean up operation. ¡ For smallsystems (~<50 nodes), there should be little impact as thestorage is cleaned regularly. RAS Impact: No impact CorrectiveAction - OEM re-Install of 2.1-GA- RMJT or utilize Seagatescript / procedure available via Seagate FAE’s
No workaround at thistime
OSG-1961 Watchdog timer is too short for vmcore dumps to complete.No workaround at thistime
OSG-1978 Rebuild failed to start on md64 when drive was failedNo workaround at thistime
RAMA-907 MGS node shows green in GUI heatmap when Lustre not startedNo workaround at thistime
RAMA-908 Dashboard shows downed nodes as green on heatmapNo workaround at thistime
RAS-473RAS treating SBB FRU Health status ‘unknown’ valueinconsistent.
No workaround at thistime
RAS-484No alert notification from RAS after triggered an uncorrectableECC error on LS
No workaround at thistime
SCRUF-1348Nodes n02 & n03 are physically swapped contradicting theArchitectural Specification, which details n02 as the top node and
No workaround at thistime
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
25
n03 as the bottom node within the MMU enclosure. PhysicalImpact: MMU nodes will contradict the Sonexion 3000Architectural Specification and with n02 being the bottom physicalnode & n03 being the top physical node. Functional Impact:System will continue to operate as intended with no functionalimpact on performance or operation. RAS Impact: RAS willcontinue to be functional and identify the correct node, in thecase of a failure, if the nodes were identified as per below at pointof OEM installation. Top Node: purpose: [mgs=primary,
mds=secondary]
Bottom Node: purpose: purpose: [mds=primary, mgs=secondary]
Corrective Action - OEM re-Install of 2.1-GA- RMJT or utilizeSeagate script / procedure available via Seagate FAE’s
SCRUF-1369 ipmi-stonith is not being configured to use ipmi-sec on L300No workaround at thistime
SCRUF-1371 SU-002.13 script doesn't update mgmt HA configNo workaround at thistime
TRT-4361
Active/Active MDT are not properly configured on Sonexion 3000Physical Impact: Sonexion 3000 MDT not initialized as active /active Functional Impact: Sonexion 3000 MDT not initialized asactive / active RAS Impact: No impact Corrective Action - OEMre-Install of 2.1-GA- RMJT or utilize Seagate script / procedureavailable via Seagate FAE’s
No workaround at thistime
TRT-4398When accepting LNET routing files from user, DOS file format ofCR and LF is not handled and gives error. Workaround is toreplace this with only LF using an external utility.
No workaround at thistime
TRT-4416ses_monitor.py displaying "Possible bad or baulky drivemessages" in logs.
No workaround at thistime
TRT-4534In certain scenarios, while doing repeated failover and failback ofADU node, cscli faiback -n on ADU may not work as expected.
No workaround at thistime
TRT-4545While doing OEM install on Sonexion 3000 system, intermittentlythe installation screen will not show the status bar for all nodes.Refreshing of the browser tends to make the status bar accurate.
No workaround at thistime
TRT-4631Upload ssl certificate results in JAVA error: Invalid process stage:expected 49, actual 50
No workaround at thistime
Bug Fixes, Features, Improvements, and Known Issues for 2.1.0-002
26
7 FirmwareThis section specifies component firmware qualified for the Sonexion 2.1.0-002 release.
SSU Firmware VersionsThis table lists qualified versions of firmware sub-components for the 5U84 G2 storage system released underGOBI OneStor USM STX_GOBI_R1.16 for the SSU enclosure.
SMU Component Firmware Sub-Component
Version Number
BMC Firmware: 0.01.0013
CPLD Firmware: 0.03.0004
BIOS Firmware: 0.02.0024
GEM
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
ConfigCRC 0x00000000
VPD structure 0x06
VPD CRC 0XD7CA1702
Eth Switch EEPROM CRC 0x45CD694A
GEMSat
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
Bootloader 1.00
ConfigCRC Not present
VPD structure 0x06
VPD CRC 0x992781BB
CPLD 2.1
MidplaneCPLD 0x13
VPD structure 0x0C
VPD crc 0x1E7457CB
PCM1Firmware 1.00 | 1.05
VPD structure 0x05
Firmware
27
VPD CRC 0x 41BEF99C
PCM2Firmware 1.00 | 1.05
VPD structure 0x05
VPD CRC 0x 41BEF99C
Fan Controller 1Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 2Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 3Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 4Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 5Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Sideplane
Element0 Firmware : 4.0.0.67|BL=6.10|FC=0x9E928EC0|VR=
0x06|VC=0x699F059B|CR=0x12|PC=N/ A|EV=0x80040002|SV=3.06-B032
Element1 Firmware : 4.0.0.67|BL=6.10|FC=0xF3F1DF4D|VR=
0x06|VC=0x42845E7B|CR=0x12|PC=N/ A|EV=0x80050002|SV=3.06-B032
Element2 Firmware : 4.0.0.67|BL=6.10|FC=0xD82BA4C7|VR=
0x06|VC=0x25D4B564|CR=0x12|PC=N/ A|EV=0x80040002|SV=3.06-B032
Element3 Firmware : 4.0.0.67|BL=6.10|FC=0x1709EDAC|VR=
0x06|VC=0xAC2E8A42|CR=0x12|PC=N/ A|EV=0x80050002|SV=3.06-B032
Firmware
28
SMU Firmware VersionsThis table lists qualified versions of firmware sub-components for the 2U24 storage system under GOBI OneStorUSM STX_GOBI_R1.16 for the SMU enclosure, when the product leaves the factory.
SMU Component Firmware Sub-Component Version Number
BMC Firmware: 0.01.0013
CPLD Firmware: 0.03.0004
BIOS Firmware: 0.02.0024
GEM
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
ConfigCRC 0x00000000
VPD structure 0x06
VPD CRC 0XD7CA1702
Eth Switch EEPROM CRC 0x45CD694A
GEMSat
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
Bootloader 1.00
ConfigCRC Not present
VPD structure 0x06
VPD CRC 0x992781BB
CPLD 2.1
Midplane
CPLD 0x13
VPD structure 0x0C
VPD crc 0x1E7457CB
PCM1
Firmware 1.00 | 1.05
VPD structure 0x05
VPD CRC 0x 41BEF99C
PCM2
Firmware 1.00 | 1.05
VPD structure 0x05
VPD CRC 0x 41BEF99C
MMU Firmware VersionsThis table lists qualified versions of firmware sub-components for the MMU enclosures, when the product leavesthe factory.
Firmware
29
MMU Component Firmware Sub-Component Version Number
BMC Firmware: 0.01.0013
CPLD Firmware: 0.03.0004
BIOS Firmware: 0.02.0024
GEM
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
ConfigCRC 0x00000000
VPD structure 0x06
VPD CRC 0XD7CA1702
Eth Switch EEPROM CRC 0x45CD694A
GEMSat
Firmware 4.3.1.19
Firmware date Mar 22 2016 18:30:05
Bootloader 1.00
ConfigCRC Not present
VPD structure 0x06
VPD CRC 0x992781BB
CPLD 2.1
Midplane
CPLD 0x13
VPD structure 0x0C
VPD crc 0x1E7457CB
PCM1
Firmware 1.00 | 1.05
VPD structure 0x05
VPD CRC 0x 41BEF99C
PCM2
Firmware 1.00 | 1.05
VPD structure 0x05
VPD CRC 0x41BEF99C
ESU Firmware VersionsESU Component Firmware Sub-Component Version Number
EBODFirmware 4.0.0.75
Bootloader 5.07
Firmware
30
VPD Structure 0x06
VPD CRC 0xB8D3D512
ConfigCRC 0x5BD2C2E8
GEM CPLD 0x14
Power CPLD 0x00176CF8
Midplane
CPLD 0x03
VPD structure 0x10
VPD CRC 0x7BE4F602
PCM1
Firmware 2.29|2.1E|2.00
VPD structure 0x03
VPD CRC 0x486003DF
PCM2
Firmware 2.29|2.1E|2.00
VPD structure 0x03
VPD CRC 0x486003DF
Fan Controller 1
Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 2
Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 3
Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 4
Device FW 01.0F
VPD version 0x05
Config 0x636B4986
Fan Controller 5
Device FW 01.0F
VPD version 0x05
Config 0x636B4986
SideplaneElement0 Firmware :
4.0.0.75|BL=0610|FC=0x9E928EC0|VR=0 x06|VC=0x699F059B|CR=0x12|PC=N/A|EV=0x80040002|SV=3.06-B032
Firmware
31
Element1 Firmware :4.0.0.75|BL=0610|FC=0xF3F1DF4D|VR=0 x06|VC=0x42845E7B|CR=0x12|PC=N/A|EV=0x80050002|SV=3.06-B032
Element2 Firmware :4.0.0.75|BL=0610|FC=0xF2F500C7|VR=0 x06|VC=0x25D4B564|CR=0x12|PC=N/A|EV=0x80040002|SV=3.06-B032
Element3 Firmware :4.0.0.75|BL=0610|FC=0x4E77DD42|VR=0 x06|VC=0xAC2E8A42|CR=0x12|PC=N/A|EV=0x80050002|SV=3.06-B032
Rack Component Firmware VersionsThis table lists qualified versions of firmware for the Sonexion 3000 rack components (switches), when theproduct leaves the factory.
Rack Component Version Number
MellanoxSB7790 EDR(36-portIB) 11.0300.0354
Brocade ICX-6610-24 (24-port) 08.0.30
Brocade ICX-6610-48 (48-port) 08.0.30
Disk Drive Firmware VersionsThis table lists qualified versions of disk drive firmware, when the product leaves the factory.
Drive Model Firmware Version
Seagate 300GB HDD (ST300MP0065) [SMU/MMU2U24] K003
Seagate 900GB HDD (ST900MM0008) [SMU/MMU 2U24] K002
Seagate 800GB SSD(ST800FM0053) [SSU] XGEG
Seagate 4TB HDD (ST4000NM0074) [SSU & ESU] KT05
Seagate 4TB HPC HDD(ST4000NM0031) [SSU & ESU] KTF2
Seagate 6TB HDD (ST6000NM0074) [SSU & ESU] KT05
Seagate 8TB HDD (ST8000NM0095) [SSU & ESU] KT01
Firmware
32
8 Notices and PrecautionsThe following statements are provided to list the specific known issues, and to ensure safety and safe operations.
Electrical Considerations● This equipment is designed to be installed on a dedicated circuit.
● The dedicated circuit must have circuit breaker or fuse protection. Protection of capacity equal to the currentrating of the distribution unit must be provided and must meet all applicable codes and regulations.
● Warning! HIGH LEAKAGE CURRENT. Ground (earth) connection is essential before connecting a supply.
● The Sonexion rack has multiple input power connectors. Disconnect all supply power for complete isolation.
● A safe electrical ground (earth) connection must be provided to the power supply cords.
● When power cycling any enclosure, wait approximately 10 seconds before re-applying power. Use the powersupply's ON/OFF switch to manage the power.
● After completing all assemblies and prior to powering on any system, perform a ground (earth) continuity anddielectric strength test.
● Verify that any circuit breakers installed in the facility are adequately sized, to avoid the possibility of thefacility's circuit breakers tripping in the event of a fault within the Sonexion rack and causing down time.
● When handling disk drives or components, avoid touching the printed circuit boards. You must observe allconventional ESD precautions.
Load Ratings and General PrecautionsThe rack has load ratings as described below:
● Base Rack with 6 SSUs:
○ Static Load (HDDs installed in SSUs): 1004 kg
○ Dynamic Load (no HDDs installed in SSUs): 568 kg
● Storage Rack with 7 SSUs (no Additional MMUs):
○ Static Load (HDDs installed in SSUs): 1034 kg
○ Dynamic Load (no HDDs installed in SSUs): 581 kg
● Frame load ratings are not dependent on side panels, doors, or other components for structural support.
● A customer is responsible for ensuring that the floor will support the static and dynamic load rating of the rack.This is especially important for installations that involve raised flooring.
● After removing the packaging and before moving the Sonexion rack to the final location, the bottom two SSUsmust be fully populated with disk drives.
Notices and Precautions
33
● With the weight and size of the Sonexion rack, it is possible for the rack to topple over while it is being moved.Do not tip the rack more than 10 degrees from a level surface or when rolling down an incline or ramp. Ensurethat the outriggers are properly installed to prevent possible toppling.
● When moving the Sonexion rack, the drives must be removed from all except the bottom two SSUs. Whenremoving the disk drives you must ensure each drive is re-installed in the exact same drive slot and the exactsame enclosure.
● When loading the rack, fill from the bottom up and empty from the top down.
● Do not slide more than one drawer out of any SSU enclosure in the rack at a time to avoid the danger of therack toppling over.
● Do not leave any enclosure bay empty.
● Contact Cray Service for firmware upgrades.
● Only trained service personnel may service any field replacement unit (FRU), and must follow the approveddocumented procedures for the FRU.
● Replacement of a cooling fan in any SSU enclosure must be completed within 2 minutes.
● When opening a drawer on any SSU enclosure, do not leave the drawer open longer than 2 minutes.
● When replacing a disk drive in any SSU enclosure, unlatch the drive and wait 5 seconds for the drive to spindown before removal.
Notices and Precautions
34
Top Related