Sun Enterprise Server Maintenance
-
Upload
miguel-angel-barona -
Category
Documents
-
view
145 -
download
2
Transcript of Sun Enterprise Server Maintenance
SunEnterpriseServerMaintenance
IT-ETC-033
Sun Microsystems LTDCitygateCross StreetSaleManchester M33 7JFUK
®
Revision E June 2001, Brian Jackson
Please
Recycle
Copyright © 2001 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303, U.S.A. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying,
distribution, and decompilation. No part of this product or document may be reproduced in any form by any means
without prior written authorization of Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a
registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun Logo, SunVTS, OpenBoot, Sun Enterprise, UltraSPARC, Solstice SyMON, Gigaplane,
SPARCstorage, RSM, RSM Array, SunFastEthernet, SunFDDI, StorEdge, SunDiag, SunPCI, SunBus, AnswerBook, and
OBDiag are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc.
in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun
Microsystems, Inc.
The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees.
Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user
interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface,
which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written
license agreements.
U.S. Government approval required when exporting the product.
RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Govt is subject to restrictions of FAR 52.227-14(g)
(2)(6/87) and FAR 52.227-19(6/87), or DFAR 252.227-7015 (b)(6/95) and DFAR 227.7202-3(a).
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS,
AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH
DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Contents
Introduction to Sun Enterprise Servers .................................................1-1Additional Resources ....................................................................... 1-2Enterprise Introduction ................................................................... 1-4Ex000 servers versus Ex500 servers................................................ 1-5Server Specifications......................................................................... 1-6
Sun Enterprise 3000 ..................................................................1-6Sun Enterprise 3500 .................................................................1-7Sun Enterprise 4500 ..................................................................1-8Sun Enterprise 5500 ..................................................................1-9Sun Enterprise 6500 ................................................................1-10
Reliability, Availability, and Serviceability Features ................ 1-11Reliability ......................................................................................... 1-12Availability....................................................................................... 1-13Serviceability.................................................................................... 1-14Scalability ......................................................................................... 1-15Concurrent Maintenance Tools..................................................... 1-16
Dynamic Reconfiguration......................................................1-16Alternate Pathing ....................................................................1-16
Monitoring and Administration .................................................. 1-17Solstice SyMON.......................................................................1-17
Hardware Component Overview............................................................2-1The Sun Enterprise 3000 Server ...................................................... 2-2
Specifications .............................................................................2-3The Sun Enterprise 3500 Server ...................................................... 2-4
Specifications .............................................................................2-6The Sun Enterprise 4000/4500 Server............................................ 2-7
Specifications .............................................................................2-8The Sun Enterprise 5500 Server ...................................................... 2-9
Specifications ...........................................................................2-10The Sun Enterprise 6500 Server .................................................... 2-11Specifications ....................................................................................2-12Gigaplane Architecture ................................................................. 2-13
Centerplane Configuration....................................................2-15
iiiCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
Centerplane Numbering Scheme..........................................2-16PCM/Slot layout ............................................................................. 2-17Performance..................................................................................... 2-18Hot Swap and Hot Plug ................................................................. 2-19Power Supplies................................................................................ 2-20
Power/Cooling Module (PCM)............................................2-20Peripheral Power Supply (PPS) ............................................2-21
Hot Pluggable Boards .................................................................... 2-25Hot Plug Architecture ............................................................2-25
Sun Enterprise Deskside Chassis Designs................................... 2-21Common and unique components ............................................... 2-27Exercise: Component Removal and Replacement...................... 2-28
Bus Structures and Types .........................................................................3-1UPA Bus Architecture ..................................................................... 3-2
The CPU/memory Board and the UPA Bus.........................3-2Ultra Port Architecture Features.............................................3-3
SBus Architecture ............................................................................. 3-4SBus Features.............................................................................3-4
PCI Architecture ............................................................................... 3-5PCI Mechanical Specifications ................................................3-5PCI Electrical Specifications ....................................................3-5PCI Board connectors ...............................................................3-6
SCSI Introduction ............................................................................. 3-7Small Computer System Interface Features ..........................3-9Fast SCSI – Higher Bus Speed .................................................3-9Wide SCSI – Wider Is Better ..................................................3-10Differential SCSI — Less Interference..................................3-10Ultra2 SCSI ...............................................................................3-10Termination..............................................................................3-14Cable quality............................................................................3-14Conclusion ...............................................................................3-14
SCSI implementation on I/O boards ........................................... 3-12Fibre Channel Interface ................................................................. 3-13
CPU/Memory and Clock Boards .............................................................4-1CPU/Memory+ Board ......................................................................4-2
CPU Module ..............................................................................4-5400 MHz, 8MB Ecache Module ...............................................4-6CPU Module Handling Precautions ......................................4-8Removing and Replacing a CPU Module............................4-10
Memory ............................................................................................ 4-11Board Status Indicators .................................................................. 4-12Clock+ Board ................................................................................... 4-14
Console Bus..............................................................................4-17Clocks........................................................................................4-17Reset logic ................................................................................4-18
iv Sun Enterprise Server MaintenanceCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
TOD/NVRAM.........................................................................4-18Serial,keyboard,mouse ports.................................................4-18JTAG..........................................................................................4-18Remote console commands ...................................................4-19XIR.............................................................................................4-20LED Status codes.....................................................................4-21
Passive Boards ................................................................................. 4-22Filler Panel ...............................................................................4-22Load Board...............................................................................4-23
I/O Boards....................................................................................................5-1Types of I/O Boards: ........................................................................ 5-2
I/O Addressing .........................................................................5-2SBus I/O Boards:............................................................................... 5-4
SBus I/O Boards– Type 1.........................................................5-5SBus + I/O Board – Type 4......................................................5-6SBus I/O Boards– Type 1.........................................................5-7SBus + I/O Boards– Type 4 .....................................................5-8
Graphics I/O Boards: ....................................................................... 5-9Graphics I/O Board – Type 2 ................................................5-10Graphics+ I/O Board – Type 5..............................................5-11Graphics I/O Board – Type 2 ................................................5-12Graphics+ I/O Board – Type 5..............................................5-13
PCI I/O Boards:............................................................................... 5-14PCI+ I/O Board – Type 3.......................................................5-14
Board Status Indicators .................................................................. 5-18Enterprise 3500 Fibre Channel Interface Board ..................5-20SCSI Disk Board .....................................................................5-21SCSI Disk Board Addressing.................................................5-21
Open Boot PROM / NVRAM...................................................................6-1Introducing OBP ............................................................................... 6-2Features of OBP ................................................................................ 6-4The OBP User Interface .................................................................... 6-7System Testing Commands ............................................................ 6-8Informational Commands ............................................................. 6-10The Device Tree............................................................................... 6-11Displaying the Device Tree ........................................................... 6-13
Using the .properties Command.......................................6-14Using the dev Command.......................................................6-14
Listing System Devices................................................................... 6-15Displaying Device Aliases ............................................................. 6-18
Device Alias Commands........................................................6-19nvalias Command ...................................................................6-20
Open Boot PROM Commands for the NVRAM......................... 6-21The printenv Command ......................................................6-22
General NVRAM parameters........................................................ 6-25
vCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
Platform specific NVRAM parameters ........................................ 6-27Environmental monitoring .............................................................6-30NVRAM security..............................................................................6-31NVRAMRC editing commands .....................................................6-32Updating Flash PROM and FCode................................................6-34Correcting a Faulty Flash PROM ................................................. 6-41Synchronizing NVRAM/TOD chips............................................ 6-43
Power on self test (POST).........................................................................7-1Introducing POST ............................................................................ 7-2Self test overview .............................................................................. 7-6POST control commands ............................................................... 7-18
s-flag..........................................................................................7-18v-flag .........................................................................................7-18
POST Menus .................................................................................... 7-20option 7 .....................................................................................7-21
POST Board status messages......................................................... 7-23Sample error messages................................................................... 7-24POST error reporting...................................................................... 7-25
show-post-results ....................................................................7-26When things go wrong................................................................... 7-29Accessing and Displaying POST .................................................. 7-30
tip session.................................................................................7-30
Internal Disk Subsystems ........................................................................8-1Internal Storage Capacities .............................................................. 8-2
The SCSI Disk Board.................................................................8-3The SCSI Disk Board Addressing ...........................................8-3
Disk Addressing ............................................................................... 8-5Examples ....................................................................................8-5
Sun Enterprise 3500 ...........................................................................8-6Fibre Channel Interface Board ................................................8-6
Disk Addressing ............................................................................... 8-9probe-fcal-all ............................................................................8-10world-wide numbers ..............................................................8-10
E3500 boot disk replacement......................................................... 8-12E3500 data disk replacement ......................................................... 8-13Sun Enterprise 3000 .........................................................................8-14
I/O Addressing test.........................................................................8-15
vi Sun Enterprise Server MaintenanceCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
Solaris Support Utilities ...........................................................................9-1How Solaris References System Components .............................. 9-2
Logical Device Names..............................................................9-2Physical Device Names ............................................................9-4Instance Names .........................................................................9-5
Configuring Components in Solaris............................................... 9-6Automatic Device Configuration............................................9-6
Displaying System Configuration Information............................ 9-9The prtconf Utility ..................................................................9-9The sysdef Utility .................................................................9-11The format Utility ..................................................................9-15
Displaying Diagnostic Information.............................................. 9-16The dmesg Command.............................................................9-16The prtdiag Command.........................................................9-18
Setting NVRAM Configuration Parameters From Solaris ........ 9-21The eeprom Command...........................................................9-21
SunVTS System Diagnostics .................................................................10-1Introduction ..................................................................................... 10-2
SunVTS Software Overview..................................................10-2Test categories ................................................................................. 10-3Hardware and software requirements......................................... 10-4Starting the SunVTS Software....................................................... 10-5The SunVTS Graphical Interface................................................... 10-6The SunVTS Window Panels......................................................... 10-7The SunVTS Window Icons........................................................... 10-8The SunVTS Menu Selections........................................................ 10-9The Schedule Options Menu ....................................................... 10-11The Test Execution Menu ............................................................ 10-12The Advance Options Menu ....................................................... 10-14Intervention Mode ........................................................................ 10-15Performance Monitor Panel......................................................... 10-16Using SunVTS in TTY Mode ....................................................... 10-18Negotiating the SunVTS TTY Interface ..................................... 10-19Running SunVTS Remotely......................................................... 10-20
Requirements.........................................................................10-20Running SunVTS Through a Remote Login .....................10-20Running SunVTS Through telnet or tip ........................10-22
SunVTS Test Summary ................................................................ 10-24Advanced Frame Buffer Test...............................................10-24SunATM Adapter Test .........................................................10-24Audio Test..............................................................................10-25Bidirectional Parallel Port Printer Test ..............................10-25Compact Disc Test ................................................................10-25Frame Buffer, GX, GX+ and TGX Options Test................10-26Disk and Floppy Drives Test...............................................10-26
viiCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
ECP 1284 Parallel Port Printer Test ....................................10-27Sun Enterprise Network Array Test...................................10-27StorEdge 1000 Enclosure Test .............................................10-28Frame Buffer Test..................................................................10-28Fast Frame Buffer Test..........................................................10-28SunVTS Test Summary ........................................................10-29Floating Point Unit Test .......................................................10-29Sun GigabitEthernet Test .....................................................10-29Intelligent Fibre Channel Processor Test ...........................10-29Dual Basic Rate ISDN (DBRI) Chip ....................................10-30M64 Video Board Test ..........................................................10-30Multiprocessor Test ..............................................................10-30Network Hardware Test ......................................................10-31SPARCstorage Array Controller Test ................................10-31Physical Memory Test ..........................................................10-32Prestoserve Test.....................................................................10-32Serial Asynchronous Interface Test....................................10-33Sun Enterprise Cluster 2.0 Network Hardware Test .......10-33Environmental Sensing Card Test ......................................10-34Soc+ Host Adapter Card Test..............................................10-34Serial Parallel Controller Test..............................................10-35Serial Ports Test .....................................................................10-35SunButtons Test.....................................................................10-35SunDials Test .........................................................................10-36HSI Board Test.......................................................................10-36Sun PCi Test...........................................................................10-36System Test ............................................................................10-37Tape Drive Test .....................................................................10-37Virtual Memory Test ............................................................10-37
Test Message Syntax..................................................................... 10-38
Alternate Pathing ......................................................................................A-1Introducing Alternate Pathing ....................................................... A-2Supported Devices ........................................................................... A-3
Disk Devices .............................................................................A-3Network Devices......................................................................A-3
Installing AP ..................................................................................... A-4How AP Works ................................................................................ A-5Physical paths ................................................................................... A-6Metadisk ............................................................................................ A-7Disk Pathgroup ................................................................................ A-8Metanetwork..................................................................................... A-9AP With Mirroring......................................................................... A-11AP and DR ...................................................................................... A-12AP State Database .......................................................................... A-13Creating the AP State Database ................................................... A-14
viii Sun Enterprise Server MaintenanceCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
The apinst Utility .................................................................A-16Creating a disk pathgroup and metadisks ................................. A-18Using the metadisks ...................................................................... A-20Placing your boot disk under AP control ................................... A-21Manually switching the active path ............................................ A-22Automatic disk pathgroup switching (AP2.1)........................... A-23Creating a network pathgroup .................................................... A-24Alternately pathing the primary network interface.................. A-25Switching a network pathgroup .................................................. A-27
Dynamic Reconfiguration ....................................................................... B-1Introducing Dynamic Reconfiguration.......................................... B-2
What Is Dynamic Reconfiguration? ....................................... B-2Benefits of DR ........................................................................... B-2Disadvantages of DR ............................................................... B-3Supported Hardware............................................................... B-3DR Limitations ......................................................................... B-4
Displaying Board Status.................................................................. B-5Basic Status Display................................................................. B-5Detailed Status Display........................................................... B-8
Reconfiguration Considerations .................................................... B-9Device driver interface DDI.................................................... B-9Suspend-Safe and Suspend-Unsafe Devices........................ B-9Hot-Plug Hardware ............................................................... B-10Permanent memory management ....................................... B-11Required additions to /etc/system..................................... B-11
Dynamic Reconfiguration Procedures........................................ B-12Removing a CPU/Memory Board....................................... B-12Installing or Replacing a CPU/Memory Board................. B-14Removing an I/O Board ....................................................... B-18Removing Boards that Use Detach-Unsafe Drivers.......... B-20Installing a New I/O Board.................................................. B-21Installing a Replacement I/O Board ................................... B-23
ixCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
x Sun Enterprise Server MaintenanceCopyright 1999 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
Introduction toSunEnterpriseServers 1
1-11Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Additional Resources
● http://docs.sun.com
● Server Rack Installation Manual, Part Number 802-7573
● Sun Enterprise 6500/5500/4500 Systems Installation Guide, Part
Number 805-2631
● SPARC Hardware Platform Guide, Part Number 802-5341
● Solstice SyMON User's Guide, Part Number 802-5355
● Sun Enterprise 6x00, 5x00, 4x00, and 3x00 Systems DynamicReconfiguration User’s Guide, Part Number 806-0280-05.
● Sun Enterprise Expansion Cabinet Installation and Service Manual,Part Number 805-4009
● Sun Enterprise 6/5/4/3x00 Systems SIMM Installation Guide, Part
Number 802-5032
● SBus+ and Graphics+ I/O Boards (100 MB/sec. Fibre Channels) for SunEnterprise 6/5/4/3x00 Systems, Part Number 805-2704
● PCI+ I/O Board Installation and Component Replacement for SunEnterprise 6/5/4/3x00 Systems, Part Number 805-1372
● Sun Enterprise 3500 System Reference Manual, Part Number 805-2630
● Sun Enterprise 6500/5500/4500 System Reference Manual, Part
Number 805-2632
● Sun Enterprise Server Alternate Pathing User's Guide, Part Number
805-5444
● Sun Enterprise 6x00/5x00/4x00 Disk Board Installation Guide, Part
Number 802-6740
● Sun Enterprise Systems Peripheral Power Supply Installation Guide,
Part Number 802-5033
● Sun Enterprise Systems Power/Cooling Module Installation Guide, Part
Number 802-6244
● Sun Enterprise 6/5/4/3x00 Systems Board Installation Guide, Part
Number 805-4007
1-12 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Introduction to Sun Enterprise Servers 1-13Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Enterprise Introduction
This course introduces you to some new concepts and some new
hardware. It is intended to give you an adequate understanding of the
enterprise computing environment and how Sun servers, software, and
applications fit into that enterprise. After you have been introduced to
the systems and understand their capabilities you will be provided
with an opportunity to take the systems apart, and put them back
together.
A main goal for this course is to help you understand the enterprise
computing environment better so that you can develop the
appropriate concurrent maintenance strategy. Troubleshooting a system
in the enterprise computing environment is quite different than a
desktop.
You must understand the function that the system you are working on
has in a company’s enterprise computing environment and how
critical it is that the system continue to operate while you troubleshoot
and repair it. No longer can a company afford to shut down a mission-
critical element in their enterprise operation while you perform
maintenance on that system.
Sun Microsystems has developed several products that can assist you
in performing your tasks with a minimal effect on the customer’s
enterprise computing environment. This course introduces you to
those products and tools and shows you how to be proficient with
them so you can safely work on Sun Enterprise servers.
1-14 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Ex000 Servers versus. Ex500 Servers
The original Enterprise servers, the E3000, E4000, E5000 and E6000
have been upgraded; a process the marketing people called a “mid-life
enhancement”.
The enhanced servers are called the E3500,E4500, E5500 and E6500.
Note – The key difference is that the Ex000 servers run there
interconnect at 83MHz.
The E3500, E4500 and E5500 run their interconnect at 100MHz using
enhanced system boards and centreplane.
The E6500 is constrained to run at a maximum interconnect speed of
90MHz.
● E6000 v E6500
The E6000 is housed in a 56” cabinet whilst the E6500 is housed in
a 68” cabinet. This makes room for an additional A5000 or D1000.
● E5000 v E5500
The E5000 is housed in a 56” cabinet whilst the E5500 is housed in
a 68” cabinet. This again makes room for an additional A5000 or
D1000.
● E4000 v E4500
No major difference, apart from faster interconnect.
● E3000 v E3500
Very different. The E3500 has been totally re-designed. There are
too many to outline briefly here, but we shall cover them all.
Introduction to Sun Enterprise Servers 1-15Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Server Specifications - E3000
Figure 1-1 The Sun Enterprise 3000 Cabinet
Main system features and options:
● Deskside chassis
● Enterprise 3000 is a four-slot model
● One CPU/memory+ and one I/O+ board minimum
● 1 to 6 UltraSPARC CPU modules
● 64 Mbytes to 12 Gbytes of RAM
● Up to ten internal SCSI disk drives
● Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive
1-16 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Server Specifications - E3500
Figure 1-2 The Sun Enterprise 3500 Cabinet
Main system features and options:
● Deskside chassis
● Five-slot system (Enterprise 3000 is a four-slot model)
● One CPU/memory+ and one I/O+ board minimum
● 1 to 8 UltraSPARC CPU modules
● 64 Mbytes to 16 Gbytes of RAM
● Up to eight internal FC-AL disk drives
● Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive
● Over 6 Tbytes of external storage
Introduction to Sun Enterprise Servers 1-17Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Server Specifications - E4500
Figure 1-3 The Sun Enterprise 4500 Cabinet
Main system features and options:
● Desktop chassis
● Eight-slot system, four in front and four in back
● One CPU/Memory+ and one I/O+ board minimum
● 1 to 14 UltraSPARC CPU modules
● 64 Mbytes to 28 Gbytes of RAM
● Up to 33.6 Gbytes of internal storage mounted on four disk
boards.
● Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm-tape drive
● Over 10 Tbytes of external storage
1-18 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Server Specifications - E5500
Figure 1-4 The Sun Enterprise 5500 Cabinet
Main system features and options:
● Datacentre cabinet
● An E4500, without cosmetic panels, mounted in a cabinet
● 1 to 14 UltraSPARC CPU modules
● 64 Mbytes to 28 Gbytes of RAM
● Up to 720 Gbytes of internal storage, comprising four disk boards
and A5000 or D1000 disk trays.
● Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm- tape drive
● Over 10 Tbytes of external storage
Introduction to Sun Enterprise Servers 1-19Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Server Specifications - E6500
Figure 1-5 The Sun Enterprise 6500 Cabinet
Main system features and options:
● Datacentre cabinet
● sixteen-slot system, eight in front and eight in back
● Minimum configuration; one CPU/memory+ and one I/O+ board
● 1 to 30 UltraSPARC CPU modules
● 64 Mbytes to 60 Gbytes of RAM
● Up to 576 Gbytes of internal storage, comprising two disk boards
and A5000 or D1000 disk trays.
● Ultra SCSI-2 CD-ROM32 and 4mm- or 8mm- tape drive
● Over 20 Tbytes of external storage
1-20 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Reliability, Availability, and Serviceability Features
RAS is a set of enterprise computing technologies that furnish a high
degree of protection for corporate data (reliability), provide near
continuous data access (availability), and incorporate procedures to
correct problems with minimal business impact (serviceability).
These capabilities, commonly known as RAS, are a standard part of
traditional monolithic, centralized processing systems. Many
businesses today are moving to network computing where the flexible,
scalable architecture enables them to easily expand IT systems as their
needs grow while maintaining a reliable, stable computing
environment. Sun Microsystems has become a trusted vendor of safe,
innovative network computing solutions by delivering mainframe-
class RAS features and capabilities in their commercial computing
solutions.
New features that improve data integrity, system reliability, and
availability include a simpler system design, improved environmental
and hardware monitoring tools, redundant power and cooling, and
hot plug design for some components. Hot plug means that these
system components can be replaced or added while the server is up
and running. Serviceability features include requiring only one tool for
disassembly and re-assembly (a Phillips screwdriver), identical
components across the Sun Enterprise server family, and improved
diagnostics utilities.
The RAS feature set focus is to warn the operator about problems, and
act on their effects. There are new sensors in the hardware, which are
monitored by the software for just about everything. For example, it
monitors the temperature not only of each board, but of each central
processing unit (CPU) module, and the state of each fan. There are
unique monitoring tools, such as Sun Management Centre, which can
display the state of the machine to the board level, and works on a
“predictive failure” model. For example, it provides the system
administrator with warnings indicating what the likely effects of a
detected problem are to the system.
Introduction to Sun Enterprise Servers 1-21Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Reliability
Sun Enterprise Ex500 systems have many features that improve their
reliability, which is defined as their ability to run continuously and
correctly. These features demonstrate continuous improvement and
Sun’s commitment to quality systems. The goal is to minimize the
burden on system operators and system administrators.
ECC and Parity Protection
● End-to-end error checking and correction (ECC) protection of data
● Address and control lines are parity protected
● Improved hardware monitors (time-outs and parity)
Enhanced Environmental Monitoring
● Advanced monitoring tools for power supplies, fans,
CPU/memory, input/output (I/O) boards, disks, and system
temperatures.
So, if a CPU modules overheats it will be taken off-line by the
system.
1-22 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Availability
The following describes some of the availability features of the Sun
Enterprise servers:
System Monitoring
System monitoring enhancements improve reliability by directing
error messages to other applications that can dynamically alter the
system’s configuration without stopping or rebooting the system.
New capabilities of power on self test (POST) analyze parts and
report failures to the automatic reconfiguration software.
Automatic System Reconfiguration (ASR)
Uses the POST output to identify and remove failed components
from the systems configuration before rebooting the system.
Hot pluggable power supplies and disk drives that have failed can
be replaced without any system downtime or reboot, which
increases the system’s availability.
Redundant Components
This feature provides for an immediate replacement of a failed
component. A redundant power supply provides the current
necessary for the system to continue to operate if another power
supply fails. Large systems have multiple power supplies, each
capable of providing power for a specific number of boards (not
specific boards or slots). Should two or more power supplies fail,
the system’s ASR software would reconfigure for fewer boards,
reducing the power requirements to that of the available power
supplies, and continue to operate in a reduced capacity until the
failed power supply is replaced.
Introduction to Sun Enterprise Servers 1-23Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Serviceability
The following describes some of the serviceability features of the Sun
Enterprise servers:
Hot Plug and hot swap components
Does away with the need for downtime.
Dynamic Reconfiguration
Eliminates the need for a reboot to logically attach a new or
replacement board.
Improved Diagnostics
Identify a system component failure more accurately.
The tests that run on system components at power on illuminate status
light emitting diodes (LEDs).
1-24 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Scalability
The modular design allows customers to expand and enhance the
system as they require. Because Sun has leveraged the same
technology across the entire line of servers, from small (2-4 CPU) work
group servers to large (up to 30 CPU) enterprise servers, upgrade costs
can be kept to a minimum and customers can protect their
investments.
The following lists the hardware components that are the same in
workgroup servers and enterprise servers:
● CPU/Memory(+), and I/O(+) boards
● Clock boards
● Power and cooling modules
● Peripheral power supplies 184 and 195-watt models
Introduction to Sun Enterprise Servers 1-25Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Concurrent Maintenance Tools
Dynamic Reconfiguration
Dynamic Reconfiguration (DR) is the ability to alter the configuration
of a running system by bringing components online or taking them
off-line without disrupting system operation or requiring a system
reboot. With DR, system boards can be logically and physically
included in the system configuration, or logically and physically
removed while the system is running.
This is useful in mission-critical environments if a system board has
failed and needs to be replaced or if new system boards need to be
added to the system for additional performance and capacity. DR is a
critical part of the concurrent maintenance strategy prevalent in the
enterprise computing environment.
Alternate Pathing
Alternate Pathing (AP) creates a new layer of device drivers called
meta-disks and meta-networks, which route access to one of two
physical device drivers. Applications and the operating system
components, including the disk management software, use the meta-
device name to access the resource. Only the drivers know the actual
physical paths.
The active path can be manually switched from the primary to the
alternate, at any time, with no interruption to data traffic. With AP
software operating and configured, automatic switch-over to the
alternate path occurs if the primary path fails. A manual AP switch
back to the primary path is required after service has been completed.
Meta-device definitions are stored in an AP state database that is used
early in the boot process. There are usually several copies of this
database. You must create the meta-devices yourself; the system does
not automatically create these for you.
1-26 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
Monitoring and Administration
Sun Management Centre
Sun Management Centre, formally known as SyMON, is a
comprehensive system monitoring tool for the Sun Enterprise servers.
Its graphical user interface (GUI) and intuitive design make it easy to
learn and use.
Sun Management Centre is a powerful system management solution
that dramatically increases RAS by allowing system administrators to
monitor and quickly manage large enterprise system configurations.
Sun Management Centre address the following system management
functions:
● Manages thousands of systems
● Supports heterogeneous GUI (Java technology-based)
● Supports full Simple Network Management Protocol (SNMP)
connectivity
● Supports active configuration management controls (supports DR)
● Supports historical data storage
● Supports system management capabilities
Introduction to Sun Enterprise Servers 1-27Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
1
1-28 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
HardwareComponentOverview 2
2-29Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 3000 Server
The Enterprise 3000 is a deskside tower enclosure. All the boards plug into therear of the E3000.
The clock board is located in the lower right, next to board slot 1. The clockboard has its own slot and does not use one of the four slots for theCPU/memory or I/O boards.
There are four slots in the bottom portion of the cabinet for CPU/memoryboards and I/O boards. The slots are numbered 1, 3, 5, and 7, from right to left.
A fully loaded E3000 will require 2 power/cooling modules (PCMs), the firstlocated above slots 1 and 3, the second located above slots 5 and 7. A thirdPCM can be used for redundant power in a fully loaded system.
If a third PCM is not used, a fan tray must be installed above the peripheralpower supplies to provide cooling.
The peripheral power supply is located in the lower left of the cabinet rear. Aspot for a redundant peripheral power supply is located to the right of the firstperipheral power supply.
Internal Disk Drives
The E3000 holds up to ten internal hot-plug disk drives. The disks are alldriven from the I/O board in slot 1.
Disk addressing is covered in chapter 8.
Three 300-watt PCMs
57 3 1
Peripheral
PeripheralFour board slots
Clockboard
PowerSupply #1
Power Supply #2(optional)
2-30 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 3000 Server
Specifications
Table 2-1 Sun Enterprise 3000 Server Specifications and Features
Features System/Board Configuration
Number ofGigaplane slots
Four slots. Minimum configurationrequires one I/O and one system board
Number ofprocessors
One to six Superscalar SPARC Version 9,UltraSPARC microprocessor modules
CPU interface One to six 128-bit Ultra Port Architecture(UPA) slots
Memory 256 Mbytes to 12 Gbytes
SystemInterconnect
Gigaplane, 2.68 GB/sec at 83 MHz
Three differentpower supplysystems
Up to three power and cooling modules(PCM) (power supply + fan module) forsystem and I/O boards. A peripheralpower supply (PPS1) for auxiliary powerand a peripheral power supply/AC(PPS0)
Internal disk Up to ten 3.5 inch hot-pluggable, SCSIdisk drives
Internal tape 8 mm, 4 mm, and .25 inches
CD-ROM SunCD12 drive standard
Height 65 cm (25.5 inches)
Width 43 cm (17.0 inches)
Depth 60 cm (23.5 inches)
Hardware Component Overview 2-31Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 3500 Server
The Sun Enterprise 3500 is vastly different to the E3000.
There are five slots in the bottom portion of the cabinet for
CPU/memory boards and I/O boards. The slots are numbered 1, 3, 5,
7, and 9 from right to left.
The Sun Enterprise 3500 server comes with at least one power/cooling
module located above slots 1 and 3. If a second power/cooling module
is required, it would fit above slots 5 and 7, to the left of the first
power/cooling module.
A fan tray above the peripheral power supply is also included in an
entry configuration.
A third power/cooling module can be used for redundant power in a
system with three or more boards. To install the third power/cooling
module, the existing fan tray, located to the left of the second
power/cooling module, must be removed. The third power/cooling
module fits into this slot.
In addition to three power/cooling modules, a second peripheral
power supply is required for full N+1 power supply redundancy in a
five-board Sun Enterprise 3500 server configuration.
The first peripheral power supply is located in the lower left of the
cabinet rear. A spot for the second, optional peripheral power supply
is located in the lower right of the Sun Enterprise 3500 cabinet front.
This second peripheral power supply is located in the rear of the Sun
Enterprise 3000 system cabinet. It was redesigned and moved to the
front of the Sun Enterprise 3500 system cabinet in order to provide
space for the additional system slot.
The second peripheral power supply on the Sun Enterprise 3500 server
is now 195 watts, instead of the 184 watts peripheral power supply
used on the Sun Enterprise 3000 server.
2-32 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Internal FC-AL Drives
The Sun Enterprise 3500 server has two internal disk banks (four disks perdisk bank), which support up to eight 9.1-GB FC-AL disks with optional dual-port connections. The number of internal disks supported in the SunEnterprise 3500 server was reduced in order to provide room for theadditional system slot in the rear of the server.
The inclusion of the fifth system slot in the back of the cabinet required thatthe optional second peripheral power supply be redesigned and moved to thefront of the cabinet, resulting in less space in the front of the cabinet for diskdrives.
The newer drives, however, can be configured to provide better diskavailability than that offered by the Sun Enterprise 3000 server. Each of thetwo disk banks can have one or two FC-AL loops connected to the installeddrives for a total of up to four loops. Dual-loop configurations provide ahighly-available, redundant hardware configuration.
Because the two banks are independent, a full configuration of eight diskdrives requires a minimum of two loops: one for each bank of four drives. Onthe other hand, a minimum configuration requires only one FC-AL connectionfor up to four disk drives.
The new FC-AL drives in the Sun Enterprise 3500 server still provide the hot-swap capability offered with the internal SCSI drives on the Sun Enterprise3000 server.
Disk addressing is covered in chapter 8.
Three 300-watt power/cooling modules
57 3 1Peripheral
FC-AL Five board slots
ClockboardPower
Supply with
InterfaceBoard
9
Key switch
Second peripheralpower supply
Fan tray
InternalFC-ALdisks
32X CD-ROM Tape drive
Front view Rear view
AC inlet
Hardware Component Overview 2-33Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 3500 Server
Specifications
Table 2-2 Sun Enterprise 3500 Server Specifications and Features
Features System/Board Configuration
Number ofGigaplane slots
Five slots. Minimum configurationrequires one I/O and one system board
Number ofprocessors
One to eight Superscalar SPARC Version 9,UltraSPARC microprocessor modules
CPU interface One to eight, 128-bit Ultra PortArchitecture (UPA) slots
Memory 256 Mbytes to 16 Gbytes
SystemInterconnect
Gigaplane, 2.68 GB/sec at 83 MHz, 3.2GB/sec at 100 MHz
Three differentpower supplysystems
Up to three power and cooling modules(PCM) (power supply + fan module) forsystem and I/O boards. A peripheralpower supply (PPS1) for auxiliary powerand a peripheral power supply/AC(PPS0)
Internal disk Up to eight, 3.5 inch hot-swappable, FC-AL disk drives with dual porting
Internal tape 8 mm, 4 mm, and .25 inches
CD-ROM SunCD32 drive standard
Height 65 cm (25.5 inches)
Width 43 cm (17.0 inches)
Depth 60 cm (23.5 inches)
2-34 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 4000/4500 Server
A compact mid-range server with tremendous computing power, this server nearly
doubles the expendability of the Sun Enterprise 3500 server.
You can install up to fourteen UltraSPARC II processor modules in a single chassis
with four CPU/memory boards in the front and three CPU/memory boards in back.
You can install up to four Sun Enterprise 4500 servers in a single data center cabinet.
When properly configured, each Enterprise 4500 system can support over 4
Terabytes of disk storage.
The Enterprise 4500, like the Sun Enterprise 5500 and 6500 servers, uses a horizontal
card cage.
Power/coolingmodules
Peripheral PowerTape drive(optional)32X CD-ROM drive
CPU/memory, I/Oand disk boards
Clock Board
Front view Rear view
SupplyKey switch
Hardware Component Overview 2-35Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 4500 Server
Specifications
Table 2-3 Sun Enterprise 4500 Server Specifications and Features
Features System/Board Configuration
Number ofGigaplane slots
Eight slots. Minimum configurationrequires one I/O and one system board
Number ofprocessors
Two to 14 Superscalar SPARC Version 9,UltraSPARCII microprocessor modules
CPU interface One to 14, 128-bit Ultra Port Architecture(UPA) slots
Memory 256 Mbytes to 28 Gbytes
SystemInterconnect
Gigaplane, 2.68 GB/sec (E4000 at 83MHz), 3.2 GB/sec (at 100 MHz)
Two differentpower supplymodules used
Up to four PCM (300 watt power supply +fan module) for system and I/O boards.One PPS1 (184 watt peripheral powersupply) for auxiliary power
Internal disk Up to eight 9.1 GByte disk drives on up to4 Disk Boards
Internal tape 8 mm, 4 mm, and .25 inches
CD-ROM SunCD32 drive standard
Height 34cm (13.5 inches)
Width 50 cm (19.7 inches)
Depth 56 cm (22 inches)
2-36 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 5500 Server
The Sun Enterprise 5500 is a 68-inch data center cabinet with an 8-slot E4500 card cage
mounted inside. The data center cabinet provides power distribution and cooling for the
system and up to one half terabyte of disk space. Each Enterprise 5500 data center rack
can accommodate up to four A5000 disk StorEdge subsystems. The Sun Enterprise 5000
server can accommodate up to six removable storage modules (RSMs). The system,
when completed with the proper features and options, can support over six terabytes
of disk space. This does require additional disk expansion racks.
Note: You can have A5000s or D1000s
Front view Rear view
CPU/memory,I/O, and disk
Power/cooling modules
Power sequencer
Peripheral power supplyClock board
board slots
Key switch32X CD-ROM drive
Tape drive(optional)
Cabinetfan tray
Optional secondpower sequencer
Sun StorEdgeLibrary FlexiPackTray or Hub Tray
Sun StorEdgeTM
Library FlexiPackTray or Hub Tray
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Hardware Component Overview 2-37Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 5500 Server
Specifications
Table 2-4 Sun Enterprise 5500 Server Specifications and Features
Features System/Board Configuration
Number ofGigaplane slots
Eight slots. Minimum configurationrequires one I/O and one system board
Number ofprocessors
Two to 14 Superscalar SPARC Version 9,UltraSPARCII microprocessor modules
CPU interface Up to 14, 128-bit Ultra Port Architecture(UPA) slots
Memory 256 Mbytes to 28 Gbytes
SystemInterconnect
Gigaplane, 2.68 GB/sec (E5000 at 83MHz), 3.2 GB/sec (at 100 MHz)
Two differentpower supplymodules used
Up to four PCM (300 watt power supply+fan module) for system and I/O boards. APPS1 (184 watt peripheral power supply)for auxiliary power
Internal disk Up to eight 9.1 GByte disk drives on up tofour Disk Boards
A5200 option Up to four subassemblies for over 1 TByteof storage
Internal tape 8 mm, 4 mm, and .25 inches
CD-ROM SunCD32 drive standard
Height 173 cm (68.3 inches)
Width 77 cm (30 inches)
Depth 99 cm (39 inches)
2-38 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 6500 Server
The Sun Enterprise 6500 server is a 68-inch data center cabinet with a 16-slot card
cage; 8-board slots in front as well as the back. The E6000 will have one less storage
array, since it is housed in a 56-inch cabinet.
Note: You can have A5000s or D1000s
CPU/memory
Power/cooling modules
Power sequencer
Peripheral power supply
Clock board
Front view Rear view
Key switchCD-ROM drive
Tape drive(optional)
Cabinetfan tray
and I/O boardslots
Optional secondpower sequencer
Sun StorEdgeLibrary FlexiPackTray or Hub Tray
Sun StorEdgeTM
Library FlexiPackTray or Hub Tray
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 Array
Sun StorEdgeD1000 ArraySun StorEdge
A5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Sun StorEdgeA5000
Hardware Component Overview 2-39Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
The Sun Enterprise 6500 Server
Specifications
Table 2-5 Sun Enterprise 6500 Specifications and Features
Features System/Board Configuration
Number ofGigaplane slots
16 slots. Minimum configuration requiresone I/O and one system board
Number ofprocessors
Two to 30 Superscalar SPARC Version 9,UltraSPARCII microprocessor modules
CPU interface Up to 30, 128-bit Ultra Port Architecture(UPA) slots
Memory 256 Mbytes to 60 Gbytes
SystemInterconnect
Gigaplane, 2.68 GB/sec at 84 MHz.
Two differentpower supplymodules used
Up to eight PCM (300 watt power supply+fan module) for system and I/O boards. APPS1 (184 watt peripheral power supply)for auxiliary power
Internal disk Up to four 18.2 GByte disk drives on twodisk boards slots 14 and 15 only
A5200 option Up to three subassemblies for over 760GByte of storage
Internal tape 8 mm, 4 mm, and .25 inches
CD-ROM SunCD32 drive standard
Height 6500 - 173 cm (68.3 inches)6000 - 141 cm (56 inches)
Width 77 cm (30 inches)
Depth 99 cm (39 inches)
2-40 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Gigaplane Architecture
Ultra Port Architecture (UPA)
The gigaplane interconnect is based around the Sun4u (UPA)
architecture. Each board within the gigaplane is assigned 2 UPA port
numbers, which are used by the system to derive addressing
information which is passed to the Solaris kernel.
Board Layout
● CPU/memory boards are usually in even-numbered slots in the
front (component side down) of E4500, E5500, and E6500 systems.
● I/O boards are usually in odd-numbered slots in the back.
▼ I/O boards are in the back because of the interface ports and
connected cables.
Note – You can install any CPU/Memory board in any slot, front or
back and you can install any I/O board in any slot, front or back.
You must install an I/O board in slot 1 to drive the internal CD-ROM
and tape unit.
The clock board has its own special slot, which is numbered slot 16 in
all the systems.
Hardware Component Overview 2-41Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Packet Switched Bus
● 256-bit data width (plus error correction)
● Out-of-order completion
▼ A centerplane transaction does not tie up the bus. Due to the
packet nature of bus data, you can have up to 112 transactions
waiting for completion. Because there are no unused cycles
when different boards access the centerplane, we have a
sustained bandwidth that is 97 percent of the maximum.
● Pipeline transactions
▼ Up to 7 outstanding transactions from each processor
▼ Up to 7 outstanding transactions from each board on the
Gigaplane.
Gigaplane Speed
● Sun Enterprise x000 systems use a clock speed of 83 MHz
▼ 83 MHz provides for up to 2.6 Gbytes of bandwidth
● Sun Enterprise x500 systems use a clock speed 100 MHz
▼ 100 MHz provides for up to 3.2 Gbytes of bandwidth
Note – You can install a 100 MHz board in the 83 MHz system and it
should operate properly, although the board will only run at 83MHz.
But, installing an 83 MHz board in a 100 MHz system changes the
gigaplane speed to 83 MHz.
The 100 MHz boards are identified with a plus (+) sign in their
product name.
2-42 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Centerplane Configuration
The centerplane is a backplane with more connections to the bus for
the same linear space.
It does not matter which type of board plugs into which side or slot,
with the exception of slo1 which we will talk about later.
The main considerations are that you want the boards as close to one
another as possible to reduce noise and latency.
You should place boards with external cabling in the back of the
system.
The next page gives a layout of the UPA port numbers assigned to
each gigaplane slot. We have included the SCSI assignments for the
slots; we will cover this later in the course.
I/O
Address Bus
Data Bus
BoardI/O
BoardI/O
BoardI/O
BoardI/O
Board
CPU/Mem
CPU/Mem
CPU/Mem
I/OBoard
CPU/Mem
System Front
System Rear
CPU/Mem
I/OBoard
ClockBoard
Hardware Component Overview 2-43Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Centreplane Slot Assignment
2-44 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
E3000 PCM and Slot Layout
Note: If you do not have PS5 in place, you will need to fit a fan tray in
its place to provide cooling for the PPSs
Hardware Component Overview 2-45Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
E3500 PCM and Slot Layout
Note: If you do not have PS5 in place, you will need to fit a fan tray in
its place to provide cooling for the PPSs
2-46 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
E4500 - 6500 PCM and Slot Layout
Hardware Component Overview 2-47Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Performance
Memory performance
Memory performance is improved by:
● 512 bits plus ECC (error correction code) = 576 bits transfer per
CPU clock cycle
● Cache-to-cache transfers, with the same-line buffer reducing
latency and processor intervention
● High memory bandwidth
▼ 500 Mbytes per second per bank (600 Mbytes per second for 2
banks on one board)
▼ Up to 16-way interleaved memory
● Address and data packets, 2-cycles each, so contention delay is
small
I/O performance
I/O performance is improved by:
● Multiple I/O boards for greater bandwidth
● Efficient interrupt processing
▼ Interrupt packets carry data and interrupts route to any CPU
● Two SBus controllers, three sbus slots per Sbus I/O+ board
▼ 64-bit, 25 MHz, 64-byte bursts
▼ 100 Mbytes per second direct memory access (DMA) read, 120
Mbyte per second DMA write for each SBus
▼ Double-buffered streaming buffers for read-ahead, write-
behind
● Graphics I/O card replaces one SBus and slot with a UPA bus and
graphics adapter slot. Other components and ports are the same.
2-48 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Hot swap and Hot Plug devices
Be aware of the difference between the above:
Hot Swap
The unit automatically detaches from the system software.
Examples are:
● PCMs
● PPSs
Hot Plug
The unit has to be manually detached from the system software.
Examples are:
● Disk drives
● CPU/Memory boards
● I/O boards
Hardware Component Overview 2-49Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Power Supplies
Power/Cooling Module (PCM)
AC Input, 100-240V AC @ 5.5A, DC Output
PCM power supplies are used in Enterprise 3x00, 4x00, 5x00, and 6x00
systems. There must be a 300W PCM for every two adjacent boards in
the system, because the fans inside the PCM are the only cooling for
those boards. This means that if a board is added to the system, there
must be an associated PCM. If one is not present, it must be added.
Each 300W PCM supplies enough power for two boards, although in a
fully loaded configuration one supply can be lost and there will still be
enough power for the remaining boards (N+1).
The PCMs:
● Are hot pluggable
● Supply cooling for two adjacent boards
● Operate in redundant current share mode (N+1)
+3.3V +5V +2.0V MaximumContinuous
51A 32A 5A 300 watts
2-50 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Peripheral Power Supply (PPS)
The PPS is used in Enterprise systems to power internal SCSI devices
(CD, tape, and disks), in addition to the devices below.
There are two types of PPS; one with an AC input which is specific to
the E3x00 systems and one without an AC Input common to all the
servers.
Backup PPS
You will find one PPS per 4x00, 5x00, or 6x00 system; and one or two
PPS in the E3X00. This is because the PPS in the 4x00, 5x00, 6x00
systems power the CD-ROM and tape only, whilst the PPS in an E3x00
powers the internal disks.
Losing a PPS in a E3x00 is a system down, hence the backup.
The PPS provides the following:
● +5Vdc and +12Vdc peripheral tray power
● +5Vdc and +12Vdc drive precharge (nonredundant)
● +3.3Vdc and +5Vdc system precharge (nonredundant)
● +5Vdc redundant system power
● +12Vdc redundant power for PCM fans
● +12Vdc redundant power for E3000/E3500 Auxiliary Fan Module
● +5Vdc auxiliary power for Clock Board remote console serial port
● E4000/E4500 Keyswitch Assembly fan power
● E5000/E5500 and E6000/E6500 AC Input Box fan power
Internal Disk Board
It is the PCM, not the PPS, that powers the disk board.
Hardware Component Overview 2-51Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Peripheral Power Supply (PPS)
184 Watt PPS, used in the E4x00, E5x00 and E6X00. Used as a backup
PPS in an E3000. Part number 300-1301
184 Watt PPS with AC Input, used as a main PPS in an E3000. Part
number 300-1307
2-52 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Peripheral Power Supply (PPS)
195 Watt PPS, used as a backup PPS in an E3500. Part number 300-1358
300-1358 - AC Input 100-240V AC @ 3A, DC Output
300-1301/1307 - AC Input 100-240V AC @ 3A, DC Output
+5V +5V +12.0V -12.0V +14V MaximumContinuous
20A 5A 13A 1.5A 1A 195 Watts
+5V +5V +12.0V +12.0V Maximum Continuous
20A 5A 13A 1.5A 184 Watts
Hardware Component Overview 2-53Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
PCM and PPS Status Lights
Status LEDs Codes
Green Yellow Description
Off Off No AC Input
On Off Normal Operation
On On Fan Failure
Off On DC Output Failure
2-54 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Hot Pluggable Boards
Hot Plug Architecture
The CPU/Memory+ boards and the I/O+ boards are hot pluggable
under certain conditions.
You can only remove a system board if it has an amber light on only,
and even then there are checks to be made to ensure the board may be
removed.
In the middle of the centerplane connector are three large pins that are
larger and longer than the Gigaplane connector pins.
Hardware Component Overview 2-55Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Hot Plug Architecture
These connectors provide for connection to the power bus before data
and address pins make contact in the Gigaplane connectors. Each of
the power connectors is a different length, which provides for a
sequential connection process.
The first pin to make contact when a board is plugged into the card
slot is the ground pin.
Next is the precharge voltage connection. This applies a low voltage to
the logic and prepares the logic for full voltage with less current drain
at contact than would be required if the precharge was not provided.
This eliminates the power surge, which corrupts data and address
lines, and causes systems to halt when boards are inserted.
Warning – The precharge voltage is provided by the PPS. Ensure the
precharge is available before attempting a hot-plug.
# /usr/platform/sun4u/sbin/prtdiag -v | grep \precharge
Trigger Pin
Just before the data and address pins in the Gigaplane are connected a
logic pin called the trigger pin makes connection. This informs the
clock board to suspend activity on the gigaplane for 200ms whilst the
board insertion completes.
!Caution – You can not hot-plug the clock board for two reasons.
Firstly, there are no precharge connections for the slot, and secondly
because it is the clock board which controls board insertion.
2-56 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Common and Unique components
Common Components
One of the features of the Exx00 range is the commonality betweenmajor components. Some common components include:
● CPU/Memory boards
● I/O boards
● CPU Modules
● Memory
● PCMs
● PPSs
● Clock Boards
● Disk Boards
Unique Components
Some unique components are:
● AC Input units
● Media bays
● Load boards
● E3500, IB boards
● E3500, auxiliary PPS
● E3500, FC-AL drives
● E6500 load boards
Hardware Component Overview 2-57Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Cooling Considerations
Filler Panel
The filler panel shown below directs airflow inside the card cage and
helps shield electromagnetic interference (EMI) type emissions from
interfering with normal operations.
Caution – Empty slots in Enterprise 4X00 and 5X00 systems must have
a filler panel installed. Whenever you remove a board and do not
immediately replace it, you must install a filler panel.
Springfingers
2-58 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Cooling & Loading Considerations
Load Board
The load board shown below does the same tasks as the filler panel,
but it also helps maintain a constant load on the power supply system,
reducing the occurrences of voltage spikes.
Whenever you remove a system board in an E6x00 and it is not
immediately replaced, you must install a load board in its place.
Caution – Load Boards are used only in Enterprise 6X00 Systems.
All slots in Enterprise 6X00 systems that do not contain system boards
must have a load board installed.
Springfingers
Centerplaneconnector
Hardware Component Overview 2-59Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
Sun Enterprise 4500, 5500, and 6500 Systems FRU RemovalProcedures.
!Caution – Before beginning any procedure to remove static sensitive
components from any Sun Enterprise server, attach an approved ESD
wrist strap to your wrist and connect the other end to the system
chassis. Connect the ESD mat provided to the same chassis and verify
that it is properly grounded before preceding. Always place removed
system components on the ESD mat provided
Removing the Power and Cooling Modules
Note – Remember the following rules for hot-plug replacement of a
PCM: The peripheral power supply must be operational (to provide
precharge current). Hot-plugging requires adequate redundancy of
electrical power or an overload condition might occur when a power
supply is removed. Use the prtdiag command to determine if
precharge current is present before removing or installing a hot
pluggable power supply.
1. Use a #1 Phillips screwdriver to turn each quarter-turn access
screw on the power supply to the unlocked position
2-60 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
2. Pull the end of the extraction lever outward to release the power
supply from the centerplane.
Figure 2-1 Extracting a Power and Cooling Module
3. Slide the power and cooling module out of the chassis.
Front Rear
Extraction levers toward near side
Hardware Component Overview 2-61Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
Removing the Peripheral Power Supplie(s)
1. Use a Phillips #1 screwdriver to unlock the quarter-turn access
slots on the power supply.
2. Pull the ends of the extraction levers outward to release the power
supply from the centerplane
Figure 2-2 E5500/6500 PPS Removal
Figure 2-3 E3500 PPS/AC Removal
2-62 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
Removing the Auxiliary Peripheral Power Supply 1 (PPS1) Fromthe E3500
1. Release the power supply from the system chassis by loosening
the captive screws.
2. Pull the ends of the extraction levers outward to release the power
supply from the centerplane.
3. Pull the power supply straight out.
Figure 2-4 E3500 Auxiliary Peripheral Power Supply 1 Removal
Hardware Component Overview 2-63Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
Removing the Removable Media Tray
1. E3500/4500, remove the front bezel.
a. Grasp the front bezel on both sides near the center.
b. Place your thumbs on top of the front bezel and place your
other fingers at the slight indentations under the front bezel
for leverage.
c. Pull the front bezel straight out toward you and set it aside.
2. Loosen the bottom two captive screws that secure the media tray
to the chassis tray.
Figure 2-5 E3500/E4500 Media Tray Removal
3. Use a screwdriver in the notch at the bottom center of the tray to
assist in separating the media tray from the rear slip connectors,
and pull out the tray.
E5500/6500
1. Remove the left side panel
2. Release the device enclosure from the media tray by removing
three screws on the left side of the media tray.
2-64 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
Exercise: Component Removal and Replacement
3. Pull the device enclosure forward and disconnect the data and
power cables from the rear of each device.
4. After the cabling is removed, remove the device enclosure from
the media tray.
Figure 2-6 E5500/6500 Media Tray Removal
Hardware Component Overview 2-65Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
2
2-66 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
BusStructuresandTypes 3
3-67Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
UPA Bus Architecture
The CPU/memory Board and the UPA Bus
The figure below shows the relationship between the CPU modules
and the system board. The area within the shaded box is supported by
the UPA bus.
The table below shows you the bus widths for different system
functions.
UPA and Gigaplane bus widths
UPA bus Gigaplane bus
Processor; 128 data + 16 ecc
SYSIO; 64 data + 8 ecc
FFB; 64 data
256 data + 32 ecc
41 address
UPA bus
3-68 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Ultra Port Architecture Features
The Ultra Port Architecture (UPA) supports the high-performance
UltraSPARC design. Sun Microsystems created this new component
interconnect bus to optimize data transfers between devices and
system boards. Designed specifically for multitasking, multiprocessing
environments, the UPA interconnect handles multiple simultaneous
requests for data transfers between processors, memory, and I/O devices.
UPA features include:
● Packet-switched bus
● High speed (1.6 Gbytes/second)
● High bandwidth
● Direct CPU to memory without crossbar switching
● Improved 3D graphics acceleration
This new high-performance architecture has a processor-to-memory
interconnect using the UPA bus. The UPA bus runs at one-half the
CPU clock rate because it is twice as wide. This enables the CPU to
load each half of the bus’s data before the next bus cycle.
To increase the data flow between the CPU and other subsystems, the
UPA uses crossbar packet switching. Packets from various subsystems,
such as memory, graphics, and I/O devices can be multiplexed. This
allows multiple transactions to occur seemingly simultaneously, with
peak transfers in excess of 1.6 Gbytes per second.
Bus Structures and Types 3-69Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SBus Architecture
SBus Features
Sun Microsystems designed the SunBus™ (SBus) to provide the
SPARC™ products with a high-performance, space-efficient, and cost
effective system bus. The 25 MHz 32-bit data and address SBus
specifications have been adopted by the Institute of Electrical and
Electronic Engineers (IEEE) and are available to third-party
developers.
SBus provides for device autoconfiguration. Installing SBus expansion
boards is easy because of an EPROM containing machine-independent
Forth code that describes the board’s function and contains a POST
that is compatible with Sun systems POST commands. The system
retrieves configuration information from the expansion boards at
power-up, thereby identifying and initializing all devices connected on
the SBus.
3-70 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
PCI Bus Architecture
PCI Mechanical Specifications
PCI boards have two basic form factors, standard or long length (312
mm) and short length (119-167 mm). Board edge connectors are keyed
for 3.3V signaling, 5V signaling, or universal signaling. Universal
boards are designed to fit in 3.3V or 5V connectors.
The 32-Bit, 124 pin PCI connector has 120 signal pins and 4 key pins.
The 32-Bit connector defines the system signaling as 3.3V or 5V. An
optional 64-Bit extension is built into the same connector molding
extending the number of pins to 184.
A 32-Bit PCI board identifies itself for 32-Bit transfers when it is
installed in a 32-Bit or 64-Bit connector. A 32-Bit PCI board can be
installed in either a 32-Bit or 64-Bit connector.
A 64-Bit PCI board identifies itself for 32-Bit transfers when it is
installed in a 32-Bit connector. A 64-Bit PCI board identifies itself for
64-Bit transfers when it is installed in a 64-Bit connector.
The signals that enable 64-bit operation are REQ64 and ACK64. They
are Side A Pin-60 and Side B Pin-60 of the 32-bit connector.
PCI Electrical Specifications
The PCI specification provides for 3.3V and 5V signaling. Signaling is
determined by the motherboard. Signaling for a 3.3V PCI board is at
3.3V. Signaling for a 5V PCI board is at 5V. Signaling for a universal
PCI board is at 3.3V or 5V.
All PCI connectors require four power rails: +3.3V, +5V, +12V, and -
12V. The distinction between a 3.3V and 5V PCI boards is in the
signaling protocol, not the connector power rails. The maximum
power allowed for a PCI board is 25 Watts from all four power rails
combined.
Bus Structures and Types 3-71Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
PCI Bus Architecture
PCI Board Connections
PCI Boards are shown with the solder side up because this is the
orientation in many PCI systems.
3-72 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI SBus card
You will find a number of scsi connections within the Exx00 servers.
Single-Ended Fast/Wide (SunSwift), part number 501-2739
There are sbus scsi cards, pci scsi cards, and each I/O board has an
on-board scsi port which is driven by a FEPS chip on the board.
Bus Structures and Types 3-73Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI PCI card
Single-Ended Ultra/Wide (SunSwift PCI), part number 501-2741
This is a PCI SCSI card. Note the driver chip. This card will provide an
Ultra-SCSI bus.
3-74 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI Features - Fast SCSI
Small Computer System Interface Features
The Small Computer System Interface (SCSI)-1 standard defines two
modes of data transfer: asynchronous (handshaking) and synchronous(streamed) mode. SCSI-1 synchronous transfer rates are limited to 5
Mbytes per second. In many environments this is acceptable. But in
configurations with multiple high-performance devices on the bus, 5
Mbytes per second can make the bus a bottleneck.
Besides a better-defined set of required features, the (SCSI)-2 standard
defines several optional features that have an impact on users: Fast,
Wide, differential, and tagged queueing. A specific implementation
can be SCSI-2-compliant, yet implement none of these four features. In
fact, all current Sun Microsystems SCSI disk and CD-ROM products,
as well as the tape drive devices, are compliant with SCSI-2. There are
many more features to the SCSI-2 standard than these four options.
This section discusses only these options, because they are the most
commonly used features of SCSI-2.
Fast SCSI – Higher Bus Speed
The SCSI-2 standard defines an option known as Fast SCSI, which
increases the synchronous transfer rate to 10-Mbytes per second. The
terms Fast SCSI and 10-Mbyte SCSI are synonymous, and are used
interchangeably. The term SCSI-2 is often incorrectly used to mean
Fast SCSI.
10-Mbytes per second, 5-Mbytes per second, and asynchronous
devices can be mixed on a SCSI bus. Transfer rates are negotiated on
an individual basis between the host and each device.
Fast SCSI requires the proper protocol chips in both the host adapter
and device controller, as well as a modified software driver. Solaris 2.0
(and higher) software support fast SCSI.
The SPARC desktop systems developed after the SPARCstation™ 10
offer the fast SCSI host adapter on the system board. There are also
additional host adapter SBus cards available from Sun Microsystems
that support Fast SCSI.
Bus Structures and Types 3-75Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Wide SCSI, Differential SCSI
Wide SCSI – Wider Is Better
In SCSI-1, all data transfer paths are parallel and 8-bits wide. The
SCSI-2 standard defines two options that widen the bus to 16 or 32
bits. Each of these options are referred to as Wide SCSI. Most
implementations of Wide SCSI are 16-bits wide and also implement
the Fast option, thus yielding burst-transfer rates of 20 Mbytes per
second.
Differential SCSI — Less Interference
The SCSI standard defines two types of electrical interfaces: single-
ended and differential. Single-ended uses a 50-pin, high-density,
connector. Differential SCSI uses special hardware drivers and
receivers that reference the signals to each other rather than to ground.
Sun Microsystems differential implementation uses a slightly larger,
industry-standard, 68-pin, high-density connector.
There is no performance benefit to differential SCSI, but it
accommodates considerably longer SCSI bus lengths than does the
single-ended interface. Differential SCSI busses can be up to 25 meters
(82 feet) in length.
Single-ended SCSI is limited to 6 meters (19.7 feet) total bus length. In
fact, the SCSI-2 standard recommends that busses with Fast SCSI
devices be limited to 3 meters. However, with high-quality shielded
cables and proper active (regulated) bus termination, 6-meter Fast
busses are quite reliable.
3-76 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI Termination, Ultra-SCSI
Termination
SCSI buses need to be correctly terminated. If the bus is not
terminated, you may get signal reflections on the bus which will give
SCSI transport errors. There are two types of termination; active (or
regulated) and passive (or standard). Active termination is the better
of the two.
Ultra-SCSI
Ultra-SCSI is also known as Fast-20. It combines the features of Fast
SCSI with Wide SCSI and doubles the transfer rate to 40 MBytes per
second. This increase in transfer rate requires the faster (33MHz) PCI
bus systems to handle the increased transfer speeds.
Bus Structures and Types 3-77Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI Icons, cable quality
Cable Quality
The following figures assume SUN cables are being used. Ensure your
customer is using these cables, or cables of a similar quality.
SCSI icons
Below are the icons which denote single-ended and differential.
The icon on the left is for a single
ended scsi controller or terminator.
To the left is the icon for a
differential scsi controller or
terminator.
3-78 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Conclusion - SCSI Cable Lengths
The Signal Frequency and the Electrical Wiring can then be used to
calculate the Maximum Cable Length. The following tables show the
Maximum Cable Length in meters (m):
Cable length
Signal Freq. Devices Single ended Differential
SCSI-2 Fast/wide 1-16 6.00m 25.00m
Ultra-SCSI 1-4 3.00m 12.50m
Ultra-SCSI 4-8 1.50m 6.25m
Bus Structures and Types 3-79Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
SCSI Implementation on Ex00 I/O Boards
!Caution – You must include the internal cable-lengths of the I/O
boards and peripherals in your calculations.
Device Internal Cable length
I/O boards 0.5 m
Disk boards 1.0 m
I/O Board in Slot 1
This is a special case, since the I/O board in slot 1 drives the internal
CD-ROM and Tape drive.
Rules
E3500 4.5 m cable length supported
E4x00 4.5 m cable length supported
E5x00,6x00 SCSI devices are not supported on slot 1 in an E6500,
apart from the internal CD-rom and tape.
I/O Boards in all other slots
All other slots support 5.5m of cable length
3-80 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Fibre Channel Interface
Fibre Channel
SCSI is by far the most common peripheral interconnect today,
although others are in common use. The primary disk interconnect
used by Sun today is Fibre Channel (FC), an ANSI standard (ANSI
X3T9.3) that defines a SCSI-like command set but which is carried via
a fiber optic connection instead of copper wires. Suns SPARCstorage
Array uses a Fibre Channel connection to carry standard SCSI-2
commands and data. Although Fibre Channel is an ANSI standard, it
has been brought under the SCSI-3 umbrella. Future FC standards will
be generated as a subset of the SCSI-3 specification, which includes a
bewildering variety of options, for command sets, interconnect media,
and interoperability.
Fibre Channel Topologies
The familiar SCSI-2 really has only one or two ways to connect: a tree
of peripherals is connected to a host. Alternatively, the peripheral tree
is connected to two hosts via some sort of multi-initiator arrangement.
Fibre Channel has three very different topology options:
● point-to-point, in which a device connects to exactly one other
device;
● arbitrated loop (normally abbreviated FC-AL), in which the
peripherals and one or more hosts are connected together in a ring
topology using many point-to-point links;
FC-AL is architecturally similar to a full-duplex FDDI;
● fabric, in which switches and hubs are used to create an arbitrarily
complex network, possibly including multiple paths from a host to
a peripheral.
These topologies are shown overleaf.
Bus Structures and Types 3-81Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Topologies
3-82 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
World wide numbers (WWN)
Fibre Channel devices use a flat, universal addressing structure in
which every device is assigned a unique address, known as the world
wide name (WWN). The WWN must be unique in the FC topology;
because Fibre Channel domains can potentially be connected into
arbitrary fabrics, the usual practice is to assign completely unique
WWNs to devices, in much the same way that Ethernet addresses are
assigned uniquely.
The SPARCstorage Array uses the simplest of these options, a point-to-
point link that connects a disk array controller to one or two hosts. The
controller connects to a host via a point-to-point link using a two-
strand fiber cable. Fibre Channel is a full-duplex medium, requiring a
strand for each direction. The SPARCstorage Array can be connected
to two hosts through the simple expedient of having two
(independent) FC interfaces. Expanding the point-to-point mechanism
into a more complex network is impossible without resorting to hubs
and switches and the use of a fabric.
Fibre Channel Transfer specifications
The FC standard defines several classes of signal, corresponding to
different capabilities when combined with actual fiber connectors.
Each signal type uses a different type of laser, so the varieties are not
interchangeable. The classes are normally described in terms of their
data speed, or 25 MB/sec, 50 MB/sec, and 100 MB/sec. Because FC is
a full-duplex standard, transferring between two devices at double
these speeds is theoretically possible, although in practice few devices
are capable of handling this much data. Although Sun has fielded over
20,000 SPARCstorage Arrays using FC-25, the industry as a whole
deferred acceptance of Fibre Channel until the arrival of FC-100 parts.
The market seems to have bypassed FC-50 completely. A few vendors
are now delivering products capable of FC-100 interoperability, but
little volume has been achieved to date (mid 1996). However, every
major storage vendor is planning FC-100 products in late 1996 or early
1997, and a safe bet is that high-end storage will be dominated by FC-
100 products by 1998.
Bus Structures and Types 3-83Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Fibre Channel Distance Capability
One of the most useful capabilities of the FC medium is that its lasers
are capable of transmitting signals reliably over distances that are far
in excess of those attainable using standard copper SCSI technology.
Whereas SCSI-2 is limited to six meters in single-ended
implementations and 25 meters using differential transceivers, Fibre
Channel uses 50 micron multimode fiber capable of 2 km transmission
distance, although Sun itself offers cable lengths only up to 15 meters.
The FC standard permits distances up to 10 km.
One of the most useful capabilities made possible by Fibre Channel is
the ability to geographically disburse storage across much wider
distances than with other technologies.
With a practical cabling distance of several kilometers, it is possible to
mirror data onto two different disk arrays located on opposite ends of
a campus, or even nearby in a metropolitan area. Because the FC
connection operates at full disk subsystem speed, disaster recovery can
be simplified without loss of performance. This capability is similar to
the those offered by a few mainframe disk vendors, with one major
exception: the FC operates at full FC speeds with negligible
transmission latency, whereas the wide-area disk mirroring available
on some mainframe storage units is subject to significant delays due to
wide-area networking latency. For bandwidth-sensitive applications,
3-84 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Fibre Channel Cable
As its name implies, the fibre channel devices use a glass fibre instead
of a copper wire to carry the signal from the source to the destination.
The glass fibre shown in Figure 3-1 is about the thickness of three
sheets of paper.
Figure 3-1 Cross Section of a Fibre Optic Cable
The jacket on the fibre-cable provides something a connecting device
can bond with because the glass fibre is too thin and fragile for direct
access. The connector ends of the cable are precession manufactured to
guide the end of the glass fiber so it matches up exactly with the
transceiver port. If the glass fibre is not aligned perfectly with the laser
LED, the light does not pass along the cable.
!Caution – Be careful how you handle fibre cable. It has a minimum
bend radius which must not be exceeded.
Buffercoating
125 microncladding
glass fibre
62.5 Microncore of pure
Bus Structures and Types 3-85Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
3
Fibre Channel Interface - FC/OM and GBIC
The jacket helps prevent the cable from being bent or kinked. Any
damage to the glass causes a loss of signal. If the cable is bent sharply,
the laser beam will not go around the corner. If the cable is cracked or
crushed, the laser beam bounces back because it cannot pass through.
Figure 3-2 FC/OM and GBIC Optical Cable and Connector
The fibre channel optical module (FC/OM, predecessor to the GBIC)
and GBIC fibre cable plug and module connectors are keyed so they
can connect together only one way. Always observe the two pieces and
ensure they are properly aligned before connecting them.
Dual Porting
Fibre-channel allows disk drives and arrays to be dual ported. This
gives a great RAS advantage; alternate pathing or dynamic multi-pathing (DMP) software can be installed which protects the storage
from a failing I/O path.
Dual porting has implications for device addressing, which we shall
look at in chapter 5.
3-86 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
CPU/MemoryandClockBoards 4
4-87Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Sun Enterprise 3x00/4x00/5x00/6x00 CPU/Memory Boards
CPU/Memory+ board block diagram showing the major component groups and the
interconnecting buses.
The CPU/Memory+ board includes
An Address Controller (AC+),
8 x Data Controllers (DC+s),
A Bootbus Controller, also known as the fhc
Onboard devices (including a Flash PROM, and SRAM),
Two UPA bus CPU processor slots
4-88 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU/Memory Board - Overview
CPU/Memory Board Component Layout.
Note the plastic cover over the address and data controllers. It is there
to prevent the heatsinks being knocked loose on a board insertion or
removal.
Loose heatsinks cause us many problems in the field with unreliability.
If you find a loose heatsink in the field, replace the board.
The older boards do not have this cover. Be especially careful with
these boards.
CPU/Memory and Clock Boards 4-89Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU/Memory Board - Physical
A 501-2976 support 2MB cache modules and run at 83 MHz
A 501-4312 support 8MB cache modules and run at 83MHz
A 501-4882 support 8MB cache modules and run at 100 MHz
4-90 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Memory DIMMs
Each CPU/Memory+ board has 16 DIMM sockets, which are divided
into two banks of 8 DIMMs each.
Bank 0 and bank 1 DIMMs occupy alternate slot locations; bank 0
DIMMs are in the even numbered slots, and bank 1 DIMMs are in odd
numbered slots.
Memory DIMMs come in sizes ranging from 8 MBytes to 128 MBytes
each.
Memory must be installed in a complete bank of eight DIMMs with
each DIMM being the same size, type, and speed. Bank 0 can contain
different size DIMMs than bank 1.
UPA Ports
Proc 0 is assigned the first port number associated with the slot, proc 1
the second.
DC - DC convertors
These ensure that the CPU modules get the correct voltage they
require. Yo do not necessarily have to upgrade a CPU/Memory board
if you upgrade the CPU module
CPU/Memory and Clock Boards 4-91Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU Modules
Processing power on each CPU/Memory+ board is provided by one or
two UltraSPARC II CPU modules, with one to four Mbytes of local
high-speed external cache memory. Supported modules are as listed
below.
167 MHZ, 0.5/1.0 MB Ecache
250MHZ, 1.0/4.0 MB Ecache
336MHZ, 4.0 MB Ecache
400MHZ, 4.0/8.0 MB Ecache
Figure 4-1 UltraSPARC II CPU Module
A CPU/Memory+ board is not required to contain an UltraSPARC II
processor module and can operate as a memory-only board.
288 Pin144 Pin
Screws
Screws
ConnectorConnector
4-92 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
400 MHz, 8 MB Ecache processor modules
When trying to install Solaris 2.5.1 HW 11/97 or 2.6 HW 3/98 on a
Ex000 server with a 400MHz/8MB cache CPU module, booting from
CD-ROM or network install server gives the error message:
Fast Data Access MMU Miss error
or panics with;
mutex_enter: bad mutex.
This is because there is no support for the 8MB cache without the
following patches. The procedure is as follows.
NOTE: This procedure requires downloading and applying patches so
the install client must have a network connection.
1. Verify OBP version by typing at the ok prompt be typing
ok .version
Or check at the UNIX prompt.
# /usr/sbin/prtconf -V
If needed, upgrade to at least flash PROM version 3.2.21 using patch
103346-22 or greater.
2. ok setenv auto-boot? false
3. ok reset
4. ok limit-ecache-size
5. ok boot cdrom (at least 2.5.1 HW 11/97 or 2.6 HW 3/98)
6. Install the OS but do not allow auto-reboot!
7. # init 0
8. ok reset (usually not needed with 2.6)
CPU/Memory and Clock Boards 4-93Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
9. ok limit-ecache-size
10. ok boot
11. Make sure you have a network connection, FTP to
sunsolve.sun.com and get latest kernel patch (minimum levels to
support 400 mhz/8mb cache listed):
Solaris 2.5.1 --> 103640-27 and prtdiag patch104595-08.
Solaris 2.6 --> 105181-14
12. Change run level to single-user mode using init S.
13. Install patches from ftp download directory.
14. Reboot.
4-94 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU Module Handling Precautions
Use the following precautions when handling UltraSPARC II modules:
!Caution – Do not handle the modules by touching the gold pins on the
compression connectors. The natural oils on your hands causes these
connectors to oxidize and corrode over a period of time. Corroded
connector pins cause the module to fail, requiring you to replace the
module again.
!Caution – Handle the UltraSPARC modules by the edges only. Do not
handle them by the heatsinks because they can break easily.
Warning – The heatsinks attached to the UltraSPARC processor chip
can get very hot. Avoid touching the heatsink because you can get a
severe burn. You could damage the module if you drop it.
CPU/Memory and Clock Boards 4-95Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Removing and Replacing a CPU Module
Use a 3/32 hex-driver to loosen all screws on each of the compression
connectors on the module to be removed (three screws for the 288-pin
connector, two screws for the 144-pin connector).
Lift the module straight up, off the board mating surface and the
single standoff that positions the module on the board.
Figure 4-2 Removing a CPU Module
4-96 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Removing and Replacing a CPU Module
Each module is located on the main board with a single standoff and is
connected to the main board by two spring loaded connectors. The
pins within the connectors are compressed to the corresponding
board’s mating surfaces by a compression bar which, when secured
with screws, connects the module connector pins to the board’s
corresponding connector surface.
To ensure that the connectors are correctly aligned, you must align the
post on the MLB with the corresponding hole in the module. When
you have the post and hole aligned, you can insert the five hex-
socketed screws and finger tighten them. Now you must torque the
screws, in the order described below to six inch-pounds using the
torque-driver (Sun part number 560-2324) supplied with the system.
Ignore the reference to Method B. The torque sequence has gone
through a number of changes.
Take up the slack on each screw, then go
around the screws in the order shown
putting a 1/4 turn on each screw.
Each screw should reach the correct torque
setting at the same time.
FOLLOW THIS PROCEDURE. IT IS
IMPORTANT.
DO NOT MAKE UP YOUR OWN
SEQUENCE.
DO NOT RUSH THIS PROCEDURE.
CPU/Memory and Clock Boards 4-97Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Memory Interleaving
Enterprise servers allow up to 16-way interleaving. There is an OBP
parameter which sets up interleaving.
memory-interleave
min disables interleaving, max sets interleaving to the maximum
possible factor. How you populate memory will have a major effect on
system performance. The rules are below.
Note – You must set memory-interleave=min to allow dynamic
reconfiguration of CPU/Memory boards
Memory Configuration Rules
The following rules apply to configuring the systems memory:
● DIMMs are 72-pin.
● Eight DIMMs form a bank.
● All DIMMs in a bank must have the same capacity.
● The first bank of memory can be either Bank 0 or Bank 1.
● There is a better performance from mixing many smaller banks
than fewer bigger banks.
● Install one bank on each CPU/Memory board before installing the
second bank on any board.
● Install the largest density banks (128MB DIMMs) first, then
medium density banks (32MB DIMMs), and finally the smallest
density banks (8MB DIMMs).
All DIMMs in a bank should have the same speed rating. If DIMMs of
different speeds are mixed in a bank, the bank will function, but at the
lowest speed.
4-98 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU/Memory Board Status Indicators
CPU/Memory+ boards have three LEDs indicating the status of that
board.
With the advent of dynamic reconfiguration (DR), the meaning of the
amber service LED has changed.
Before DR, the only time a board had an amber light on was when it
had failed POST. The correct meaning of the amber light on as
highlighted below, is the board is in low power mode. Either it has a
fault or it has been DR’d out.
Table 4-1
Power Service Running Condition
Off Off Off Board has no electrical power
Off On Off Board is in low power mode, can beunplugged
Off Off On Undefined
Off On On Undefined
On Off Off System is hung, either in POST/OpenBootor in the operating system
On Off On Hung in OS
On On Off Hung in POST/OBP or hung in OS andhas failed component on board
On On On Hung in POST/OBP or hung in OS andhas failed component on board
On Off Flash OS running
On On Flash OS running and failed component onboard.
On Flash Off Slow flash = POST. Fast flash = OBP.
On Flash On Undefined
CPU/Memory and Clock Boards 4-99Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
CPU/Memory Board Status Indicators
The General Rules
The following lists the general LED condition rules for the
CPU/Memory+ boards:
● If no LEDs are lit, there is no electrical power to the board.
● If the green Power and Running LEDs are not lit, and only the
amber light is lit, the board is ready for removal.
● If no LEDs are flashing, the system is hung or in the process of
booting up.
● It used to be the case that the board required service if the amber
Service LED was lit continuously (not flashing).
The amber light is not a fault light, it is a low power indicator.
There may well be a fault, or equally the board my have been
dynamically reconfigured out of operation
● It is a normal condition for the Service LED to flash during POST
testing.
4-100 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Introduction
There are, at the time of writing, four different clock boards. The main
difference between them is the clock ratio.
501-2975 provides a 1:2 clock ratio.
501-4286 supports 1:2 and 1:3 clock ratios.
501-4946 supports 1:2, 1:3, and 1:4 clock ratios.
501-5365 supports 1:2, 1:3, 1:4, 1:5, and 1:6 clock ratios.
Now, these ratios are used to derive the gigaplane frequency. The maximum speedis 100 MHz.
So, for example, for 400 MHz processors you would need a 501-4946
Note – Full details of which clock board is used alongside which
processor module, is provided in the FE Handbook.
CPU/Memory and Clock Boards 4-101Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Block Diagram
The Clock+ board block diagram below shows a high level view of the
functionality of the Clock+ board.
Clock+ Board Block Diagram
The Clock+ board consists of the following subsystems:
● Console Bus
● Clocks
● Reset logic
● JTAG logic and interface port for factory testing only
● Centerplane connector signals monitoring
LEDs
Serial ports
Keyboard/mouse
Console
Clocks
Reset
JTAG
Clock Frequency
led [2.0]
Console bus
Clock bus
Reset bus
JTAG bus
Centerplaneconnector
ResetButton
ResetButton(xir)
4-102 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board - Physical
Backpanel and Connectors
CPU/Memory and Clock Boards 4-103Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Console Bus
Note – The console bus passes information such as enviromental
information and POST around. It is a ‘back door’ path between
boards.
4-104 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Console Bus
Console Bus
The Console Bus provides CPU/Memory+ boards access to global
system control and status as well as to the keyboard, mouse, and serial
ports. In addition, there is a NVRAM/time of day (TOD) chip that
maintains the date and time and 8 Kbytes of data when the power to
the system is shut off.
The state of physical hardware conditions is maintained in registers on
the Clock+ board. Each of these registers has inputs generated from
other subsystems on the Clock+ board, from other boards, or from the
power supplies in the system. Some Clock+ board registers are
reserved for controlling various states of the machine.
The Clock+ board allows you to connect an ASCII terminal to the
serial port and a Sun keyboard and mouse to the keyboard port. This
allows you to interface to the local system console. The serial port
allows POST messages to be displayed to a local ASCII terminal.
You can configure the serial port for standard serial devices, such as
modems and printers.
Clocks
The clock subsystem generates the clocks for the entire system. The
base clock is synthesized and then divided into various frequencies.
These clock signals are then distributed to the centerplane by an array
of driver chips. Two clocks for processor slots and one system timing
clock go to each of the board slots on the centerplane.
Clock synthesizer and drivers. The clock synthesizer generates the
base clock signal, which is divided into several different signals by the
clock divider. These clocks are then distributed to the centerplane by
the clock drivers.
CPU/Memory and Clock Boards 4-105Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board - Overview
Reset Logic
Generates and sends reset commands to all system boards when either
an XIR or POR reset signal is received.
TOD/NVRAM
Centralized Time-of-day (TOD) chip that includes NVRAM. You can
copy the contents to each I/O board in the system for redundancy and
backup
Serial, keyboard and mouse ports
There are two tty connections, along with the kbd/mouse.
JTAG
There is a JTAG (Joint Test Action Group) connection between the
system ASICs and the Clock board. POST information is passed
around the system via JTAG. There is a further connection on the clock
board which is blanked off and used for factory testing only.
4-106 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Reset Logic
There are four circuits that control system reset and error state.
● Manual Reset
● System Reset
● System Error Reset
● Externally initiated reset (XIR)
We can initiate resets in a number of ways:
● Power the machine off & on. This is the Power-on Reset POR
● Type reset at the ok prompt. This is a software reset SOR
● Use the Reset Buttons on the clock board. The button labelled POR
will initiate a power-on reset.
The button labelled XIR will run an externally initiated reset (See
below)
● We can use the remote console commands
Remote Console Commands
The remote console feature is a very basic method of controlling the
Exx00 servers. A customer may send reset commands to the servers
via the ttya port. The system is constantly monitoring ttya for the
commands listed below.
CR CR ~ CNTL SHFT P Power cycle reset
CR CR ~ CNTL SHFT R Software reset
CR CR ~ CNTL SHFT X XIR
On receiving the key sequences on ttya, the system will initiate the
appropriate reset.
CPU/Memory and Clock Boards 4-107Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board - XIR resets
Note – The secure position of the keyswitch disables the remote
console.
Enter remote console characters with a 0.5 to 5 second delay.
Externally Initiated Reset XIR
This is a useful reset to use if you are resetting a hung
machine. When an XIR occurs, memory is cleared and a
“snapshot” of the CPU registers and processes is saved.
To view this snapshot of CPU registers, you must be at the ok
prompt. Type
OK.xir-state-all
This displays information similar to the following:
CPU ID#1TL=1 TT=3TPC=e0028688 TnPC=e0028688 TSTATE=9900001e06CPU ID#5TL=1 TT=3TPC=e002755c TnPC=e0027560 TSTATE=4477001e03
Note – It is outside the scope of this course to go into decoding the XIR
log reports. An XIR does not override the NVRAM auto-boot?variable.
You can initiate an XIR either by using the XIR button on the
Clock+ board or the remote console XIR sequence.
4-108 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
Clock+ Board Status Indicators
LED States
Note – The Clock+ Board LEDs display the same information as the
system LEDs.
This has led people in the past to assume that the clock board has a
fault on it.
Always check for other fault conditions before assuming a clock board
fault.
Table 4-2 Clock+ Board LED States
Power Service Cycling Condition
Off Off Off No power
Off On Off Failure mode
Off Off On Failure mode
Off On On Failure mode
On Off Off Hung in POST/OPB or OS
On Off On Hung in OS
On On Off Hung in POST/OBPHung in OS / failedcomponent
On On On Hung in POST/OBPHung in OS/ failedcomponent
On Off Flashing OS running normally
On On Flashing OS running / failedcomponent
On Flashing Off Slow flash=POSTFast flash=OBP
On Flashing On OS or OBP error
CPU/Memory and Clock Boards 4-109Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
4
4-110 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
I/OBoards 5
5-111Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
I/O Boards
Types of I/O Boards:
The enterprise systems support the five types of I/O boards identified
as follows.
● Type 1 – SBus I/O board with FC-OM Fibre Channel
● Type 4 – SBus+ I/O board with FC-AL Fibre Channel
● Type 2 – Graphics I/O board with FC-OM Fibre Channel
● Type 5 – Graphics+ I/O board with FC-AL Fibre Channel
● Type 3 – PCI+ I/O board
The + denotes boards capable of connecting to the 100MHz Gigaplane
bus in the X500 series. Each board has three LEDs that provide board
status codes.
I/O Addressing
It is essential that you fully understand how disk subsystems,
networks, SBus cards, PCI cards are addressed.
If your customer has errors on the database /engineering/parts, you
need to find where this partition is mounted.
If your customer tells you hme4 is faulty, where do you start?
5-112 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
I/O Addressing
We will be going through many examples of I/O addresses.
Physical paths are derived using UPA port numbers and device driver
names.
These are the most common driver names that may appear in a device
path.
fas - driver for fast/wide SCSI FEPS controllers
hme - driver for Fast Ethernet
isp - driver for differential SCSI controllers and the SunSwift card
sf - driver for soc+ or socal Fiber Channel Arbitrated Loop (FC-AL)
soc - driver for SPARC Storage Array (SSA) controllers
socal - driver for serial optical controllers for FCAL (soc+)
pln - SPARCstorage Array Nexus Driver
System Slot 1
Slot 1 in an Enterprise server will always have an I/O board installed,
since it is the on-board SCSI FEPS chip, which drives the internal CD-
ROM and tape drive.
I/O Boards 5-113Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SBus I/O Boards
Block diagram of the SBus I/O board showing two SBuses connecting
the components and SBus card slots.
Onboard devices include a Flash PROM, SRAM, and environmental
sensors.
5-114 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SBus I/O Board – Type 1
The Type 1 was the original 83 MHz SBus I/O board.
The SBus+ I/O board provides the following interface connections:
● Two SBus channels for three SBus slots
● SunFastEthernet
● Fast/wide SCSI-II
● Two OLC sockets for FC/OM (Fibre Channel – Optical Module)
interface converter modules
Part Numbers 501-2977, 501-4287, (83 MHz)
SBus 0
SBus 2SBus 1
I/O Boards 5-115Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SBus + I/O Board – Type 4
A Type 4 I/O board is the newer 100 MHz SBus I/O board, which
differs from a Type 1 in its on-board serial optical controller.
The SBus+ I/O board provides the following interface connections:
● Two SBus channels for three SBus slots
● SunFastEthernet
● Fast/wide SCSI-II
● Two FC-AL sockets for hot-pluggable gigabit interface converter
(GBIC) modules
Part Numbers 501-4266 (83 MHz), 501-4883 (83, 90, 100MHz)
SBus 0
SBus 2SBus 1
5-116 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SBus I/O Board – Type 1, Physical layout
This is the original dual SYSIO board. Type 1 boards have an on-board
SOC chip, which drives two on-board Fibre channel optical modules
(FC-OM). These are otherwise known as optical link controllers (OLC).
The on-board FC-OMs are used to drive a Sparc Storage Array. You
may drive 2km of fibre cable from these boards.
Note the connector layout. pln@a is on the right and pln@b is on the
left.
I/O Boards 5-117Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SBus + I/O Board – Type 4, Physical layout
A Type 4 board has an on-board SOC+, otherwise known as the socal
(SOC arbitrated loop).
The SOC+ drives two on-board GBICs, which are used to drive the
A500 disk systems. You may drive 500m of fibre cable from these
boards.
The GBIC on the right is addressed as sf@0, the one on the left is
addressed as sf@1
5-118 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Graphics I/O Boards
The Graphics+ I/O is similar to the SBus(+) I/O board with the
following differences:
● The Graphics I/O boards (Type 2 and Type 5) have one SBus
implemented with one SYSIO chip with two SBus card slots.
● The Graphics I/O board has one UPA port number assigned to the
SYSIO chip, and one UPA port for a fast-frame buffer.
I/O Boards 5-119Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Graphics I/O Board – Type 2
The Graphics I/O board shown below provides you with the SBus you
need and a UPA interface for those systems on which you need to
install a monitor.
The Graphics I/O board provides the following interface connections:
● One SBus channel, for two SBus slots
● One UPA slot for Creator and Creator3D graphics cards
● SunFastEthernet
● Fast/wide SCSI-II,
● Two OLC sockets for FC/OM interface converter modules
Part Numbers 501-2749, 501-4288 (83 MHz),
SBus 0
SBus 1
UPA Bus
5-120 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Graphics+ I/O Board – Type 5
The Graphics I/O board shown below is the 100 MHz “+” version of
the type 2 board.
The Graphics I/O board provides the following interface connections:
● One SBus channel, for two SBus slots
● One UPA slot for Creator and Creator3D graphics cards
● SunFastEthernet
● Fast/wide SCSI-II,
● Two FC-AL sockets for hot-pluggable gigabit interface converter
(GBIC) modules
Part Number 501-4884 (83, 90, 100MHz)
SBus 0
SBus 1
UPA Bus
I/O Boards 5-121Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Graphics I/O Board – Type 2, Physical layout
The difference from a Type 1 is that both sbus0 and sbus2 are driven
from one SYSIO chip, which takes the second UPA port number for the
board.
The first UPA port number is assigned to the Creator 3d graphics card.
5-122 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Graphics+ I/O Board – Type 5, Physical layout
The difference from a Type 4 is that both sbus0 and sbus2 are driven
from one SYSIO chip, which takes the second UPA port number for the
board.
The first UPA port number is assigned to the Creator 3d graphics card.
I/O Boards 5-123Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
PCI+ I/O Board – Type 3
The PCI+ I/O board provides the following interface connections:
● There are risers for 32- or 64-bit cards, 33- or 66-MHz cards, and
3.3- or 5-volt cards. The riser must match the specification of the
PCI card used
● One on-board 10/100-Mb-per-second Ethernet port (twisted pair)
● Ultra SCSI
5-124 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
PCI+ I/O Board – Type 3
The diagram of the PCI interface board shown below has two PCI
interface connectors to which you must connect a riser for the specific
type of PCI card you are installing.
The PCI+ I/O board provides the following interface connections:
● Four PCI bus channels for two configurable interface riser card
slots
● SunFastEthernet
● On-board SCSI implemented by an ISP 1040 controller, which
gives an Ultra SCSI connection.
Note: Ultra SCSI transfer rates are not supported as of 6/98, and
should be disabled.
Refer to PCI I/O Product Note, 805-3364-10 of September 1997.
Part Numbers 501-4325 (83 MHz), 501-4926 (100MHz)
PCI Bus 1PCI Bus 0
I/O Boards 5-125Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
PCI+ I/O Board – Type 3, Physical layout
Type 3 boards have PSYCHO chips instead of SYSIO chips.
PCI0 on the right takes the first UPA port number.
PCI1 on the left takes the second UPA port number.
5-126 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
PCI+ I/O Board – Type 3 Port Definitions
/pci@ x,4000/SUNW,hme@1,1
is the device path (or physical name) for the onboard fast ethernet porton a PCI I/O board. This port is controlled by the PCI 0 Psycho chipon the board.
/pci@ y,4000/SUNW,isptwo@3
is the device path (or physical name) for the onboard UltraSCSI porton a PCI I/O board. This port is controlled by the PCI 1 Psycho chipon the board.
The pci slot labelled J3200 is driven from PCI0 and has a device pathbeginning with
/pci@ x,2000/
which denotes that it can drive pci cards at 33MHz or 66 MHz
Similarly, the pci slot labelled J4200 is driven from PCI1 and has adevice path beginning with
/pci@ y,2000/
which denotes that it can drive pci cards at 33MHz or 66 MHz
I/O Boards 5-127Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Board Status Indicators
CPU/Memory+ boards and I/O boards have three LEDs indicating
the status of that board.
With the advent of dynamic reconfiguration (DR), the meaning of the
amber service LED has changed.
Before DR, the only time a board had an amber light on was when it
had failed POST. The correct meaning of the amber light on as
highlighted below, is the board is in low power mode. Either it has a
fault or it has been DR’d out.
LED Status Codes
Table 5-1 LED Codes for the CPU/Memory+ and I/O Boards
Power Service Running Condition
Off Off Off Board has no electrical power
Off On Off Board is in low power mode, can beunplugged
Off Off On Undefined
Off On On Undefined
On Off Off System is hung, either in POST/OpenBootor in the operating system
On Off On Hung in OS
On On Off Hung in POST/OBP or hung in OS andhas failed component on board
On On On Hung in POST/OBP or hung in OS andhas failed component on board
On Off Flash OS running
On On Flash OS running and failed component onboard.
On Flash Off Slow flash = POST. Fast flash = OBP.
On Flash On Undefined
5-128 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Board Status Indicators
The General Rules
The following lists the general LED condition rules for the
CPU/Memory+ and I/O+ boards:
● If no LEDs are lit, there is no electrical power to the board.
● If the green Power and Running LEDs are not lit, and only the
amber light is lit, the board is ready for removal.
● If no LEDs are flashing, the system is hung or in the process of
booting up.
● It used to be the case that the board required service if the amber
Service LED was lit continuously (not flashing).
The amber light is not a fault light, it is a low power indicator.
There may well be a fault, or equally the board my have been
dynamically reconfigured out of operation
● It is a normal condition for the Service LED to flash during POST
testing.
I/O Boards 5-129Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
Enterprise 3500 Fibre Channel Interface Board
This is a new board designed to provide connectivity to the internal
disk drives in the Sun Enterprise 3500 server. The internal disk drives
operate with the fibre channel arbitrated loop (FC-AL) architecture.
Each of the four potential FC-AL loops corresponds to one of four
gigabit interface converter (GBIC) modules on the Fibre channel
interface board.
GBIC LAGBIC LB
GBIC UA
GBIC UB
Part Number 501-4820
5-130 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
SCSI Disk Board
You can install up to four SCSI disk boards in the Sun Enterprise 4x00,
and 5x00 systems and two in the Sun Enterprise 6x00. Each SCSI disk
card can contain one or two, 2.1, 4.2 or 9.1 GByte 7200 RPM disk
drives.
SCSI Disk Board Addressing
SCSI disk addressing is dependent on drive position and gigaplane
slot the SCSI disk board is plugged into. We will cover addressing in
chapter 8.
Part Numbers 501-3113 (no disks) 501-4168, 501-5137
High densityUltraSCSI connector
I/O Boards 5-131Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
5
5-132 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
OpenBootPROM/NVRAM 6
6-133Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Introducing OBP
History
The original SPARC boot PROM was based on revision 1.x
A boot command at this revision was of the form
>b sd(3,0,0)
The first open boot PROM was OBP 2.x The disadvantage with this
revision, was that to upgrade the firmware, you had to change the
chip.
Enterprise servers operate on OBP3.x which has the advantage that it
is downloadable.
The OpenBoot architecture provides a significant increase in
functionality and portability when compared to proprietary systems of
the past. Although this architecture was first implemented by Sun
Microsystems as OpenBoot on SPARC systems, its design is processor-
independent.
!Caution – Don’t get mixed up between NVRAM and OBP.
The OBP holds Device drivers, POST code and provides some user
diagnostics.
The NVRAM holds the hostid, MAC address, time-of-day and
parameters which dictate how the OBP code will interact with the
system.
Refer back to your desktop course notes.
6-134 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Introducing OBP (cont)
Open Boot PROM on each CPU/Memory Board
The proms on each CPU/Memory board all contain the same OBP and
POST and should all be at the same revision. The OBP loaded into
memory at boot time will be from the POST master.
Open Boot PROM on each I/O Board
The proms on the I/O boards will hold FCODE and iPOST specific to
that type of board.
Master NVRAM
Resides on the Clock board.
Backup NVRAM
Reside on each I/O board. There are no backup NVRAM chips on the
CPU/Memory boards.
Open Boot PROM/NVRAM 6-135Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Introducing OBP (cont)
POST and OpenBoot work together in the system to test and manage
system hardware. When the system is turned on, or if a system reset is
issued, POST detects and tests buses, power supplies, boards, CPUs,
DIMMs, and many board functions.
Only POST can configure the system hardware at power up, and only
POST can enable hot-pluggable boards (if DR and AP are not present
and operating).
ok prompt
Once POST is completed, OBP checks the NVRAM parameters to see
how it should configure the system. The OBP is then loaded into main
memory. The system may then return to the ok prompt, assuming it
has been setup to do so.
{6} ok
Note – The number proceeding the ok prompt specifies the
POST/JTAG master. It is usually the first CPU module in the system.
6-136 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Features of OBP
Plug-in Device Drivers
A plug-in device driver is usually loaded from a plug-in device,
such as an SBus card. You can use a plug-in device driver to boot
the operating system from a device other than the default boot
device. Another example would be to display text on an output
device, other than the one attached to ttya, before the operating
system has loaded its own device drivers.
FCode Interpreter
Plug-in drivers are written in a machine-independent interpreted
language called FCode. Each OpenBoot system PROM contains an
FCode interpreter. This means that the same device and driver can
be used on machines with different types of CPUs (SPARC, Intel).
Device tree
The device tree is a data structure describing the devices
(permanently installed and plug-in) attached to a system. Both the
user and the operating system can determine the hardware
configuration of the system by inspecting the device tree.
Forth toolkit
The OpenBoot User Interface is based on the interactive
programming language Forth. You can combine sequences of user
commands to form complete programs. This provides a powerful
capability for debugging hardware and software.
Open Boot PROM/NVRAM 6-137Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Features of OBP (cont)
Flash Programmable
This makes upgrading the system’s POST, OBP, and I/O devices
Fcode fast, easy, and inexpensive.
You can upgrade several Sun Enterprise servers with little
downtime to the enterprise. The new OBP program information
can come from a CD-ROM or a network server.
POST
The code to run power on self tests resides within the OBP chip. It too
can be upgraded to include tests for new boards which come out.
6-138 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Recovery Features
These keyboard functions reset variable parameters in the
NVRAM configuration file.
Note – These keyboard functions work only from a local keyboard.
They do not work from an ASCII terminal or remote access terminal
connected to the systems serial port A.
If your system is down because it does not complete POST, you must
connect a Sun keyboard to the keyboard connector to enable these
recovery functions.
To activate these recovery functions:
1. Start with power off.
2. Press and hold the Stop key and action key simultaneously.
3. Apply power to the system while continuing to hold the keys
down until the keyboard LEDs flash.
The key combinations and functions available are:
Stop-F
Forces I/O to ttya. Enter Forth command mode on ttya before
probing hardware. Use fexit to continue probing hardware.
Stop-N
Resets NVRAM contents to default values.
Stop-D
Sets the diag-switch? parameter variable to true and enables
verbose output during POST.
Open Boot PROM/NVRAM 6-139Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
The OBP User Interface
The OBP user interface is based on an interactive command interpreter
that gives you access to an extensive set of functions for hardware and
software development, fault isolation, and debugging.
You can enter the OpenBoot environment, that is, get to the okprompt, in the following ways:
Shutdown the operating system.
# shutdown -y -g0 -i0
Execute the Stop-A keystroke sequence.
You will sometimes see Stop-A referred to as L1-A
Press the reset switch on systems equipped with one
(not recommended unless absolutely necessary).
Power-cycle the system
(also not recommended).
Note – A reset will only get you to the OpenBoot user interface i.e. the
ok prompt if the OBP parameter auto-boot? is set to false
6-140 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
System Testing Commands
The Open Boot PROM contains many commands used to test the
system hardware.
test-all
Tests all devices that have built-in self test methods. Testing starts with
the current device node, or the specified device and includes all
children
test (device-specifier)
Tests the specified device. The NVRAM diag-switch? parameter
and the front panel keyswitch control the verbosity and depth of
the test command.
!Caution – After entering the OpenBoot command to probe something,
a WARNING message is displayed. It informs you that if the operating
system has been running, you must type the reset-all command
before you probe anything. Failure to do this causes the system to
hang (lock up).
probe-scsi
Identifies devices attached to the (primary) SCSI bus.
probe-scsi-all
Identifies devices attached to all SCSI host adapters on all system
boards.
probe-fcal-all
Identifies devices within the E3500 on the FC-AL loops
Open Boot PROM/NVRAM 6-141Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
watch-clock
Tests the clock function.
watch-net
Monitors the network connection.
probe-net-all
Monitors all network connections of built-in and plugged-in
networking cards.
6-142 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Informational Commands
Some OpenBoot commands provide information about the system
components, including their contents if applicable.
banner
Displays the power-on banner.
.enet-addr
Displays the current Ethernet address.
.idprom
Displays the “ID PROM” contents.
.traps
Displays a list of SPARC trap types.
.version
Displays the PROM version for all the boards in the system.
Open Boot PROM/NVRAM 6-143Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
The Device Tree
Devices are attached to a host computer through a hierarchy of
interconnected buses.
OpenBoot represents the interconnected buses and their attached
devices as a tree of nodes.
Such a tree is called the device tree. A node representing the host
computer’s main physical address bus forms the tree’s root node.
The physical address generally represents a physical characteristic
unique to the device (such as the bus address or the slot number
where the device is installed).
The use of physical addresses to identify devices prevents device
addresses from changing when other devices are installed or removed.
Note – The system generates the device tree structure after POST and
passes it to memory.
It is this structure which maps low level addresses to high level
addresses.
E.g. /sbus@3,0/SUNWfas@3,f880000/sd@0,0 maps to
/dev/dsk/c0t0d0s0
6-144 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Typical Device Tree
OpenBoot deals directly with hardware devices in the system. Each
device has a unique name representing the type of device and where
that device is located in the system addressing structure. The
following example shows a typical device tree.
Figure 6-1 Typical Device Tree
machine
ac
fhc sbus
ethernet
SUNW,socal
scsi-disk
scsi-tape
cpu-modulememory
SUNW,hme SUNW,faseeprom
sf
ssd
flashprom
central
fhc
zs
clock-board
eeprom
upa
Open Boot PROM/NVRAM 6-145Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Displaying the Device Tree
You can browse the device tree to examine and modify individual
device tree nodes. The device tree browsing commands are similar to
the Solaris commands for changing (cd), displaying (ls ) and listing
the current directory (pwd) in the Solaris directory. Selecting a device
node makes it the current node.
Table 6-1 Commands for Browsing the Device Tree
Command Description
.properties Displays the names and values of the current node'sproperties.
dev device-path
Chooses the indicated device node, making it thecurrent node.
dev node-name Searches for a node with the given name in the subtreebelow the current node, and choose the first such nodefound.
dev .. Chooses the device node that is the parent of the currentnode.
dev / Chooses the root machine node.
device-end Exits the device tree.
ls Displays the names of the current node's children.
pwd Displays the device path name that names the currentnode.
show-devs[device-path]
Displays all the devices directly beneath the specifieddevice in the device tree. The show-devs command,used by itself shows the entire device tree.
6-146 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Using the .properties Command
The .properties command displays the names and values of all the
properties in the current node:
ok dev /zs@1,f0000000ok .propertiesaddress ffee9000port-b-ignore-cdport-a-ignore-cdkeyboarddevice_type serialslave 00000001intr 0000000c 00000000interrupts 0000000creg 00000001 0000000 00000008name zsok
Using the dev Command
The dev command sets the current node to the named node so you can
be view its contents. For example, to make the ACME company's SBus
device named ACME,widget the current node:
ok dev /sbus/ACME,widget
The find-device command is identical to the dev command,
differing only in the way the input pathname is passed.
ok /sbus/ACME,widget find-device
Note – After choosing a device node with dev or find-device ,
usually, you cannot execute that node's methods because dev does not
establish the current instance. For a detailed explanation of this issue,
refer to Writing FCode 3.x Programs, part number 802-3239-10.
Open Boot PROM/NVRAM 6-147Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Listing System Devices
The show-devs command displays a listing of all devices currently
available in the system. If a device has been added to a disable list
(discussed in the next section) but the system has not been reset or
gone through a POST, the device still shows up on the dev report. A
device can be physically installed in the system chassis but not show
up on the following report because it is listed on the disabled-boardlist . You must remove the entry from the disabled board list after
the board has been replaced. You must do a system reset to enable
POST and OBP to add the device back to the dev listing.
The following device listing is from a Sun Enterprise 4000.
ok show-devs/SUNW,ffb@2,0/counter-timer@7,3c00/sbus@7,0/counter-timer@6,3c00/fhc@6,f8800000/sbus@6,0/counter-timer@3,3c00/sbus@3,0/fhc@2,f8800000/SUNW,UltraSPARC@5,0/SUNW,UltraSPARC@4,0/fhc@4,f8800000/SUNW,UltraSPARC@1,0/SUNW,UltraSPARC@0,0/fhc@0,f8800000/central@1f,0/virtual-memory/memory@0,0/aliases/options/chosen/openprom/packages/sbus@7,0/SUNW,fas@3,8800000/sbus@7,0/SUNW,hme@3,8c00000/sbus@7,0/SUNW,fas@3,8800000/st/sbus@7,0/SUNW,fas@3,8800000/sd/fhc@6,f8800000/sbus-speed@0,500000/fhc@6,f8800000/eeprom@0,300000/fhc@6,f8800000/flashprom@0,0
6-148 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Listing System Available Devices
/fhc@6,f8800000/environment@0,400000/fhc@6,f8800000/ac@0,1000000/sbus@6,0/SUNW,soc@d,10000/sbus@3,0/SUNW,fas@3,8800000/sbus@3,0/SUNW,hme@3,8c00000/sbus@3,0/SUNW,soc@d,10000/sbus@3,0/SUNW,fas@3,8800000/st/sbus@3,0/SUNW,fas@3,8800000/sd/sbus@3,0/SUNW,soc@d,10000/SUNW,pln@a0000000,78c0c9/sbus@3,0/SUNW,soc@d,10000/SUNW,pln@a0000000,78c0c9/SUNW,ssd/fhc@2,f8800000/sbus-speed@0,500000/fhc@2,f8800000/eeprom@0,300000/fhc@2,f8800000/flashprom@0,0/fhc@2,f8800000/environment@0,400000/fhc@2,f8800000/ac@0,1000000/fhc@4,f8800000/flashprom@0,0/fhc@4,f8800000/sram@0,200000/fhc@4,f8800000/environment@0,400000/fhc@4,f8800000/simm-status@0,600000/fhc@4,f8800000/ac@0,1000000/fhc@0,f8800000/flashprom@0,0/fhc@0,f8800000/sram@0,200000/fhc@0,f8800000/environment@0,400000/fhc@0,f8800000/simm-status@0,600000/fhc@0,f8800000/ac@0,1000000/central@1f,0/fhc@0,f8800000/central@1f,0/fhc@0,f8800000/clock-board@0,900000/central@1f,0/fhc@0,f8800000/zs@0,904000/central@1f,0/fhc@0,f8800000/zs@0,902000/central@1f,0/fhc@0,f8800000/eeprom@0,908000/openprom/client-services/packages/disk-label/packages/obp-tftp/packages/deblocker/packages/terminal-emulatorok
Open Boot PROM/NVRAM 6-149Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Listing System Available Devices
!Caution – If you boot the operating system, exit from the operating
system into OpenBoot without resetting the system, then use some
OpenBoot commands, the commands might not work as expected. In
this case, you might have to power cycle the system to restore normal
operation.
For example, suppose you boot the operating system, exit to
OpenBoot, then execute the probe-scsi command. You find that
probe-scsi fails, hangs the system, and you cannot resume (Ok go)
the operating system. To regain control of the system, you must
perform a hardware reset (power cycle or reset switch).
The correct method for executing OpenBoot probe commands is to
reset the system before entering the command. You must type
reset-all as the first OBP command, then invoke the desired probecommand, as shown:
ok reset-allok probe-scsi-all
sifting Command
sifting acts very much like the UNIX grep command. If you have a
command you wish to run and you can’t remember the syntax, type:
ok sifting test
6-150 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Displaying Device Aliases
The devalias command prints a listing of shortcuts or nicknames for
long device addresses. The system has no trouble remembering long
device addresses but humans do. So the device aliases list was created.
You should be familiar with one or two of these aliases, such as diskand cdrom , because you have used both of these to boot the system.
You can always use the entire device path at the OKprompt when
booting.
Systems usually have predefined device aliases for the most
commonly used devices, such as the following listing taken from a Sun
Enterprise 3500.
ok devaliasdisk /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0disksocal /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0disk /sbus@3,0/SUNW,fas@3,8800000/sd@0,0diskbrd /sbus@3,0/SUNW,fas@3,8800000/sd@a,0diskisp /sbus@3,0/QLGC,isp@0,10000/sd@0,0net /sbus@3,0/SUNW,hme@3,8c00000cdrom /sbus@3,0/SUNW,fas@3,8800000/sd@6,0:ftape /sbus@3,0/SUNW,fas@3,8800000/st@4,0scsi /sbus@3,0/SUNW,fas@3,8800000disk0 /sbus@3,0/SUNW,fas@3,8800000/sd@0,0disk1 /sbus@3,0/SUNW,fas@3,8800000/sd@1,0disk2 /sbus@3,0/SUNW,fas@3,8800000/sd@2,0disk3 /sbus@3,0/SUNW,fas@3,8800000/sd@3,0disk4 /sbus@3,0/SUNW,fas@3,8800000/sd@4,0disk5 /sbus@3,0/SUNW,fas@3,8800000/sd@5,0tape0 /sbus@3,0/SUNW,fas@3,8800000/st@4,0tape1 /sbus@3,0/SUNW,fas@3,8800000/st@5,0ttya /central/fhc/zs@0,902000:attyb /central/fhc/zs@0,902000:bkeyboard /central/fhc/zs@0,904000keyboard! /central/fhc/zs@0,904000:forcemodename aliasesok
Open Boot PROM/NVRAM 6-151Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Device Alias Commands
A device alias, or simply, alias, is a shorthand representation of a
device path.
For example, the boot disk, partition a, can be aliased as disk, which
represents the complete device path name to the boot disk drive.
The devalias commands are used to examine, create, and change
aliases
Table 6-2 Device Alias Commands.
!Caution – User-defined aliases are lost after a system reset or power
cycle.
To create permanent aliases, use the nvalias command.
ok devalias diskdisk /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@0,0ok devalias disk /sbus@3,0/SUNW,fas@3,8800000/sd@0,0ok devalias diskdisk /sbus@3,0/SUNW,fas@3,8800000/sd@0,0ok
This changed the default boot disk from one in a storage subsystem
connected to a GBIC (socal@d) to a local disk on a fast SCSI SBus card.
Command Description
devalias Displays all current device aliases.
devalias alias Displays the device path namecorresponding to alias.
devalias aliasdevice-path
Creates and defines an aliasrepresenting device-path. If an aliaswith the same name already exists,the new value supersedes the old.
6-152 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
nvalias command
An easy method of setting up an alias is to use the show-disks
command.
Example
We will set up a boot device on the first disk on a disk board located in
slot 3.
{0} ok show-disksa) /sbus@7,0/SUNW,fas@3,8800000/sdb) /sbus@3,0/SUNW,fas@3,8800000/sdq) NO SELECTION
Enter Selection, q to quit: a
/sbus@7,0/SUNW,fas@3,8800000/sd has been selected.Type ^Y ( Control-Y ) to insert it in the commandline.
e.g. ok nvalias mydev ^Y for creating devaliasmydev for /sbus@7,0/SUNW,fas@3,8800000/sd
{0} ok nvalias bootdisk CTRL-Y
pressing CNTRL-Y here will insert
/sbus@7,0/SUNW,fas@3,8800000/sd. You must add @a,0
To set boot device, the boot-device NVRAM parameter must be
changed:
ok setenv boot-device bootdiskok reset
Open Boot PROM/NVRAM 6-153Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Open Boot PROM Commands for the NVRAM
Whenever you are not sure of the correct command or what the
command is used for, you can ask for help. The OPB displays a listing
of commands available.
ok help
After listing and selecting a command you think might be the one you
want, you can ask for help on that one command.
Type help command-name or help category-name for more specific help.
Note – Use ONLY the first word of a category-name or category
description.
For example, type help select
ok help selectMain categories are:Repeated loopsDefining new commandsNumeric outputRadix (number base conversions)ArithmeticMemory accessLine editorSystem and boot configuration parametersSelect I/O devicesFloppy ejectPower on resetDiag (diagnostic routines)Resume executionFile download and bootnvramrc (making new commands permanent)Enable/Disable selected hardware subsystemsEnvironmental monitor
6-154 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
OBP Commands for displaying and changing the NVRAMParameters
The printenv Command
The printenv command displays NVRAM parameter names, current
values, and default values.
The following is a listing of current parameter names. Each system
type and model can have different parameters available. Desktops
have one set, single main logic board (MLB) servers, such as the Sun
Enterprise 250, have a different set and multiple CPU board servers,
such as the Sun Enterprise 5500 have another set of parameters.
To display the contents of the NVRAM, use the printenv command.
ok printenv
Variable Name Value Default Valuedisabled-memory-listdisabled-board-listconfiguration-policy board componentmemory-interleave max maxdiag-passes 1 1diag-verbosity 0 0diag-continue? false falsetpe-link-test? true truescsi-initiator-id 7 7keyboard-click? false falsekeymapttyb-mode 9600,8,n,1,- 9600,8,n,1,-ttya-mode 9600,8,n,1,- 9600,8,n,1,-ttyb-rts-dtr-off false falsettyb-ignore-cd true truettya-rts-dtr-off false falsettya-ignore-cd true truereboot-flag false falsereboot-posc 4294582272 0reboot-posl 0 0reboot-cmd boot net -r
Open Boot PROM/NVRAM 6-155Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Open Boot PROM Commands for the NVRAM
diag-level min minenv-monitor enabled enabled#power-cycles 4system-board-serial# 802F01F0system-board-date 34cf6a6bfcode-debug? false falseoutput-device screen screeninput-device keyboard keyboardload-base 16384 16384boot-command boot bootauto-boot? true trueauto-boot-on-error? false falsewatchdog-reboot? false falsediag-filediag-device net netboot-fileboot-device net disk netlocal-mac-address? false falseansi-terminal? true truescreen-#columns 80 80screen-#rows 34 34silent-mode? false falseuse-nvramrc? false falsenvramrcsecurity-mod nonesecurity-passwordsecurity-#badlogins 0oem-logooem-logo? false falseoem-banneroem-banner? false falsehardware-revisionlast-hardware-updatediag-switch? false false
ok
6-156 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Open Boot PROM Commands for the NVRAM
To show a specific parameter, for example the diag-switch ? variable,
type printenv and the variable name.
ok printenv diag-switch?
diag-switch? = true
ok
You can modify the values of the configuration variables, and any
changes you make remain in effect even after a power cycle.
!Caution – Configuration variables should be adjusted cautiously.
These NVRAM variables determine the startup routine of the system
so their configuration, if incorrect, can cause the system to operate in
an unexpected manner.
To change a parameter, use the setenv command. To change the
diagnostic switch:
ok setenv diag-switch? true
ok set-defaults
The set-defaults command restores the default setting of all
parameters.
ok set-default variable
The set-default variable command resets the value of variableto the default setting.
Open Boot PROM/NVRAM 6-157Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
General NVRAM Parameters
Below are the NVRAM parameters which apply to all Sun servers. The
list is as compiled by the Solaris eeprom command.
Note – Not all OpenBoot systems support all parameters. Defaults
may vary depending on the system and the PROM revision.
List of NVRAM Configuration Parameters
Variable TypicalDefault Description
auto-boot? true If true, boot automatically after power-on or reset.
boot-command boot Command executed if auto-boot? is true.
boot-device disk net Device from which to boot.
boot-file empty string File to boot. An empty string lets the secondarybooter choose the default.
diag-device net Diagnostic boot source device.
diag-file empty string File from which to boot in diagnostic mode.
diag-level platform-dependent
Diagnostics level. Values include off, min, max andmenus.
diag-switch?fcode-debug?
falsefalse
If true, run in diagnostic mode.If true, includes name parameter for plug-in deviceFCodes
hardware-revision N/A System version information.
input-device
keyboard-click?
keyboard
false
Input device used at power-on (usually keyboard,ttya, or ttyb).If true, enable keyboard click.
last-hardware-update
N/A System update information.
local-mac-address? false If true, network drivers use their own MAC address,not system’s.
6-158 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
General NVRAM Parameters ( cont)
Variable TypicalDefault Description
nvramrc empty Contents of NVRAMRC.
output-device screen Output device used at power-on (usually screen,ttya, or ttyb).
screen-#columns 80 Number of on-screen columns (characters/line).
screen-#rows 34 Number of on-screen rows (lines).
scsi-initiator-id 7 SCSI bus address of host adapter, range 0-7.
security-mode none Firmware security level (options: none, command, orfull). If set to command or full, system will promptfor PROM security password.
security-password N/A Firmware security password (never displayed). Canbe set only when security-mode is set to command orfull.
selftest-#megs 1 Metabytes of RAM to test. Ignored if diag-switch?is true.
tpe-link-test? true Enable 10baseT link test for built-in twisted pairEthernet.
ttya-mode 9600,8,n,1,- TTYA line discipline (baud rate, #bits, parity, #stop,handshake).
ttyb-mode 9600,8,n,1,- TTYB line discipline (baud rate, #bits, parity, #stop,handshake).
ttya-ignore-cd true If true, operating system ignores carrier-detect onTTYA.
ttyb-ignore-cd true If true, operating system ignores carrier-detect onTTYB
ttya-ignore-cd false If true, operating system does not assert DTR andRTS on TTYA.
ttyb-ignore-cd
use-nvramrc?
false
false
If true, operating system does not assert DTR andRTS on TTYB.If true, execute commands in NVRAMRC duringsystem start-up.
Open Boot PROM/NVRAM 6-159Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Platform specific NVRAM Commands
The OpenBoot PROM Version 3.x used in Sun Enterprise server
systems now includes additional parameters for managing the
hardware. These new parameters include:
● disabled-board-list
Is a list of boards, by system backplane slot number to be disabled
at boot up. This example puts the board in slots 4 and 6 in the
NVRAM disable-board-list parameter:
ok setenv disabled-board-list 46
To return disable-board-list to default value, type:
ok set-default disable-board-list
● disabled-memory-list (whole board at a time)
Displays a list of CPU boards whose memory is to be disabled and
left unused by the operating system. The value (for example, 7a) is
the CPU board in slots 7 and 10 containing the memory that is to
be disabled. There is no way to disable individual memory banks
at this time.
The CPU modules, if any, on the board continue to operate
normally.
To disable the memory on the CPU board in slot 7 type:
ok setenv disabled-memory-list 7a
● memory-interleave
Used to enable or disable memory interleaving. Values are min to
disable memory interleaving and max to set the maximum possible
memory interleaving.
6-160 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
● configuration-policy
Defines how the system handles devices when they fail POST.
The values are component, board, or system.
For example, if a SYSIO chip on an I/O board in slot 5 fails its self
test, POST disables the entire board if the variable is set to board.
POST disables only the SBus if the variable is set to component.
● sbus-probe-default
sbus-probe-default d3120
This variable defines the SBus device probe order on an I/O
board per SBus, where:
d = On-board SOC
3 = On-board FEPS
0-2 = SBus slots 0, 1, and 2
On a Type 2 and a Type 5 I/O board, since there is only 1 SBus, the
probe order will be:
d 3 2 0 (no slot 1)
To change the default probe order to ‘123d0’, enter the following at
the ok prompt:
ok setenv sbus-probe-default 123d0
Remember that this changes the default probe order for all boards in
the system. You can also use this to skip over an SBus slot, but don’t
include it in the list of devices to probe. To change the probe order for
a specific board, use the sbus-specific-probe variable.
Open Boot PROM/NVRAM 6-161Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
● sbus-specific-probe
This variable controls the SBus probe order on a given list of
boards. To set the probe order as 320 on I/O board 4, enter the
following at the ok prompt:
ok setenv sbus-specific-probe 4:320
The number preceding the ‘:’ is the slot number; the numbers
following it are the SBus device numbers in the desired probe
order. All unlisted I/O boards in the system will use the default
probe order as defined by the sbus- default-probe NVRAM
variable.
Multiple boards can be defined by this variable as follows:
ok setenv sbus-specific-probe 4:320 6:d3210
6-162 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Environmental Monitoring
Some of the functions of the OBP do not use inputs from a user. These
functions are preprogramed operations that start automatically after
the system has booted. Some take input from the Solaris operating
system and perform tasks as described in their initial configuration.
These configurations might not be configurable by you or the
operating system.
● ok disable-environmental-monitor
Stops the monitoring of power supply status, board temperatures,
and board hot plug while the screen displays the ok prompt.
● ok enable-environmental-monitor
Starts monitoring power supply status, board temperatures and
board hot plug while the screen displays the ok prompt.
Note – This environmental-monitor function is enabled by default.
Console messages for environmental conditions appear as follows:
● PROM NOTICE: Overtemp detected on board <n.
● PROM NOTICE: System has cooled down.
● PROM WARNING: Board <n is too hot.
● PROM NOTICE: Insufficient power detected.
● PROM NOTICE: Power supply restored.
● PROM NOTICE: Board insert detected.
● PROM NOTICE: Reset Initiated...
If a board is over the predetermined temperature, then the PROM
initiates a warning message to the console and performs a resetcommand resulting in POST disabling the faulty board and the system
rebooting the operating system.
If insufficient power is detected and is not fixed in 30 seconds, the OBP
initiates a reset to allow POST to deconfigure some of the boards
according to the amount of available power.
Open Boot PROM/NVRAM 6-163Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
NVRAM Security
The NVRAM system security variables are:
● security-mode
Sets the firmware security level (options: none , command, or full ).
Default is none .
● security-password
Sets the firmware security password (never displayed). No default.
● security-#badlogins
Sets the number of incorrect security password attempts. No
default.
!Caution – Do not set a password at the OBP level.
Your customer may or may not wish to.
If he does and then forgets it, there is no way to recover back to a
default
6-164 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
NVRAMRC Editing Commands for the NVRAM
The script editor, nvedit , lets you create and modify the script
using the commands listed in NVRAM
Table 6-3 nvramrc Script Editor Commands.
Command Description
nvalias aliasdevice-path
Stores the command “devalias alias device-path”in the script. The alias persists until eithernvunalias or set-defaults is executed.
$nvalias Performs the same function as nvalias, exceptthat it takes its arguments, name-string device-string, from the stack.
nvedit Enters the script editor. If data remains in thetemporary buffer from a previous nveditsession, resumes editing those previouscontents. If not, reads the contents of nvramrcinto the temporary buffer and begins editing it.
nvquit Discards the contents of the temporary buffer,without writing it to nvramrc . Prompts forconfirmation.
nvrecover Recovers the contents of nvramrc if they havebeen lost as a result of the execution of set-defaults ; then enters the editor as with nvedit.nvrecover fails if nvedit is executed betweenthe time that the nvramrc contents were lostand the time that nvrecover is executed.
nvrun Executes the contents of the temporary buffer.
nvstore Copies the contents of the temporary buffer tonvramrc ; discards the contents of the temporarybuffer.
nvunalias alias Deletes the specified alias from nvramrc .
$nvunalias Performs the same function as nvunaliasexcept that it takes its argument, name-string,from the stack.
Open Boot PROM/NVRAM 6-165Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
NVRAMRC Editing Commands for the NVRAM ( cont)
NVRAM Command Precautions
There are two commands you should understand along with their
implications:
● set-defaults and the escape hatch Stop-N
▼ Sets all NVRAM variables to the default values
Note – Key switch in secure position will inhibit Stop key functions.
● use-nvramrc? Set to false
▼ Clears the nvramrc memory location.
If any device alias had been set, they would have been in nvramrc
along with possible other tests or codes required to execute during
POST and boot.
The nvrecover command can restore the contents if you do not do the
nvstore command after you type the set-defaults command. If the
nvstore command was done, the contents of the nvramrc memory
area are not recoverable. This is one more reason why it is important
that you write down the contents of the nvramrc before attempting
any changes to it.
6-166 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Updating Flash PROM and FCode
Do you need to update?
At the ok prompt, type .version. The banner command gives the OBP
revision but not the FCode revisions.
ok .versionSlot 1 - I/O Type 4 FCODE 1.8.7 1997/12/08 15:39 iPOST 3.4.41997/08/26 17:37Slot 3 - I/O Type 3 FCODE 1.8.7 1997/05/09 11:18 iPOST 3.0.21997/05/01 10:56Slot 7 - I/O Type 1 FCODE 1.8.3 1997/11/14 12:41 iPOST 3.4.61998/04/16 14:22Slot 9 - CPU/Memory OBP 3.2.16 1998/06/08 16:58 POST 3.9.41998/06/09 16:25
You can use the .properties command to display the CPU/Memory
Board Flash PROM revision in hexadecimal ASCII, but this is a long
way round to get to the information above.
It is included to demonstrate how the flash-proms connect to the fhc,
aka fire-hose controller, aka bootbus controller.
Note – Remember that the show-devs command lists all the devices in
the OpenBoot device tree, which you need for the following
commands.
ok cd /fhc@12,f8800000/flashprom@0, 0ok .propertiesversion 4f 42 50 20 20 20 33 2e 32 2e 31 36 20 31 39 39 39model SUNW,525-1431name flashprom
Note – 4f 42 50 20 20 20 33 2e 32 2e 31 36 20 31 39 39 39 is the hex code
for OBP 3.2.16 1999
Open Boot PROM/NVRAM 6-167Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Updating Flash PROM and FCode (cont)
ok cd /fhc@e,f8800000/flashprom@0,0ok .propertiesversion 46 43 4f 44 45 20 31 2e 38 2e 33 20 31 39 39 37model SUNW,525-1432name flashprom
Note – 46 43 4f 44 45 20 31 2e 38 2e 33 20 31 39 39 37 = FCODE 1.8.3
1997
Use the .properties command to display the I/O Board SOC
Controller FCode revision.
ok cd /sbus@2,0/SUNW,soc@d,10000ok .propertiessoc-fcode 1.3 95/09/28model 501-2069name SUNW,soc
Use the .properties command to display the I/O Board SOC+
Controller FCode revision.
ok cd /sbus@2,0/SUNW,socal@d,10000ok .propertiesversion @(#) FCode 1.11 97/12/07model 501-3060name SUNW,socal
Checking version under UNIX
At the UNIX prompt, you can obtain the OBP revision level using:
# prtconf -V
6-168 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Updating Flash PROM and FCode (cont)
Where do I obtain the latest revisions?
At the time of this writing, patch 103346-24 updates the OBP to 3.2.24
The patch is available on Sunsolve CD and from sunsolve.sun.com
Flash PROM and FCode are available within this patch
!Caution – You can not use patchadd or installpatch to upgrade
the Flash PROM and FCode.
You must obtain the patch, uncompress it and extract the files.
After that you use the Flash PROM programming utility to update the
OpenBoot PROM on the CPU/Memory board and FCode on the I/O
boards.
Example
# zcat 103346-24.tar.Z | tar xvf -
# gzcat 103346-24.tar.gz | tar xvf -
The gzcat utility does not come as standard on Solaris 2.6 systems, but
is available on the Sunsolve CD, under the directory
/cdrom/cdrom0/gzip/bin/svr4
Open Boot PROM/NVRAM 6-169Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Procedure to update FlashPROM and FCode
!Caution – As a consequence of the upgrade, the system’s NVRAM
configuration variables MAY BE reset to their default values.
If you have any custom NVRAM CONFIGURATION then you
SHOULD NOTE THEM DOWN Before proceeding.
Attach to the directory derived from the previous step. The flash-
update is achieved by running the UNIX programme within the
directory.
# cd 103346-24
# ./flash-update-<latest-rev>
Generating flashprom driver...
Generating SUNW,Ultra-Enterprise flash-update program...
Current System Board PROM Revisions:
---------------------------------------------------------
Board 0: cpu OBP 3.2.23 1999/10/01 10:07 POST 3.9.23 1999/10/01
17:54
Board 2: cpu OBP 3.2.23 1999/10/01 10:07 POST 3.9.23 1999/10/01
17:54
Board 1: Dual SBus + IO Board
FCODE 1.8.23 1999/10/01 10:07 iPOST 3.4.23 1996/03/16
17:55
Board 3: Dual PCI IO Board
FCODE 1.8.23 1999/10/01 10:07 iPOST 3.0.23 1999/10/01
17:55
Available’Update’ Revisions:
-----------------------------------------
CPU/Memory Board:
OBP 3.2.24 1999/12/23 17:31
POST 3.9.24 1999/12/23 17:35
IO Graphics Board:
6-170 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
I/O Type 2 FCODE 1.8.24 1999/12/23 17:29
iPOST 3.4.24 1999/12/23 17:34
IO Graphics + Board:
I/O Type 5 FCODE 1.8.24 1999/12/23 17:34
iPOST 3.4.24 1999/12/23 17:34
Dual Sbus IO Board:
I/O Type 1 FCODE 1.8.24 1999/12/23 17:29
iPOST 3.4.24 1999/12/23 17:34
Dual Sbus + IO Board:
I/O Type 4 FCODE 1.8.24 1999/12/23 17:30
iPOST 3.4.24 1999/12/23 17:34
Dual PCI IO Board:
I/O Type 3 FCODE 1.8.24 1999/12/23 17:30
iPOST 3.0.24 1999/12/23 17:34
Verifying Checksums: Okay
Do you wish to flash update your firmware? y/[n]: y
Are you sure? y/[n]: y
Updating Board 0: Type’cpu’
1 Erasing ... Done.
1 Verifying Erase... Done.
1 Programming... Done.
1 Verifying Program... Done.
Updating Board 2: Type ’cpu’
1 Erasing... Done.
1 Verifying Erase... Done.
1 Programming... Done.
1 Verifying Program... Done.
Open Boot PROM/NVRAM 6-171Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Updating Board 1: Type ’dual-sbus’
1 Erasing... Done.
1 Verifying Erase... Done.
1 Programming... Done.
1 Verifying Program... Done.
Updating Board 3: Type ’upa-sbus’
1 Erasing... Done.
1 Verifying Erase... Done.
1 Programming... Done.
1 Verifying Program... Done.
#
NOTE: The flash proms are write protected by either of the
following two conditions:
a) Front panel key switch in secure mode.
b) Jumper (P601) removed on clock board.
At the time of writing this document systems are shipped with the
jumper on the clock board installed.
This means that only the front panel key switch being in secure
position write protects the proms.
If the proms are detected to be write protected then the flash update
process will fail with the following message:
FPROM Write Protected: Check Write Enable Jumper orFront Panel Key Switch.
!Caution – If there is a power failure while the flash proms are being
upgraded then you need to follow steps listed on the following pages.
6-172 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Correcting a Faulty Flash PROM
You will have a problem if you lose power in the middle of a flash-
prom update. If the system only has on CPU/Memory board, you may
need to replace it.
But, if there are two CPU/Memory boards, there are a number of
options for recovery.
update-proms
Assuming the system gets to the ok prompt, there will be a message
stating that
xxxxx
Synchronize all Flash PROMs in the system of the same board types, to
the most current level available in the system by typing
ok update-proms
prom-copy
You can copy the contents of one I/O boards (slot 3) Flash PROM to
another I/O board (slot 9)., for example. To do this, type
ok prom-copy 3 9
Open Boot PROM/NVRAM 6-173Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Correcting a Faulty Flash PROM - Updating within ExtendedPOST
You can reprogram a corrupted PROM if another board of the same
type with uncorrupted code is available.
Refer to the Flash PROM Programming Guide, 805-5579, for more
information.
To reprogram a faulty FlashPROM:
1. Connect an ASCII terminal to Serial Port A.
2. Remove the board with corrupted code from the backplane.
3. Install a known good board in any available slot.
4. Turn the keyswitch to On.
5. Wait 15 seconds and press s to enter Extended POST.
6. Select f for fcopy from the Extended POST Menus.
7. Insert the board with corrupted code into the backplane (the board
is hot-pluggable).
8. Select 4 for Activate System Board and follow the instructions.
9. Select 1 to copy the code and follow the instructions.
10. Turn the keyswitch to Standby.
6-174 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Synchronizing NVRAM/TOD chips
The NVRAM/TOD chip on the Clock board and all I/O boards
contain the same information, including the NVRAM environmental
variables and configuration settings.
The master NVRAM/TOD parameters are kept on the NVRAM chip
held on the Clock board.
On occasion, you will see a message at the ok prompt stating:
Clock TOD doe not match any I/0 board
This means the NVRAM/TOD chip on the Clock board and the chip
on all I/O boards has got out of step.
Figure 7-1 illustrates how to recover a corrupted TOD Clock value.
Figure 6-2 NVRAM/TOD Contents Can Be Copied Automaticallyor Manually From One Source to Another
Open Boot PROM/NVRAM 6-175Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
6
Synchronizing NVRAM/TOD chips (cont)
This happens, for example, when a new I/O board is fitted.
To correct the time of day, copy the correct information from the clock
board to the I/O boards.
ok copy-clock-tod-to-io-boards
Correcting a Corrupted NVRAM/TOD
It could happen that the master chip gets corrupted.
If this happens, copy the contents from an I/O board with the correct
data to the clock board TOD chip.
ok (ioboard# in hex) copy-io-board-tod-to-clock-tod
In this example the correct data is on the I/O board in slot three.
ok 3 copy-io-board-tod-to-clock-tod
6-176 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
PowerOnSelfTest (POST) 7
7-177Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Introducing POST
Always runs after a reset
The Sun Enterprise servers always execute the power on self tests
(POST) at power up and whenever a system reset is initiated. The
POST initializes all of the hardware devices before OBP starts booting
the operating system. The POST also identifies new boards that have
been installed in the system and makes them available to the OBP and
the system.
Checks the environment
Once POST are complete, the OpenBoot PROM environmental
monitoring process checks the temperature sensors in the system to
detect any over heated conditions. If the temperature sensed is above
the predefined level, a warning message is written to the system
console. If the temperature sensed exceeds a higher predefined level,
the OBP disables the board and places it into low power mode.
POST Output on Serial Port A
To effectively service Exx00 servers, there must be either VT100 type
terminal connected to ttya or a tip session from another system.
POST resides on each system board
POST resides in the OBP on each CPU/Memory+ board.
POST sets LED indicators
POST controls the status LEDs on the system front panel and all
boards.
Only POST can configure the system hardware at power up, and only
POST can enable hot-pluggable boards (if DR and AP are not present
and operating).
7-178 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Level of testing
Over 90 percent of system board interconnects
Over 80 percent of each system board ASICs
Identify 95% of detectable faults to FRU level
Performance
Runtime should be less than 90 seconds (diag-level set to minimum)
Code size should be less than 256 Kbytes for CPU boards
Code size should be less than 64 Kbyes for I/O boards
Coverage
POST is designed to test just about everything that is internal to the
system and the system boards. POST tests the following:
● CPU modules and caches
● System board ASICs (DC, AC, and FHC)
● Busses (SBus, UPA, centerplane, boot-bus)
● I/O ASICs (Sysio, FEPS, SOC)
● Clock board and console bus devices (NVRAM, TOD, EEPROM)
● DIMMS
▼ Environmental Sensors
What POST doesn’t cover
POST will not test SBus cards or PCI cards.
In fact, there is a jumper on the PCI riser 501-8888 to enable or disable
JTAG. Disable it or POST may hang.
Power On Self Test (POST) 7-179Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Introducing POST (cont)
Bootbus Controller
Otherwise known as the fhc (fire hose controller!!) Each board in the
system has an fhc which connects to a bootbus running on the
gigaplane, and various on-board ASICs including the SRAM and
temperature sesnsors.
The purpose of the bootbus is twofold. It passes the POST data around
the system, and is used by the clock board to pass NVRAM
parameters to the CPU/Memory boards.
Also connected to the fhc is the JTAG scan controller.
JTAG
JTAG is a 4-wire connection between various ASICs in the system. The
spec was developed by the Joint Test Action Group, a group set up by
the IEEE who give the spec its name, and is defined by IEEE 1149.1
Its purpose is to pass around POST information between boards and
ASICs, assuming the ASICs are JTAG compliant.
Warning – Not all ASICs in the system are JTAG compliant. Certainly
not the ASICs on the PCI cards plugged into a Type 3 I/O board.
Set the JTAG jumper on the PCI riser appropriately.
For details regarding JTAG specs, scan rings etc refer to
http://solutions.sun.com/embedded/databook/pdf/whitepapers/WPR-0018-01.pdf
7-180 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Introducing POST (cont)
POST Master
After a Power-on reset (POR) each CPU module checks itself, its cache
and its gigaplane interface using JTAG loops VIA the bootbus. POST
runs in SRAM on each board. The first CPU that passes is elected the
POST Master, normally (0,0)
The CPU and the OBP on the master system board, when determined,
runs the self-test routines for each I/O board. It then sets the I/O
board configuration parameters according to the resident firmware.
OBP Parameters
diag-switch? False, Diagnostic level determined by
diag-level parameter
True, full (verbose) diagnostics run
diag-level min, minimum diagnostics run
max, full (verbose) diagnostics run
Keyswitch Positions
Normal power-on Diagnostic level determined by diag-level
parameter
Diagnostic power-on Full (verbose) diagnostics run
Note – The diag-switch? and diag-level parameters are not
particularly useful on the Enterprise servers, since if you want to run
full diagnostics, you can power on the system by turning the
keyswitch to the diagnostic position.
Power On Self Test (POST) 7-181Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Power on Self Test Overview
Sample Output
The following is an example of what you see if you have an ASCII
terminal device connected to serial port ttya on the clock board of an
Enterprise x000/x500 server.
POST runs a complete and in-depth set of tests when the system
keyswitch is set to the diagnostic position or the NVRAM parameter
diag-switch? is set to true.
Hardware Power ON
POST COMPLETE7,0>7,0>@(#) POST 3.9.4 1998/06/09 16:257,0> SelfTest Initializing (Diag Level 10, ENV 0000ff00) IMPL 0011 MASK 207,0>Board 7 CPU FPROM Test7,0>Board 7 Basic CPU Test7,0> Set CPU UPA Config and Init SDB Data7,0> SRAM Mode = 22, Clock Mode = 4:1, PCON = 6fa, MCAP = 07,0>Board 7 MMU Enable Test7,0> DMMU Init7,0> IMMU Init7,0> Mapping Selftest Enabling MMUs7,0>Board 7 Ecache Test7,0> Ecache Probe7,0> Ecache Tags7,0> Ecache Quick Verify7,0> Ecache Init7,0> Ecache RAM7,0> Ecache Address Line7,0> Configure Ecache Limit7,0>Ecache Size = 00400000, Limited to 004000007,0>Board 7 FPU Functional Test7,0> FPU Enable7,0>Board 7 Board Master Select Test7,0> Selecting a Board Master7,0>Board 7 FireHose Devices Test7,0>Board 7 Address Controller Test7,0> AC Initialization7,0> AC DTAG Init
7-182 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0>Board 7 Dual Tags Test7,0> AC DTAG Init7,0>Board 7 FireHose Controller Test7,0> FHC Initialization7,0>Board 7 JTAG Test7,0> Verify System Board Scan Ring7,0>Board 7 Centerplane Test7,0> Centerplane Join7,0>Setting JTAG Master7,0>Clear JTAG Master7,0>Board 7 Setup Cache Size Test7,0> Setting Up Cache Size7,0>Board 7 System Master Select Test7,0> Setting System Master7,0>POST Master Selected (JTAG,CENTRAL)7,0>Board 16 Clock Board Test7,0> Clock Board Initialization7,0> Clock Board Temperature Check7,0>Board 16 Clock Board Serial Ports Test7,0>Board 16 NVRAM Devices Test7,0> M48T59 (TOD) Init7,0>Board 7 System Board Probe Test7,0> Probing all CPU/Memory BDA7,0> Probing System Boards7,0> Probing CPU Module JTAG Rings7,0>Setting System Clock Frequency7,0> CPU Module mid 14 Checked in OK (speed code = 4)7,0> CPU mid 18 Version=00170011.200005077,0> CPU Module mid 18 Checked in OK (speed code = 4)7,0> CPU mid 19 Version=00170011.200005077,0> CPU Module mid 19 Checked in OK (speed code = 4)7,0> ******** Clock Reset - retesting7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=4967,0>7,0>@(#) POST 3.9.4 1998/06/09 16:257,0> SelfTest Initializing (Diag Level 40, ENV 0000ff80) IMPL 0011 MASK 207,0>Board 7 CPU FPROM Test7,0> CPU/Memory Board FPROM Checksum Test7,0>Board 7 Basic CPU Test7,0> FPU Registers and Data Path Test7,0> Instruction Cache Tag RAM Test7,0> Instruction Cache Instruction RAM Test7,0> Instruction Cache Next Field RAM Test7,0> Instruction Cache Pre-decode RAM Test7,0> Data Cache RAM Test
Power On Self Test (POST) 7-183Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> Data Cache Tags Test7,0> DMMU Registers Access Test7,0> DMMU TLB DATA RAM Access Test7,0> DMMU TLB TAGS Access Test7,0> IMMU Registers Access Test7,0> IMMU TLB DATA RAM Access Test7,0> IMMU TLB TAGS Access Test7,0> Set CPU UPA Config and Init SDB Data7,0> SRAM Mode = 22, Clock Mode = 3:1, PCON = 6fa, MCAP = 07,0>Board 7 MMU Enable Test7,0> DMMU Init7,0> IMMU Init7,0> Mapping Selftest Enabling MMUs7,0>Board 7 Ecache Test7,0> Ecache Probe7,0> Ecache Tags7,0> Ecache Quick Verify7,0> Ecache Init7,0> Ecache RAM7,0> Ecache 6N RAM Pattern Test7,0> Ecache Address Line7,0> Configure Ecache Limit7,0>Ecache Size = 00400000, Limited to 004000007,0>Board 7 FPU Functional Test7,0> FPU Enable7,0>Board 7 Board Master Select Test7,0> Selecting a Board Master7,0>Board 7 FireHose Devices Test7,0> PROM Datapath Test7,0> FHC CPU SRAM Test7,0>Board 7 Address Controller Test7,0> AC Registers Test7,0> AC Initialization7,0> Memory Registers Test7,0> Memory Registers Initialization Test7,0> AC DTAG Init7,0>Board 7 Dual Tags Test7,0> AC DTAG Test7,0> AC DTAG Init7,0>Board 7 FireHose Controller Test7,0> FHC Initialization7,0>Board 7 JTAG Test7,0> Verify System Board Scan Ring7,0>Board 7 Centerplane Test7,0> Centerplane and Arbiter Check Test7,0>Setting JTAG Master
7-184 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0>Clear JTAG Master7,0> Centerplane Join7,0>Setting JTAG Master7,0>Clear JTAG Master7,0>Board 7 Setup Cache Size Test7,0> Setting Up Cache Size7,0>Board 7 System Master Select Test7,0> Setting System Master7,0>POST Master Selected (JTAG,CENTRAL)
Note – At this point POST has completed the system board testing and
assigned a master to start testing other boards on the backplane. For
example, each I/O board has its own PROM containing information
about the board (type, revision, speed) and tests for components and
interfaces. The tests are initiated by the master CPU. I/O POST reports
from these tests are sent to the master, indicating the state of the
system. The master CPU deactivates I/O boards or components
according to the report.
7,0>Board 16 Clock Board Test7,0> Clock Board Registers Test7,0> Clock Board Initialization7,0> Clock Board Temperature Check7,0>Board 16 Clock Board Serial Ports Test7,0> 85C30 Register Test7,0> 85C30 Serial Ports Test7,0> Keyboard Loopback7,0> Mouse Loopback7,0> Serial Port B Loopback7,0> Remote Serial Port A Loopback7,0> Remote Serial Port B Loopback7,0>Board 16 NVRAM Devices Test7,0> M48T59 (TOD) Init7,0> M48T59 (TOD) Functional Part 1 Test7,0> NVRAM (Non-Destructive) Test7,0>Board 7 System Board Probe Test7,0> Probing all CPU/Memory BDA7,0> Probing System Boards7,0> Probing CPU Module JTAG Rings7,0>Setting System Clock Frequency7,0> CPU Module mid 14 Checked in OK (speed code = 4)7,0> CPU mid 18 Version=00170011.200005077,0> CPU Module mid 18 Checked in OK (speed code = 4)
Power On Self Test (POST) 7-185Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> CPU mid 19 Version=00170011.200005077,0> CPU Module mid 19 Checked in OK (speed code = 4)7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=4967,0>TESTING BOARD 17,0>Board 1 JTAG Test7,0> Verify System Board Scan Ring7,0>Board 1 Centerplane Test7,0> Centerplane Check7,0>Board 1 Address Controller Test7,0> AC Registers Test7,0> AC Initialization7,0>Setting Freq to 25MHZ7,0> Memory Registers Test7,0> Memory Registers Initialization Test7,0> AC DTAG Init7,0>Board 1 FireHose Controller Test7,0> FHC Initialization7,0>Board 1 NVRAM Devices Test7,0> M48T59 (TOD) Init7,0> M48T59 (TOD) Functional Part 1 Test7,0> NVRAM (Non-Destructive) Test7,0>TESTING BOARD 37,0>Board 3 JTAG Test7,0> Verify System Board Scan Ring7,0>Board 3 Centerplane Test7,0> Centerplane Check7,0>Board 3 Address Controller Test7,0> AC Registers Test7,0> AC Initialization7,0>Setting Freq to 25MHZ7,0> Memory Registers Test7,0> Memory Registers Initialization Test7,0> AC DTAG Init7,0>Board 3 FireHose Controller Test7,0> FHC Initialization7,0>Board 3 NVRAM Devices Test7,0> M48T59 (TOD) Init7,0> M48T59 (TOD) Functional Part 1 Test7,0> NVRAM (Non-Destructive) Test7,0>Re-mapping to Local Device Space7,0>Begin Central Space Serial Port access7,0>Enable AC Control Parity7,0>Hotplug Trigger Test7,0>Init Counters for Hotplug7,0>Board 7 Cross Calls Test7,0> Cross Calls Test
7-186 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0>Displaying PROM Versions7,0>Slot 1 IO Type 4 FCODE 1.8.7 1997/12/8 15:39 iPOST 3.4.6 1998/4/1614:227,0>Slot 3 IO Type 4 FCODE 1.8.7 1997/12/8 15:39 iPOST 3.4.6 1998/4/1614:227,0>Slot 7 CPU/Memory OBP 3.2.16 1998/6/8 16:58 POST 3.9.4 1998/6/916:257,0>Slot 9 CPU/Memory OBP 3.2.16 1998/6/8 16:58 POST 3.9.4 1998/6/916:257,0>Board 7 Environmental Probe Test7,0> Environmental Probe7,0>Checking Power Supply Configuration7,0>Power is more than adequate, load 4 ps 37,0>Reconfig memory due to POR or CLOCK RESET7,0>Reconfig memory due to DIAG_LEVEL7,0>Board 7 Probing Memory SIMMS Test7,0> Probe SIMMID7,0> Populated Memory Bank Status7,0> bd # Size Address Way Status7,0> 9 256 Normal7,0>Board 7 Memory Configuration Test7,0> Memory Interleaving7,0> Total banks with 8MB SIMMs = 07,0> Total banks with 32MB SIMMs = 17,0> Total banks with 128MB SIMMs = 07,0> Total banks with 256MB SIMMs = 07,0> Overall memory default speed = 60ns7,0>Do OPTIMAL INTLV7,0> Board 9 AC rev 5 RCTIME = 0 (Tras 71)7,0> Memory Refresh Enable7,0>Board 7 SIMMs Test7,0> MP Memory SIMM Clear Test7,0> Memory Size is 256Mbytes7,0> CPU MID 18 clearing 00000000.00004000 to 00000000.055000007,0> CPU MID 19 clearing 00000000.05500000 to 00000000.0aa000007,0> CPU MID 14 clearing 00000000.0aa00000 to 00000000.100000007,0> CPU MID 14 clearing 00000000.00000000 to 00000000.000040007,0> Memory Walking Rows and Columns Test7,0> MP Memory SIMM (6N RAM Patterns) Test7,0> Memory Size is 256Mbytes7,0> CPU MID 18 testing 00000000.00000000 to 00000000.055000007,0> CPU MID 19 testing 00000000.05500000 to 00000000.0aa000007,0> CPU MID 14 testing 00000000.0aa00000 to 00000000.100000007,0> MP Memory SIMM (moving inverse) Test7,0> Memory Size is 256Mbytes7,0> CPU MID 18 testing 00000000.00000000 to 00000000.05500000
Power On Self Test (POST) 7-187Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> CPU MID 19 testing 00000000.05500000 to 00000000.0aa000007,0> CPU MID 14 testing 00000000.0aa00000 to 00000000.100000007,0>Slave CPU Functional Tests7,0> Slave CPU MID 18 started9,0>Board 9 Functional CPU 0 Test9,0> Dcache Init9,0> Dcache Enable Test9,0> Dcache Functionality Test9,0> Ecache Stress Test9,0> Ecache Functional Test9,0> CPU Dispatch (Multi-Scalar) Test9,0> SPARC Atomic Instructions Test9,0> SPARC Prefetch Instructions Test9,0> CPU Softint Registers and Interrupts Test9,0> Uni-Processor Cache Coherence Test9,0> Branch Memory Test9,0> SDB ECC CE Test9,0> SDB ECC Uncorrectable Test9,0> FPU Instruction Test7,0> Slave CPU MID 19 started9,1>Board 9 Functional CPU 1 Test9,1> Dcache Init9,1> Dcache Enable Test9,1> Dcache Functionality Test9,1> Ecache Stress Test9,1> Ecache Functional Test9,1> CPU Dispatch (Multi-Scalar) Test9,1> SPARC Atomic Instructions Test9,1> SPARC Prefetch Instructions Test9,1> CPU Softint Registers and Interrupts Test9,1> Uni-Processor Cache Coherence Test9,1> Branch Memory Test9,1> SDB ECC CE Test9,1> SDB ECC Uncorrectable Test9,1> FPU Instruction Test7,0>Board 7 Functional CPU 0 Test7,0> Dcache Init7,0> Dcache Enable Test7,0> Dcache Functionality Test7,0> Ecache Stress Test7,0> Ecache Functional Test7,0> CPU Dispatch (Multi-Scalar) Test7,0> SPARC Atomic Instructions Test7,0> SPARC Prefetch Instructions Test7,0> CPU Softint Registers and Interrupts Test7,0> Uni-Processor Cache Coherence Test
7-188 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> Branch Memory Test7,0> SDB ECC CE Test7,0> SDB ECC Uncorrectable Test7,0> FPU Instruction Test7,0>TESTING IO BOARD 17,0>Board 1 I/O FPROM Test7,0> I/O Board EPROM checksum Test7,0>@(#) iPOST 3.4.6 1998/04/16 14:227,0> TESTING IO BOARD 1 ASICs7,0> TESTING SysIO Port 07,0>Board 1 SysIO Registers Test7,0> SysIO Register Initialization7,0> IOMMU Registers and RAM Test7,0> Streaming Buffer Registers and RAM Test7,0> SBus Control and Config Registers Test7,0> SysIO RAM Initialization7,0>Board 1 SysIO Functional Test7,0> Clear Interrupt Map and State Registers7,0> SysIO Interrupts Test7,0> SysIO Timers/Counters Test7,0> IOMMU Virtual Address TLB Tag Compare Test7,0> Streaming Buffer Flush Test7,0> DMA Merge Buffer Test7,0> SYSIO ECC Correctable Test7,0> SYSIO ECC UnCorrectable Test7,0> SysIO Sbus Probe Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>Board 1 OnBoard IO Chipset (SOC) Test7,0> SOC SRAM Test7,0> SOC Registers Test7,0> SOC Interrupt Test7,0> Clear Interrupt Map and State Registers Test7,0> TESTING SysIO Port 17,0>Board 1 SysIO Registers Test7,0> SysIO Register Initialization7,0> IOMMU Registers and RAM Test7,0> Streaming Buffer Registers and RAM Test7,0> SBus Control and Config Registers Test7,0> SysIO RAM Initialization7,0>Board 1 SysIO Functional Test7,0> Clear Interrupt Map and State Registers7,0> SysIO Interrupts Test7,0> SysIO Timers/Counters Test7,0> IOMMU Virtual Address TLB Tag Compare Test
Power On Self Test (POST) 7-189Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> Streaming Buffer Flush Test7,0> DMA Merge Buffer Test7,0> SYSIO ECC Correctable Test7,0> SYSIO ECC UnCorrectable Test7,0> SysIO Sbus Probe Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>Board 1 OnBoard IO Chipset (FEPS) Test7,0> FAS366 Registers Test7,0> ESP FAS366 DVMA burst mode read/write Test7,0> FAS366 FIFO TO DMA Test7,0> DMA TO FAS366 FIFO Test7,0> FEPS (Ethernet) Registers Test7,0> FEPS Ethernet(BM, DP83840, Twister) Internal Loopbacks Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>IO BOARD 1 TESTED7,0>TESTING IO BOARD 37,0>Board 3 I/O FPROM Test7,0> I/O Board EPROM checksum Test7,0>@(#) iPOST 3.4.6 1998/04/16 14:227,0> TESTING IO BOARD 3 ASICs7,0> TESTING SysIO Port 07,0>Board 3 SysIO Registers Test7,0> SysIO Register Initialization7,0> IOMMU Registers and RAM Test7,0> Streaming Buffer Registers and RAM Test7,0> SBus Control and Config Registers Test7,0> SysIO RAM Initialization7,0>Board 3 SysIO Functional Test7,0> Clear Interrupt Map and State Registers7,0> SysIO Interrupts Test7,0> SysIO Timers/Counters Test7,0> IOMMU Virtual Address TLB Tag Compare Test7,0> Streaming Buffer Flush Test7,0> DMA Merge Buffer Test7,0> SYSIO ECC Correctable Test7,0> SYSIO ECC UnCorrectable Test7,0> SysIO Sbus Probe Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>Board 3 OnBoard IO Chipset (SOC) Test7,0> SOC SRAM Test
7-190 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> SOC Registers Test7,0> SOC Interrupt Test7,0> Clear Interrupt Map and State Registers Test7,0> TESTING SysIO Port 17,0>Board 3 SysIO Registers Test7,0> SysIO Register Initialization7,0> IOMMU Registers and RAM Test7,0> Streaming Buffer Registers and RAM Test7,0> SBus Control and Config Registers Test7,0> SysIO RAM Initialization7,0>Board 3 SysIO Functional Test7,0> Clear Interrupt Map and State Registers7,0> SysIO Interrupts Test7,0> SysIO Timers/Counters Test7,0> IOMMU Virtual Address TLB Tag Compare Test7,0> Streaming Buffer Flush Test7,0> DMA Merge Buffer Test7,0> SYSIO ECC Correctable Test7,0> SYSIO ECC UnCorrectable Test7,0> SysIO Sbus Probe Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>Board 3 OnBoard IO Chipset (FEPS) Test7,0> FAS366 Registers Test7,0> ESP FAS366 DVMA burst mode read/write Test7,0> FAS366 FIFO TO DMA Test7,0> DMA TO FAS366 FIFO Test7,0> FEPS (Ethernet) Registers Test7,0> FEPS Ethernet (BM, DP83840, Twister) Internal Loopbacks Test7,0> SysIO Register Initialization Test7,0> SysIO RAM Initialization Test7,0> Clear Interrupt Map and State Registers Test7,0>IO BOARD 3 TESTED7,0>SYSTEM LEVEL TESTING7,0>Board 7 Cache Coherency Test7,0> Multi-Processor Cache Coherence Test7,0> Testing CPU MID 187,0> Testing CPU MID 197,0>Probing for Disk System boards7,0>Board 7 System Interrupts Test7,0> System Interrupts Test7,0>Checking Power Supply Configuration7,0>Power is more than adequate, load 4 ps 3 (Four boards, and
3 power supplies)7,0> Check Board Present Test
Power On Self Test (POST) 7-191Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7,0> Board Present Interrupt Test7,0>7,0> System Board Status7,0>-----------------------------------------------------------------7,0> Slot Board Status Board Type Failures7,0>-----------------------------------------------------------------7,0> 0 | Not installed | |7,0> 1 | Normal |+IO Type 4 |7,0> 2 | Not installed | |7,0> 3 | Normal |+IO Type 4 |7,0> 4 | Not installed | |7,0> 5 | Not installed | |7,0> 6 | Not installed | |7,0> 7 | Normal |+CPU/Memory |7,0> 8 | Not installed | |7,0> 9 | Normal |+CPU/Memory |7,0> 16 | Normal | Clock Board |7,0>-----------------------------------------------------------------7,0>7,0> CPU Module Status7,0>-----------------------------------------------------------------7,0> MID OK Cache Speed Version7,0>-----------------------------------------------------------------7,0> 14 | y | 4096 | 248 | 00170011.200005077,0> 18 | y | 4096 | 248 | 00170011.200005077,0> 19 | y | 4096 | 248 | 00170011.200005077,0>-----------------------------------------------------------------7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=4967,0> Populated Memory Bank Status7,0> bd # Size Address Way Status7,0> 9 256 0 0 Normal7,0>7,0> POST COMPLETE7,0>Entering OBP
7-192 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Control Commands
The following are the control commands for POST.
Note – These commands are entered on the terminal connected to ttya
or the keyboard of the workstation running the tip session.
Don’t try to enter these commands on the Sun keyboard connected to
the clock board
The toggle keys turn on and off the feature on each stoke of the key.
There are two particularly useful commands:
s - Toggle Stop flag
This flag stops the POST on completion in the extended POST menus.
Get into the habit of hitting the s key during POST which will then put
you into the extended POST.
v - Toggle verbose print flag
Normally, the only way to get a display of POST to ttya is to power on
in diagnostic mode or have diag-switch? set to true.
By hitting the v key during a normal power-on, POST is displayed to
ttya.
Power On Self Test (POST) 7-193Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Control Commands (cont)
L Toggle Loop on full POST
7-194 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Menus
Power On Self Test (POST) 7-195Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Menus (cont)
Option 7... Display system summary
This is the most useful command, since it gives a display of the final
system configuration:
7,0> System Board Status7,0>-----------------------------------------------------------------7,0> Slot Board Status Board Type Failures7,0>-----------------------------------------------------------------7,0> 0 | Not installed | |7,0> 1 | Normal |+IO Type 4 |7,0> 2 | Not installed | |7,0> 3 | Normal |+IO Type 4 |7,0> 4 | Not installed | |7,0> 5 | Not installed | |7,0> 6 | Not installed | |7,0> 7 | Normal |+CPU/Memory |7,0> 8 | Not installed | |7,0> 9 | Normal |+CPU/Memory |7,0> 16 | Normal | Clock Board |7,0>-----------------------------------------------------------------7,0>7,0> CPU Module Status7,0>-----------------------------------------------------------------7,0> MID OK Cache Speed Version7,0>-----------------------------------------------------------------7,0> 14 | y | 4096 | 248 | 00170011.200005077,0> 15 | y | 4096 | 248 | 00170011.200005077,0> 18 | y | 4096 | 248 | 00170011.200005077,0> 19 | y | 4096 | 248 | 00170011.200005077,0>-----------------------------------------------------------------7,0>System Frequency (MHz),fcpu=248, fmod=124, fsys=82, fgen=4967,0> Populated Memory Bank Status7,0> bd # Size Address Way Status7,0> 9 256 0 0 Normal
7-196 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Warning – Note the MID address for the processors. POST numbers
processors in decimal (as does Solaris) whereas OBP numbers the
processors in hex.
BE AWARE OF THIS DIFFERENCE....
Experiment with the POST Menus. Some of the tests return a message
STILL UNDER DEVELOPEMENT
and should no be too heavily relied upon for fault finding.
Power On Self Test (POST) 7-197Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Board Status Messages
On completion of testing, POST will display the status of each board.
There are four board status types:
Normal
On-line/Failed A component on that board has failed POST
Low-power mode Either the whole board has failed POST or the
obp parameter configuration-policy is set to board
or the board has been detached using dr
Not Installed
7,0> System Board Status7,0>-----------------------------------------------------------------7,0> Slot Board Status Board Type Failures7,0>-----------------------------------------------------------------7,0> 0 | Not installed | |7,0> 1 | Normal |+IO Type 4 |7,0> 2 | Not installed | |7,0> 3 | Low Power Mode |+IO Type 4 | AC7,0> 4 | Not installed | |7,0> 5 | Not installed | |7,0> 6 | Not installed | |7,0> 7 | Online/failure |+CPU/Memory | CPU 17,0> 8 | Not installed | |7,0> 9 | Normal |+CPU/Memory |7,0> 16 | Normal | Clock Board |7,0>-----------------------------------------------------------------
7-198 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Sample Error Messages
Power On Self Test (POST) 7-199Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Error Reporting
You can view the output from the last POST by running the show-post-results command. You can examine the report for error
messages. The report generated by the show-post-results command
displays a synopsis of the POST tests in a less confusing manner than
the actual POST output you observed using the serial port connection.
The symbols used in the show-post-results report are defined as
follows:
● P = present
● *** = failed component
● NOT = Not found
● 0 = no failures
7-200 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Error Reporting (cont)
The following is a sample output from a show-post-resultscommand.
ok show-post-results
Slot 0 - Status=Okay, Type: CPU/Memory
Cpu0=P Cpu0-OK=P FailCode=0 Cpu1=P Cpu1-OK=P FailCode=0AC=P FHC=P SRAM=P FPROM=P LabCon=Not Ovtemp=NotBank0=0 Bank1=0 DTag0=P DTag1=P JTAG=P CntrPl=PBank0=P Bank1=Not DC=ff
Slot 1 - Status=Fail, Type: IO board Type 2
Sysio0=P Sysio1=P FEPS=P FEPSFC=0 SOC=P FFB=PSbus0=P Sbus2=PAC=P FHC=P SRAM=P FPROM=P LabCon=Not Ovtemp=NotTODC=*** JTAG=P CntrPl=P DC=ff
Slot 2 - Status=Okay, Type: CPU/Memory
Cpu0=P Cpu0-OK=P FailCode=0 Cpu1=P Cpu1-OK=P FailCode=0AC=P FHC=P SRAM=P FPROM=P LabCon=Not Ovtemp=NotBank0=0 Bank1=0 DTag0=P DTag1=P JTAG=P CntrPl=PBank0=Not Bank1=Not DC=ff
Slot 16 - Status=Fail, Type: Clock
Clock=P Serial=P KbdMse=P PPS-DC=P DCReg0=P DCReg1=PAC=P ACFan=P KeyFan=P PSFail=0 Ovtemp=Not TODC=P RKFan=P
P = Present or Passed*** = Failed ComponentNot = Not present
ok
The following few pages provides a key to the show-post-resultsoutput.
Power On Self Test (POST) 7-201Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
POST Error Reporting - definitions
CPU/Memory Board
Cpu0/Cpu1 CPU modules on the board
CPU{0,1}-OK CPU module status
FailCode Failure code (valid only if CPU failed)
FHC Fire Hose Controller
SRAM Static RAM
FPROM Flash PROM
FHC Fire Hose Controller
LabCon Lab Console
Ovtemp Overtemp
Bank0 Bank0 status (a bit indicates a missing or failed SIMM)
Bank1 Bank1 status (a bit indicates a missing or failed SIMM)
DTag0 DTags0 status
DTag1 DTags1 status
JTAG Jtag status
CntrPl Centerplane status
DC Data Controllers (0 bit indicates a failed DC)
I/O Board
Sysio0 SysIO 0 status
Sysio1 SysIO 1 status
FEPS Onboard FEPS chip
FEPSFC FEPS fail code (valid only if failed)
SOC Onboard SOC status
FFB FFB card status
Sbus0 SBus0 slot status
Sbus1 SBus1 slot status
Sbus2 SBus2 slot status
AC Address Controller
FHC Fire Hose Controller
SRAM Static RAM
FPROM Flash PROMs
LabCon Lab Console
Ovtemp Overtemp
TODC Time of Day Clock
JTAG JTAG status
CntrPl Centerplane status
DC Data Controllers (0 bit indicates a failed DC)
7-202 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Disk Board
Disk0 Disk0 ID (valid only if disk present)
Disk1 Disk1 ID (valid only if disk present)
Disk0P Disk0 Present
Disk1P Disk1 Present
VDDOK SCSI VDD status
Fan Fan Fail status
JTAG JTAG status
Clock Board
Clock Clock running
Serial Serial Port
KBytes Keyboard Mouse status
PPS-DC Peripheral PS ok (all DC levels OK)
AC AC power status
ACFan AC box fan status
KeyFan KeySwitch fan status
PSFail Power Supply fail status
(bit position indicates which ps failure)
Ovtemp Overtemp
TODC Time of Day Clock
V5-P Peripheral 5V
V12-P Peripheral 12V
V5-Aux Auxilary 5V
V5P-PC Peripheral 5V Precharge
V12-PC Peripheral 12V Precharge
V3-PC System 3.3V Precharge
V5-PC System 5.0V Precharge
RKFan Rack Fan Status
3.3V Clock board 3.3 V
5.0V Clock board 5.0 V
Power On Self Test (POST) 7-203Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
When things go wrong...
What constitutes a minimum system?
If you have a system which hangs under POST, or is unpredictable in
its results, run POST with a minimum config.
You can run POST with a clock board, and a CPU/Memory board with
one CPU module, and no memory. You do not need any memory for
POST, since it runs in SRAM on each board.
Frequency Margining
Again, if you have intermittent faults, increase the frequency of the
gigaplane interconnect to trap these faults.
Do not margin it too high, since it will automatically fail.
loop on diagnostics
Remember the loop function which you can set on the POST control
menu.
Warning – POST does not check SBus cards, or peripherals. It is no use
running POST with a loop command and with frequency margined
high, if the fault is that the system will not see any disks.
7-204 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
Accessing and Displaying POST
To access the host’s operating system from the console and to interact
with OBP and POST programs, you must access the system’s serial
port A. For interactive capability you must have an ASCII terminal
with keyboard attached to serial port A.
tip session
The best method of getting POST output is to tip into the serial port A
from another Sun system. Typically, you will tip out of port B on a
workstaion.The method is outlined below.
workstation# more /etc/remote | grep hardwirehardwire:dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D:workstation# tip hardwireconnected
Note the tip commands..
~# break (stop-A)
~. exit
Null modemcable
Serial Port A
ASCII terminal or workstation
Power On Self Test (POST) 7-205Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
7
7-206 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
InternalDiskSubsystems 8
8-207Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Internal Storage Capacities
Sun Enterprise systems have the following maximum internal storage
capacities:
● Sun Enterprise 3000 – Up to ten 18.2-Gbyte SCSI drives are used to
populate the internal bays
● Sun Enterprise 3500 – Up to eight 36.4-Gbyte FC-Al dual-ported
disks drives can be used to populate the internal bays
● Sun Enterprise 4x00 and Enterprise 5x00 – Up to eight 18.2-Gbyte
SCSI drives, mounted on four disk boards
● Sun Enterprise 6x00 – Up to four 18.2-Gbyte SCSI drives, mounted
on two disk boards
8-208 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Disk Subsystems
Sun Enterprise Servers can support several terabytes of disk storage
when external assemblies are used.
This module focuses on disks that are configured as internal devices.
The SCSI Disk Board
With the exception of the Sun Enterprise 3500 and 3000 systems, the
Sun Enterprise x500 servers support dual-SCSI disk boards that
contain one or two UltraSCSI disk drives.
The disk board capacity for these servers varies as follows:
● The Sun Enterprise 4x00, supports up to four disk boards.
● The Sun Enterprise 5x00, supports up to four disk boards.
● The Sun Enterprise 6x00, supports only two disk boards
maximum.
This is due to the fact that the disk boards do not put a load on the
gigaplane. Indeed, the only thing the disk board does take from
the gigaplane is power. Putting more than two disk boards in an
E6x00 would leave spaces on the bus, which is not allowed. (This
is why we have load boards in empty slots).
Disk boards are limited to slots 14 and 15 only, which are the slots
closest to the gigaplane terminators.
The SCSI Disk Board Addressing
SCSI addressing is assigned according to the Gigaplane slot in which
the board is installed, as shown in Table 8-1.
Internal Disk Subsystems 8-209Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Note – The SCSI disk board requires a SCSI-2 interface from an I/O
board that connects to the external SCSI-2 port. The SCSI disk boards
can be daisy-chained so only one interface is required
Table 8-1 Default Drive Address Settings
Jumpers J0702 and J0703 override the default drive address settings as
shown in Table 8-2, assigned by the centerplane slot position.
Table 8-2 SCSI Disk Board Disk Addressing Override JumperConfigurations
SLOT DISK 0ADDRESS
DISK 1ADDRESS
SLOT DISK 0ADDRESS
DISK 1ADDRESS
0 4 5 8 10 11
1 6 7 9 0 1
2 0 1 10 12 13
3 10 11 11 2 3
4 2 3 12 14 15
5 12 13 13 8 9
6 8 9 14 0 1
7 14 15 15 10 11
JUMPER PINS SETTING DESCRIPTION
J0702 1-21-2A0-A3
OutInAs required
Disk 0 default address selectionDisk 0 manual address selectionDisk 0 address select
J0703 1-21-2A0-A3
OutInAs required
Disk 1 default address selectionDisk 1 manual address selectionDisk 1 address select
J0705 1-2 As required Disk 0 delay spin
J0706 1-2 As required Disk 1 delay spin
8-210 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Disk Addressing
You can type a complete physical path name or a complete logical path
name to specify the device or controller. How Solaris derives device
addresses is covered in the upcoming Solaris module. In this module,
you are given sample addresses both for SCSI devices and FC-AL
devices.
● Physical addresses are designed to follow a hardware tree to a
specific device.
● Logical addresses allow applications to point to a specific device
an a specific bus.
● Solaris performs the translation between logical and physical
addresses transparent to the end-user.
Examples
A typical physical path name for a disk device is:
/sbus@3,0/SUNW,fas@3,880000/sd@0,0:a,raw
or
/sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@w2100002037000f96,0:a,raw
A typical logical path name is:
c2t1d0s1
Additional information on addressing that is specific to the server type
is covered with the individual servers.
Internal Disk Subsystems 8-211Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Sun Enterprise 3500
Enterprise 3500 Fibre Channel Interface Board
This is a new board designed to provide connectivity to the internal
disk drives in the Sun Enterprise 3500 server. The internal disk drives
operate with the fibre channel arbitrated loop (FC-AL) architecture.
Each of the four potential FC-AL loops corresponds to one of four
gigabit interface converter (GBIC) modules on the Fibre channel
interface board.
Figure 8-1 Sun Enterprise 3500 Fibre Channel Interface Board
The Fibre channel interface board comes with two hot-pluggable GBIC
modules. The 2-meter fibre channel cables establish a loop or
connection with the internal disk drives. This board is part of the
standard internal disk drive option. If no internal drives are ordered,
this board is not present.
Table 8-3 GBIC to Disk Drive Bay and Drive Port Connection
Disk Drives Drive Port GBIC name
0, 1, 2, 3 A GBIC LA (lower bank)
0, 1, 2, 3 B GBIC LB (lower bank
4, 5, 6, 7 A GBIC UA (upper bank)
4, 5, 6, 7 B GBIC UB (upper bank)
GBIC LAGBIC LB
GBIC UA
GBIC UB
Part Number 501-4820
8-212 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Enterprise 3500 Fibre Channel Interface Board
The Sun Enterprise 3500 can be ordered without internal disk drives.
Any of the bootable external Sun StorEdge disk products (such as the
Sun StorEdge UniPack, MultiPack, D1000, A3500, and A5X00
products) can be used as a boot device for a Sun Enterprise 3500
without internal disk drives. Such a configuration does not require an
FC-AL Interface board because the FC-AL Interface board’s only
purpose is to connect to internal disk bays.
The interface boards can be connected to the SBus I/O Board and the
Graphics I/O Board which both come with a pair of on-board 100
MB/second FC-AL sockets. In addition, both types of boards support
a SBus Host Adapter that has a pair of 100 MB/second FC-AL sockets.
Each of these pairs of sockets can support the internal disk drives in
the Sun Enterprise 3500 or the Sun StorEdge A5000, but they cannot be
split up so that one supports one type of device while the other socket
supports a different type of device.
However, a PCI-only configuration in a Sun Enterprise 3500 does not
provide a way to connect the internal FC-AL disk drives. This is
because the PCI I/O Board does not have on-board FC-AL sockets and
there currently is no PCI FC-AL card available. So, if you want to use
the internal disk drives in the Sun Enterprise 3500, you must have at
least one SBus I/O or one Graphics I/O Board installed. There are no
plans to add on-board FC-AL sockets to the PCI I/O Board because
there is not enough physical space on the board to accommodate
on-board FC-AL sockets.
Even though the FC-AL connection cannot be split between internal
and external connection, the individual FC-AL connections on the
FC-AL Interface board are logically independent. The components do
get their power through a single connection. However, the power to
the FC-AL Interface board comes from the backplane which is
supported by redundant power supplies. Therefore the design has
practically no single point of failure.
Internal Disk Subsystems 8-213Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Fibre Channel Interface Board
The FC-AL board comes with two GBIC modules and one 2-meter
fibre channel cable to establish one loop (connection).
Figure 8-2 Basic FC-AL Loop
One GBIC module is installed on the FC-AL Interface board and,
typically, the other is installed on the I/O board (or SBus card) leaving
three empty FC-AL sockets on the FC-AL Interface board. Each
additional loop requires two additional GBIC modules and one 2-
meter fibre channel cable. The GBIC modules on the FC-AL Interface
board are exactly the same as those used in the Sun StorEdge A5X00
arrays, FC-AL SBus Host Adapter, and on the SBus I/O board.
SBusSBusSBus
SCSIEthernetFibre channel cable
Gigaplane Bus Connector
Card Card Card
AddressControl
DataControl
28841 UPA Bus
SBus I/O boardInterface board
To lower
{ {GBICs
disk baysTo upperdisk bays
8-214 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Sun Enterprise 3500 - Disk Addressing
A typical configuration, as illustrated in , takes advantage of the dual-ported capability of
the Sun Enterprise 3500 disk structure. Having two paths to each disk allows eliminates the
path to disk as a single point of failure.
Sun Enterprise 3500 Disk Configuration
In the Sun Enterprise 3500, the lower four drives are configurable as one group of disks, or
they can be accessed as two smaller independent groups of disks. The configuration is
application dependant. The same is true for the upper four disk bays.
I/OI/OIB
LA
UB
LB
UA
0 1 2 3
e1 e0 dc da
ef e8 e4 e2
4 5 6 7
13579
Internal Disk Subsystems 8-215Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
probe-fcal-all
A new command has been introduced to look at the FC-AL disk on an
E3500.
{e} ok probe-fcal-all
/sbus@6,0/SUNW,socal@d,10000/sf@1,0
/sbus@6,0/SUNW,socal@d,10000/sf@0,0
WWN 200d080020940232 Loopid 1
WWN 21000020370cbc0e Loopid e1Disk SEAGATE ST19171FCSUN9.0G117E9804P938
/sbus@2,0/SUNW,socal@d,10000/sf@1,0
/sbus@2,0/SUNW,socal@d,10000/sf@0,0
WWN 2005080020940232 Loopid 1
WWN 21000020370d8ad0 Loopid efDisk SEAGATEST19171FCSUN9.0G117E9814T324
Each disk in an E3500 has an independent world-wide number
(WWN). These numbers are assigned by the manufacturer and are
unique to the disk. The FC-AL specification states that each
component in a fibre channel loop must have a unique WWN. This
includes the interface boards.
The WWN of the IBs is derived from the host MAC address, in this
case 8:00:20:94:02:32
The WWN is mapped to a logical path at install time.
Do a long listing on the logical path to view how the numbers relate.
# ls -l /dev/dsk/c0t0d0s0
8-216 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
lrwxrwxrwx 1 root root 74 Jan 22 15:00/dev/dsk/c0t0d0s0 ->
../../devices/sbus@2,0/SUNW,socal@d,10000/ssd@w21000020370d8ad0,0:a
Fortunately, we don’t have to boot the device using the WWN. We can
boot using the disk id.
ok boot /sbus@2,0/SUNW,socal@d,10000/ssd@0,0
The proper approach is to put the above in the boot-device parameter
of the NVRAM and then boot from the alias
ok devalias disk
disk=/sbus@2,0/SUNW,socal@d,10000/ssd@0,0
ok boot disk
Internal Disk Subsystems 8-217Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Sun Enterprise 3500 - Boot Disk Replacement
A host that boots from a non-mirrored FCAL disk (either an A5000 or
the E3500 internal disks) will have to overcome the hard-coded World
Wide Number (WWN) that each of these disks uses as an integral part
of their device path.
On failure of the boot disk the systems administrator must ensure that
this WWN is correctly updated throughout the system to ensure it will
reboot.
Procedure
When the boot disk is replaced, and a system is booted from CD-ROM,
a device tree is built in memory as part of the boot sequence.
But, when the data is restored from a backup tape, the old path_to_instfile with the old WWN is put back on the disk.
To recover, mount the root filesystem which you have now restored on
/a. Run the following commands to re-build the devices tree:
# drvconfig -r /a -p /a/etc/path_to_inst# cd /devices# find . -print | cpio -pduVm /a/devices# disks -r /a# devlinks -r /a
NOTE: It is currently necessary to use both "drvconfig" and "find |
cpio" due to bugid 4161768, drvconfig does not work properly with
socal disks.
Restore the other filesystems on that disk, or comment out the entries
for them from /a/etc/vfstab. At least you must have all the Solaris
filesystems (root, /var, /usr, /opt, etc.) recovered.
Reboot the system from the recovered disk.
For full details, see Internal SRDB 17658
8-218 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Sun Enterprise 3500 - Data Disk Replacement
We will still have to overcome the hard-coded World Wide Number
(WWN) that each of these disks uses as an integral part of their device
path.
Procedure
Ensure the the following patches are installed or higher
Solaris 2.6
sf/socal/ib/luxadm patch - 105375-10
ssd patch - 105356-08
Solaris 2.5.1
sf/socal/ib/luxadm patch - 105310-08
ssd patch - 104708-16
These provide support for the luxadm commands on the E3500.
Unmount the disk and then stop it with
# luxadm stop <logical path, physical path or WWN ...>
Remove the device entries,the following command will complete this
# luxadm remove_device <logical path, physical path or WWN ...>
Replace the disk and then
# luxadm insert <no arguments required>
This will recreate the device entries, the device is now ready to be
used.
For full details, see Internal SRDB 18595
Internal Disk Subsystems 8-219Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
Sun Enterprise 3000 Disk Addressing
As you look at the front of an E3000, the top four disks are assigned
scsi targets 0-3, and the bottom six disks are assigned scsi targets 10-15.
Note that the system addresses the disks in hex
Note: All ten drives plus the tape unit and CD-ROM are driven from
the onboard scsi controller in slot 1.
8-220 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
I/O Addressing Test
The following output has been generated from an E3500.
Outline all the boards within the system, with part numbers.
You may assume that we have 400MHz processors.
{e} ok show-disks
a) /pci@b,4000/SUNW,isptwo@3/sdb) /sbus@7,0/SUNW,fas@0,8800000/sdc) /sbus@7,0/SUNW,fas@3,8800000/sdd) /sbus@6,0/SUNW,socal@d,10000/sf@1,0/ssde) /sbus@6,0/SUNW,socal@d,10000/sf@0,0/ssdf) /sbus@3,0/SUNW,fas@3,8800000/sdg) /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssdh) /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssdq) NO SELECTION
Internal Disk Subsystems 8-221Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
8
8-222 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
SolarisSupportUtilities 9
9-223Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
How Solaris References System Components
In the Solaris 2.x and 7 operating environments, system components
are referenced in three different ways:
● Logical device names – Names used by system administrators and
software to access system resources.
● Physical device names – Names that represent the full device path
name in the device information hierarchy (or tree).
● Instance names – The kernel’s abbreviated names for every
possible device on the system. dmesg displays instance names,
such as sd0 and sd1.
Logical Device Names
These names are symbolically linked to their corresponding physical
device (/devices ) names. The logical names are located in the /devdirectory and are created at the same time as the physical names.
It is important to remember that in most cases, software applications
and system administrators view system resources (such as disk)
through their logical names. When a system fault occurs, it might be
necessary to translate a device’s logical name to some physical
identifier so that you can repair the problem. The next few pages will
show you the relationship between the logical name and the physical
name.
The following examples show the logical names of a diskette drive and
hard disk drive 0.
# ls /dev/diskette*/dev/diskette/dev/diskette0
# ls /dev/rdsk/c0t0d0*c0t0d0s0 c0t0d0s1 c0t0d0s2 c0t0d0s3c0t0d0s4 c0t0d0s5 c0t0d0s6 c0t0d0s7
9-224 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Figure 9-1 shows the relationship of the hard disk drive logical name
syntax to traditional SCSI components.
Figure 9-1 Logical Name Syntax
/dev/[r]dsk/c#t#d#s#
Slice or partition number
Disk or logical unit number (LUN)
Target number
Controller number
Solaris Support Utilities 9-225Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
How Solaris References System Components (cont)
Physical Device Names
The physical names are located in the /devices directory where the
entries are created during installation or subsequent automatic device
configuration or by using the drvconf command. The device file
provides a pointer to the kernel device drivers.
● The following examples show the relationship of the diskette drive
and hard disk drive 0 physical names to their logical names.
Note – The following example is from an Enterprise 450.
# ls -l /dev/diskette*lrwxrwxrwx 1 root root 49 Aug 5 13:52 /dev/diskette ->/devices/pci@1f,4000/ebus@1/fdthree@14,3023f0:clrwxrwxrwx 1 root root 49 Aug 5 13:52 /dev/diskette0 ->/devices/pci@1f,4000/ebus@1/fdthree@14,3023f0:c
# ls -l /dev/rdsk/c0t0d0s0lrwxrwxrwx 1 root root 45 Aug 5 13:52 /dev/rdsk/c0t0d0s0 ->/devices/pci@1f,4000/scsi@3/sd@0,0:a,raw
● The next two examples show the corresponding OBP device treeand devalias entries for the same two devices.
ok show-devs./pci@1f,4000/ebus@1/fdthree@14,3023f0:c./pci@1f,4000/scsi@3/disk.
ok devalias.floppy /pci@1f,4000/ebus@1/fdthree.disk0 /pci@1f,4000/scsi@3/sd@0,0.
9-226 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
How Solaris References System Components (cont)
Instance Names
In the Solaris 2.x and 7 environments, the instance name is bound to
the physical name by references in the /etc/path_to_inst file.
The device instance is the number on the right side of the file (the
number is in bold in the displayed output for each device in the
following example). The kernel uses these names to identify every
possible device instance.
The instance numbers are assigned in order of insertion/configuration
and therefore do not necessarily follow any recognizable or usable
pattern. However, they do map to groupings of the minor device
numbers listed in the /devices/... sub-directories.
The following example shows the entries in the /etc/path_to_instfile for the same diskette drive and hard disk drive 0 seen earlier.
“/pci@1f,4000/ebus@1/fdthree@14,3023f0” 0 “ fd ”“/pci@1f,4000/scsi@3/sd@0” 0 “ sd ”
Solaris Support Utilities 9-227Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Configuring Components in Solaris (cont)
Automatic Device Configuration
The kernel, consisting of a small generic core with a platform-specific
component and a set of modules, is configured automatically in the
Solaris environment.
A kernel module is a hardware or software component that is used to
perform a specific task on the system. An example of a loadable kernel
module is a device driver that is loaded when the device is accessed.
The system determines what devices are attached to it at boot time.
Then the kernel configures itself dynamically, loading needed modules
into memory. At this time, device drivers are loaded when devices,
such as disk and tape devices, are accessed for the first time. This
process is called autoconfiguration because all kernel modules are
loaded automatically when needed.
Adding New Components to Solaris
Note – The following procedure should be used only when
configuring components that not hot-pluggable and/or Dynamic
Reconfiguration is unavailable.
If Solaris is running, perform the following steps:
1. Become superuser.
2. Create the /reconfigure file.
# touch /reconfigure
The /reconfigure file causes the Solaris software to check for the
presence of any newly installed devices the next time you turn on
or boot your system.
3. Shut down the system.
# shutdown -i0 -g30 -y
4. Turn off power to the system after it is shut down.
9-228 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
5. Turn off the system.
6. Install the device.
7. Turn on the power to the system.
The system will boot to multiuser mode and the login prompt will
be displayed.
8. Verify that the device has been configured.
Note – If the system is in OBP, execute the boot -r command to force
a Solaris reconfiguration.
Solaris Support Utilities 9-229Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
How to a Add a Device Driver
This procedure assumes that the device has already been added to the
system.
1. Become superuser.
2. Place the tape, diskette, or CD-ROM into the appropriate drive.
3. Use the pkgadd command to install the driver.
# pkgadd -d device package-name
where
-d device
Identifies the device pathname.
package-name
Identifies the package name that contains the device driver.
4. Verify that the package has been added correctly by using the
pkgchk command. The system prompt returns with no response if
the package is installed correctly.
# pkgchk packagename
9-230 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Displaying System Configuration Information - prtconf ,sysdef and format
Solaris provides you with a variety of utilities that you can use to
monitor Sun Enterprise systems. The following is a list of utilities to
display system and device configuration information:
● prtconf – Displays system configuration information, including
total amount of memory and the device configuration as described
by the system’s device hierarchy. The output displayed by this
command depends upon the type of system.
● sysdef – Displays device configuration information including
system hardware, pseudo devices, loadable modules, and selected
kernel parameters.
● format – Displays both logical and physical device names.
The prtconf Utility
The following prtconf output is displayed on a Enterprise 450
system. To execute the prtconf command, type the following:
# /usr/sbin/prtconfSystem Configuration: Sun Microsystems sun4uMemory size: 256 MegabytesSystem Peripherals (Software Nodes):
SUNW,Ultra-4 packages (driver not attached) terminal-emulator (driver not attached) deblocker (driver not attached) obp-tftp (driver not attached) disk-label (driver not attached) ufs-file-system (driver not attached)
openprom (driver not attached) client-services (driver not attached) options, instance #0 aliases (driver not attached) memory (driver not attached) virtual-memory (driver not attached) associations slot2disk slot2led
Solaris Support Utilities 9-231Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
slot2devpci, instance #0 ebus, instance #0 auxio (driver not attached) power, instance #0 (driver not attached) SUNW,pll (driver not attached) sc (driver not attached) se, instance #0 su, instance #0 su, instance #1 ecpp, instance #0 (driver not attached) fdthree, instance #0 eeprom (driver not attached) flashprom (driver not attached)
.
.
9-232 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
The sysdef Utility
The following sysdef output is displayed on a Enterprise 450 system.
To execute the sysdef command, type the following:
# /usr/sbin/sysdef** Hostid* 8095febb** sun4u Configuration*** Devices*packages (driver not attached) terminal-emulator (driver not attached) deblocker (driver not attached) obp-tftp (driver not attached) disk-label (driver not attached) ufs-file-system (driver not attached)openprom (driver not attached) client-services (driver not attached)options, instance #0aliases (driver not attached)memory (driver not attached)virtual-memory (driver not attached)associations (driver not attached) slot2disk (driver not attached) slot2led (driver not attached) slot2dev (driver not attached)counter-timer (driver not attached)pci, instance #0 ebus, instance #0 auxio (driver not attached) power, instance #0 (driver not attached) SUNW,pll (driver not attached) sc (driver not attached) se, instance #0 su, instance #0 su, instance #1 fdthree, instance #0 eeprom (driver not attached) flashprom (driver not attached)
Solaris Support Utilities 9-233Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
SUNW,envctrl, instance #0 network, instance #0 (driver not attached) scsi, instance #0 disk (driver not attached) tape (driver not attached) sd, instance #0 sd, instance #1 sd, instance #2 sd, instance #3 sd, instance #4 (driver not attached) sd, instance #5 (driver not attached) sd, instance #6 (driver not attached) sd, instance #7 (driver not attached) sd, instance #8 (driver not attached) sd, instance #9 (driver not attached) sd, instance #10 (driver not attached) sd, instance #11 (driver not attached) sd, instance #12 (driver not attached) sd, instance #13 (driver not attached) sd, instance #14 (driver not attached) scsi, instance #1 disk (driver not attached) tape (driver not attached) sd, instance #15 sd, instance #16 sd, instance #17 sd, instance #18 sd, instance #19 (driver not attached) sd, instance #20 (driver not attached) sd, instance #21 (driver not attached) sd, instance #22 (driver not attached) sd, instance #23 (driver not attached) sd, instance #24 (driver not attached) sd, instance #25 (driver not attached) sd, instance #26 (driver not attached) sd, instance #27 (driver not attached) sd, instance #28 (driver not attached) sd, instance #29 (driver not attached)pci, instance #1mc (driver not attached) bank (driver not attached) dimm (driver not attached) dimm (driver not attached) dimm (driver not attached) dimm (driver not attached) bank (driver not attached)
9-234 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
bank (driver not attached) bank (driver not attached)SUNW,UltraSPARC-II (driver not attached)pci, instance #2pci, instance #3pci, instance #4 SUNW,m64B, instance #0pci, instance #5pseudo, instance #0 clone, instance #0 ip, instance #0 tcp, instance #0..** Loadable Objects** Loadable Object Path = /platform/SUNW,Ultra-4/kernel*misc/platmodmisc/sparcv9/platmod** Loadable Object Path = /platform/sun4u/kernel*cpu/sparcv9/SUNW,UltraSPARC-IIcpu/sparcv9/SUNW,UltraSPARC-IIicpu/sparcv9/SUNW,UltraSPARC** Loadable Object Path = /kernel*drv/ispdrv/logdrv/le..** Loadable Object Path = /usr/kernel*drv/sparcv9/tnfdrv/sparcv9/audiocsdrv/sparcv9/dbristrmod/u8lat2** System Configuration** swap files
Solaris Support Utilities 9-235Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
swapfile dev swaplo blocks free/dev/dsk/c0t0d0s1 32,1 16 308800 308800** Tunable Parameters* 5095424 maximum memory allowed in buffer cache (bufhwm) 3898 maximum number of processes (v.v_proc) 99 maximum global priority in sys class (MAXCLSYSPRI) 3893 maximum processes per user id (v.v_maxup) 30 auto update time limit in seconds (NAUTOUP) 25 page stealing low water mark (GPGSLO) 5 fsflush run rate (FSFLUSHR) 25 minimum resident memory for avoiding deadlock (MINARMEM) 25 minimum swapable memory for avoiding deadlock (MINASMEM)..
9-236 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
The format Utility
The format utility is normally used to prepare a disk drive for access
by the Solaris operating system. Maintenance personnel also use this
utility as a visibility tool to determine which disk drives can be “seen”
by Solaris. To execute the format command, type the following:
# formatAVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /pci@1f,4000/scsi@3/sd@0,0 1. c0t3d0 <SUN9.0G cyl 4924 alt 2 hd 27 sec 133> /pci@1f,4000/scsi@3/sd@3,0Specify disk (enter its number):
This format example identifies two 9 GByte SCSI disk drives (sd@0,0and sd@3,0 )
Note – Press Control-d to exit the format utility.
Here is a rather more realistic example:
# formatAVAILABLE DISK SELECTIONS:
0. c0t12d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@3,0/SUNW,fas@3,8800000/sd@c,0 1. c0t13d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@3,0/SUNW,fas@3,8800000/sd@d,0 2. c1t0d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@7,0/SUNW,fas@3,8800000/sd@0,0 3. c1t1d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> /sbus@7,0/SUNW,fas@3,8800000/sd@1,0 4. c2t4d0 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64> /pseudo/rdnexus@2/rdriver@4,0 5. c2t4d1 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64> /pseudo/rdnexus@2/rdriver@4,1 6. c2t4d2 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64> /pseudo/rdnexus@2/rdriver@4,2 7. c3t5d3 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64> /pseudo/rdnexus@3/rdriver@5,3 8. c3t5d4 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64> /pseudo/rdnexus@3/rdriver@5,4 9. c3t5d5 <SYMBIOS-RSMArray2000-0204 cyl 8182 alt 2 hd 64 sec 64>
Solaris Support Utilities 9-237Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
/pseudo/rdnexus@3/rdriver@5,5 10. c4t5d0 <SYMBIOS-RSMArray2000-0205 cyl 8108 alt 2 hd 64 sec 64> /pseudo/rdnexus@4/rdriver@5,0 11. c4t5d2 <SYMBIOS-RSMArray2000-0205 cyl 8106 alt 2 hd 64 sec 64> /pseudo/rdnexus@4/rdriver@5,2 12. c4t5d3 <SYMBIOS-RSMArray2000-0205 cyl 8106 alt 2 hd 64 sec 64> /pseudo/rdnexus@4/rdriver@5,3 13. c4t5d4 <SYMBIOS-RSMArray2000-0205 cyl 8108 alt 2 hd 64 sec 64> /pseudo/rdnexus@4/rdriver@5,4 14. c5t4d1 <SYMBIOS-RSMArray2000-0205 cyl 8106 alt 2 hd 64 sec 64> /pseudo/rdnexus@5/rdriver@4,1
Specify disk (enter its number):
9-238 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Displaying Diagnostic Information
In addition to monitoring utilities, Solaris provides you with
commands that you can use to display diagnostic information. The
following commands are used for this purpose:
● dmesg – Looks in a system buffer for recently printed diagnostic
messages and prints them on the standard output.
● prtdiag – Displays displays system configuration and diagnostic
information. The diagnostic information lists any failed Field
Replaceable Units (FRUs) in the system.
Note – /var/adm/messages – Contains error messages relative to the
current operating system initialization.
The dmesg Command
The following dmesg output is from an Enterprise 450 system. To
execute the dmesg command, type the following:
# /usr/sbin/dmesgMon Aug 9 12:50:07 MDT 1999Aug 5 14:02:31 proto144 unix: pseudo-device: winlock0Aug 5 14:02:31 proto144 unix: winlock0 is /pseudo/winlock@0Aug 5 14:02:31 proto144 unix: pseudo-device: devinfo0Aug 5 14:02:31 proto144 unix: devinfo0 is /pseudo/devinfo@0Aug 5 14:02:32 proto144 unix: pseudo-device: vol0Aug 5 14:02:32 proto144 unix: vol0 is /pseudo/vol@0Aug 5 14:02:32 proto144 unix: pseudo-device: llc10Aug 5 14:02:32 proto144 unix: llc10 is /pseudo/llc1@0Aug 5 14:02:32 proto144 unix: pseudo-device: pm0Aug 5 14:02:32 proto144 unix: pm0 is /pseudo/pm@0Aug 5 14:02:32 proto144 unix: pseudo-device: tod0Aug 5 14:02:32 proto144 unix: tod0 is /pseudo/tod@0Aug 5 14:02:32 proto144 unix: ecpp0 at ebus0: offset 14,3043bcAug 5 14:02:32 proto144 unix: ecpp0 is/pci@1f,4000/ebus@1/ecpp@14,3043bcAug 5 14:02:59 proto144 unix: SUNW,hme0: Link Down - cableproblem?Aug 5 14:03:09 proto144 last message repeated 2 timesAug 6 10:07:50 proto144 unix: syncing file systems...Aug 6 10:07:50 proto144 unix: done
Solaris Support Utilities 9-239Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Aug 6 10:08:37 proto144 unix: SunOS Release 5.7 Version Generic_106541-04 64-bit [UNIX(R) System V Release 4.0]Aug 6 10:08:37 proto144 unix: Copyright (c) 1983-1999, Sun Microsystems,Inc.Aug 6 10:08:37 proto144 unix: Ethernet address = 8:0:20:95:fe:bbAug 6 10:08:37 proto144 unix: mem = 262144K (0x10000000)Aug 6 10:16:45 proto144 unix: avail mem = 250568704Aug 6 10:16:45 proto144 unix: root nexus = Sun Enterprise 450(UltraSPARC-II 296MHz)Aug 6 10:16:45 proto144 unix: pci0 at root: UPA 0x1f 0x4000Aug 6 10:16:45 proto144 unix: pci0 is /pci@1f,4000Aug 6 10:16:45 proto144 unix: pci1 at root: UPA 0x1f 0x2000Aug 6 10:16:45 proto144 unix: pci1 is /pci@1f,2000..
9-240 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
The prtdiag Command
To execute the prtdiag command, type the following:
# /usr/platform/ platform-name /sbin/prtdiag -v
Note – The command options are -l , which logs information to disk if
any error is found, and -v , which provides verbose output.
The following is an example of a prtdiag command output.
● CPU
========================= CPUs =========================Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask--- --- ------ ---- ---- ------ ---- 7 14 0 248 4.0 US-II 2.0 9 18 0 248 4.0 US-II 2.0 9 9 1 248 4.0 US-II 2.0
● Memory group
=============================== Memory ==================================Intrlv Intrlv.
Brd Bank MB Status Condition Speed Factor With--- ----- --- ------- --------- ----- ------- ------ 9 0 256 Active OK 60ns 1-way
● I/O boards
========================= IO Cards =========================Bus Freq
Brd Type MHz Slot Name Model--- ---- ---- --- ------------------------- -------- 1 SBus 25 0 DOLPHIN,sci 1 SBus 25 3 SUNW,hme 1 SBus 25 3 SUNW,fas/sd (block) 1 SBus 25 13 SUNW,socal/sf (scsi-3) 501-3060 3 SBus 25 0 DOLPHIN,sci 3 SBus 25 3 SUNW,hme 3 SBus 25 3 SUNW,fas/sd (block) 3 SBus 25 3 SUNW,socal/sf (scsi-3) 501-3060
Solaris Support Utilities 9-241Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
● Detached boards
No failures found in System===========================
● Fatal hardware reset
▼ This information is collected from components after a
hardware failure. This information is useful in determining the
correct FRU to be replaced.
No failures found in System===========================
● POST-detected failures
No System Faults found======================
● OS detected system faults
▼ System-detected faults lights the Yellow LED on the failing
board.
▼ You can repair system-detected faults. These faults will be
removed from the display when they are repaired (overtemp,
fan failure, power supply failure)
Most recent AC Power Failure:=============================Fri Mar 12 10:44:07 1999
● Environmental display
========================= Environmental Status =========================Keyswitch position is in Normal ModeSystem Power Status: Redundant
System LED Status: GREEN YELLOW GREENNormal ON OFF BLINKING
Fans:-----Unit Status---- ------Rack OKKey OKAC OK
9-242 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
System Temperatures (Celsius):------------------------------Brd State Current Min Max Trend--- ------- ------ --- --- -----1 OK 8 37 38 stable3 OK 44 44 44 stable7 OK 40 39 41 stable9 OK 44 43 45 stableCLK OK 35 35 35 stable
Power Supplies:---------------Supply Status--------- ------0 OK1 OK2 OK3 OKPPS OK System 3.3v OK System 5.0v OK Peripheral 5.0v OK Peripheral 12v OK Auxilary 5.0v OK Peripheral 5.0v precharge OK Peripheral 12v precharge OK System 3.3v precharge OK System 5.0v precharge OKAC Power OK
● Firmware levels
========================= HW Revisions =========================
ASIC Revisions:---------------Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type Attributes--- --- -- ----- ----- ---- ---- ---- ---------- ---------- 0 1 5 CPU 98MHz Capable 1 1 5 1 1 22 Dual-SBus-SOC+ 98MHz Capable 2 1 5 CPU 98MHz Capable 3 1 5 1 1 22 Dual-SBus-SOC+ 98MHz Capable 4 1 5 CPU 98MHz Capable 6 1 5 CPU 98MHz Capable
Solaris Support Utilities 9-243Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
System Board PROM revisions:----------------------------Board 0: OBP 3.2.24 1999/12/23 17:31 POST 3.9.24 1999/12/23 17:35Board 1: FCODE 1.8.24 1999/12/23 17:30 iPOST 3.4.24 1999/12/23 17:34Board 2: OBP 3.2.24 1999/12/23 17:31 POST 3.9.24 1999/12/23 17:35Board 3: FCODE 1.8.24 1999/12/23 17:30 iPOST 3.4.24 1999/12/23 17:34Board 4: OBP 3.2.24 1999/12/23 17:31 POST 3.9.24 1999/12/23 17:35Board 6: OBP 3.2.24 1999/12/23 17:31 POST 3.9.24 1999/12/23 17:35
9-244 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
Setting NVRAM Configuration Parameters From Solaris
The eeprom Command
Solaris provides system administrators and service personnel with the
ability to change system configuration parameters in NVRAM so that
they can take effect when the system is restarted. This is accomplished
by using the eeprom command.
The eeprom command displays or changes the values of parameters in
the EEPROM.
It processes parameters in the order given. When processing a
parameter accompanied by a value, eeprom makes the indicated
alteration to the EEPROM; otherwise it displays the parameter’s value.
When given no parameter specifiers, eeprom displays the values of all
EEPROM parameters.
The following are examples of the eeprom commands available:
● To display all configuration parameter settings, type
# eeprom
● To display the current setting of the auto-boot? parameter, type
# eeprom auto-boot?
● To disable boards in slots 3 and 5, type
# eeprom disable-board-list=35
● To set configuration policy to board, type
# eeprom configuration-policy=board
Solaris Support Utilities 9-245Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
9
9-246 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
SunVTSSystemDiagnostics 10
10-247Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Introduction
SunVTS Software Overview
SunVTS is Sun’s online validation test suite. With VTS, you can verify
the functionality of most of Sun’s hardware devices. You can use the
SunVTS tests to stress certain areas of the system as needed for
diagnostic and troubleshooting purposes.
The SunVTS diagnostic software is the successor to SunDiag™
diagnostics, which is shipped with the Solaris 2.4 operating system or
earlier releases. SunVTS runs on the Solaris 2.5 operating system and
later releases.
Like its SunDiag predecessor, SunVTS software can run concurrently
with customer applications and the Solaris operating system. SunVTS
is a vital part of the Sun Enterprise sever concurrent maintenance
strategy.
10-248 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Test Categories
SunVTS is comprised of many individual tests that support testing of a
wide range of products and peripherals. Most of the tests are capable
of testing devices in a 32-bit or 64-bit Solaris environment.
Use SunVTS to test one device or multiple devices. Some of the major
test categories are:
● Audio Tests
● Communication (Serial and Parallel) Tests
● Graphic/Video Tests
● Memory Tests
● Network Tests
● Peripherals (Disks, Tape, CD-ROM, Printer, Floppy) Tests
● Processor Tests
● Storage Tests
SunVTS System Diagnostics 10-249Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Hardware and Software Requirements
The following lists the requirements to run SunVTS Version 3.1
software successfully in the common desktop environment (CDE)
environment:
● The Solaris 7 3/99 operating environment
● The SunVTS 3.1 package
● Operating system kernel configured to support all peripherals to
be tested
● Superuser access to startup SunVTS software
● Connection of loopback connectors, installation of test media, or
the availability of disk space
Note – In this module, all references to SunVTS imply SunVTS 3.1.
10-250 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Starting the SunVTS Software
The SunVTS program is run when the superuser types one of the
following commands. The ex /opt/SUNWvts/bin directory needs to be
defined as part of the PATHvariable.
● sunvts – Runs the SunVTS kernel and default graphical interface
(CDE) on the local machine
● sunvts -l – Runs the SunVTS kernel and OpenLook graphical
interface on the local machine
● sunvts -t – Runs the SunVTS kernel in TTY mode, vtstty
● sunvts -h host_name – Runs the graphical interface on the local
machine while connecting and testing a remote machine
Note – The SUNvts package and, if needed, the SUNvtsx package must
be installed on both local and remote machines to perform remote
diagnostics.
SunVTS System Diagnostics 10-251Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The SunVTS Graphical Interface
The initial SunVTS graphic menu is shown in Figure 10-1.
Figure 10-1 SunVTS Graphical Interface
10-252 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The SunVTS Window Panels
The five major panels of the SunVTS window are:
● System Status Panel – Test status, host name, model type, number
of passes and errors, and elapsed time are displayed in the upper
area of the SunVTS menu.
● System Map – This area of the initial menu displays a logical device
view consisting of a selectable list of devices to test by default. You
can turn each device test on or off by clicking on the check box.
You can select particular devices, such as CPUs, network
interfaces, or disks, by clicking on the plus sign box.
● Select Devices – This area of the SunVTS menu enables you to
quickly select the devices to test, including a default set (shown in
Figure 6-2).
● Select mode – A SunVTS test session runs in one of two test modes:
Connection test mode and Functional test mode.
▼ Connection Test Mode
In Connection test mode, the tests determine if the devices are
connected to the system you are testing and if they are
accessible. Functional testing is not done in this mode, but the
devices are accessed to establish system connection and
accessibility.
You can safely run this mode when the system is online. When
SunVTS testing is started in Connection test mode, each test is
run sequentially until all tests are run.
The limited nature of the tests in this mode makes it possible to
run periodic checks for configuration verification on the system.
▼ Functional Test Mode
Checks the operation of the system devices. This mode finds
any faults and exercises the system by running tests to
increase the load and stress on the system.
Do not run critical applications on the system or use the
system for production purposes in Functional test mode.
● Test Messages – This area displays any information or error
messages that are issued during test executions.
SunVTS System Diagnostics 10-253Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The SunVTS Window Icons
Seven icons are provided at the top of the SunVTS menu. These are:
● Start – Begins the test, according to the selections made in the
System map, Select devices, and Select mode areas. Progressive
updates are displayed in the Information Panel during testing.
● Stop – Stops current testing, without exiting SunVTS.
● Reset – Sets the System map area to previous state.
● Host – Provides a submenu in which you can enter a remote host
name for a test connection. This host must be reachable, with
SunVTS installed.
● Log – Displays the log file, and provides a menu to select the
amount of information to log, including errors, information, and
UNIX messages (/var/adm/messages ).
● Meter – Invokes the performance monitor utility, which
graphically displays system resource activity during testing.
● Quit – Exits the SunVTS program.
10-254 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The SunVTS Menu Selections
The top horizontal bar of the SunVTS menu has four selections with
lists of associated submenus.
● Commands – This menu provides the following commands:
▼ Start testing – Begins testing
▼ Stop testing – Halts testing
▼ Connect to host – Specifies host name target host
▼ Trace test – Selects a test to trace, and a location for the output
▼ Reprobe system – Probes the hardware
▼ Quit VTS – Exits SunVTS
● View – This menu provides two options:
▼ Open System map – Displays full device selection list
▼ Close System map – Displays default device selection list
● Options – The following selections are available:
▼ Thresholds – Specifies number of passes, errors, and time torun
▼ Notify – Specifies a user to mail with test status information
▼ Schedule, Test Execution, and Advanced – Runs specifiednumber of tests with stress, verbose, core file, or run on erroroption (see the next page)
▼ Option files – Loads, stores, or removes a test options file
SunVTS System Diagnostics 10-255Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
● Reports – Two selections are provided:
▼ System configuration – Displays the system configurationreport as obtained with the prtconf command
▼ Log files – Displays the log file and allows selection of the levelof information to log
10-256 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The Schedule Options Menu
Clicking on the Schedule option beneath the Options selection on the
horizontal bar of the SunVTS window displays the window in
Figure 10-2.
Figure 10-2 Schedule Options Window
The available options are:
● Auto Start – Runs tests selected in a previously saved option file
using a command-line specification when sunvts is invoked.
● Single Pass – Runs only one pass of each selected test.
● System Concurrency – Specifies the maximum number of tests that
can be run concurrently on the machine being tested.
● Group Concurrency – Specifies the number of tests to be run at the
same time in the same group.
SunVTS System Diagnostics 10-257Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The Test Execution Menu
Clicking on Test Execution beneath the Options selection on the upper
horizontal bar of the SunVTS menu displays the window in
Figure 10-3.
Figure 10-3 Text Execution Options Window
10-258 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The Test Execution Menu
The following is a list of options available in the Test Execution menu:
● Stress – Runs certain tests in stress mode, working the system
harder than normal.
● Verbose – Enables more information to be logged and displayed
during testing.
● Core file – Allows for a core dump generation in the SunVTS bindirectory when abnormal conditions occur. The core file name
format is core .testname.xxxxxx .
● Run on Error – Continues testing until the max_errors value is
reached.
● Max Passes – Specifies the maximum number of passes that tests
can run. A value of zero indicates no limit.
● Max Errors – States the maximum number of errors any test allows
before stopping. A value of zero causes tests to continue regardless
of errors.
● Max Time – Specifies the maximum number of minutes tests are
allowed to run. A value of zero indicates no limit.
● Number of Instances – Specifies the number of tests to run for all
tests that are scalable.
SunVTS System Diagnostics 10-259Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The Advance Options Menu
Clicking on the Options selection on the topmost horizontal bar of the
SunVTS window displays the window in Figure 10-4.
Figure 10-4 Advanced Options Window
The available options are:
● System Override – Supersedes group and test options in favor of
the options selected in a Global Options window; set all options on
all test group and test option menus.
● Group Override – Supersedes specific test options in favor of the
group options set in a Group Options window.
● Group Lock – Protects specific group options from being changed
by the options set at the system level. (System Override
supersedes this option.)
● Test Lock – Protects specific test options from being changed by
options set at the group or system level. (System Override and
Group Override supersede this option.)
10-260 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Intervention Mode
Certain tests require that you intervene before you can run the test
successfully. These include tests that require media or loopback
connectors.
● Loopback connectors are required to run certain tests, such as
serial port tests, successfully.
See the SunVTS Test Reference Manual for more information about
loopback connectors, and which tests need them.
● Media (tapes, diskettes, or CD-ROMs) must be present in the
drive(s) before the system is probed at SunVTS startup. If this is
not done, the following error message is displayed:
Using old or damaged tapes and diskettes may causeerrors in corresponding tests.
You cannot select these tests until you enable the intervention mode.
This setting reminds you that you must intervene before the test can
be successfully completed.
SunVTS System Diagnostics 10-261Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Performance Monitor Panel
The performance monitor displays system resource activity. A brief
description of each component is provided on the next page.
Figure 10-5 Perfmeter Window
10-262 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
The Performance Monitor Panel
The information displayed with the SunVTS Performance Monitor is
the same as that displayed by the operating system perfmeter utility.
● cpu – Percentage of CPU used per second
● pkts – Ethernet packets per second
● page – Paging activity in pages per second
● swap – Jobs swapped per second
● intr – Number of device interrupts per second
● disk – Disk use in transfers per second
● cntxt – Number of context switches per second
● load – Average number of processes that have run over last minute
● colls – Collisions per second detected on the Ethernet
● errs – Errors per second on receiving packets
SunVTS System Diagnostics 10-263Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Using SunVTS in TTY Mode
If you use the SunVTS software in TTY mode, no frame buffer is
required. To run in TTY mode, perform the following steps:
1. Start the SunVTS kernel with the vtsk command.
# /opt/SUNWvts/bin/vtsk
2. Start the SunVTS TTY User Interface with the vtstty command
# /opt/SUNWvts/bin/vtstty
or the sunvts command with the -t option.
# /opt/SUNWvts/bin/sunvts -t
Figure 10-6 SunVTS Window
10-264 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Negotiating the SunVTS TTY Interface
The SunVTS TTY interface provides a screen with four working
panels: Message, Status, Control, and System map.
The following keys operate as follows with the TTY interface:
● Tab – Selects a screen panel for keyboard input
● Spacebar – Selects an option within a panel
● Arrows – Moves between the options in a panel
● Esc – Closes pop-up option windows
● Control-l – Refreshes the TTY window
Figure 10-7 Various Working Panels of the SunVTS TTY Interface
Control panel
Status panel
System map
Message area
SunVTS System Diagnostics 10-265Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
Running SunVTS Remotely
A testing session can be run across a network or even a modem.
Both the kernel and the user interface components are used in remote
testing.
Requirements
The following requirements must be met to run SunVTS on a remote
system:
● There must be network connectivity between the local and remote
system.
● You must install the same revision of SunVTS on both the local
and remote system.
Running SunVTS Through a Remote Login
1. Use the xhost command to allow the remote system to display on
your local system.
$ /usr/openwin/bin/xhost + remote_hostname
where remote_hostname is the name of the remote system.
2. Log in to the remote system and substitute user to root.
$ rlogin remote_hostname$ su -
3. Start SunVTS.
# /opt/SUNWvts/bin/sunvts -display \local_hostname :0
where local_hostname is the name or IP address of the local system.
Note – The SunVTS kernel starts on the remote system and the user
interface displays on your system.
10-266 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
4. Configure SunVTS for the test session and start the tests.
5. Review the SunVTS logs for test results.
You can view the remote system test logs through the local
SunVTS interface. The log files are stored on the system under test
(SUT).
SunVTS System Diagnostics 10-267Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
Running SunVTS Remotely
Running SunVTS Through telnet or tip
You can run SunVTS on a remote system, with the TTY interface,
through a telnet or tip session.
You need to set the correct terminal type and number of columns and
rows before starting the interface. The steps below describe this
process.
1. Use the echo command to display the value of the TERM variable:
Note – In this example, the TERM variable is a Korn or Bourne shell
variable and the value is sun-cmd .
$ echo $TERMsun-cmd
2. Use the stty command to display the settings of your terminal:
$ stty
speed 9600 baud; -parity hupclrows = 60; columns = 80; ypixels = 780; xpixels = 568;switch = <undef>;brkint -inpck -istrip icrnl -ixany imaxbel onlcrecho echoe echok echoctl echoke iexten
Note – You must have a minimum of 80 columns and 24 rows to run
the SunVTS TTY interface.
Write down the values of your TERM variable and rows and
columns settings. You will need these values later.
3. Connect to the remote system using either the telnet or tipcommands.
10
Running SunVTS Remotely
Running SunVTS Through telnet or tip
4. Become superuser on the remote system.
5. Identify your terminal type and settings in the telnet (or tip )
session window:
# TERM=sun-cmd# stty rows 60# stty columns 80
6. Start SunVTS with the TTY interface:
# /opt/SUNWvts/bin/sunvts -t
7. Configure SunVTS for the test session and start the tests.
8. Review the SunVTS logs for test results.
You can view the remote system test logs through the local
SunVTS TTY interface. The log files are stored on the system under
test (SUT).
SunVTS System Diagnostics 10-269Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
SunVTS supports a number of tests that are applicable to Sun
Enterprise servers. This section gives a brief description of these
tests. Further details on each test can be found in the SunVTS 3.x TestReference Manual.
Advanced Frame Buffer Test
The afbtest verifies the functionality of the Advanced Frame Buffer.
Note – This test supports Function Test mode only.
!Caution – Do not run any other application or screen saver program
that uses the AFB accelerator port while running afbtest . This
combination causes SunVTS to return incorrect errors.
SunATM Adapter Test
The atmtest checks the functionality of the SunATM-155 and
SunATM-622 SBus and PCI bus adapters. It runs only in loopback
(external or internal) mode. The Asynchronous Transfer Mode (ATM)
adapter, and ATM device driver must be present.
To run the atmtest in external loopback mode, a loopback connector
must be attached to the ATM adapter. The internal loopback mode
does not require a loopback connector.
Note – This test supports Function Test mode only.
Note – Do not run nettest while running atmtest .
Note – Bring the ATM interface down to make sure that the interface is
in offline mode before running atmtest
10-270 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Audio Test
The audiotest verifies the hardware and software components of the
audio subsystem. This test supports all Sun audio implementations.
Note – This test supports Connection and Function Test modes.
Note – The audio device is an exclusive use device. Only one process
or application can interface with it at a time.
Bidirectional Parallel Port Printer Test
The bpptest verifies the functionality of the bidirectional parallel port.
The bpptest verifies that your SBus card and its parallel port are
working properly by attempting to transfer a data pattern from the
SBus card to the printer.
Note – This test supports Connection and Function Test modes.
Compact Disc Test
The cdtest checks the CD-ROM unit by reading the CD. cdtest is
not a scalable test. Each track is classified as follows:
● Mode 1 uses error detection/correction code (288 bytes).
● Mode 2 uses that space for auxiliary data, or as an audio track.
Note – Load a compact disc into the drive before starting the test.
Note – This test supports Connection and Function Test modes.
SunVTS System Diagnostics 10-271Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Frame Buffer, GX, GX+ and TGX Options Test
The cg6 test verifies the cgsix frame buffer and the graphics options
offered with most SPARC based workstations and servers.
Note – This test supports Function Test mode only.
Disk and Floppy Drives Test
The disktest verifies the functionality of hard disk drives and floppy
drives using three subtests; Media, File System, and Asynchronous
I/O. Most disk drives, such as SCSI disks, native or SCSI floppy disks,
IPI, and so on, are supported. The type of drive being tested is
displayed at the top of the Test Parameter option menu.
The WriteRead option of the Media subtest is allowed only if a
selected partition is not mounted. By default, disktest does not
mount any partitions.
!Caution – If a power failure occurs while the Media subtest is running
in WriteRead mode, disk data might be destroyed.
Caution – Running the Media subtest on a disk partition in the
WriteRead mode can cause data corruption if the same partition is
being used by other programs. Only select this mode when the system
is offline (not used by any other users or programs).
Note – This test supports Connection and Function Test modes.
10-272 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
ECP 1284 Parallel Port Printer Test
The ecpptest verifies the functionality of the ecpp IEEE 1284 parallel
printer port device.
Note – The ecpp device is an exclusive use device. Only one
application can interface with it at a time
Note – This test supports Connection and Function Test modes.
Sun Enterprise Network Array Test
The enatest is used to provide configuration verification, fault
isolation, and repair validation of the Sun Enterprise Network Array.
The Sun Enterprise Network Array is a high availability mass storage
subsystem consisting of:
▼ SCSI fibre channel protocol host adapters with dual 100-
Megabyte FC-AL ports.
▼ A disk enclosure.
▼ A Front panel display for configuration information.
▼ Up to two interface boards in the enclosure, which provide
FC-AL connections to the enclosure and also provide status
information and control of the conditions within the enclosure.
▼ Other field-replaceable units (FRUs) within the enclosure
include power supply units, fan trays and backplane.
enatest detects all Sun Enterprise Network Array™ enclosures
connected to the host and collects relevant configuration information.
Note – This test supports Connection and Function Test modes.
SunVTS System Diagnostics 10-273Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
StorEdge 1000 Enclosure Test
The enctest tests the StorEdge 1000 enclosures. The enclosure can
support either 12 1” 4Gbyte drives or 8 1.6” 9Gbyte drives and have
redundant power and cooling. Two enclosure models are available:
● StorEdge A1000 - Disk Tray with the hardware RAID controller
● StorEdge D1000 - Disk Tray without the hardware RAID
controller.
You can use enctest can be used for validation, configuration
verification, repair verification, and fault isolation of both models.
The enctest probe detects all the connected StorEdge enclosures and
displays the status of the various elements in the enclosure.
Note – This test supports Connection and Function Test modes.
Frame Buffer Test
The fbtest is a generic test for all dumb frame buffers used with the
Solaris 2.x and Solaris 7 software.
Note – This test supports Function Test mode only.
Fast Frame Buffer Test
The ffbtest verifies the functionality of the Fast Frame Buffer.
ffbtest can detect and adapt to the video modes of single- and
double-buffer versions of the fast frame buffer (FFB).
Note – This test supports Function Test mode only.
10-274 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Floating Point Unit Test
The fputest checks the floating point unit on machines with the
SPARC-based architecture.
Note – This test supports Connection and Function Test modes.
Sun GigabitEthernet Test
The gemtest provides functional test coverage of the Sun
GigabitEthernet SBus and PCI bus adapters. It runs in loopback
(external/internal) mode and must be selected mutually exclusive
with the nettest . The gemtest provides better fault isolation as
compared to nettest .
Note – This test supports Function Test mode only.
Intelligent Fibre Channel Processor Test
The ifptest tests the functionality of the PCI FC_AL card when there
are no devices attached to the loop. The driver checks for devices on
the fibre loop. If devices are detected the driver blocks any diagnostic
commands.
Note – When devices are attached to the loop, do not run ifptest .
Instead, run disktest tests on the individual devices. This will test the
whole subsystem including the FC_AL controller.
Note – This test supports Connection and Function Test modes.
SunVTS System Diagnostics 10-275Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Dual Basic Rate ISDN (DBRI) Chip
The isdntest verifies the functionality of the ISDN portion of the
Dual Basic Rate ISDN (DBRI) chip.
Note – This test supports Function Test mode only.
M64 Video Board Test
The m64test tests the PCI-based M64 video board by performing the
following subtests:
● Video Memory test
● RAMDAC test
● Accelerator Port test
!Caution – DO NOT run any other application or screen saver program
that uses the Pineapple accelerator port while running m64test . Do
not run power management software. These programs cause SunVTS
to return incorrect errors.
Note – This test supports Function Test mode only.
Multiprocessor Test
The mptest verifies the functionality of multiprocessing hardware.
mptest can test up to 256 processors can be tested by mptest.
Note – This test supports Connection and Function Test modes.
10-276 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Network Hardware Test)
The nettest checks all the networking hardware on the system CPU
board and separate networking controllers (for example, a second
SBus Ethernet controller). For this test to be meaningful, the machine
under test must be attached to a network with at least one other
system on the network.
Note – This version of nettest is used for all networking devices,
including Ethernet (ie and le ), token ring (tr , trp ), quad Ethernet
(QED), fiber optic (fddi , nf , bf , pf ), SPARCcluster™ 1 System , ATM
(sa , ba), HiPPI, and 100-Mbits per second Ethernet (be,hme) devices.
Note – This test supports Connection and Function Test modes.
SPARCstorage Array Controller Test
The plntest checks the functionality of the controller board on the
SPARCstorage™ Array.
The SSA controller card is an intelligent, CPU-based board with its
own memory and ROM-resident software. In addition to providing a
communications link to the disk drives, it also buffers data between
the host system and disk drives in its nonvolatile RAM (NVRAM). For
data to go from the host to a particular disk, it must first be
successfully transferred to this NVRAM space.
The host machine, SBus host adapter card, fiber-channel connection,
and the SSA controller board must be working properly to perform
this data transfer operation. By verifying and stressing this operation,
plntest can isolate failures on the SSA disk drives from failures on
the SSA controller board.
Note – This test supports Connection and Function Test modes.
SunVTS System Diagnostics 10-277Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Physical Memory Test
The pmemtest checks the physical memory of the system. The
pmemtest locates parity errors, hard and soft error correction code
(ECC) errors, memory read errors, and addressing problems.
This test reads through all available physical memory. It does not write
to any physical memory location.
Note – This test supports Connection and Function Test modes.
Prestoserve Test
Prestoserve is an Network File System (NFS) accelerator. It reduces the
frequency of disk I/O access by caching the written data blocks in
nonvolatile memory. Prestoserve then flushes the cached data to disk
asynchronously, as necessary.
The pstest verifies the Prestoserve accelerator’s functionality with the
following three checks:
● Board battery check
● Board memory check
● Board performance and file I/O access check
Note – This test supports Function Test mode only.
10-278 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Serial Asynchronous Interface Test
The saiptest checks the functionality of the Serial Asynchronous
Interface card through its device driver.
Note – You must run the saiptest in intervention mode.
Note – This test supports Function Test mode only.
Sun Enterprise Cluster 2.0 Network Hardware Test
The scitest verifies the functionality of the Sun Enterprise Cluster 2.0
by checking the networking hardware. For this test to be meaningful,
the cluster must already be configured before the test is run.
After finding the cluster nodes (targets), scitest performs the
following tests:
● Random test sends out 256 packets with random data length and
random data.
● Incremental test sends out packets with length from minimum to
maximum packet size using incremental data.
● Pattern test sends 256 packets of maximum length
Note – This test supports Function Test mode only.
SunVTS System Diagnostics 10-279Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Environmental Sensing Card Test
The sentest checks the SCSI Environmental Sensing card (SEN)
installed in the SPARCstorage RSM to monitor the enclosure
environment. The SEN card monitors the enclosure’s over-temperature
condition, fan-failures, power-supply failures, and drive activity.
sentest verifies the following control functions in the enclosure:
● Alarm (enable/disable) – sentest toggles the alarm to the disable
state, then to the enable state.
● Alarm time (0-0xff seconds) – sentest sets the time (from 0 to
4095), then reads it back to verify the time setting.
● Drive fault LED (DL0-DL6) – sentest toggles each LED to its OFF
and ON states.
Note – This test supports Connection and Function Test.
Soc+ Host Adapter Card Test
The socaltest aids the validation and fault isolation of the SOC+
host adapter card. In the case of a faulty card, the test tries to isolate
the fault to the card, the Gigabit Interface Controller (GBIC) module,
or the DMA between the host adapter card and the host memory.
Note – This test supports Function Test mode only.
10-280 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
Serial Parallel Controller Test
The spiftest accesses card components such as the cd-180 and ppc2
chips, and the serial and parallel ports through the serial parallel
controller device driver.
Note – The spiftest must be run in Intervention mode.
Note – This test supports Function Test mode only.
Serial Ports Test
The sptest checks the system’s on-board serial ports (zs[0,1], zsh[0,1],
se[0,1], se_hdlc[0,1]), as well as any multi-terminal interface (ALM2)
boards (mcp[0-3]). Data is written and read in asynchronous and
synchronous modes utilizing various loopback paths.
Note – The sptest must be run in Intervention mode.
Note – This test supports Connection and Function Test.
SunButtons Test
The sunbuttons test verifies that the SunButtons graphics
manipulation device is working correctly
Note – This test supports Function Test mode only.
SunVTS System Diagnostics 10-281Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
SunDials Test
The sundials test verifies that the SunDials graphics manipulation
device controls are working properly. sundials also verifies the
connection between the dialbox and serial port.
Note – This test supports Function Test mode only.
HSI Board Test
The sunlink test verifies the functionality of the SBus and PCI bus
High Speed Serial Interface (HSI) boards by using the High-level Data
Link Control (HDLC) protocol. sunlink initializes and configures the
selected channel.
Note – This test will not pass unless you install the correct loopback
connectors or port to port cables on the ports you are testing.
Note – This test supports Function Test mode only.
Sun PCi Test
The sunpcitest tests the SunPCi™ plug-in PCI card, which is an X86
processor embedded in an add-on card.
Note – This test supports Function Test mode only.
10-282 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
SunVTS Test Summary
System Test
The systest checks the CPU board by exercising the I/O, memory,
and CPU channels simultaneously as threads. There is no quick test
option for systest ; it is a CPU stress test.
Note – This test supports Function Test mode only.
Tape Drive Test
The tapetest synchronous I/O test writes a pattern to a specified
number of blocks (or, for a SCSI tape, writes to the end of the tape).
The tapetest then rewinds the tape and reads and compares the data
just written. The tapetest asynchronous I/O test sends a series of up
to five asynchronous read/write requests to the tape drive, writing to
the tape and then reading and comparing the data. The tapetest file
test writes four files to the tape and then reads them back, comparing
the data. For tape library testing, the pass count is incremented only
after all tapes in the library have been tested.
Note – A blank writable tape (scratch tape) must be loaded before you
start this test.
Note – This test supports Connection and Function Test.
Virtual Memory Test
The vmemtest checks virtual memory; that is, it tests the combination
of physical memory and the swap partitions of the disk(s).
Note – This test supports Function Test mode only.
SunVTS System Diagnostics 10-283Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
10
t.
re-
r
rec-
Test Message Syntax
All SunVTS test messages follow this format:
SUNWvts.testname[.subtest_name].message_number date time testnamedevice_name [FRU_path]ERROR|FATAL|INFO|WARNING|VERBOSE message
Table 10-1 lists the SunVTS test message arguments and gives a brief
description.
Table 10-1 SunVTS Test Message Arguments
Argument Description
SUNWvts SunVTS package name
testname SunVTS test name
subtest_name The subtest module name (optional)
message_number The message identifier, which is a unique number for the tesThe number is usually within the following ranges: VER-BOSE: 1 - 1999 INFO: 2000 - 3999 WARNING: 4000 - 5999ERROR/FATAL: 6000 - 7999 FATAL: 8000 - 9998 (The num-ber 9999 is reserved for any possible old message types in pvious SunVTS releases for compatibility reasons.)
date time Tells when the error occurred
testname The name of the test reporting the error
device_name The device being tested when the error occurred
FRU_path A full Solaris device path of the failed FRU; this argumentvaries, depending on the type of test running when the errooccurred
message Contains test messages, in addition to probable cause and ommended action
10-284 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
AlternatePathing A
A-285Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Introducing Alternate Pathing
Alternate Pathing (AP) provides high availability to storage and
network devices. With AP, you have two physical paths to the same
A5000 or SSA storage array or network interface, transparent to the
operating system.
Only one path can be active at a time. If a path fails, the alternate path
can be switched in place of the failed path. Path switching does not
always occur automatically; you might need to switch it manually.
The system uses the meta-device, a name representing the end object
(such as the disk partition or network interface), but does not use the
physical path names to access the device.
Note – The AP material covered in this module applies to the AP 2.2
support that Solaris 7 provides for the Sun Enterprise x000 and x500
servers.
A-286 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Supported Devices
Disk Devices
AP supports the StorEdge A5000 and SPARCstorage arrays.
SCSI devices are not supported. The StorEdge A3000 is not supported,
but has its own internal AP capability.
After you set up AP for disks, you can use Solstice DiskSuite Version
4.1 and Sun Volume Manager Versions 2.3, 2.4, and 2.5 normally.
(However, on installation Dynamic Multipathing (DMP) automatically
disables itself in Volume Manger 2.5 if AP is already installed.)
!Caution – You must make sure that any AP devices used by these
products are used by their meta-device names only.
You can place your boot disk and primary network interface under AP
control. This makes it possible for the system to boot unattended, even
if the primary network or boot disk controller is not accessible, as long
as a usable alternate path for these devices is defined and available.
Network Devices
The following network devices are supported by AP 2.2:
● SunFastEthernet 2.0 (hme)
● Sun FDDI™ 5.0 (nf) SAS and DAS
● Lance Ethernet (le)
● Quad Ethernet (qe)
● Sun Quad FastEthernet (qfe)
● GigabitEthernet (ge)
Alternate Pathing A-287Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Installing AP
Solaris 7
Solaris 7 supports AP 2.2.
Solaris 2.6
Solaris 2.6 supports AP 2.1.
Solaris 2.5.1
Solaris 2.5.1 supports AP 2.0.
Installing AP
Install the following packages:
● SUNWapr– AP subsystem (root)
● SUNWapu– AP subsystem (usr)
● SUNWapdv
Documentation:
● SUNWabap– AP AnswerBook
● AP 2.0 only. AP 2.1 documentation is in theHardware AnswerBook, SUNWabhdw.
● SUNWapdoc– AP man pages
● Apply all appropriate patches
The installation process uses the pkgadd command to
install the AP packages. There is no order dependency.
A-288 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
How AP Works
AP creates a new layer of device drivers (meta-disks and meta-
networks), which accesses one of two physical device drivers to access
the device. Applications and the OS components, including the disk
management software, use the meta-device name to access the
resource. Only the drivers know the actual physical paths.
No component other than AP is aware that the normal device paths
are to the same device. This can cause problems for applications that
use the physical paths instead of the meta-devices to scan or inspect
disk or network devices; they might identify the meta-device paths as
separate devices.
AP automatically switches from the active disk path to the alternate
disk path if the active path fails. Additionally, you can manually
switch the active path to the alternate, at any time, with no
interruption to active traffic using the meta-device for both disk and
networks.
Note – In the Enterprise x000 and x500 computers there is no
automatic switch-over to the alternate path during a DR operation.
Meta-device definitions are stored in an AP state database that is used
early in the boot process. There are usually several copies of this
database. You must create the meta-devices yourself; the system will
not automatically create these for you.
Note – AP can be used with Sun Enterprise Volume Manager (SEVM)
or Solstice DiskSuite (SDS)
Alternate Pathing A-289Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Physical Paths
For the purposes of AP, an I/O device is either a disk or network device.
The only types of disk device currently supported by AP are the
StorEdge A5000 and the SPARCstorage Array (SSA). In this module,
the term disk always refers to one of these devices.
An I/O adapter is the controller for an I/O device such as an A5000
SOC+ adapter.
A device node is a path in the devices directory that is used to access a
physical device, such as /dev/dsk/c0t1d1s0 .
The term physical path refers to the electrical path from the host to a
disk or network.
A-290 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Metadisk
A metadisk is logical name that enables you to access a physical disk
device without having to specify the particular path to the device. You
reference a metadisk just as if it were a real device, using an AP-
specific device node, such as /dev/ap/dsk/mc0t1d1s0 . The AP
software determines which path is active and uses that path to access
the device.
The path, /dev/ap/dsk/mc0t1d1s0 is used to access a slice on a
metadisk, regardless of which pln port is currently active (handling
I/O) for the metadisk. For the A5000, the sf ports (representing an
SOC+ adapter) are where AP activates the paths.
Alternate Pathing A-291Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Disk Pathgroup
A disk pathgroup consists of two physical paths leading to the samestorage array. When a physical path is part of a pathgroup, it is called
an alternate path. An alternate path to a disk can be uniquely identified
by the pln or sf port that the alternate path uses.
Make sure that you understand the use of the term alternate. It means
either possible path, not just the spare path. The path in use is the
active alternate.
Only one alternate path at a time is allowed to handle disk I/O. The
alternate path that is currently handling I/O is called the activealternate.
One of the alternate paths is designated the primary path. The primary
path is initially made the active alternate. Although you can change
which path is the active alternate, the primary path is always the same.
The primary path has several properties.
● It is initially the active alternate.
● It provides the metadisk name.
● Identifies the metadisk.
You reference a disk pathgroup by specifying the pln or sf port (such
as pln1 or sf7 ) that corresponds to the primary path. For example, if
the primary path is sf1 , the pathgroup name is msf1 .
Some considerations are:
● Both array interfaces in a pathgroup must be attached to the same
array
● Only one interface is active at a time through the meta-device
● There must be exactly two adapters in a pathgroup
● If you have two interface boards, consider connecting a path to
each
● If you are using hubs in your configuration, use a separate hub for
each interface
A-292 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Metanetwork
A metanetwork, just like a metadisk, is a logical interface that enables
you to access a network through either of two physical paths without
having to reference either path explicitly within your scripts and
programs. You reference a metanetwork by using a metanetworkinterface name such as mle1 .
Interface mle1 is used to access the metanetwork, regardless of which
physical adapter (le1 or le6 ) is currently active for the metanetwork
device.
Alternate Pathing A-293Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Network Pathgroup
Similar to a disk pathgroup, a network pathgroup consists of two
network adapters connected to the same physical network.
To specify a network pathgroup, use the metanetwork interface name,
such as mle1 . Just as with a disk pathgroup, this is how you would
switch the active alternate.
Some considerations are:
● Network adapters in a pathgroup must be attached to the same
subnet
● Only one adapter is active at a time
● Use a separate hub for each path for even more redundancy
● There must be exactly two adapters in a pathgroup
● Both network adapters must be of the same device type
A-294 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
AP With Mirroring
AP is similar to, but not the same as, disk mirroring. Disk mirroring
replicates data to separate devices and thus achieves data redundancy.
AP, on the other hand, achieves pathing redundancy. Disk mirroring
and AP are complementary; you can use them together to achieve both
data redundancy and pathing redundancy.
Mirroring occurs on top of AP, which enables switching of the
underlying adapters used to implement the mirror from one board to
another without disruption of the disk mirroring or any active I/O.
AP does not provide mirroring itself.
Alternate Pathing A-295Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
AP and DR
AP supports DR which is used to logically attach and detach system
boards from the operating system without having to halt and reboot.
For example, with DR you can detach a board from the operating
system, physically remove and service the board, and then re-insert
the board and attach it to the operating system again. You can do all of
this without halting the operating system or terminating any user
applications.
To detach a board that is connected to an I/O device, and if that I/O
device is alternately pathed, you can first use AP to redirect the I/O
flow to a controller on a different board. You can then use DR to
detach the system board without interrupting the I/O flow.
A-296 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
The AP State Database
AP maintains a database that contains information about
all defined meta-disks, meta-networks, and their
corresponding alternate paths and properties. Each
system will have its own database.
Conceptually, a single AP database is maintained in a
single system. However, you should set up multiple
copies of this database. In this way, if a given database
copy is not accessible or becomes corrupted, AP can
automatically begin to use a current, non-corrupted
database copy. All of the AP databases synchronize their
contents during system initialization and DR operations.
● You must dedicate an entire raw disk slice, of at least
300 Kbytes, to each AP database copy. As configured
at the factory, slice 4 of the root disk is appropriately
sized for an AP database (2 Mbytes) and is not
allocated to any other purpose.
When choosing partitions for the AP database, remember
that:
● You should set up at least three to five database
copies.
● The database copies should have no I/O adapters in
common with each other. This helps protect against
an adapter failure.
● The copies can be on any slice of any type of disk
device. They do not need to be on devices that AP
supports, and do not need to have alternate paths.
● Especially if you are using Dynamic Reconfiguration
(DR), the database copies should be on I/O adapters
on different system boards so that at least one
database copy is always accessible if one of the
system boards is detached. Generally, you should
have one separate copy per system board.
Alternate Pathing A-297Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Creating the AP Database
Before you can begin configuring AP, you must create at
least one AP database. The AP database is created with
the apdb command. You can use apdb to create the
original database or a copy.
The apdb Command
# apdb -c /dev/rdsk/c0t3d0s4 -f
The -c (create) option is followed by the raw disk slice
that will contain the new AP database copy. Each copy
requires its own dedicated slice, which must be at least
300 Kbytes in size.
The -f (force) option is only necessary to create the firstAP database copy. It is not used otherwise.
If you want an AP database copy to reside on an AP
disk, you must create two copies of the AP database. The
AP configuration process can only access database
locations by the physical disk slice address, and is not
aware of meta-devices at this level.
You must create this database copy twice, specifying
each of the physical paths to the AP meta-disk. For
example, if c1 and c9 are connected to the same AP
pathgroup, to create a copy of the AP database residing
on target 3, slice 4, use the following two commands:
# apdb -c /dev/rdsk/c1t3d0s4 -f# apdb -c /dev/rdsk/c9t3d0s4
The AP software will be aware of two copies of the
database when actually there is only one, because the
disk is accessible through two paths. This database
"alias" is safe, because AP always updates and accesses
its database copies sequentially. The AP copy is updated
twice with the same information, but this is insignificant
overhead.
A-298 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
The whole process works outside of AP. AP is not aware
that these are two separate copies of the database.
Alternate Pathing A-299Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Creating the AP Database
Example
# apdb -c /dev/rdsk/c0t1d0s4 -f
The -c option specifies the raw disk slice (under /dev/rdsk ) where
you want to create the database copy. You must dedicate an entire disk
partition to each database copy. The disk partition must have at least
300 Kbytes. The -f (force) option is only necessary to create the first
AP database copy.
# apconfig -D
path: /dev/rdsk/c3t3d0s1major: 32minor: 145timestamp: Wed Mar 10 18:45:58 1999checksum: 2636010350default: yescorrupt: noinaccessible: no
path: /dev/rdsk/c3t3d0s6major: 32minor: 150timestamp: Wed Mar 10 18:50:43 1999checksum: 2636010350default: nosynced: yescorrupt: noinaccessible: no
A-300 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
AP Utility Examples
Identifying Disk Host Adapter Instances
Identifies all ports and provides the name, instance number, and disk
special files (/dev/dsk ) targets attached to each port.
# apinst
isp0/dev/dsk/c0t0d0/dev/dsk/c0t1d0/dev/dsk/c0t2d0pln0/dev/dsk/c1t0d0/dev/dsk/c1t1d0/dev/dsk/c1t2d0/dev/dsk/c1t3d0/dev/dsk/c1t4d0/dev/dsk/c1t5d0pln1/dev/dsk/c2t0d0/dev/dsk/c2t1d0/dev/dsk/c2t2d0/dev/dsk/c2t3d0/dev/dsk/c2t4d0/dev/dsk/c2t5d0sf0/dev/dsk/c3t0d0/dev/dsk/c3t1d0/dev/dsk/c3t2d0/dev/dsk/c3t3d0/dev/dsk/c3t4d0/dev/dsk/c3t5d0sf1/dev/dsk/c4t0d0/dev/dsk/c4t1d0/dev/dsk/c4t2d0/dev/dsk/c4t3d0/dev/dsk/c4t4d0/dev/dsk/c4t5d0
Alternate Pathing A-301Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Meta-Disk Configuration
# ssaadm disp c1
SPARCstorage Array 110 Configuration (ssaadm version: 1.20 97/05/14)Controllerpath:/devices/sbus@45,0/SUNW,soc@0,0/SUNW,pln@b0000000,8a0e2f:ctlr DEVICE STATUS TRAY 1 TRAY 2 TRAY 3slot1 Drive: 0,0 Drive: 2,0 Drive: 4,02 NO SELECT NO SELECT NO SELECT3 NO SELECT NO SELECT NO SELECT4 NO SELECT NO SELECT NO SELECT5 NO SELECT NO SELECT NO SELECT6 Drive: 1,0 Drive: 3,0 Drive: 5,07 NO SELECT NO SELECT NO SELECT8 NO SELECT NO SELECT NO SELECT9 NO SELECT NO SELECT NO SELECT10 NO SELECT NO SELECT NO SELECT
CONTROLLER STATUSVendor: SUNProduct ID: SSA110Product Rev: 1.0Firmware Rev: 3.12Serial Num: 00000083BE1DAccumulate Performance Statistics: Enabled
For A5000s, you would use:
# luxadm disp c2
Note that the luxadm command includes the ssaadmcommand functionality. You could use luxadm to obtain
information for both A5000 and SSA devices.
A-302 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Creating a Disk Pathgroup and Meta-Disks
1. Use apdisk to create an uncommitted disk
pathgroup. The apdisk command creates the meta-
disk names and updates the AP database with the
alternate paths for all six SSA disks.
# apdisk -c -p pln0 -a pln1
The -c operand specifies creation of a pathgroup,
and the -p and the -a operands specify the primary
and alternate paths, respectively.
2. Verify the results with apconfig -S -u .
# apconfig -S -u
c1 pln0 P Ac3 pln1 metadiskname(s): mc1t5d0 U mc1t4d0 U mc1t3d0 U mc1t2d0 U mc1t1d0 U mc1t0d0 U
Note that the entries are uncommitted.
3. Use apdb -C to commit the new database entries.
# apdb -C
4. Use apconfig -S to view the new disk entries in the
database. Note that the U is now gone.
# apconfig -S
c1 pln0 P Ac3 pln1 metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0 mc1t0d0
Alternate Pathing A-303Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Creating a Disk Pathgroup and Meta-Disks
5. Run drvconfig to create the new metadevice entries
in the /devices directory. The -i operand ensures
that only AP metadevices are created.
# drvconfig -i ap_dmd
6. Use the ls command to confirm that the device
nodes have been created.
# ls /devices/pseudo/ap_dmd*/devices/pseudo/ap_dmd@0:128,blk/devices/pseudo/ap_dmd@0:128,raw/devices/pseudo/ap_dmd@0:129,blk/devices/pseudo/ap_dmd@0:129,raw/devices/pseudo/ap_dmd@0:130,blk/devices/pseudo/ap_dmd@0:130,raw...
7. Use apconfig -R to create the /dev directory links
to the new /devices directory nodes. /dev/ap/dskand /dev/ap/rdsk links for each possible partition
on each drive will be created, just like the diskscommand does for regular disk devices.
# apconfig -R
8. Use the ls command to confirm that the /dev links
to the device nodes have been created.
# ls -l /dev/ap/dsktotal 8lrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s0 -> ../../../devices/pseudo/ap_dmd@0:128,blklrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s1 -> ../../../devices/pseudo/ap_dmd@0:129,blklrwxrwxrwx 1 root 40 Jul 27 16:47 mc1t0d0s2 -> ../../../devices/pseudo/ap_dmd@0:130,blk
Similar entries will exist for /dev/ap/rdsk .
A-304 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Using the Meta-Devices
You must modify every reference to a physical device
node (such as a path name that begins with /dev/dsk or
/dev/rdsk ) to use the corresponding meta-disk device
node, the path that begins with /dev/ap/dsk or
/dev/ap/rdsk .
If a partition is currently mounted under a physical path
name, it should be unmounted and remounted under the
meta-disk path name. This can be done by changing the
vfstab file and having the meta-device become active on
the next reboot.
Do not do this for the boot device.
If you are placing the boot disk under AP control, you
will need to modify the vfstab file by using the apbootcommand. Refer to the following page for further
information.
Alternate Pathing A-305Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Placing the Boot Disk Under AP Control
1. Create an AP pathgroup for physical path that
includes the boot disk.
2. Run apboot , specifying the boot meta-disk name, to
define the new AP boot device. apboot modifies
/etc/vfstab and /etc/system .
# apboot mc2t0d0
where mc2t0d0 is the meta-disk name of the boot
disk.
apboot examines /etc/vfstab and replaces the
physical device name of the boot disk, such as
/dev/dsk/c2t0d0s x, with the meta-disk name,
such as /dev/dsk/mc2t0d0s x. It also edits
/etc/system so that the drivers required for AP
boot disk usage are force loaded.
Do not manually replace the physical devices in
/etc/vfstab with meta-disks for the boot disk.
Instead, use the apboot command to ensure that all
required changes are made. Just changing
/etc/vfstab will prevent the system from booting.
3. Set the OBP environment variable boot-device to
the physical path most likely to be used for booting.
Do not use multiple device names from the devaliascommand, including the other path.
4. Define an OBP devalias for the alternate boot
device physical path in case you need to perform a
manual boot from the alternate path. Set the OBP
boot-device parameter to this name. Do not add it
to the boot-device parameter value.
5. At this point, just reboot the system to begin using
the AP boot device.
Warning – If you want to create a new AP database copy
after you have placed the boot disk under AP control,
and the new database copy is to be located on a partition
A-306 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
controlled by a pln port that does not control any of the
current AP database copies, you must first remove the
boot disk from AP control. Make sure that the new AP
database has been created. Then place the boot disk
under AP control again. Failure to follow this procedure
may cause the AP database to become inaccessible
during boot.
Alternate Pathing A-307Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Manually Switching the Active Path
Note – You can perform a switch at any time, even while
I/O is occurring on the device. You might want to
experiment with the switching process to verify that you
understand it and that your system is set up properly,
rather than wait until a critical situation occurs.
1. Use apconfig -S to view the current configuration:
# apconfig -S
c1 pln0 P Ac3 pln1 metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0
2. To perform the switch, use apconfig -P -a , where
-P identifies the pathgroup and -a specifies the path
to become active.
# apconfig -P pln0 -a pln1
3. Verify the results with the apconfig -S command.
You can see that the active alternate has been
switched to pln1 .
# apconfig -S
c1 pln0 Pc3 pln1 A metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0 mc1t2d0 mc1t1d0
A-308 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Note – Remember that switch operations take effect
immediately.
Alternate Pathing A-309Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Automatic Disk Pathgroup Switching (AP 2.1)
AP 2.1 provides the ability to automatically switch the
active path of a disk pathgroup. This will occur only
under two conditions:
● The currently active path has failed
● DR requests the switch (Enterprise 10000 only)
If AP detects that a path has failed, it will be marked
with a T in the apconfig -S output.
# apconfig -S
c1 pln0 P Ac3 pln1 T metadiskname(s): mc1t5d0 mc1t4d0 mc1t3d0
mc1t2d0 mc1t1d0
When a path is marked T (tried), AP will not
automatically switch to it. You can reset the tried flag by:
● Rebooting the system
● Using DR detach and then DR attach the board
● Resetting the flag manually with apdisk -w . Specify
the tried path, not the pathgroup name.
# apdisk -w pln1#
Note – Resetting the flag manually should only be done
after the cause of the failure has been repaired.
You can still manually switch to a path marked tried
with the apdisk -P command.
A-310 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Creating a Network Pathgroup
This example assumes that you are creating a network
pathgroup using physical interfaces le0 and le2 , with
le0 as the primary interface.
1. Use apnet to create an uncommitted network
pathgroup. The apnet command creates the meta-
interface names and updates the AP database with
the alternate paths.
# apnet -c -p le0 -a le2
The -c operand specifies creation of a pathgroup,
and the -p and the -a operands specify the primary
and alternate paths, respectively.
2. Verify the results with apconfig -N -u .
# apconfig -N -u
metanetwork: mle0 Uphysical devices: le2 le0 P A
3. Use apdb -C to commit the new database entries.
# apdb -C
4. Use apconfig -N to view the new network entries
in the database. Note that the U is now gone.
# apconfig -N
metanetwork: mle0physical devices: le2 le0 P A
Alternate Pathing A-311Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Alternately Pathing the Primary Network Interface
The primary network interface between your system and the other
machines on the network is difficult to configure down. There are
three ways to solve this problem:
● Create the appropriate AP database entries, create a new
/etc/hostname.m xxx file or rename the corresponding
/etc/hostname. xxx file, and then reboot your system.
● Set up a script file to perform the transition in your system
without rebooting.
Log in to your system from another network interface so that you
can stay connected when the primary network interface is
disabled.
You can also execute these commands all on one line, separated
with semi-colons. Ensure that you do not have any syntax errors.
Remember to remove any /etc/hostname.qfe0 and
/etc/hostname.qfe4 files, and add the /etc/hostname.mqfe0file.
# ifconfig qe0 down unplumb # ifconfig qe4 down unplumb # ifconfig mqe0 plumb # ifconfig mqe0 inet 136.162.22.45 netmask + broadcast + up
An example of a script to perform this operation is shown
overleaf..
A-312 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Alternately Pathing the Primary Network Interface
● Generate a script to configure the qe0 and qe4 interfaces down,
then configure up the meta-network interface. This method does
not require you to reboot your system, but you will briefly lose allcommunication over the primary network interface.
# ifconfig -alo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232inet 127.0.0.1 netmask ff000000qe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255ether 0:0:be:0:8:c5# cat > /tmp/washington.restartifconfig qe0 down unplumbifconfig qe4 down unplumbifconfig mqe0 plumbifconfig mqe0 inet 136.162.22.45 netmask + broadcast + up^D# chmod 700 /tmp/washington.restart# nohup /tmp/washington.restart &# ifconfig -alo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232inet 127.0.0.1 netmask ff000000mqe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500inet 136.162.22.45 netmask ffffff00 broadcast 136.162.22.255ether 0:0:be:0:8:c5#
Boot Time Interface Failure
If the primary network path fails at boot time, AP will switch the
primary interface to the other alternate. An automatic switch due
to an error will not occur at any other time.
Alternate Pathing A-313Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
A
Switching a Network Pathgroup
Remember that you can switch the active interface of a
network pathgroup while the meta-interface is active.
The change is recorded in the state databases. The new
active path will be used until you switch back, even after
a reboot.
To switch the active interface, use the apconfigcommand. The change will occur immediately. There is
no commit process for pathgroup switching.
# apconfig -P mle0 -a le2
You can see that the switch has occurred by using the
apconfig -N command.
# apconfig -N
metanetwork: mle0physical devices:le2 A
le0 P
Note – Remember that switch operations take effect
immediately; there is no commit process for them.
Warning – When you switch interfaces, AP does not
check that the interface you are going to is the correct
path. AP does not know if the new interface is connected
to the wrong subnet, disconnected, or inoperative.
A-314 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
DynamicReconfiguration B
B-315Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Introduction to Dynamic Reconfiguration
What Is Dynamic Reconfiguration?
Dynamic Reconfiguration (DR) is the ability to alter the configuration
of a running system by bringing components online or taking them
offline without disrupting system operation or requiring a system
reboot. With the availability of DR, system boards can be logically and
physically included in the system configuration, or logically
deactivated and removed while the system is running.
DR is useful in mission-critical environments if a system board fails
and must be replaced or if new system boards need to be added to the
system for additional performance and capacity. It is a critical part of
the concurrent maintenance strategy prevalent in the enterprise
computing environment.
Note – DR capability requires that the system OBP be at revision 3.2.22
or later (refer to the prtconf -V command) and the operating system
be Solaris 7 5/99 or later (refer to the /etc/release file).
Benefits of DR
DR increases system availability and flexibility by allowing the hot-
swap CPU/memory and I/O board functionality that the Sun
Enterprise 3000-6000 server hardware has supported from the
beginning. Hot-swap functionality means that the components can be
physically and logically removed or added while the system is
running.
DR includes:
● Dynamic attachment of system boards making them available for
use without rebooting the system
● Dynamic detachment of system boards making them ready for
physical removal without rebooting the system
● Display of board status
● Initiation of board testing
B-316 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Disadvantages of DR
The main disadvantage is that to dynamically add and remove
CPU/Memory boards, you must set memory_interleaving to min, i.e.
disable it, since dr can not handle memory spread across boards.
This has a major impact on performance.
Supported Hardware
Table 2-1 lists the supported system board types that the cfgadmcommand displays. System I/O boards are classified by numerical typevalue.
Table 2-1 DR Supported Boards
!Caution – Do not assume that just because an I/O board will dr, the
SBus cards on it will dr. For a complete list of supported hardware,
refer to http://sunsolve5.sun.com/sunsolve/Enterprise-dr/
Type Name and Identifying Characteristics
CPU/mem CPU/memory board with at least one CPU module
Mem CPU/memory board with no CPU module
Disk board System board containing two SCSI disk drives
Type 1 SBus I/O board with 3 SBus slots and 2 FC-OM
Type 2 Graphics I/O with 1 UPA slot, 2 SBus slots and 2 FC-OM
Type 3 PCI+ I/O board with 2 PCI card adapter slots
Type 4 SBus+ I/O board with 3 SBus slots and 2 GBIC
Type 5 Graphics+ I/O with 1 UPA slot, 2 SBus slots and 2 GBICs.
Dynamic Reconfiguration B-317Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Limitations to Dynamic Reconfiguration
Slot 1 board can not be removed
● The slot 1 board provides the electrical path to devices on the clock
board, and is normally the lowest-numbered working I/O board.
First CPU board can not be removed
● This is due to the fact that the POST Master is also set up as the
JTAG Master, and can not be dr’d since the JTAG Master controls
the dr POST.
It is not too difficult to crash the system...
● Inserting a failed board can immediately crash the system.
Connecting a bad board that passes POST can also crash the
system.
● Bending a pin when inserting a board can crash system. Hardware
slots are not isolated.
● Inserting a board in too slowly can panic Solaris. If an interrupt is
in flight when the pause pin is asserted during insert for more
than one second, Solaris will panic.
Fails using 168MHz modules
● POST fails during DR connect on 168 MHz machine. DR connect
operation with a CPU/Memory board that has UltraSparc I
modules can fail or take a long time.
Fails in single user mode
● DR connect operation hangs in single user mode. DR connect
operations performed in single user mode causes the system to
hang.
B-318 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Displaying Board Status
Basic Status Display using cfgadm
When used without options, the cfgadm command displays
information about all known attachment points, the collective term for a
board and its card cage slot (or receptacle).
There are two types of system names for attachment points:
● Physical attachment point – Describes the software driver and
location of the card cage slot. For example:
/devices/central@1f,0/fhc@0,f8800000/clock-board@0,900000:sysctrl,slot0
● Logical attachment point – An abbreviated name created by the
system to refer to the physical attachment point. For example:
sysctrl0:slot0
DR displays the status of the slot, the board, and the attachment point.
The DR definition of a board also includes the devices connected to it.
The term occupant is used to refer to the combination of board and
attached devices.
The following display shows a typical cfgadm output:
Ap_Id Receptacle Occupant Condition
ac0:bank0 connected configured okac0:bank1 empty unconfigured unknownac1:bank0 connected configured okac1:bank1 empty unconfigured unknownac2:bank0 connected configured okac2:bank1 empty unconfigured unknownsysctrl0:slot0 connected configured oksysctrl0:slot1 connected configured oksysctrl0:slot2 disconnected unconfigured unknownsysctrl0:slot3 connected configured ok
Dynamic Reconfiguration B-319Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
canbe
is
on-
fig-
Displaying Board Status
The following is a lists the possible conditions of the receptacle and occupant.
The LED assignments are as you look at them from left to right.
ReceptacleStatus Explanation
Empty No board is present in the slot. All LEDs are off.
Disconnected A board is present but is electrically disconnected. The system identify the board type. The board is in low power mode and can unplugged at any time. LED state off on off
Connected The board is electrically connected and powered up. The systemactively monitoring the board for temperature and cooling.LED state on off off
OccupantStatus Explanation
Configured Devices on the board are fully initialized and can be mounted or cfigured for use. LED state on off blink
Unconfigured The unconfigured state covers all device states that are not conured, including receptacles in the empty state.LED state on off off
B-320 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
tsd is
n
asys-s
te
-
er
pty
Displaying Board Status
Conditions Explanation
Unknown The current condition cannot be determined. This situation resuleither when a new board is inserted in a running system, or a boarplaced on the disabled board list before a reboot. A transition to aconnected receptacle state changes an attachment point conditiofrom unknown to either OK or Failed.
OK No problems detected. This condition occurs only after a board hbeen connected. This condition persists either until the board is phically removed, or a problem is detected. An OK condition requirecorrect hardware compatibility, correct firmware revision, adequapower, adequate cooling, and adequate precharge.
Failing A failing condition occurs when a board that was in the OK condition develops a problem.
Failed The board has failed POST/OBP. A failed condition can occur eithduring bootup or after a failed connect attempt. This condition isconsidered uncorrectable and will persist until the board is physi-cally removed.
Unusable Either an attachment point has incompatible hardware, or an emattachment point lacks power, cooling, or precharge current.
Dynamic Reconfiguration B-321Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Detailed Status Display using cfgadm -v
For a more detailed status report, use the command cfgadm -v . The
-v option turns on expanded (verbose) descriptions.
Figure B-1 shows a breakdown of each field found in the output of the
cfgadm -v command. The example shown is of a 64MB memory
module.
Figure B-1 Detail status display entry
ac0:bank0 connected configured ok slot0 64mb base 0x00000000
May 1 13:00 memory n /devices/fhc@0,f8800000/ac@0,1000000/bank0
Attachment point
Slot electrical condition
Board operational condition
Physical ID and locationBoard Activity
Board status
Location
Board type(board not busy)
B-322 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Reconfiguration Considerations
Device Driver Interface DDI
For a device to fully conform to dr, it must comply with the following:
The device driver must support DDI_ATTACH, DDI_DETACH andDDI_SUSPEND/RESUME.
All drivers support DDI_ATTACH but not all drivers support
DDI_DETACH and DDI_SUSPEND/RESUME.
A dr detach must pause the operating system, i.e. quiesce it, and to do
this the driver must be suspend-safe.
Suspend-Safe and Suspend-Unsafe Devices
A driver is suspend-safe if it supports operating system quiescence, that
is, one that does not access memory or interrupt the system while the
operating system is in quiescence (suspend/resume).
It also guarantees that when a suspend request is successfully
completed, the device that the driver manages does not attempt to
access memory, even if the device is open when the suspend request is
made.
A suspend-unsafe device is one that allows a memory access or a system
interruption while the operating system is in quiescence.
Suspend-safe drivers allows you to:
● Stop user threads.
● Execute the DDI_SUSPEND call in each device driver.
● Stop the clock and CPUs.
The operating system refuses a quiescence request if a suspend-unsafe
device is open.
To manually suspend the device, you will have to issue a modunloadcommand.
Dynamic Reconfiguration B-323Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Testing for Suspend-Safe Drivers
The quiesce-test option tests for suspendable drivers. For example:
# cfgadm -x quiesce-test sysctrl0:slot <number>
Note – All tape drivers are considered suspend-unsafe.
Hot-Plug Hardware
Hot-plug boards and modules have special connectors that supply
electrical power to the board or module before the data pins make
contact. Boards and devices that do not have hot-plug connectors
cannot be inserted or removed while the system is running.
!Caution – Before inserting a board into the centreplane, it is essential
that the precharge voltages are present. Ensure the PPS is supplying
these voltages by typing:
/usr/platform/sun4u/sbin/prtdiag -v | grep precharge
I/O boards and CPU/memory boards used in Enterprise x000 and
x500 systems are hot-plug devices. Some devices, such as the clock
board, are not hot-plug modules and cannot be removed while the
system is running.
B-324 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Permanent Memory Management
Certain parts of memory can not be paged out during a detach. This
permanent memory includes the kernel and OBP.
The kernel is loaded to high order memory during boot up.
The kernel must be confined to one system board, a process known as
caging the kernel.
The only system board that can not be removed from an operating
system is the board in the lowest numerical slot.
It is recommended that steps be taken to force the kernel to load on
that board so only one system board is restricted.
Required additions to /etc/system
The following entries must be added to the /etc/system file.The
following enables dr on I/O boards:
set soc:soc_enable_detach_suspend=1set pln:pln_enable_detach_suspend=1
The following enables dr on CPU/Memory boards:
set kernel_cage_enable=1
Dynamic Reconfiguration B-325Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Removing a CPU/Memory Board
!Note – Performing the following board removal procedures is the
responsibility of the system administrator. However, it is important for
you to understand these procedures in order to assist where possible.
The memory modules on a CPU/memory board can be shared by
other CPU/memory boards. Therefore, you must halt all use of
memory modules on a board before you can remove the board.
1. Log into the system console as root .
2. Use the cfgadm command to determine the system name for the
CPU/memory board and associated memory banks.
Note – A CPU/memory board can have up to two banks of memory.
Memory banks have logical names of the form acnumber:bank number.The term acnumber identifies the driver instance, but the number is not
directly related to the board slot number. The bank number is either
bank0 or bank1.
Note – For the example in this procedure, the board is ac1 , which has
one memory bank (bank1).
Also, verify that you can relocate the memory modules on the
CPU board.
# cfgadm -v
You cannot unconfigure non-relocatable memory pages in the
memory span (a section of memory that is reserved for system
use). Non-relocatable memory is identified as permanent in a
cfgadm listing.
3. If the memory is relocatable, stop all activity in the memory
modules on the board.
# cfgadm -c unconfigure ac1:bank1
B-326 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
This step halts all accesses by other CPU/memory boards and
prevents any further memory use until the board is replaced.
4. Verify that the CPUs on the board are not bound to any processes
running in the system.
If a CPU is bound to a process, the board cannot be removed until
the process is unbound.
The CPUs are identified by numbers that are related to the board
number. The first CPU number is twice the board number (2*n).
The second CPU number is twice the board number, plus one (2*n
+ 1).
To list all bound processes, use the pbind command. If any of the
listed processes show the CPUs in question, the related boards
cannot be removed until those processes are unbound.
The following example shows that process ID 1145 is bound to
processor 10 (board number 5, CPU 0). The pbind -u (unbind)
command unbinds the process. The pbind -q (query) command
shows that process ID 1145 is no longer bound.
# pbind
process id 1145: 10
# pbind -u 1145# pbind -q 1145
process id 1145: not bound
5. Unconfigure the board.
# cfgadm -c unconfigure sysctrl0:slot <number>
where slot <number> is the slot location (number) in the card
cage.
6. If the previous step did not also disconnect the board, disconnect
the board by typing the following command:
# cfgadm -c disconnect sysctrl0:slot <number>
7. When the LEDs on the board indicate that the board is ready for
removal (two outer LEDs off and the middle LED on), you can
physically remove the board.
Dynamic Reconfiguration B-327Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
!Caution – If a replacement board is not available and you remove the
board, you must fill the empty slot to maintain the proper flow of
cooling air in the card cage. For Sun Enterprise 3000, 3500, 4000, 4500,
5000, and 5500 systems, use a dummy board (part number 504-2592).
For Sun Enterprise 6000 or 6500 systems, use a load board
(part number 501-3142).
B-328 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Installing or Replacing a CPU/Memory Board
1. Verify that the selected board slot can accept a board.
# cfgadm
The states and conditions should be:
▼ Empty, Unconfigured, Unknown
2. Physically insert the board into the slot and watch for an
acknowledgment on the system console or in the system log file.
The acknowledgment is of the form,
Name inserted into slot <number>
where Name is the name of the system board being installed and
<number> is the slot location (number) in the card cage.
After a CPU/memory board is inserted, the states and conditions
should become:
▼ Disconnected, Unconfigured, Unknown
Note – Any other states or conditions should be considered an error.
Dynamic Reconfiguration B-329Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Operating System Quiescence
During an insertion operation of a board, the operating system is
briefly paused, which is known as operating system quiescence. All
operating system and device activity on the backplane must cease for a
few seconds during a critical phase of the operation. You must reply
with a yes to continue or no to stop the configuration process and
allow the operating system to continue operating normally.
Before quiescence can be achieved, the operating system must
temporarily suspend all processes, CPUs, and device activities. If the
operating system cannot achieve quiescence, it displays the reasons,
which can include the following:
● A user thread did not suspend
● Real-time processes are running
● A device exists that cannot be paused by the operating system
The conditions that cause processes to fail to suspend are generally
temporary. Examine the reasons for the failure. If the operating system
encountered a transient condition causing a failure to suspend a
process, you can try the operation again.
3. Configure the board.
# cfgadm -v -c configure sysctrl0:slot <number>
This command should both connect and configure the receptacle.
Us the cfgadm command to verify this.
The states and conditions for a connected and configured
attachment point should be:
▼ Connected, Configured, OK
Now the system is aware of the usable devices on the board and
the devices can be used.
4. Configure the memory devices on the board in Solaris.
# drvconfig -i ac
5. Determine the system numbers of the new CPU modules.
B-330 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
# psrinfo
4 on-line since 5/15/99 08:01:145 on-line since 5/15/99 08:01:196 powered-off since 5/16/99 09:27:21
In this example, there is one new CPU module (system number 6).
The module has not yet been enabled, so it is listed as being
powered off.
The system number for a CPU is equal to twice the board number,
plus 0 for CPU module 0, or 1 for CPU module 1. In the example
shown, system number 6 represents module 0 on board number 3.
6. Enable the new CPU module or modules.
# psradm -n 6
where 6 is the system number of the CPU module to be enabled.
7. Test the new memory banks.
# cfgadm -o test_type -t ac number:bank0# cfgadm -o test_type -t ac number:bank1
where test_type is one of three memory tests:
▼ Quick – Writes a pattern of ones and zeros.
▼ Normal – Detects specific memory address failures.
▼ Extended – Tests interference between memory cells.
Note – The acnumber can be found in the basic or detailed status
display.
8. Configure the new memory banks.
# cfgadm -c configure ac number:bank0# cfgadm -c configure ac number:bank1
9. Verify that the board and the memory banks are configured.
▼ For the CPU status, use the psrinfo or mpstat commands.
▼ For the memory status, use the prtconf or vmstatcommands.
Dynamic Reconfiguration B-331Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Removing an I/O Board
This procedure assumes that all activity going to the I/O board to be
removed has been stopped, file systems have been unmounted, and
network interfaces have been shut down.
Or, if AP is in use, all I/O functions have been switched to the
alternate I/O board.
1. Verify that all I/O activity to the board has been terminated.
2. Check the status of the board.
# cfgadm
For a board removal or replacement, the states and conditions
must be one of the following sets:
If the board is ok, state is:
▼ Connected, Configured, OK
If the board is failing, state is:
▼ Connected, Configured, Failing
3. Unconfigure the board.
# cfgadm -c unconfigure sysctrl0:slot <number>
4. Use the cfgadm command to confirm that the board is
unconfigured.
If the unconfigure operation failed, verify that:
▼ The board is Detach-Safe.
▼ Activity on the board has been quiesced.
!Caution – A failure of step 4 results in a partially unconfigured
condition. If this happens, attempt to unconfigure again. A
configuration operation is not permitted at this point.
5. When the board is unconfigured, you can do one of the following:
B-332 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
▼ Leave the board in the system unconfigured
▼ Configure the board
▼ Disconnect the board manually, if the unconfiguration
operation did not do so automatically by typing the following
command:
# cfgadm -v -c disconnect sysctrl0:slot <number>
6. If you wish to remove the board from the card cage, first verify the
board status.
▼ Use the cfgadm command to verify that the board is logically
disconnected.
▼ Check the LEDs on the board to verify that the board is
electrically disconnected. The two outer LEDs must be off and
the middle LED must be on.
!Caution – If a replacement board is not available and you remove the
board, you must fill the empty slot to maintain the proper flow of
cooling air in the card cage. For Sun Enterprise 6000 or 6500 systems,
use a load board (part number 501-3142), for all other systems use a
dummy board (part number 504-2592).
Dynamic Reconfiguration B-333Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Removing Boards that Use Detach-Unsafe Drivers
Some drivers do not yet support DR on Sun Enterprise 3x00, 4x00,
5x00, and 6x00 systems. DR cannot detach these drivers, but you can
remove some undetachable drivers manually.
1. Halt all use of the device controller.
2. Halt the use of all other controllers of the same type on all boards
in the machine.
The remaining controllers can be used again after the DR
unconfigure operation is complete.
3. Use Unix commands to manually close all such drivers on the
board and use the modunload command to unload them.
# modinfo | grep tape107 f66a0000 dfe9 33 1 st (SCSI tape driver 1.1)
# modunload -i 107#
4. Disconnect the board.
# cfgadm -c disconnect sysctrl0:slot <number>
The disconnected board can be physically removed now or at a
later time.
!Caution – Many third-party drivers (those purchased from vendors
other than Sun Microsystems) do not yet properly support the
standard Solaris software modunload interface. Test these driver
functions during the qualification and installation phases of any third-
party device.
B-334 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Installing a New I/O Board
1. Verify that the selected board slot is ready for a board.
# cfgadm
The states and conditions should be:
▼ Empty, Unconfigured,Unknown
2. Physically insert the board into the slot and look for an
acknowledgment on the console in the form of
Name board inserted into slot <number>
After an I/O board is inserted, the states and conditions should
become:
▼ Disconnected,Unconfigured,Unknown
Note – Any other states or conditions should be considered an error.
3. Connect any peripheral cables and interface modules to the board.
4. Configure the board with the command.
# cfgadm -v -c configure sysctrl0:slot <number>
Note – This command should both connect and configure the
receptacle.
5. Verify with the cfgadm command.
The states and conditions for a connected and configured
attachment point should be
▼ Connected, Configured, OK
Now the system is also aware of the usable devices that reside on
the board and all devices that can be mounted or configured to be
used.
Dynamic Reconfiguration B-335Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
If the command fails to connect and configure the board and slot,
try the connection and configuration as separate steps:
a. Connect the board and slot by typing the following:
# cfgadm -v -c connect sysctrl0:slot <number>
The states and conditions for a connected attachment point
should be:
▼ Connected, Unconfigured, OK
Now the system is aware of the board, but not the usable
devices which reside on the board. Temperature is monitored
and power and cooling affect the attachment point condition.
b. Configure the board and slot by typing the following:
# cfgadm -v -c configure sysctrl0:slot <number>
The states and conditions for a configured attachment point
should be:
▼ Connected, Configured, OK
Now the system is also aware of the usable devices that reside
on the board and all devices that can be mounted or
configured.
6. Reconfigure the devices on the board.
# drvconfig; devlinks; disks; ports; tapes;
Reconfiguring the system normally falls under one or more of the
following categories:
● Board removal – If you remove a board that is not to be replaced,
you can (but do not have to) execute the reconfiguration sequence
to clean up the /dev links for disk devices.
● Board change – If you remove a board and then insert it into a
different slot, or replace a board with another board that has
different I/O devices, you must execute the reconfiguration
sequence to configure the I/O devices associated with the board.
B-336 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
● Board installation – When adding a board, you must execute the
reconfiguration sequence to configure the I/O devices associated
with the board.
● Board replacement – If you replace a board with another board
that hosts the same set of I/O devices, inserting the replacement
into the same slot, you might not need to execute the
reconfiguration sequence.
The console should display a list of devices and their addresses.
7. Activate the devices on the board using commands, such as mountand ifconfig , as appropriate.
Dynamic Reconfiguration B-337Copyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E
B
Procedures - Installing a Replacement I/O Board
This procedure assumes that you have previously performed the
Removing an I/O Board procedure discussed earlier in this module.
1. If you are not continuing from the procedure Removing an I/O
Board, use the cfgadm command and select a card cage slot to use,
but do not insert the board yet.
2. View the configuration list and verify that the slot is unconfigured.
# cfgadm
3. Insert the board in the slot and look for an acknowledgment on the
console, such as:
Name board inserted into slot<number>.
4. Use the cfgadm command again to look for the system name
assigned to the new board.
5. Configure the board using the system name for the board.
# cfgadm -c configure sysctrl0:slot <number>
6. Configure any I/O devices on the board using commands, such as
drvconfig and devlinks , as appropriate.
7. Activate the devices on the board using commands, such as mountand ifconfig , as appropriate.
B-338 Sun Enterprise Server MaintenanceCopyright 2001 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services June 2001, rev E