E.S. 3par Basic Training

24
August 2021 by Carlo Curiale, SME E.S. 3par Basic Training

Transcript of E.S. 3par Basic Training

Page 1: E.S. 3par Basic Training

August 2021

by Carlo Curiale, SME

E.S. 3par Basic Training

Page 2: E.S. 3par Basic Training

OVERVIEW

2

This is an introductory training course on HP 3PAR systems currently supported by Park Place Technologies. By the end of this course, you should be able to:

Identify the different systems

Understand logs to be requested

Read and interpret hard drive logs

Read alerts generated by REM and ParkView

Determine correct hard drive part numbers

Page 3: E.S. 3par Basic Training

TABLE OF CONTENTS

• What is a 3par?

• Terminology of Important Hardware Parts and Components

• Terminology of Important Logical Components

• 3PAR Model Naming Conventions

• Replacing 3PAR Components

• Basic Logs to Request

• Hard Drive Logs to Request

• How to Read Log Output

• How to Interpret 3par Alerts

• How to Interpret ParkView Alerts

• How to Determine Drive Part Number

• Action Plan Example

Page 4: E.S. 3par Basic Training

WHAT IS A 3PAR?

4

A 3par is a disk-based enterprise-class storage array built around a controller/shelf design. Built with the ability to customize the level of redundancy desired by the end user, 3par arrays are highly resilient, scalable, and are built to withstand multiple component failures. The 3par brand was purchased by HP in 2010 and has since been developed into several different 3par product tiers.

7000/8000 series

10000(V-Class)

Page 5: E.S. 3par Basic Training

5

TERMINOLOGY OF IMPORTANT HARDWARE PARTS AND COMPONENTS

Disk

A disk, or drive, is the individual physical device where data is stored on the array. Backline will typically refer to the physical location of a disk by its cage position, notated as <cage>:<magazine>:<slot> (ex. – 3:4:2)

Magazine

Equivalent to a drive “sled”, a disk magazine holds disks within the drive cage. Depending on the 3Par model, magazines hold either one or four disks

Cage

Equivalent to a disk “shelf”, a cage contains the magazines that contain the drives for the array.

Node

Equivalent to a “controller”, nodes run the operating system that control and manage the array. The nodes also serve as the interface between the storage array and any hosts that attach to the array, and they also act as an interface between enclosures. A basic 3Par array will contain at least two redundant nodes but can contain up to eight depending on the model.

Service Processor (SP)

A service processor (abbreviated as SP) is a 1U server that sits in the same rack as the 3Par. Customers can optionally opt to use a virtual SP instead of a physical SP. The SP is responsible for monitoring the array and sending alerts and notifications for it, as well as providing a route for CLI access to the array itself. The SP is standalone and is not required for normal array management and has no impact on array I/O should it fail.

T-Classand

10000(V-Class)

7000/8000LFF

7000/8000SFF

F-Class

0123

Magazine

Drives

Magazine

Cage

Page 6: E.S. 3par Basic Training

6

TERMINOLOGY OF IMPORTANT LOGICAL COMPONENTS

Physical Disk (PD) ID

A physical disk ID (PD ID) is a logical ID number that is dynamically assigned to each unique disk that has been installed in a 3Par. A PD ID number does not indicate any physical location of a disk within an array, unlike the cage position.

Chunklets and Logical Disks (LDs)

A chunklet is the smallest measurable unit in which data is stored on a 3Par. A single chunklet represents either 256MB or 1GB of actual data, depending on the 3Par model. Chunklets get stored and dynamically relocated around all PDs in the array and make up groups of logical disks (or LDs), which make up virtual volumes.

Virtual Volumes (VVs) and Common Provisioning Groups (CPGs)

Virtual Volumes (or VVs) are the larger blocks of storage that are connected to and used by hosts. Virtual volumes are stored within common provisioning groups (or CPGs), which are the equivalent of storage “pools” that make up the physical disk space that store data.

Page 7: E.S. 3par Basic Training

3PAR MODEL NAMING CONVENTIONSThe Model number of the 3PAR StoreServ system can give an idea of the configuration.

The first digit is the Series:

F-Class, T-Class, 10000(V-Class), 7000, 8000, 20000Note: All 20000 series will be handled by Backline AEG as it is a new product

The second digit is the number of Nodes:

2-8, depending on the Series(Note: A 4 or 8 Node system can be initially configured with fewer Nodes for future expansion, so a 7400 can possibly contain 2 Nodesinstead of 4)

The third digit specifies the type of Drives :

0 all spinning, 4 mix of SSD and spinning, 5 all SSD(Note: This is used more commonly on 7000/8000 and newer, not asmuch on F, T, and V-Class)

The fourth digit is always 0

7000 Series These have Converged versions, indicated with a “c” at the end of the model number

Model: 8000 Nodes: 4 Drive type: all SSD drives

7440c

Model: 7000 Nodes: 4 Drive type: mixed Converged

T800

Model: T-Class Nodes: 8 Drive type: all spinning drives

Nodes: 8 Drive type: all spinning drives

V80010800

Model: 10000

8450

V-Class

Page 8: E.S. 3par Basic Training

8

REPLACING 3PAR COMPONENTS

There are few hot swappable parts on a 3PAR system. Most parts are hot pluggable and require a user log into the Service Processor in order to run pre-checks and to prepare the system for service through CLI. The commands differ depending on the type of system being serviced. Post-checks are also done to ensure the service was completed properly. This is especially true when replacing hard drives.

F, T, and (10000)V-Class drives are hot pluggable. The T-Class and (10000)V-Class drive cage contains magazines with 4 drives in each magazine. In order to remove the magazine to replace the failed drive a “servicemag” command must be run in order to prepare the magazine.

This involves vacating data for the drives into spare space, which the system will then relocate to the 4 healthy drives once the failed drive is replaced and the “servicemag” is resumed. There are no spare drives, just spare space dedicated on each drive.

***Best practices limit to only preparing and replacing 2 drives per visit.

7000 and 8000 series systems contain hot swappable drives and multiple drives can be replaced once the required checks are done to ensure drives are ready for replacement.

Page 9: E.S. 3par Basic Training

BASIC LOGS TO REQUEST

Zendesk Macro Logs

Basic System Info

showsysDisplays basic system info such as system type and serial number

showversion -a -bDisplays system OS levels

showinventoryDisplays system installed components

showalert -nShows new system alerts

checkhealthDisplays basic health information of the system

Batteries

showbatteryDisplays basic battery info for all installed batteries

shownodeenvDisplays system environmental conditions

Page 10: E.S. 3par Basic Training

BASIC LOGS TO REQUEST CONT.

Cage IO Modules or Degraded Cages

showcage -iDisplays cage installed components

showcage -dDisplays detailed cage information

showalert -nDisplays new system alerts

Failed Node

shownodeDisplays node status

showeepromDisplays last known boot status

showeeprom -dead <node#>Displays last known boot status of failed node

showalert -allDisplays all system alerts

Page 11: E.S. 3par Basic Training

11

BASIC LOGS TO REQUEST CONT.

Power Supply

shownode -ps (for node PSU issues)

Displays node power supplies

showcage -d cage<#> (for cage PSU issues)

Displays cage power supplies

showcage -d

Request if cage# is not known

InSplore

This will only be requested by Backline AEG if needed, file is very large

Page 12: E.S. 3par Basic Training

HARD DRIVE LOGS TO REQUEST

Engineering Support’s primary focus will be gathering and reading hard drive logs to create Action Plans. All other issues will go to Backline AEG once logs are received.Drives must be in a Failed status, not Degraded for replacement.

The required Zendesk Macro logs to request for hard drive failures are as follows:

Disk Failure

showsysshowpd -failed -degraded -iservicemag status showversionshowcage

showsysProvides system information (serial number, system name, system type)

showpd -failed -degraded -iShows the currently failed and degraded drives as well as the model number needed to determine drive part numberDrive must be in a Failed status for replacement

servicemag status Shows if the drive has been prepped and is ready for replacement. For 7000/8000 series the system almost always automatically runs the servicemag to prep the failed drive for replacement.

Page 13: E.S. 3par Basic Training

HARD DRIVE LOGS TO REQUEST CONT.

showversion

Shows the current OS version in order to determine alternate part numbers if necessary. Always engage Backline if an alternate part is requested.

showcageThis will be used to determine if cage is LFF(Large Form Factor) or SFF(Small Form Factor) Only required for 7000 and 8000 systems

Page 14: E.S. 3par Basic Training

HOW TO READ LOG OUTPUT

The following is a breakdown of the information once logs are received.

showsys

cli% showsys --------------(MB)----------------ID ------Name------ ----Model---- -Serial- Nodes Master ClusterLED TotalCap AllocCap FreeCap FailedCap

25148 ParkPlaceMRO7400 HPE_3PAR 7400 1625148 2 0 Off 1671168 1154048 517120 0

showpd -failed -degraded -i

cli% showpd -failed -degraded -iId CagePos State ----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev- Protocol MediaType -----AdmissionTime-----47 2:1:3 failed 2000B452539B56AA SEAGATE SEGLE0600GBFC15K 6SL5L89E 3P01 FC Magnetic 2013-07-08 10:49:03 MST

Here we can see that PDID 47 has failed. Drive must be in a failed status for replacement. The location is shown under CagePos, 2:1:3 (Cage 2, Magazine 1, Drive 3) If there is a “?” next to the failed drive contact BacklineNote: For 7000/8000 series the drive location will always be 0, 3par considers the magazine location as the driveModel of the drive is SEGLE0600GBFC15K, this is what is used to determine drive part number neededSerial number of the drive is 6SL5L89E

servicemag status

cli% servicemag statusNo servicemag operations logged

This drive has not been vacated and is NOT ready for replacement.

Page 15: E.S. 3par Basic Training

HOW TO READ LOG OUTPUT CONT.

cli% servicemag statusCage 14, magazine 0:The magazine was successfully brought offline by a servicemag start command.The command completed at Tue Jul 6 16:02:44 2021.servicemag start -wait -pdid 272 -- Succeeded

For most F-Class, T-Class and 10000(V-Class) servicemag will not be ran until customer prompts the system, or in the few cases that AEG has remote access to the system. The line servicemag start -wait -pdid <pdid#> -- Succeeded means the drive is vacated and ready to replace.

For 7000/8000 systems the servicemag will automatically be ran when a drive failure occurs

showversion

cli% showversionRelease version 3.2.2 (MU6)Patches: P99,P107,P119,P131,P135,P139,P149,P154,P160,P162,P165,P167

Component Name VersionCLI Server 3.2.2 (P165)CLI Client 3.2.2System Manager 3.2.2 (P165)Kernel 3.2.2 (MU6)TPD Kernel Code 3.2.2 (MU6)TPD Kernel Patch 3.2.2 (P165)

This will only be used in instances where an alternate drive part number is requested and will need to move to Backline to determine compatibility

Page 16: E.S. 3par Basic Training

HOW TO READ LOG OUTPUT CONT.

showcage

cli% showcageId Name LoopA Pos.A LoopB Pos.B Drives Temp RevA RevB Model FormFactor0 cage0 1:0:1 0 0:0:1 0 4 26-27 4082 4082 DCN1 SFF

Use the cage number containing the failed disk provided in the showpd -failed -degraded -i output to determine whether drive needed is LFF(Large Form Factor) or SFF(Small Form Factor)

Example: Drive in 0:3:0 is failed, so we will search the output from the command to determine the form factor size of cage 0.(cage 0, mag 3, slot 0 Remember, drive slot will always be 0 in 7000 and 8000 series systems.)

Page 17: E.S. 3par Basic Training

HOW TO INTERPRET 3PAR ALERTS

The alert shown was generated from the 3par system. The information highlighted is only enough to state when the alert was generated and the physical location of the failed drive.

There is not enough information provided to let us know the drive type, as well as if the drive is ready to be replaced.

Thus, the hard drive logs requested and provided by the customer or ParkView alert will show all the necessary information needed to determine the drive part number and the state of the drive.

Alert example

1203986: 3PAR-1203986-V:(Major)PD213|comp_sw_cage_sled285885908123649 COMP_STACustomer notification from HP SP12927 Realtime Alert Process

Notification id: P21537Notify time: 2020/03/29 01:04:40.00 (User, -0600 MDT)Installed machine: 3PAR INSERV 1203986Site: 1, Customer

Event urgency: alertEvent count: 1Event location: SiteEvent time: 2020/03/29 00:04:40.00 (-0500 CDT)Event description: 3PAR INSERV Component state change

Abstract:(Major)PD213|comp_sw_cage_sled285885908123649 COMP_STATE failed. Physical Disk v

Text:Event id: 86388062 Node 1 Cust Alert - Yes, Svc Alert - Yes Severity: Major

Event time: Sun Mar 29 00:04:40 2020Event type: Component state change Alert ID: 998 MsgID: 600faComponent: Physical Disk 119 Magazine 285885908123649Short Dsc: Magazine 3:0:3, Physical Disk 119 FailedEvent String: Magazine 3:0:3, Physical Disk 119 Failed (Vacated {0x45}, InvalidMedia {0x98}, Failed Hardware {0x99})

TPD level for InServ 1203986 is 3.1.2.592

Page 18: E.S. 3par Basic Training

HOW TO INTERPRET PARKVIEW ALERTS

The alert shown was generated by ParkView. The alert contains enough information to determine the drive location, the model of the drive and the drive failed status.

The drive PDID is highlighted at the end of the PATROL Object ID line to show how to locate it.

With this information it is not necessary to ask the customer for logs, once the drive part number is identified the Action Plan can be created.

Authorized by: user bu_centralpark_prod Creation Date: 2021-07-20 1:45:47 PM Physical Disk problem on 192.168.2.152 with 3:1:0 (HITACHI - 900 GB). This physical disk is in critical/unrecoverable state. Reported status: Error.

Hardware Health Report (Tue Jul 20 13:42:42 2021)======================

Monitored Object : 3:1:0 (HITACHI - 900 GB)Type : Physical DiskOn Host : cd7d9d33-4f0a-e711-ad88-00155d059602 (192.168.2.152)On TrueSight Device: cd7d9d33-4f0a-e711-ad88-00155d059602 PDID PATROL Object ID : /MS_HW_PHYSICALDISK/MS_HW_HP3PARhdfcd7d9d33-4f0a-e711-ad88-00155d059602_5000CCA0579897B3-id-56Internal Device ID : 5000CCA0579897B3-id-56Connector Used : MS_HW_HP3PAR.hdfModel : HCBRE0900GBAS10KSerial Number : KXJPXJMXSize : 900 GB

This Object Is Attached to:

Storage: cage3 (HP M6710/7000 Encl)Type: EnclosureSerial Number: ECMCBA1TF3U78LAlternative Part Number: QR490ASerial Number: MXN3045188

cd7d9d33-4f0a-e711-ad88-00155d059602

============================================================Parameter: Status (Currently in ALARM State)------------------------------------------------------------Current Value: 2 (Failed) - ErrorUnit : 0 = OK ; 1 = Degraded ; 2 = Failed

Page 19: E.S. 3par Basic Training

HOW TO DETERMINE DRIVE PART NUMBERUsing the provided log output or the ParkView alert, locate and copy the drive model number.

cli% showpd -failed -degraded -iId CagePos State ----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev- Protocol MediaType -----AdmissionTime-----56 3:1:0 failed 2000B452539B56AA HITACHI HCBRE0900GBAS10K KXJPXJMX 3P01 FC Magnetic 2013-07-08 10:49:03 MST

ParkView Alert

Monitored Object : 3:1:0 (HITACHI - 900 GB)Type : Physical DiskOn Host : cd7d9d33-4f0a-e711-ad88-00155d059602 (192.168.2.152)On TrueSight Device: cd7d9d33-4f0a-e711-ad88-00155d059602PATROL Object ID : /MS_HW_PHYSICALDISK/MS_HW_HP3PARhdfcd7d9d33-4f0a-e711-ad88-00155d059602_5000CCA0579897B3-id-56Internal Device ID : 5000CCA0579897B3-id-56Connector Used : MS_HW_HP3PAR.hdfModel : HCBRE0900GBAS10KSerial Number : KXJPXJMXSize : 900 GB

Use the {3PAR} Drive Parts Matrix for Engineering Support - KB2200011 to locate the correct drive.

Page 20: E.S. 3par Basic Training

HOW TO DETERMINE DRIVE PART NUMBER CONT.

Navigate to the correct system type in the Drive Parts Matrix for Engineering Support, then use the drive model number copied from the logs or ParkView alert. Here we can see the required drive part number for HCBRE0900GBAS10K is 697389-001.

It is important to verify the drive model is being looked up under the correct system tab as the same drive model number can be used for other 3par systems, but the drives will not be compatible among the different systems. Using drives from different systems can cause some systems to crash.

It is also important to note the whether the cage is SFF or LFF. As we can see with drive model HSSC0480S5xnNMRI, it is offered in both versions. The showcage log requested will state form factor of the cage containing the failed drive.

Lastly, drives are replaced like for like. If the model number is not listed in the Drive Parts Matrix for Engineering Support, or if there are questions about using an alternate part number, Backline must be engaged to determine compatibility.

Page 21: E.S. 3par Basic Training

ACTION PLAN EXAMPLE

HP 3Par T and V(10000) Class Failed Disk Drive - Action Plan

***** ACTION PLAN *****Created by Team: Engineering Support

*** Current Issue ***Failed Disk Drive ID 236 4:7:3

Evidence of the issue:Customer provided logs

*** Resolution steps ***

Parts needed: Qty=1 657889-001

Detailed steps to resolve:Replace Failed Physical Disk ID 236 Cage 4 Mag 7 Disk 3 SN 6SL5L89E

If there are MULTIPLE disk failures, please contact PPT 3Par support for review before running any servicemags (or advising the customer to start any servicemags)

1. Confirm with Customer the servicemag has been run and completed, and the disk is ready for replacement.

**Servicemag MUST be run and confirmed completed prior to disk replacement**

To check the status to see if a servicemag process has been initiated:

servicemag status

Have customer run the following commands to initiate the servicemag procedure if not done so already (this process can take multiple hours to complete):

servicemag start -pdid 236

Using the output gathered, we can then create the Action Plan. The AP template will consist of filling in the blanks with the information.

From the logs or alert we have determined the failed drive is on a 10000(V-Class).

The failed drive PDID is 236

The location of the failure is 4:7:3

The model number is SEGLE0600GBFC15K

The serial number is 6SL5L89E

Servicemag has not been ran

Part number is 657889-001 as determined using the model number and searching the 10000(V_Class) tab in the Drive Parts Matrix for Engineering Support

Page 22: E.S. 3par Basic Training

ACTION PLAN EXAMPLE CONT.

To monitor the status and confirm the servicemag has completed successfully:

servicemag status -d

2.Once parts are available and disk is ready, FE schedules access.3. Bring internet capable laptop with usb-serial dongle4. Connect to the Service Processor using the red crossover cable into the laptop nic port and INT port on the SP5. Set laptop IP to: 10.255.155.49 Subnet: 255.255.255.2486. Browse to: 10.255.155.547. Login with username: spvar8. Passwords are version dependent (contact support) 9. On SPOCC homepage, Click on “Support” in left hand column10. Under “Action” Click on “Guided Maintenance”11. Under “Drive Cage” Click on “Disk Drive”12. Follow prompts to replace the disk and initiate the resume rebuild13. If the initial output shows more than one failed disk contact support14. The original PD# will still show as failed until the rebuild completes, it will then be automatically removed from the system reporting console.

Special instructions: Servicemag MUST be run and confirmed completed prior to disk replacement

Supporting documents:3Par - Connecting to the Service Processorhttp://centralpark/Service%20Delivery/Knowledge%20Base/3Par/Service%20Guides/3Par%20-%20Connecting%20to%20the%20Service%20Processor.pdf

3Par T and V-Class FE Reference Slides - Diskshttp://centralpark/Service%20Delivery/Knowledge%20Base/3Par/Service%20Guides/3Par%20T%20and%20V-Class%20FE%20Reference%20Slides%20-%20Disks.pdf

*** Timeline ***

Next steps: ES to assign FE, FE to order part and schedule with the customerNext owner: Engineereing Support

***** END OF ACTION PLAN *****

Page 23: E.S. 3par Basic Training

Thank You

23

Page 24: E.S. 3par Basic Training

KB LINK IN ZENDESK GUIDE

24

• https://parkplacetechhelp.zendesk.com/hc/en-us/articles/4404540969113