Basic Storage Networking Technology Certified Storage Engineer (Scse, s10-201)
Transcript of Basic Storage Networking Technology Certified Storage Engineer (Scse, s10-201)
SNIA Certified Storage Engineer (SCSE) book / studyguide (S10-201)Michael Boelen - rootkit.nl
Usage notes
Last updated 23 June 2009
Goal Provide a study guide for SNIA SCSE, S10-201
Audience Storage administrators and architects
License Creative Commons license
NotesAll information in this study guide is collected from books and internetsources. Although terms and data was checked before, information canincorrect or missing.This book is guide to collect information about SAN and NAS technology andas preparation guide for the S10-201 exam.This book is a work in progress.Suggestions or input are appreciated (Contact form).
Progress:Stage 1: Initial writing 100%
Stage 2: Markup 70%
Stage 3: Extend information 1%
Stage:1 2 3
1. Explain and recognize basic Storage Networking Technology Componentsand Concepts (9%)
1.1 Compare and contrast how the disk technologies of Fibre Channel, ATA, SATA, SCSI, and SASoperate
ATA (IDE)Also known as parallel ATA (PATA)
8 or 16-bits interfaceMaximum theoretical speed 100MB/s (ATA-6)
Fibre Channel:A 24-bit address consists of the following 3 parts (in order): Domain (1-239), Area (0-255) and Node Address (theAL_PA)8 Bit Domain ID, 8 Bit Area ID, and 8 Bit Port ID
• Domain The domain is a unique number assigned to each switch in a logical fabric. A domain ID assigned to aswitch can range from 1 to 239. This number comprises the first 8 bits of the FCID.
• Area -The 8-bit area field is assigned by the switch as well. It can range from 0 to 256. In some third-partyswitches this number is assigned by using the physical port number (that is, port 3 out of 16 ports), limitingavailability on some operating systems. The Cisco MDS assigns these sequentially regardless of the physical portnumber.
• Port -The port field is also 8 bits ranging from 0 to 256. This field is unique in that it also is used to assign thearbitrated loop physical address (ALPA) for devices that use loop. In the context of a device that is not usingarbitrated loop, it is common to see the field set to 0, although this is not required.http://www.cisco.com/en/US/prod/collateral/ps4159/ps6409/ps4358/prod_white_paper0900aecd80285738_ns512_Networking_Solutions_White_Paper.html
SAS (Serial Attached SCSI):
Max 128 devices (first generation), max 256 devices (second generation)Max 3 Gb/s, will be 6 Gb/s in near futureHot-pluggable
SAS devices can communicate with both SATA and SCSI devices (the backplanes of SAS devices are identical toSATA devices). A specific difference between SCSI and SAS devices is the addition in SAS devices of two data ports,each of which resides in a different SAS domain. This enables to use redudancy (failover possibility). If one pathfails, there is still communication along a separate and independent path.
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
1 sur 17 14/01/2012 18:37
SATA (Serial ATA):
Serial linkCurrent standard maximum 6 Gbit/s speedMost disks currently can't saturate the 1.5 Gbit/sUses native command queuing to deal with incoming actions7-pins connector for data, 15-pins connector for power
When converting SAS to SATA use an adapter or cableExample: http://www.cs-electronics.com/sas-products.htm
SCSI
ParallelUp to 320 MB/s (Ultra-320 SCSI) or even 640 MB/s (Ultra-640 SCSI)
Define differences between serial and parallel approaches within a configuration
PATA: Master/Slave, shared busSATA: Serial ATA, point-to-point topology, no shared bus
Parallel technologies have disadvantages like skewing (bits don't arrive at the same time)Serial approaches use often 8b/10b encoding to avoid skewing issues which parallel solutions have. The 2 extra bitsare also used for:
Clock recoveryDC balanceSpecial characters (localization)Error detection
SAS expander : forwarding
http://www.freebsd.org/doc/en/articles/storage-devices/scsi.htmlhttp://www.storagereview.com/articles/200406/20040625TCQ_1.html? page=0%2C4http://support.dell.com/support/edocs/storage/p62517/en/chapterb.htm
Related terms
Tagged Command Queuing (TCQ)Technology built into some ATA and SCSI hard drives. It allows the operating system to queue up multiple read andwrite requests to a hard drive at the same time. This helps the system to optimize the order in which it can executeread and write commands, without having the operating system to take care of the queuing.SCSI tagged command queuing (TCQ) applies to the device, device controller, firmware, device driver
Native Command QueuingIt's a more intelligent queuing mechanism than TCQ. It works by incorporating queuing into the disk, devicecontroller, firmware and device driver (operating system). All these parts work together to achieve a maximumeffiency.See NCQhttp://www.wdc.com/en/library/sata/2579-001076.pdf
1.2 Describe Array Technology/Virtualization
Goal: Hiding real disks from application Virtualization knows several layers, including:Host: Application, HBA, OSNetwork: Switch, Router, GatewayStorage: Array, Library, Device
File/Record virtualization: one or more objects are visible as oneFile system virtualization: combining multiple data sources to one big chunkTape media: better utilization of tape drives
Pro's of virtualization:
Backup & RestoreClusteringSnapshotsReplicationMigrationTransformationCachingSecurityQuality of Storage Services & PoliciesPooling
Describe virtualization implementation techniques and management strategies (e.g., in-band and
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
2 sur 17 14/01/2012 18:37
out-of-band)
host-based:
storage-based: main reasons for segmentation and security. Segmentation/virtualization helps in performingupgrades, migrating data etc.
Switch-based virtualization (in-band / out-of-band):
in-band: control and data travel the same path. Pro's are easier installation (no specific software required),offloading and performance optimizations in data path possible.out-of-band: control and data have their own path
1.3 Define SAS and SATA technology
See 1.1SATA: using Native Command Queueing.See http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1131788,00.html
SAS devices cannot plug into SATA controllersSATA devices can plug into SAS controllers
Identify a legal vs. illegal SAS topology layout
Legal topology:
Directly attached to initiatorAttached to expander
Illegal topology:
More than one fan-out expander per SAS domain
Explain the routing mechanism that occurs in a SAS expander topology
Direct routing: SAS host to directly attached devicesTable routing: SAS host to other expander devicesSubtractive routing: forward unresolved connection requests when neither direct nor table routing succeeds
Fan-out expanders
Never uses subtractive routing, but table routing instead. Usually fan-out expanders have a bigger routingtableMaximum of one fan-out expander in a SAS domainOften at the top of the chain
Edge expanders
May use subtractive routing.Subtractive routing happens upstream (to other expanders) and direct routing downstream.
2. Perform Storage Networking Administration (24%)
2.1 Optimize redundancy within a switched environment; adapt to changing needs and demands
Use multi pathing software that supports both load balancing and path fail over. Red Hat Linux (and others as well)has device mapper multi path, Solaris 10 has XPATH and IRIX has XVM. Another pro can be upgrading firmware,without disruption of the service. This can be achieved by using multiple paths to a target and disable one pathtemporarily.
2.2 Explain HBA configuration parameters; justify the reasons for each parameter setting
QueueDepthIf the number of outstanding I/Os per device is expected to be above 32, then QueueDepth needs to be increased.Usually the vendor of the storage and/or HBA's have documents describing how to adjust the value and how tomeasure the value with the best performance. Usually dividing the total of the storage array's queue lenght with theamount of HBA's. If QueueDepth is undersized, there can be a performance degradation due to Storport throttling ofits device queue.
I/O coalesce
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
3 sur 17 14/01/2012 18:37
IO coalesce controls the number of CPU interrupts, for more efficient CPU utilization. Turn on the I/O coalesceparameter in high-performance environments. However when adjusting the related parameters it's important to findthe most suitable values. Reducing the number of interrupts can cause poor performance. It depends mainly on theworkload.
CoalesceMsCnt is the count in milliseconds, CoalesceRspCntis the count of pending responses.
ConnectionOption CO 0-3 See note 1 below DataRate DR 0-3 See note 2 below FrameSize FR 512,1024,2048 HardLoopID HD 0-125 ResetDelay RD 0-255 EnableBIOS EB 0,1 See note 3,6 below EnableHardLoopID HL 0,1 See note 3 below EnableFCPErrRecovery EF 0,1 See note 3 below ExecutionThrottle ET 1-65535 See note 5 below EnableExtendedLogging EL 0,1 See note 3,4 below LoginReTryCount LR 0-255 EnableLipReset LP 0,1 See note 5 below PortDownRetryCount PD 0-255 EnableLIPFullLogin FL 0,1 See note 3 below LinkDownTimeOut LT 0-240 EnableTargetReset TR 0,1 See note 3,5 below MaximumLUNsPerTarget ML 0,8,16,32,64,128,256 See note 5 below LinkDownError LD 0,1 See note 3,5 below FastErrorReporting FE 0,1 See note 3,5 below
Parameter Qlogic default setting EMC-approved settingData Rate 0 (1 Gb/s) 2 (AutoSelect)Execution Throttle 16 256Connection options(topology)
2 (Loop preferred, otherwise point-to-point)
2 (Loop preferred, otherwise point-to-point)
Loop Reset Delay 5 5Enable LIP Full Login Yes YesEnable Target Reset No YesPort Down Retry Count 8 45Link Down Timeout 30 45LUNs Per Target 8 256Adapter Hard Loop ID Enabled DisabledHard Loop ID 0 0Descending Search LoopID 0 1Operation Mode 0 0Interrupt Delay Times 0 0Enable Interrupt (24xxHBAs) No No
Execution Throttle:Specifies the maximum number of I/O commands allowed to execute on a HBA port. When a port’s executionthrottle is reached, no new commands are executed until the current command finishes256 1–256 Windows Frame Size Specifies the size of a Fibre Channel frame per I/O. 2048 512–2048 All FibreChannel Data Rate Specifies the HBA adapter data rate. When set to Auto, the adapter auto-negotiates the data ratewith the connecting SAN device. Auto 1 (Auto), 2 (1Gb), 3 (2Gb), 4 (4Gb) All Maximum Queue Depth Specifies themaximum number of I/O commands allowed to execute/queue on a HBA port. 32 1-65535 VMware ESX MaximumScatter Gather List Size Specifies the size of the list of DMA items that are reported to SCSI mid-level per I/Orequest. 32 1-255 VMware ESX Maximum Sectors Specifies the maximum number of disk sectors that are reportedto SCSI mid-Level per I/O request. 512 512, 1024, 2048 VMware ESX
2.3 Define troubleshooting methodologies and tools within scenarios
SAN zoning problems cause the majority of issues. Common problems are:
Missing targets from the host zoneHost zone configured to see the wrong targetsIncorrect WWN alias(es) resulting from new or replaced hardwareNew zone(s) not added to the active configuration
Switch zoning modifications are the most common change that occurs in a SAN, which explains the increasedchance for mistakes. Also, there is also no way to automate zoning since it requires human decisions to determineinitiator and target accessibility.Host HBA issues occur almost as frequently as SAN zoning problems.Disk zoning / lun masking provide another layer of manual configuration that can lead to problems.FC cabling problemsUse a clear naming and cable convention to avoid problems and speeds up debugging issues.
Explain reasons to add or remove Inter Switch Links (ISLs)
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
4 sur 17 14/01/2012 18:37
Adding and removing ISLs is the result of connecting or disconnecting E-ports (Expansion port).Reasons:
Load sharingFail over
Connecting fabrics, increasing throughput. Or adding links to an existing ISL trunk.
Analyze port log-in, fabric log-in and process log-in
Fabric Login (FLOGI):Login after connecting to a fabric switch.Related ports: F_port to N_Port (or NL_Port)Related information:
WWNS_IDProtocolFibre ClassZoning
Port Login (PLOGI):Two node ports establish a connection between (often fibre channel HBA connection to a switch).Related ports: N_port to N_portRelated information:
WWNS_IDULPFibre ClassBB Credit
Process Login (PRLI):Process login is used to set up the environment between related processes on an originating N_Port and aresponding N_Port.Related ports: ULP( scsi-3 to scsi-3)Related information:
LUN
Isolate bandwidth issues and errors related to time outs
Bandwidth issues are often found on the ISLs, where paths are coming together.Monitoring of the bandwidth usage is important in tracing the source of these kind of problems.
Common symptons:One of the symptoms to this kind of problems are SCSI time out errors.
Identify process to add a configured switch to an existing fabric
Brocade:
Clear configuration (configDefault or cfgClear)Copy configuration from another switch (or backup)Save configuration (cfgSave)
Set time out values, buffer-to-buffer settings
Configure network parameters
Configure fabric parameters (BB Credit, R_A_TOV , E_D_TOV, switch PID format, Domain ID)Enable/Disable portsConfigure port speeds
Configure Zoning
BB Credit Configure the number of buffers that are available to attached devices for framereceipt default 16. Values range 1-16.
R_A_TOV Resource allocation time out value. This works with the E_D_TOV to determineswitch actions when presented with an error condition
E_D_TOV Error detect time out value. This timer is used to flag potential error conditionwhen an expected response is not received within the set time
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
5 sur 17 14/01/2012 18:37
Set communications mode between two fabrics
Brocade switches: interopmode set to 1 to talk to other vendors (note: it needs to be enabled on all switches withinthe fabric)M-EOS switches: use “open” modeNotes:
According to the documentation the domain ID must be between 97..127 for interoperability (depending onmode and vendor)
Changes after activation of interoperability mode: Switch Feature
Changes if Interoperability Is Enabled
Domain IDs = Some vendors cannot use the full range of 239 domains within a fabric.
For example in with McData switches domain IDs are restricted to the range 97-127. This is to accommodateMcData's nominal restriction to this same range. They can either be set up statically (the Cisco MDS switch acceptonly one domain ID, if it does not get that domain ID it isolates itself from the fabric) or preferred. (If it does notget its requested domain ID, it accepts any assigned domain ID.)
Timers
All Fibre Channel timers must be the same on all switches as these values are exchanged by E ports whenestablishing an ISL. The timers are F_S_TOV, D_S_TOV, E_D_TOV, and R_A_TOV.
F_S_TOV
Verify that the Fabric Stability Time Out Value timers match exactly.
D_S_TOV
Verify that the Distributed Services Time Out Value timers match exactly.
E_D_TOV
Verify that the Error Detect Time Out Value timers match exactly.R_A_TOV: Verify that the Resource Allocation Time Out Value timers match exactly.
Trunking
Trunking is not supported between two different vendor's switches. This feature may be disabled on a per port orper switch basis.
Default zone
The default zone behavior of permit (all nodes can see all other nodes) or deny (all nodes are isolated when notexplicitly placed in a zone) may change.
Zoning attributes
Zones may be limited to the pWWN and other proprietary zoning methods (physical port number) may beeliminated.
Note Brocade uses the cfgsave command to save fabric-wide zoning configuration. This command does not haveany effect on Cisco MDS 9000 Family switches if they are part of the same fabric. You must explicitly save theconfiguration on each switch in the Cisco MDS 9000 Family.
Zone propagation
Some vendors do not pass the full zone configuration to other switches, only the active zone set gets passed.
Verify that the active zone set or zone configuration has correctly propagated to the other switches in the fabric.
VSAN
Interop mode only affects the specified VSAN.
TE ports and PortChannels
TE ports and PortChannels cannot be used to connect Cisco MDS to non-Cisco MDS switches. Only E ports can beused to connect to non-Cisco MDS switches. TE ports and PortChannels can still be used to connect an Cisco MDS toother Cisco MDS switches even when in interop mode.
FSPF
The routing of frames within the fabric is not changed by the introduction of interop mode. The switch continues touse src-id, dst-id, and ox-id to load balance across multiple ISL links.
Domain reconfiguration disruptive
This is a switch-wide impacting event. Brocade and McData require the entire switch to be placed in offline modeand/or rebooted when changing domain IDs.
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
6 sur 17 14/01/2012 18:37
Domain reconfiguration nondisruptive
This event is limited to the affected VSAN. Only Cisco MDS 9000 Family switches have this capability—only thedomain manager process for the affected VSAN is restarted and not the entire switch.
Name server
Verify that all vendors have the correct values in their respective name server database.
IVR
IVR-enabled VSANs can be configured in any interop mode.Brocade's msplmgmtdeactivate command must explicitly be run prior to connecting from a Brocade switch to eitherCisco MDS 9000 Family switches or to McData switches. This command uses Brocade proprietary frames toexchange platform information, which Cisco MDS 9000 Family switches and McData switches do not understand.Rejecting these frames causes the common E ports to become isolated.
Validate interoperability among vendors
ARP can be an issue: two protocols:
FARPARP over FCP
FCIP can assist in combining hardware from several vendors
Validate domain IDs on switches
Each switch has an unique domain ID. A SAN permits up to 239 switches in a SAN and therefore allows 239 DomainIDs. Even when using separated fabrics, it's good practice to avoid using the same domain IDs to make merging offabrics in future a lot easier.
Connect switch to a fabric
Before connecting a switch, clear it's configuration first. Brocade:1. Login as root2. switchdisable3. cfgdisable4. cfgclear5. passwddefault6. portstatsclear7. portlogclear8. reboot9. configUpload
2.5 Identify results of ISL oversubscription
Common oversubscription ration: 7:1ISL ports should be monitored. A ISL port performing at 80% capacity could indicate possible oversubscription.
2.6 Create/configure and modify zone sets
BrocadeCreate initial Fabric configuration:
Switch1:admin>cfgcreate "Fabric1", "LinuxNode1Zone1"
Once the configuration is created, additional zones can be added with the cfgadd command:
Switch1:admin> cfgadd "Fabric1", "LinuxNode1Zone2"
Switch1:admin> cfgsave
Effective configuration: active set, loaded in memory. Can be saved with cfgSave.Defined configuration: saved set on flash, can be loaded with cfgEnable.
Implement zoning for single server and cluster applications
xxx
Create backup of zone database prior to zone modification
Brocade: configUpload (to FTP)
Configure zones within a redundant fabric
Important: First apply configuration change to fabric 1. When the change is successful it can be applied to fabric 2.
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
7 sur 17 14/01/2012 18:37
Explain how zone is stored and distributed throughout the fabric
A new switch will gain the configuration of an existing fabric.
Default zone membership includes all ports or WWNs that do not have a specific membership association. Accessbetween default zone members is controlled by the default zone policy.
Explain the possible zoning conflicts that cause fabric segmentation
Brocade switch: fabstatsshow (show reasons for fabric segmentation)Type mismatch: Occurs when the name of a zone object in one fabric is also used for a different type of zoneobject in the other fabric.
Example:Fabric A: alias: Mkt_Host 1,16Fabric B: zone: Mkt_Host 1,16
Content mismatch: Occurs when the name and type of a zone object in one fabric is also used in the other fabricbut the content or order is different.
Example:Fabric A: alias: Eng_Stor wwn1; wwn2Fabric B: alias: Eng_Stor wwn2; wwn1
Perform fabric merge without zoning conflict
Tips:
Clear device if it was part of another fabric
Brocade: Switches in a fabric will not merge unless the PID formats are exactly the sameDifferent time out values on E-ports can cause fabric segmentation
Segmentation errors can exist if a switch has a bigger zone database than the allowed maximum size. Usually theoldest/lightest switch determines how big the database can be within a fabric.Different VSAN's on both fabrics.
ACL/allow list on VSAN, blocking (valid) traffic.
The name of a zone in Fabric A should not be used for a different type of zone in Fabric B. For example, if youcreate a zone named myZone in Fabric A, you should not use the same name as an alias, zone configuration, orzoneset name in Fabric B. In this scenario, merging the fabrics will cause a zone type mismatch.
If an alias, zone, zoneset, or zone configuration name is the same on both Fabric A and Fabric B, but the contentbetween the two fabrics is different, the fabrics will not merge.Follow the following steps as you prepare to merge SAN fabrics:1. Check for conflicting Domain IDs on both fabrics before merging. Usually lowest WWN will get the principal role.2. Check for conflicting zone definitions before merging.3. Verify that the Fabric islands have the same feature licenses before merging.4. Verify that all switch parameters are compatible with the fabric before merging. 5. When possible, use the samehardware as much as possible.6. Merge the fabrics using one ISL at a time.
Explain instances of zone name clash
- Clash can happen when:- pWWN and FC ID are not unique between fabrics- Same zone name is used, but with different members or different order
Configure active zone sets
Zone set consists of one or more zones.
Often only one zone set can be active (SAN should be idle or shutdown to change configuration).
2.8 Identify best practices for storage allocation in Fibre Channel SAN
Adding storage to a new host
EMC:
Create raid poolBind LUNCreate storage poolRegister host
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
8 sur 17 14/01/2012 18:37
Present LUN to host
UpgradingEMC: Extend LUNNetApp: Extend volume or iSCSI LUN
3. Manage Storage Networks (21%)
3.1 Compare Storage Device Management to Storage Network Management
Discriminate among the components, characteristics and functions
Hub: older devices which send incoming data to all portsSwitch: common devices which have an increased throughput compared with hubs, due the point-to-pointconnection.Director: chassis with switch blades
Create volumes in NAS environment
NetApp:
Create aggregate and add disks to itCreate volumeConfigure characteristics of volume (minimal read-ahead, snapshots etc)
Contrast scalability issues between SAN and NAS
NAS: file based (commonly NFS/CIFS, sometimes iSCSI)SAN: block based (Fibre Channel, iSCSI)
SANs scale better, since they don't reach practical limits that easily/quickly. NAS filers have a maximum currentusers / data throughput, before additional filers have to be added.
NAS filers are usually easier to manage and provide an easy access to data for Unix and Windows clients viaNFS/CIFS.
Identify business context for NAS (e.g., email repository, content archiving)
NAS is often used for sharing documents, file stores, content archiving, email repositories, backups
Identify business context for SAN (e.g., database repository, data replication)
Storage with low latency demands like databases and OLTP. Also mass storage demands including data replication.
3.2 Describe Configuration Management Elements
xxx
Explain HBA Configuration Management Elements
xxx
Construct host-side configuration of HBAs
xxx
Identify Virtual HBA (e.g., iSCSI, VN Port)
Virtual HBA is a port within for example a virtual machine guest.VN port: Virtual Node port, connected to a virtual node (e.g. host or storage device).
Define OS-based technology concepts
xxx
3.3 Explain Change Management Process (ITIL)
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
9 sur 17 14/01/2012 18:37
Identify steps needed to bring environment back to a controlled situation (e.g., host is swapped outor a device is changed)
xxx
Implementing decommission of hardware (e.g., classify information to understand proper disposalmethods, erasure of passwords, configs and zone sets, disk, tape, and data
Cisco devices: clear zone database (clears zone information of VSAN)Passwords: clear passwordsConfigs: clear configuration before reusing or throwing hardware away.Zone sets: xxxDisk: xxxTape:Remove from catalog (remove or 'expire' the tape media) and use the company's disposal method.
3.4 Optimize redundancy within a switched environment
At least 2 HBA's in each host / storage array, if possibleDon't use too much ISL's
3.5 Apply steps to add a configured switch to an existing fabric (e.g., verify that domain ID isunique, insure zone names are unique, backup existing zone before changes, validate existingadmin account has unique username/password on new switch)
3.6 Using scenarios, illustrate reasons to add or remove ISLs (Inter Switch Links)
Increasing throughput, connecting more fabrics together.
Determine impact of adding an ISL (e.g., more options for SAN expansion, allows configuration totake full advantage of ports)
More ISLs means a better usage of the ports (and less oversubscription needed). Also expansion of the SAN ispossible.
Determine impact of removing an ISL (e.g., degraded performance)
Degraded performance, possible increased latency
3.7 Identify processes that occur on a switch during a fabric merge (e.g., name services, protocolsequence, and principle switch selection)
While merging, the following processes happen:
Zoneset passingName server distributionNegotiation of (shortest) pathsprincipal switch selection/negiotiation (lowest WWN wins usually)
3.8 Using scenarios, illustrate common blocking problems to fabric merge
xxx
Selection of switch as primary (e.g., lowest worldwide name)
Lowest domain idLowest worldwide name
Awareness of fabric behavior upon merge (e.g., takes 5-10 minutes to stabilize because ofbackground processes)
Tips:- Use one ISL at a time
Activation of new production zone sets once the merge is complete (e.g., two switches on Fabric A,and one HBA going to each fabric)
3.9 Using scenarios, determine appropriate methodologies and tools for troubleshooting zone sets
Validation of host and LUNs
Validation of HBA logged into fabric
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
10 sur 17 14/01/2012 18:37
Validation of zone set
Brocade: zoneShow
Validation of active zone library
Brocade: cfgShow
Validation of storage subsystem being logged into the switch
3.10 Predict the symptoms when the distance limitations between long-wave and shortwave fiberhas been exceeded
Explain why there is excessive SCSI re-transmit errors (e.g., intermittent loss of signal)
- Signal loss - Oversubscription
3.11 Create or modify zone sets using best practices
xxx
3.12 Using scenarios, illustrate additional conflicts that could cause fabric segmentation
(see initial reasons in 2.7)
If an Extended Fabrics port is to be installed on a SilkWorm 2000 Series switch, the fabric wide configurationparameter fabric.ops.mode.longDistance must be set to 1 on all switches operating within the fabric. Additionally,each long distance port must be set using the portCfgLongDistance command. Each of the two ports within a longdistance ISL must be configured identically, otherwise fabric segmentation will occur.
Validate switch modes are set to be the same
xxx
Verify ISLs are working correctly
Example messages on Brocade: 0x1023fc60 (tThad): Apr 3 22:11:44WARNING FW-ABOVE, 3, eportTXPerf004 (E Port TX Performance 4) is above high boundary. current value : 95462KB/s. (faulty)
Normal message:0x1023fc60 (tThad): Apr 3 22:11:52WARNING FW-BELOW, 3, eportTXPerf004 (E Port TX Performance 4) is below low boundary. current value : 12591KB/s. (normal)Brocade: portErrShow
frames enc crc too too bad enc disc link loss loss frjt fbsy tx rx in err shrt long eof out c3 fail sync sig ===================================================================== 4: 617m 2.8g 0 2 0 0 0 268k 0 0 2 9 0 0 << switch_one 4: 2.8g 617m 0 29 0 0 0 1 333 0 1 5 0 0 << switch_two
Possible causes:
Length of cablingGBIC issueDirty SPF
More information:Brocade portErrShow.pdf
4. Perform Data Protection and Recovery (14%)
4.1 Describe the different back-up and restore configurations
Make daily/weekly backups of all available configurations. Most vendors have a way to download the configuration ofswitches and store it. If needed, adjust available tooling.
Describe the technical advantages and disadvantages of each configuration (i.e., performance)
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
11 sur 17 14/01/2012 18:37
xxx
Identify external requirements that are uniquely satisfied by serverless backup or third-party copy
xxx
4.2 Analyze potential backup problems (e.g., open file, out of space, virus scanner)
xxx
Using scenarios, analyze the trade-offs with disk-to-tape, back-up window, media, silo (e.g., lowcost, portable, but slow)
xxx
Using scenarios, explain advantages of disk-to-disk method (e.g., physical space, space on media,security and access to data)
xxx
Using scenarios, explain the advantages of off-host (e.g., dedicated back-up server, speed vs. cost)
xxx
Using scenarios, explain advantage of LAN-free (e.g., tapes and disks on a dedicated fabric)
Low overhead on serversHigh speedTape devices and backup disks could be zoned or placed in a dedicated fabric.
Explain ways to maximize user time and minimize back-up window
Use LAN-free, serverless backups, snapshot technology, or backup from a passive node.
4.3 Ensure Fibre Channel Security
Physical security: do not allow physical access to unauthorized people.
Prevent physical accessPrevent remote access through IP security measures (i.e. putting devices into a specific VLAN)Hard Zone the devicesLock Down E_port creation (Brocade: portCfgEport)Disable ports (Brocade: portCfgPersistantDisable)
Data encryption: store data encrypted when needed. If needed, encrypt data before putting it on the wire.Zoning:
hardsoftmixed
LUN masking: “exports” a LUN only to the systems which are allowed to use it.
Show how to implement port authentication protocols
CHAP FCAP
Perform processes to secure a fabric
Host isolation refers to ensuring only one initiator (host) per SAN zone, which prevents a misbehaving HBA or hostdriver from interfering with any of the other hosts in the SAN.
Compare the difference between hard and soft zoning regarding security
Hard zoning: members of a zone are physical ports, also known as port zoning Soft zoning: WWN of PWWN aremembers of zone, happens within a fabric switch. Software zoning lets you create symbolic names for the zones andzone members.
Explain the process to configure secure management access to Fibre Channel switches
Use protocols with encryption like SSH (instead of telnet) and HTTPS (instead of HTTP).
4.4 Explain how to recover a clustered storage configuration
xxx
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
12 sur 17 14/01/2012 18:37
5. Implement Storage Networks (17%)
5.1 Define the role of bridges and the differences between PCI-X and PCI-e
PCIe-to-PCIX bridges allow access for legacy devices
PCI-X uses conventional PCI technology, and is the double-wide version of PCI with up to 4 times the clock speed. Itwas needed for hardware like gigabit, fiberchannel and Ultra320 SCSI cards.PCI-X v1.0 slot is 133 MHzIf a conventional PCI card is installed in a PCI-X slot then the clock speed of other PCI-X slots may be reduced.PCI express is a totally new approach, so PCI Express cards can neither be installed in conventional PCI or PCI-Xslots, nor can conventional PCI cards or PCI-X cards be installed in a PCI Express slot.PCI Express1x PCI-e cards will fit in 1x, 4x, 8x and 16x PCI-e slots.4x PCI-e cards will fit in 4x, 8x and 16x PCI-e slots.8x PCI-e cards will fit in 8x en 16x PCI-e slots.16x PCI-e cards will fit in 16x PCI-e slots.
So a fast 16x PCI-e card will not work in a 8x (or lower) slot.
5.2 Compare the RAID levels and implementation (e.g., hardware, software, host-based)
Raid 0: Raid 1: Raid 2: Raid 3: Raid 4: Raid 5: Raid 6: Raid 0+1: Raid 1+0:Hardware VS software: hardware has better performance and doesn't let the CPU do all the work.
Describe technical benefits and limitations of the different RAID levels
RAID 5: slow with writing, as all disks are used to write data, but also are needed to write the parity information.With an even amount of disks, this means only half of the write actions are possible (8 disks = 8 reads or 4 writes,at the same time).
5.3 Implementing Switch Technology
Differentiate among Core/Edge, Cascaded and Mesh designs
Cascaded: inexpensive, easy to extend. However, low reliability and low scalability.Ring: same as Cascaded topology, but with better reliabilityCore/Edge: best flexibility and reliability. Multi-layer design. Examples: tiered hybridMesh: can be full or partially crossed. Good for any-any traffic. The downside is ISLs using valuable ports.
Explain fan-in and fan-out ratios
Fan-out : ratio of storage ports to hosts (1:4)Fan-in : ratio of hosts to storage ports (7:1)
Identify the slot to place the HBA for maximum performance and reliability
When using SSD: ALWAYS use a single port per PCI-E HBA card. Do not attempt to use multiple ports on your HBAcards, as the SSD bandwidth will be limited by the PCI bus Avoid putting more HBAs on a server than the busthroughput can support
5.4 Implementing Virtualization
xxx Tape libraries can be virtualized (VTL: virtual tape library), to make applications believe they are writing to anormal tape unit. Instead these virtual tapes are disks (or parts of disks) and have a way better performance thanconventional tape units.
Explain the reasons for virtualizing servers (e.g., ability to failover, load balance, fully utilizephysical assets
Better utilizing hardware, less power, more central management possible, load balancing, clustering and failoverpossibilities by placing VM's on different hosts.
5.5 Implementing NAS
xxx
List NFS/CIFS common parameters (e.g., which OS, journaling level, statefull/stateless
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
13 sur 17 14/01/2012 18:37
NFS: UDP or TCP, port 2049, versions 2, 3, 4, usually Linux/Solaris, stateful (TCP), but no intervention neededwhen failing over. NFS is stateless, as in: failure is transparant for client and server. Recovering doesn't needactions like rebooting the system to free up resources or states. CIFS: TCP, port 445, usually Windows, stateful,intervention required at failover, due state recovery. With CIFS, the client maintains the connection and open filenames, directories and various other aspects of the files and directories. CIFS is a "stateful" protocol, which is also aproblem when the underlying connection is lost. The client does not know when to recreate the connecting. Filecontent is cached via a cooperative process between client and server code, and this is where problems can occur.The state survives only as long as the session between the server and the client survives, and this session survivesonly as long as the underlying network connection (generally TCP/IP) survives.See http://www.snia.org/images/tutorial_docs/Networking/JimPinkerton-SMB2_Big_Improvements_Remote_FS_Protocol-v3.pdf
Explain when “no block” level access is significant or insignificant (e.g., FSCK-CHKDSK, forensics)
When using file level protocols, the NAS will have to perform the local integrity of a file system. However, whenperforming forensics or file system checks, and data is being served via block based access (SAN/iSCSI), the guestsystem has to perform the operations.
Compare NDMP with standard NAS file level back-up (e.g., scalability, block vs. file, offloading ofwork to NAS unit)
xxx
6. Monitor Storage Networking Performance (9%)
6.1 Use tools to access the performance of a network storage environment for analysis
Switch performance: Brocade example:
switch1:admin> portPerfShow 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total ------------------------------------------------------------------------------------- 0 0 21m 28m 31m 0 8.4m 0 28m 21m 31m 0 8.4m 0 0 0 178m 0 0 20m 29m 31m 0 10m 0 29m 20m 31m 0 10m 0 0 0 182m 0 0 18m 36m 31m 0 14m 0 36m 18m 31m 0 14m 0 0 0 201m 0 0 17m 34m 30m 0 7.0m 0 34m 17m 31m 0 7.0m 0 0 0 179m
HBA performance: xxx
Establish baselines (e.g., performance-based, trending, configuration, as built)
Use tools like MRTG, Cacti and RRDTOOL, to create initial baselines.
Use a time server across environments for log correlation, security, discovery process andtroubleshooting
Time synchronization is important for troubleshooting, when trying to debug issues and compare log events witherror messages. Also interesting for security breaches and/or events, to trace back all steps in a investigation.
Protocol: NTPPort: 123
Brocade switches: configure time on principal switch. Other switches will use principal switch to synchronize time.
Another use for having the correct time is the discovery process happening with RSCN. When a new disk array isattached to the fabric (ONLY the switch with the connected array), the HBA's registered within the switch'snotification list, will be notified and can start discovering new devices/LUN's.Discovery process SCSI discovery process In the modern SCSI transport protocols, there is an automated processof "discovery" of the IDs. SSA initiators "walk the loop" to determine what devices are there and then assign eachone a 7-bit "hop-count" value. Serial Storage Architecture (SSA) is an IBM developed serial interface. SSA is a serialtechnology which basically runs the SCSI-2 software protocol.
The good news about SSA compared to SCSI is:
it is far easier configured and cabled -- no termination needed!it is built with HA features. The SSA loop architecture (as opposed to a SCSI bus) has no SPOF (see diagrambelow). If part of a loop fails, the device driver will automatically and transparently reconfigure itself to makesure all SSA devices can be accessed without any noticable interruption.it uses no SCSI ID addressing which means no hassle with setting up the adapters.the SSA loop can transport 4 times 20 MByte/s -- two independent reads and two independant writes across
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
14 sur 17 14/01/2012 18:37
each loop direction. Current actual adapter implementations allow for 35 MByte/s per adapter.SSA uses no bus arbitration as opposed to SCSI. Rather than that, a network-like scheme is used. Data issent and received in 128 Byte packets, and all devices on the loop can request time slots independantly.SCSI in turn needs bus arbitration which can lead to performance deadlocks if an initiator doesn't release thebus in time.SSA allows for 25 meters between each two devices. Plus, there is a fiberoptical extender which allows fordata transfers across 50 Micrometer optical cables over distances up to 2.4 km. This makes it even suitablefor site disaster recovery if configured properly.Most SSA adapters support two independent loops which makes it possible to attach mirrored disks todifferent loops for higher availability.
The SSA loops are symmetrical, twisted-pair, potential free. No TERMPWR potential shift problem.
FC-AL initiators use the LIP (Loop Initialization Protocol) to interrogate each device port for its WWN ( World WideName ). For iSCSI, because of the unlimited scope of the (IP) network, the process is quite complicated. Thesediscovery processes occur at power-on/initialization time and also if the bus topology changes later, for example ifan extra device is added.
Analyze performance implications on the fabric involving RAID, caching and connectivityconfigurations (i.e., identifying potential bottlenecks among these indicators)
xxxCache Optimizing the cache usage can have a great performance gain on the storage. More data can be quicklyserved from the cache, instead of the much slower disks.While having cache memory is usually a good thing, it should be disabled if only small random reads are being used.NetApp: sysstat -x 5 EMC Navisphere (CLI): navicli -h XXX getcacheExample:
# navicli -h 192.168.29.133getcache -pdp -high -low Prct Dirty Cache Pages = 51 High Watermark: 80 Low Watermark: 60
If 80% of cache is dirty, then it will flush cache down to 60%, currently it is at 51%.
RAID level Using the best RAID level optimized for safety and read and/or write speed is important. By creatingseveral different RAID levels within the storage tiers, much of the data processing can be improved.
Monitor, collect, and analyze trending information to avoid bottlenecks or resource constraints onthe system architecture
Monitoring logs is probably the most basic form of tracking the health of any system. Also checking trends by usingtools like RRD, SNMP can give valuable information about the health and grow speed of affected systems. Alsomonitoring tools like Nagios, Zabbix etc are useful to respond to problems in time.Brocade switches provide the commands portperfshow and porterrshow.
6.2 Develop and follow steps for problem resolution
xxx
Analyze Resolve problem; document problem tracking, root cause analysis, problem resolution,problem prevention timeline
Root cause analysis (RCA): document describing events happened after a big issue/problem. Often with additionalinformation about follow up actions, problem description, timeline of events, problem resolution/solution.
Analyze and document compliance/non-compliance to customer Service Level Agreement
xxx
6.3 Asses methods to reduce performance impacts when adding long distance connections
Use a proper amount of buffer-to-buffer credits. Use asynchronous replication instead of synchronous, to preventhuge (application) delays, if the RPO can be higher than zero. Set speed on both sides of the link to a fixed value(instead of auto negotiation)
Analyze when an increase in buffer-to-buffer credit is necessary
The buffer-credit method, a form of storage distance extension. If the length of the fiber optic cable span exceedsthis limit, the throughput drops sharply. The buffer-credit method gets around this problem. Unacknowledgedframes (buffer credits) determine how many packets can be sent, before an acknowledgment has to come. It's
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
15 sur 17 14/01/2012 18:37
compare with window size (in TCP connections). The value can be increased when the link is stable (or shorter).
Brocade formula: Buffer Credits = ((Distance in km) * (Data Rate) * 1000) / 2112
Brocade switches can also use LD mode (Dynamic long distance mode) to automatically adjust the buffer-to-buffercredit value.
Use LSANs or VSANs to isolate traffic such that only required traffic is transferred
VSAN:virtual SAN or “virtual fabric”, to achieve isolation without having the need to setup a physical separated fabric. If aswitch does not support VSANs, create a SAN as small as possible, but with room for growth.
LSAN: sharing (zone) information across fabrics (zones are usually prefixed with "lsan_").
Explain when to use compression/encryption and in which sequence
Order: Compression first, then encryption.
Compression is useful for information which is text based and have a high compress rate. Compression is not usefulfor encrypted links (like VPN tunnels), or compact formats like audio, video and images.
7. Provide Storage Networking Business Continuance (6%)
7.1 Describe archiving/nearline
Nearline storage is used to tier storage using cheaper storage, but usually with a bigger storage capacity. It can alsoapply to information which does not need high performance storage at that moment and has to be stored on a lowerperformance (and cheaper) array. One of the common used purposes is archiving of information or additionalbackups.
Define Content Addressable Storage (CAS) (e.g., hand-offs)
Content Addressable Storage/Content Addressed Storage (CAS) and Fixed Content Storage (FCS) are differentacronyms for storage of documents which don't change in time and the related location based addressing. If thesame document would available on multiple places, it is only placed once. Information is accessed by using specificID's, generated at the time of creation on the CAS system.
7.2 Identify protocols and technologies best used for implementing business recovery solutions
DWDM or IP extenders (in combination with FCIP or iFCP).
7.3 Identify techniques and processes to be used as part of a business continuance solution
Host-based replication:LAN-based replication:SAN-based replication:CDP (Continuous Data Protection)
7.4 Explain how to perform data transfers, migrations, and replications
Synchronous replication: source and target both need to acknowledge data transfer, before application is beingnotified.Asynchronous replication: source acknowledges write and notified application, afterwards data gets replicated totarget device.
Resolving Fabric Merge Conflicts
Whenever two fabrics merge SDV merges its database. A merge conflict can occur when there is a run-time information conflict orconfiguration mismatch. Run-time conflicts can occur due to:
Identical pWWNs being assigned to different virtual devicesThe same virtual devices are assigned different pWWNsThe virtual device and virtual FC ID are mismatched
A blank commit is a commit operation that does not contain configuration changes, and enforces the SDV configuration of thecommitting switch fabric-wide. A blank commit operation resolves merge conflicts by pushing the configuration from thecommitting switch throughout the fabric, thereby reinitializing the conflicting virtual devices. Exercise caution while performing thisoperation, as it can easily take some virtual devices offline.
Merge failures resulting from a pWWN conflict can cause a failure with the device alias as well. A blank commit operation on amerge-failed VSAN within SDV should resolve the merge failure in the device alias.
You can avoid merge conflicts due to configuration mismatch by ensuring that:
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
16 sur 17 14/01/2012 18:37
•The pWWN and device alias entries for a virtual device are identical (in terms of primary and secondary).
• There are no virtual device name conflicts across VSANs in fabrics.Zoning conflict parametersWhen merging two fabrics, zoning information from the two previously separated fabrics is merged as much as possible into thenew fabric. Sometimes, zoning inconsistency can occur and zoning information cannot be merged. Segmentation due to zoning willusually be flagged by an error message that says "Fabric segmented, zone conflict" appearing in the error logs. One of thesolutions is to make sure zoning information on both switches is consistent before bringing up the ISL.
Upgrading firmware on Brocade switches:The internal process will be as follows1. firmware -s download command is entered, and you respond to prompts.2. Firmware is downloaded to Secondary Partition3. Primary and Secondary boot pointers are swapped4. CP boots from firmware in new Primary partition.
Say no to autocommit and yes to reboot after download.After a few days of cool operation, run the firmwareCommit command and then the new firmware is copied to the secondaypartition as well.
http://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a309.html
Sources used: http://www.scsita.org/aboutscsi/sas/tutorials/SAS_General_overview_public.pdf http://www.directron.com/ncqvstcq.html
Study Guide / Book for SNIA Certified Storage Engineer (SCSE, S10-201) http://www.rootkit.nl/files/book_snia_certified_storage_engineer_s10-...
17 sur 17 14/01/2012 18:37