Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
-
Upload
david-dillon -
Category
Documents
-
view
220 -
download
1
Transcript of Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Express5800/ft series serversExpress5800/ft series serversProduct InformationProduct Information
Fault-Tolerant General Purpose ServersFault-Tolerant General Purpose Servers
Express5800/ft Series ServersExpress5800/ft Series ServersHigh Availability Technologies High Availability Technologies
© NEC Corporation 2013Page 3
Approaches to Reliability and Availability▐ Select and combine hardware and software technologies for availability
Cluster software
Redundant hardware(dual modular architecture)
Single server (Typical servers)
Fault tolerant server
Enhance availability
of the system
Failover across multiple servers
FT server + cluster
FT server cluster
•Continuous operation despite of hardware failures. •Simplified installation and operation
•Enhanced HW/SW failure resilience •For Large scale system with scalable nodes etc.
Partially redundant hardware (e.g. HDD, PSU)
Higher availability of a single server
Higher availability of the system
Select the best availability solution according to system requirements
Enhance fault tolerance of
the hardware
© NEC Corporation 2013Page 4
FT Server and Cluster Solution Comparison
Failover process
Service during failure
Performance enhancement
Technology
Resilience
Aim
Operation is interrupted for failover processOperation is interrupted for failover process(some several minutes to 10 minutes) (some several minutes to 10 minutes)
Add CPU or node. Supports servers with 4 or more socketsAdd CPU or node. Supports servers with 4 or more sockets
EXPRESSCLUSTEREXPRESSCLUSTER
FailoverFailover FailureFailure
Cluster systemCluster system Cluster systemCluster system
Hardware/ Hardware/ Software failures failures
FailoverFailoverLoad balancing
Achieve availability / scalability / load balancing
•Features load balancing as well as availability•Software failure-resilient•Suitable for large-scale systems (scalable nodes)
Failover to Failover to other serversother servers
Continuous operation (no interruption)
Add CPU Add CPU
Supported apps Failover settings is required for each app.Failover settings is required for each app.(creation of script batch files)(creation of script batch files)
General applicationsGeneral applications No modifications needed
Fault tolerant serverFault tolerant server Fault tolerant serverFault tolerant server
Hardware failuresHardware failures
Lockstep (CPU&MEM) and Failover (I/O)(Synchronized in normal conditions)
High availability of a single server
•System configuration requires no app modifications•Continuous operation without interruption•Ideal for 24-7 systems, email and Web servers
Isolate faulty Isolate faulty componentcomponent
CPUCPU
MemoryMemory
CPUCPU
MemoryMemory
FailureFailure
IsolationIsolation
HDDHDD HDDHDD
ft servers provide hardware availability and can be installed quick and easilyFt servers + EXPRESSCLUSTER solution takes advantage of both solutions
© NEC Corporation 2013Page 5
Express5800/ft series server Express5800/ft series server
Failover complete
1. Interruption(a few secs)
2. Determine failover host (a few secs to 1-2 mins)
4. Restart apps(a few secs to a few mins)
3. Takeover of cluster resources (e.g. NW settings and disks) (a few secs to 1 min)
Start failover process
Cluster systemCluster system
Failure
In service
Failure
Failover Repair / Replace
System downfor a few mins to 10 mins
1. Instantaneous isolation of the faulty module
Non-stop service
2. Resynchronization after replacement
Recoverycomplete
Service Intermittence Restart serviceIn service
In serviceIn service
Continuous operationContinuous operation
Processing LockstepProcessing
Module #0
Module #1
ProcessingReplacement of
faulty module
Recovery Process from HW Failures
Isolated faulty model
Express5800/ft Series ServersExpress5800/ft Series Servers
Optional Features to Increase Optional Features to Increase Fault ToleranceFault Tolerance
© NEC Corporation 2013Page 7
Express Report Service Support
Express Report ServiceExpress Report Service
CPUCPU
MemMem
HDDHDD
CPUCPU
MemMem
HDDHDD
FailureFailure
CPUCPU
MemMem
HDDHDD
CPUCPU
MemMem
HDDHDD
CPUCPU
MemMem
HDDHDD
CPUCPU
MemMem
HDDHDD
IsolationIsolation
NEC (monitoring center)NEC
Service Center
Client
AlertNotification
NotificationNotification
①①
④④③③
②②Hardware
monitoring & detection
• Isolate the failed components to continue operation.• Monitor hardware status at the service center.• Support the system proactively to ensure continuous availability.
• Isolate the failed components to continue operation.• Monitor hardware status at the service center.• Support the system proactively to ensure continuous availability.
ContinuousOperation
CPUCPU
MemMem
CPUCPU
MemMem
Replace
HDDHDD HDDHDD
RecoveryRecovery
Only the alert information will be sent out with dedicated
software (secure environment)
Via the internet (mail server)public line (modem connection)
© NEC Corporation 2013Page 8
Support for Redundant Peripheral Devices
▐ Selection of LTO or DAT and support for redundant backup*
Double backup configuration is supported to provide for failures during backup◆ Double backup configuration is supported to provide for failures during backup◆ LTO or DAT drives are offered for selection◆ LTO or DAT drives are offered for selection◆
▐ A two UPS configuration provides tolerance against UPS defects*
Module #1Module #1
Module #2Module #2
SASController
SASController
SASController
SASController
Backup device
Backup device
Backupdevice
Backupdevice
ft seriesData is output from each module to achieve backup redundancy Both backups are created almost simultaneously
* Configuration of standalone backup is also supported
Module #1Module #1
Module #2Module #2
PSU PSU
PSU PSU
ft series
UPSUninterruptable
power supply
UPSUninterruptable
power supply
UPSUninterruptable
power supply
UPSUninterruptable
power supply * Single UPS configuration is also supported. UPS is controlled through the network
Connecting each UPS to separate power sources helps avoid being affected by failures of the power sources
Peripheral Devices
© NEC Corporation 2013Page 9
ft series + EXPRESSCLUSTER for Higher Availability
▐ Clusters with ft servers enhance both HW and SW availability
Enhancement SW
OSOS
AppsApps
Module #0 Module #1
EXPRESSCLUSTEREXPRESSCLUSTERSoftware failure
EXPRESSCLUSTER monitors SW
Failover to secondary server
ft server (secondary) ft server (primary)
OSOS
AppsApps
Module #0 Module #1
ft series serverft series server Hardware failure
Highest level of availability suitable for critical systems
© NEC Corporation 2013Page 10
Benefits of ft Series + EXPRESSCLUSTER▐ Clusters using ft servers deliver the benefits of both solutions
Express5800/ft server Cluster system(configured by normal servers)
Cluster system (configured by ft servers)
Function Lockstep and Failover(within a server)
Failover (between multiple servers)
Failover(between multiple servers)
HW failure tolerance
Treatment ★★★Isolate faulty module (within the server)
★★☆Failover from the primary server to the
secondary server
★★★Isolate faulty module within the primary server
(no failover between nodes)Treatment time Instantaneous Few minutes
(Depends on the time necessary to startup apps) Instantaneous
SW failure tolerance
Treatment -(Apps level failures can be resolved by
SingleServerSafe software)
★★☆Failover from the primary server to the
secondary server
★★☆Failover from the primary server to the
secondary server
Treatment time - Several minutes
(Depends on the time necessary to startup apps)Several minutes
(Depends on the time necessary to startup apps)
Periodical maintenance (SW update)
★★☆Active Upgrade enables OS patches to be
applied with only short interruption
★★★Each node can be separated for upgrade
★★★Each node can be separated for upgrade
Performance enhancement ★★☆
Add CPU
★★★Add CPU or Nodes
★★☆Add CPU
Apps settings ★★★General apps can be used without special
modifications
★☆☆Takeover process is required for each app
★☆☆Takeover process is required for each app
Enhancement SW
Legend: ★★★: Excellent, ★★☆: Good, ★ ☆ ☆ : Fair
© NEC Corporation 2013Page 11
ft server + Hyper V + EXPRESSCLUSTER
▐ Clusters configured on Hyper-V on an ft server
Hyper-V™ 2.0
Guest OSGuest OS
AppsApps
Module #0 Module #1
ft serverft server Hardware failure
Guest OSGuest OS
Apps Apps
ft series server
EXPRESSClusterEXPRESSClusterSoftware failure
EXPRESSCluster monitors SW In the event of a SW failure, the operation fails over to another guest OS
High HW and SW availability for virtualized environments
Enhancement SW
© NEC Corporation 2013Page 12
OS
SingleServerSafe
Reboot
ServiceService ProcessProcess
AppsAppsRestart Restart
ExpressCluster X SingleServerSafe
▐ SW is monitored on the ft server to automatically restart the SW in the event of a failure.
◆ SingleServerSafe (SSS) monitors the server and SW status at all times. ◆ In an event of a failure, SSS restarts the service, process, OS etc. to resume operation. ◆ The ft server and SSS in tandem can handle both HW and SW failures
SW availability can be improved even for a single ft server
Enhancement SW
By enabling failure detection and restart/reboot, SSS helps handle a wide range of failures with a single serverBy using the optional monitoring function of EXPRESSCluster, SSS is capable of further detailed monitoring including the detection of stalling in data bases.
© NEC Corporation 2013Page 13