Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

13
Express5800/ft series Express5800/ft series servers servers Product Information Product Information Fault-Tolerant General Purpose Servers Fault-Tolerant General Purpose Servers

Transcript of Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Page 1: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Express5800/ft series serversExpress5800/ft series serversProduct InformationProduct Information

Fault-Tolerant General Purpose ServersFault-Tolerant General Purpose Servers

Page 2: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Express5800/ft Series ServersExpress5800/ft Series ServersHigh Availability Technologies High Availability Technologies

Page 3: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 3

Approaches to Reliability and Availability▐ Select and combine hardware and software technologies for availability

Cluster software

Redundant hardware(dual modular architecture)

Single server (Typical servers)

Fault tolerant server

Enhance availability

of the system

Failover across multiple servers

FT server + cluster

FT server cluster

•Continuous operation despite of hardware failures. •Simplified installation and operation

•Enhanced HW/SW failure resilience •For Large scale system with scalable nodes etc.

Partially redundant hardware (e.g. HDD, PSU)

Higher availability of a single server

Higher availability of the system

Select the best availability solution according to system requirements

Enhance fault tolerance of

the hardware

Page 4: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 4

FT Server and Cluster Solution Comparison

Failover process

Service during failure

Performance enhancement

Technology

Resilience

Aim

Operation is interrupted for failover processOperation is interrupted for failover process(some several minutes to 10 minutes) (some several minutes to 10 minutes)

Add CPU or node. Supports servers with 4 or more socketsAdd CPU or node. Supports servers with 4 or more sockets

EXPRESSCLUSTEREXPRESSCLUSTER

FailoverFailover FailureFailure

Cluster systemCluster system Cluster systemCluster system

Hardware/ Hardware/ Software failures failures

FailoverFailoverLoad balancing

Achieve availability / scalability / load balancing

•Features load balancing as well as availability•Software failure-resilient•Suitable for large-scale systems (scalable nodes)

Failover to Failover to other serversother servers

Continuous operation (no interruption)

Add CPU Add CPU

Supported apps Failover settings is required for each app.Failover settings is required for each app.(creation of script batch files)(creation of script batch files)

        General applicationsGeneral applications       No modifications needed

Fault tolerant serverFault tolerant server Fault tolerant serverFault tolerant server

Hardware failuresHardware failures

Lockstep (CPU&MEM) and Failover (I/O)(Synchronized in normal conditions)

High availability of a single server

•System configuration requires no app modifications•Continuous operation without interruption•Ideal for 24-7 systems, email and Web servers

Isolate faulty Isolate faulty componentcomponent

CPUCPU

MemoryMemory

CPUCPU

MemoryMemory

FailureFailure

IsolationIsolation

HDDHDD HDDHDD

ft servers provide hardware availability and can be installed quick and easilyFt servers + EXPRESSCLUSTER solution takes advantage of both solutions

Page 5: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 5

Express5800/ft series server Express5800/ft series server

Failover complete

1. Interruption(a few secs)

2. Determine failover host (a few secs to 1-2 mins)

4. Restart apps(a few secs to a few mins)

3. Takeover of cluster resources (e.g. NW settings and disks) (a few secs to 1 min)

Start failover process

Cluster systemCluster system

Failure

In service

Failure

Failover Repair / Replace

System downfor a few mins to 10 mins

1. Instantaneous isolation of the faulty module

Non-stop service

2. Resynchronization after replacement

Recoverycomplete

Service Intermittence Restart serviceIn service

In serviceIn service

Continuous operationContinuous operation

Processing LockstepProcessing

Module #0

Module #1

ProcessingReplacement of

faulty module

Recovery Process from HW Failures

Isolated faulty model

Page 6: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Express5800/ft Series ServersExpress5800/ft Series Servers

Optional Features to Increase Optional Features to Increase Fault ToleranceFault Tolerance

Page 7: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 7

Express Report Service Support

Express Report ServiceExpress Report Service

CPUCPU

MemMem

HDDHDD

CPUCPU

MemMem

HDDHDD

FailureFailure

CPUCPU

MemMem

HDDHDD

CPUCPU

MemMem

HDDHDD

CPUCPU

MemMem

HDDHDD

CPUCPU

MemMem

HDDHDD

IsolationIsolation

NEC (monitoring center)NEC

Service Center

Client

AlertNotification

NotificationNotification

①①

④④③③

②②Hardware

monitoring & detection

• Isolate the failed components to continue operation.• Monitor hardware status at the service center.• Support the system proactively to ensure continuous availability.

• Isolate the failed components to continue operation.• Monitor hardware status at the service center.• Support the system proactively to ensure continuous availability.

ContinuousOperation

CPUCPU

MemMem

CPUCPU

MemMem

Replace

HDDHDD HDDHDD

RecoveryRecovery

Only the alert information will be sent out with dedicated

software (secure environment)

Via the internet (mail server)public line (modem connection)

Page 8: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 8

Support for Redundant Peripheral Devices  

▐ Selection of LTO or DAT and support for redundant backup*

Double backup configuration is supported to provide for failures during backup◆ Double backup configuration is supported to provide for failures during backup◆ LTO or DAT drives are offered for selection◆ LTO or DAT drives are offered for selection◆

▐ A two UPS configuration provides tolerance against UPS defects*

Module #1Module #1

Module #2Module #2

SASController

SASController

SASController

SASController

Backup device

Backup device

Backupdevice

Backupdevice

ft seriesData is output from each module to achieve backup redundancy  Both backups are created almost simultaneously

* Configuration of standalone backup is also supported

Module #1Module #1

Module #2Module #2

PSU PSU

PSU PSU

ft series

UPSUninterruptable

power supply

UPSUninterruptable

power supply

UPSUninterruptable

power supply

UPSUninterruptable

power supply * Single UPS configuration is also supported. UPS is controlled through the network

Connecting each UPS to separate power sources helps avoid being affected by failures of the power sources

Peripheral Devices

Page 9: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 9

ft series + EXPRESSCLUSTER for Higher Availability

▐ Clusters with ft servers enhance both HW and SW availability

Enhancement SW

OSOS

AppsApps

Module #0 Module #1

EXPRESSCLUSTEREXPRESSCLUSTERSoftware failure

EXPRESSCLUSTER monitors SW

Failover to secondary server

ft server (secondary) ft server (primary)

OSOS

AppsApps

Module #0 Module #1

ft series serverft series server Hardware failure

Highest level of availability suitable for critical systems

Page 10: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 10

Benefits of ft Series + EXPRESSCLUSTER▐ Clusters using ft servers deliver the benefits of both solutions

Express5800/ft server Cluster system(configured by normal servers)

Cluster system (configured by ft servers)

Function Lockstep and Failover(within a server)

Failover (between multiple servers)

Failover(between multiple servers)

HW failure tolerance

Treatment ★★★Isolate faulty module (within the server)

★★☆Failover from the primary server to the

secondary server

★★★Isolate faulty module within the primary server

(no failover between nodes)Treatment time Instantaneous Few minutes

(Depends on the time necessary to startup apps) Instantaneous

SW failure tolerance

Treatment -(Apps level failures can be resolved by

SingleServerSafe software)  

★★☆Failover from the primary server to the

secondary server

★★☆Failover from the primary server to the

secondary server

Treatment time - Several minutes

(Depends on the time necessary to startup apps)Several minutes

(Depends on the time necessary to startup apps)

Periodical maintenance (SW update)

★★☆Active Upgrade enables OS patches to be

applied with only short interruption

★★★Each node can be separated for upgrade

★★★Each node can be separated for upgrade

Performance enhancement ★★☆

Add CPU

★★★Add CPU or Nodes

★★☆Add CPU

Apps settings ★★★General apps can be used without special

modifications

★☆☆Takeover process is required for each app

★☆☆Takeover process is required for each app

Enhancement SW

Legend: ★★★: Excellent, ★★☆: Good, ★ ☆ ☆ : Fair

Page 11: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 11

ft server + Hyper V + EXPRESSCLUSTER

▐ Clusters configured on Hyper-V on an ft server

Hyper-V™ 2.0

Guest OSGuest OS

AppsApps

Module #0 Module #1

ft serverft server Hardware failure

Guest OSGuest OS

Apps Apps

ft series server

EXPRESSClusterEXPRESSClusterSoftware failure

EXPRESSCluster monitors SW In the event of a SW failure, the operation fails over to another guest OS

High HW and SW availability for virtualized environments

Enhancement SW

Page 12: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 12

OS

SingleServerSafe

Reboot

ServiceService ProcessProcess

AppsAppsRestart Restart

ExpressCluster X SingleServerSafe

▐ SW is monitored on the ft server to automatically restart the SW in the event of a failure.

◆ SingleServerSafe (SSS) monitors the server and SW status at all times.  ◆ In an event of a failure, SSS restarts the service, process, OS etc. to resume operation.   ◆ The ft server and SSS in tandem can handle both HW and SW failures

SW availability can be improved even for a single ft server

Enhancement SW

By enabling failure detection and restart/reboot, SSS helps handle a wide range of failures with a single serverBy using the optional monitoring function of EXPRESSCluster, SSS is capable of further detailed monitoring including the detection of stalling in data bases.

Page 13: Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

© NEC Corporation 2013Page 13