Clustering - GSL

7/28/2019 Clustering - GSL

1/14

Microsoft Clustering GSLProduced by: Kingsley Bell


2/14

Produced by: Kingsley Bell

Distributed Operations Windows

Windows Server Support Contact Information

Windows Regional Services EMEA:

Hotline *448 6868

Email Address # IT TIS RDO EMEA DO Windows

ManagerAleet Kavia *448 7753

Back Office team Lead: Edwin Broersma *443 9606

Front Office team lead: Barry Roberts *448 5483

Windows Production Services (Global):Hotline *650 8888

Email Address # IT TIS RDO Windows Prod Svcs

ManagerTejendra Dhiman *650 8860

Remedy GIM / RFC Queue TIS_RDO_DO_WIN_PROD_SVCS

Remedy GIM / RFC Queue :

EMEA Asset Management TIS_RDO_EMEA_DO_WIN_ASSET_MGT

EMEA Equities & PrimeServices TIS_RDO_EMEA_DO_WIN_EQ_PS

EMEA Fixed Income & Deriv. TIS_RDO_EMEA_DO_WIN_FID_DRV

EMEA Back Office TIS_RDO_EMEA_DO_WIN_IBO_BO


3/14


Contents

What is MSCS? Cluster Overview

Cluster groups

Resources

Credit Suisse Naming Standards

Failover Disaster Recovery

Load Balancing

Questions & Answers


4/14


What is MSCS?

A cluster consists of two or more computers working together to provide a higher level of availability, reliability, and

scalability than can be obtained by using a single computer. Microsoft cluster technologies guard against three specific

types of failure:

Application and service failures, which affect application software and essential services.

System and hardware failures, which affect hardware components such as CPUs, drives, memory, network

adapters, and power supplies.

Site failures in multisite organizations, which can be caused by natural disasters, power outages, or

connectivity outages.

The ability to handle failure allows server clusters to meet requirements for high availability, which is the ability to

provide users with access to a service for a high percentage of time while reducing unscheduled outages.

In a server cluster, each server owns and manages its local devices and has a copy of the operating system and the

applications or services that the cluster is managing. Devices common to the cluster, such as disks in common disk

arrays and the connection media for accessing those disks, are owned and managed by only one server at a time. For

most server clusters, the application data is stored on disks in one of the common disk arrays, and this data is

accessible only to the server that currently owns the corresponding application or service.

Server clusters are designed so that the servers in the cluster work together to protect data, keep applications and

services running after failure on one of the servers, and maintain consistency of the cluster configuration over time.


5/14


Cluster Overview


6/14


Cluster Groups

Cluster Groups are used to group together all resources Required to run an application or instance.

A cluster group can only run on one physical node at one time. No other node will be able to access the resources e.g.Disks

Multiple cluster groups can be run simultaneously on the same node.

When a cluster group is moved to an other node all resources in that group are taken offline and brought up on the othernode.

An Active/Active cluster is when 2 cluster groups are running on 2 physical machines.

In case of a node failure the cluster service will automatically start the whole cluster group on an other available node.

The first cluster Group is used to operate the cluster, no other resources should be placed in this group.

Cluster groups have some configuration options - Preferred Owners, Failover (Threshold-Period), failback options.


7/14


Resources Resources reside in cluster groups

All resources required for a specific function should be grouped together

When a resource is in a cluster, it should only be administered through the cluster Typically a each cluster group has a network name and associated IP the resources can be accessed through

A large number of resource types can be created which can be used to provide a total clustered applicationecosystem

IP Address

Network Name

File share

Generic Service

Physical Disk

Some Resources have required dependencies e.g. the Network Name requires an IP address

You can create your own dependencies, for example a service can not start until a file share is online

Each resource has a number of configuration options

Some applications create new cluster resources e.g. MS SQL server

If the application is not cluster aware then the use of generic service/application can be used for a roll your ownsolution

Resources are required to be available on each node that may own the resource

Cluster aware applications will install required binaries on all nodes at install

Generic applications will need to have required binaries installed onto each node manually


8/14


9/14


10/14


FailoverMSCS does not provide a seamless failover solution, resources are shutdown on one node and

then brought up on an other node in case of failure

Careful consideration should be made when configuring resource parameters e.g. affect groupCluster resources should not be overcommitted to allow space for node failure e.g. if one cluster

group requires 80% of computing power to operate there should always be this amount of

capacity in the cluster available in case of node failure e.g. 2 cluster groups both need 55% of

compute power 3 nodes should be in the cluster

Keep all nodes in a cluster with the same specification

Individual resource failures can initiate a cluster failover

Node failure will initiate a cluster failover


11/14


Disaster RecoveryDR nodes should installed with enough resources to just run the cluster e.g. 3Production Nodes

requires 2 DR nodes

2+1

3+2 4+3

DR nodes typically have the cluster service disabled or running just the default cluster group with

all other cluster groups offline

DR nodes will need to have the application installed

Configuration changes need to be updated when the production configuring is changed

Credit Suisse utilises the following 3rd party vendor technologies to aid DR failover

EMC SRDF Symmetrix Remote Data Facility

CISCO LAM Local Area Mobility

When using SRDF the cluster disk resources will be unable to be brought online without the disks

being in a split state

The IP address and Network Name will be unable to be brought online if they are in use in the

estate

The DR nodes naming standard reflects the production nodes XNYC19P11013A -> XNYC19B11013A

CNYC19P11013 -> CNYC19B11013

With the use of LAM however, the Virtual server names will be able to be brought online in a DR

scenario (CNYC19P11013A)


12/14


MSCS Clusters DRCollection of clustered Windows servers with shared disks, IP addresses, network names and SQL resources. 2+1 or 3+2.

Shared Storage

PROD

Shared Storage

DR

LAM

SRDF

Heartbeatvlannon

-routed

Corp

Corp

Prod A

Prod B

DR

Slough Global Switch

Enable LAM in DR

Stop production resources

Split storage

Import disk groups in DR

Start Network and storage cluster resources in DR

Start SQL resources in DR


13/14


Load BalancingService is provided by the NOC

Cisco GSS Global Site Selector

Round Robin or weighted balancing

Session aware

End point node checking (ping)

Node port end point checking e.g. port 80,21,443 etc

Can also query website connectivity, e.g. 404 Page not found errors


14/14


Questions & Answers

Clustering solutions would be expected to support:

Automatic failover of application processes/services in the event of node failure

Automatic restart of application processes/services in the event of process failure

Automatic load balancing of peer processes/services in a cluster

Automatic reallocation of processes/services to ensure best utilisation of the cluster

Management and software deployment and provisioning at the cluster level rather than individual node level

Given these reasonable expectations of a clustering solution a number of facets of the GSL / SlatePlus clustering

arrangements don't seem to quite match these expectations.

Why is it necessary to install services explicitly to every machine rather than installing a service to a cluster andletting the clustering solution manage the deployment of services to the cluster nodes?

What role does GSL play in providing the clustering solution rather than relying upon the MS product. Forinstance is GSL Gateway necessary or could it be replaced in whole or in part by MS components?

What 'templates' exist for stateless services where we can run multiple instances of services concurrently in thecluster?

What 'templates' exist for stateful services where we would want only one instance of particular service to runat one time, but where we do want the benefits of the cluster, i.e. automatic failover to another node and

automatic restart?
http://couchfiresports.com/wp-content/uploads/2010/09/blue-question-mark_crop_340x234.jpg

Clustering - GSL

Documents

Transcript of Clustering - GSL