Clustering - GSL

download Clustering - GSL

of 14

Transcript of Clustering - GSL

  • 7/28/2019 Clustering - GSL

    1/14

    Microsoft Clustering GSLProduced by: Kingsley Bell

  • 7/28/2019 Clustering - GSL

    2/14

    Produced by: Kingsley Bell

    Distributed Operations Windows

    Windows Server Support Contact Information

    Windows Regional Services EMEA:

    Hotline *448 6868

    Email Address # IT TIS RDO EMEA DO Windows

    ManagerAleet Kavia *448 7753

    Back Office team Lead: Edwin Broersma *443 9606

    Front Office team lead: Barry Roberts *448 5483

    Windows Production Services (Global):Hotline *650 8888

    Email Address # IT TIS RDO Windows Prod Svcs

    ManagerTejendra Dhiman *650 8860

    Remedy GIM / RFC Queue TIS_RDO_DO_WIN_PROD_SVCS

    Remedy GIM / RFC Queue :

    EMEA Asset Management TIS_RDO_EMEA_DO_WIN_ASSET_MGT

    EMEA Equities & PrimeServices TIS_RDO_EMEA_DO_WIN_EQ_PS

    EMEA Fixed Income & Deriv. TIS_RDO_EMEA_DO_WIN_FID_DRV

    EMEA Back Office TIS_RDO_EMEA_DO_WIN_IBO_BO

  • 7/28/2019 Clustering - GSL

    3/14

    Produced by: Kingsley Bell

    Contents

    What is MSCS? Cluster Overview

    Cluster groups

    Resources

    Credit Suisse Naming Standards

    Failover Disaster Recovery

    Load Balancing

    Questions & Answers

  • 7/28/2019 Clustering - GSL

    4/14

    Produced by: Kingsley Bell

    What is MSCS?

    A cluster consists of two or more computers working together to provide a higher level of availability, reliability, and

    scalability than can be obtained by using a single computer. Microsoft cluster technologies guard against three specific

    types of failure:

    Application and service failures, which affect application software and essential services.

    System and hardware failures, which affect hardware components such as CPUs, drives, memory, network

    adapters, and power supplies.

    Site failures in multisite organizations, which can be caused by natural disasters, power outages, or

    connectivity outages.

    The ability to handle failure allows server clusters to meet requirements for high availability, which is the ability to

    provide users with access to a service for a high percentage of time while reducing unscheduled outages.

    In a server cluster, each server owns and manages its local devices and has a copy of the operating system and the

    applications or services that the cluster is managing. Devices common to the cluster, such as disks in common disk

    arrays and the connection media for accessing those disks, are owned and managed by only one server at a time. For

    most server clusters, the application data is stored on disks in one of the common disk arrays, and this data is

    accessible only to the server that currently owns the corresponding application or service.

    Server clusters are designed so that the servers in the cluster work together to protect data, keep applications and

    services running after failure on one of the servers, and maintain consistency of the cluster configuration over time.

  • 7/28/2019 Clustering - GSL

    5/14

    Produced by: Kingsley Bell

    Cluster Overview

  • 7/28/2019 Clustering - GSL

    6/14

    Produced by: Kingsley Bell

    Cluster Groups

    Cluster Groups are used to group together all resources Required to run an application or instance.

    A cluster group can only run on one physical node at one time. No other node will be able to access the resources e.g.Disks

    Multiple cluster groups can be run simultaneously on the same node.

    When a cluster group is moved to an other node all resources in that group are taken offline and brought up on the othernode.

    An Active/Active cluster is when 2 cluster groups are running on 2 physical machines.

    In case of a node failure the cluster service will automatically start the whole cluster group on an other available node.

    The first cluster Group is used to operate the cluster, no other resources should be placed in this group.

    Cluster groups have some configuration options - Preferred Owners, Failover (Threshold-Period), failback options.

  • 7/28/2019 Clustering - GSL

    7/14

    Produced by: Kingsley Bell

    Resources Resources reside in cluster groups

    All resources required for a specific function should be grouped together

    When a resource is in a cluster, it should only be administered through the cluster Typically a each cluster group has a network name and associated IP the resources can be accessed through

    A large number of resource types can be created which can be used to provide a total clustered applicationecosystem

    IP Address

    Network Name

    File share

    Generic Service

    Physical Disk

    Some Resources have required dependencies e.g. the Network Name requires an IP address

    You can create your own dependencies, for example a service can not start until a file share is online

    Each resource has a number of configuration options

    Some applications create new cluster resources e.g. MS SQL server

    If the application is not cluster aware then the use of generic service/application can be used for a roll your ownsolution

    Resources are required to be available on each node that may own the resource

    Cluster aware applications will install required binaries on all nodes at install

    Generic applications will need to have required binaries installed onto each node manually

  • 7/28/2019 Clustering - GSL

    8/14

  • 7/28/2019 Clustering - GSL

    9/14

  • 7/28/2019 Clustering - GSL

    10/14

    Produced by: Kingsley Bell

    FailoverMSCS does not provide a seamless failover solution, resources are shutdown on one node and

    then brought up on an other node in case of failure

    Careful consideration should be made when configuring resource parameters e.g. affect groupCluster resources should not be overcommitted to allow space for node failure e.g. if one cluster

    group requires 80% of computing power to operate there should always be this amount of

    capacity in the cluster available in case of node failure e.g. 2 cluster groups both need 55% of

    compute power 3 nodes should be in the cluster

    Keep all nodes in a cluster with the same specification

    Individual resource failures can initiate a cluster failover

    Node failure will initiate a cluster failover

  • 7/28/2019 Clustering - GSL

    11/14

    Produced by: Kingsley Bell

    Disaster RecoveryDR nodes should installed with enough resources to just run the cluster e.g. 3Production Nodes

    requires 2 DR nodes

    2+1

    3+2 4+3

    DR nodes typically have the cluster service disabled or running just the default cluster group with

    all other cluster groups offline

    DR nodes will need to have the application installed

    Configuration changes need to be updated when the production configuring is changed

    Credit Suisse utilises the following 3rd party vendor technologies to aid DR failover

    EMC SRDF Symmetrix Remote Data Facility

    CISCO LAM Local Area Mobility

    When using SRDF the cluster disk resources will be unable to be brought online without the disks

    being in a split state

    The IP address and Network Name will be unable to be brought online if they are in use in the

    estate

    The DR nodes naming standard reflects the production nodes XNYC19P11013A -> XNYC19B11013A

    CNYC19P11013 -> CNYC19B11013

    With the use of LAM however, the Virtual server names will be able to be brought online in a DR

    scenario (CNYC19P11013A)

  • 7/28/2019 Clustering - GSL

    12/14

    Produced by: Kingsley Bell

    MSCS Clusters DRCollection of clustered Windows servers with shared disks, IP addresses, network names and SQL resources. 2+1 or 3+2.

    Shared Storage

    PROD

    Shared Storage

    DR

    LAM

    SRDF

    Heartbeatvlannon

    -routed

    Corp

    Corp

    Prod A

    Prod B

    DR

    Slough Global Switch

    Enable LAM in DR

    Stop production resources

    Split storage

    Import disk groups in DR

    Start Network and storage cluster resources in DR

    Start SQL resources in DR

  • 7/28/2019 Clustering - GSL

    13/14

    Produced by: Kingsley Bell

    Load BalancingService is provided by the NOC

    Cisco GSS Global Site Selector

    Round Robin or weighted balancing

    Session aware

    End point node checking (ping)

    Node port end point checking e.g. port 80,21,443 etc

    Can also query website connectivity, e.g. 404 Page not found errors

  • 7/28/2019 Clustering - GSL

    14/14

    Produced by: Kingsley Bell

    Questions & Answers

    Clustering solutions would be expected to support:

    Automatic failover of application processes/services in the event of node failure

    Automatic restart of application processes/services in the event of process failure

    Automatic load balancing of peer processes/services in a cluster

    Automatic reallocation of processes/services to ensure best utilisation of the cluster

    Management and software deployment and provisioning at the cluster level rather than individual node level

    Given these reasonable expectations of a clustering solution a number of facets of the GSL / SlatePlus clustering

    arrangements don't seem to quite match these expectations.

    Why is it necessary to install services explicitly to every machine rather than installing a service to a cluster andletting the clustering solution manage the deployment of services to the cluster nodes?

    What role does GSL play in providing the clustering solution rather than relying upon the MS product. Forinstance is GSL Gateway necessary or could it be replaced in whole or in part by MS components?

    What 'templates' exist for stateless services where we can run multiple instances of services concurrently in thecluster?

    What 'templates' exist for stateful services where we would want only one instance of particular service to runat one time, but where we do want the benefits of the cluster, i.e. automatic failover to another node and

    automatic restart?

    http://couchfiresports.com/wp-content/uploads/2010/09/blue-question-mark_crop_340x234.jpg