towards ns service - Louisiana Tech University

31
TechEd 2002 © 2002 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1 enit c CSC469/557: Winter 2006 CSC469/557: Winter 2006 High Availability and Performance High Availability and Performance Computing: Towards non Computing: Towards non- stop stop services in HPC/HEC/Enterprise services in HPC/HEC/Enterprise IT IT Environments Environments Chokchai ( Chokchai ( Box Box) Leangsuksun, ) Leangsuksun, Associate Professor, Computer Science Associate Professor, Computer Science Director, eXtreme Computing Research Group Director, eXtreme Computing Research Group Center for Entrepreneurship and Information Technology Center for Entrepreneurship and Information Technology Louisiana Tech University Louisiana Tech University enit c Box Box’ s 1 minute Bio s 1 minute Bio MS and PhD in CS (1995): MS and PhD in CS (1995): MS Thesis: Parallel C compiler MS Thesis: Parallel C compiler PhD Thesis: Resource management/allocation in PhD Thesis: Resource management/allocation in Heterogeneous Parallel Distributed Computing Heterogeneous Parallel Distributed Computing 7 years in Highly Reliable Software/system 7 years in Highly Reliable Software/system industry (Lucent) industry (Lucent) Architect, PM, Tech lead (15 Architect, PM, Tech lead (15- 30 team size) 30 team size) R&D R&D - > 4 major network management products > 4 major network management products Associate Professor in CS since 2002. Associate Professor in CS since 2002. 12 graduate students (4 PhD) 12 graduate students (4 PhD) Collaborations with national and industry labs Collaborations with national and industry labs (e.g.ORNL, Intel, Ericsson, Dell, CRAY etc) (e.g.ORNL, Intel, Ericsson, Dell, CRAY etc)

Transcript of towards ns service - Louisiana Tech University

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 1

enitc

CSC469/557: Winter 2006CSC469/557: Winter 2006High Availability and Performance High Availability and Performance

Computing: Towards nonComputing: Towards non--stop stop services in HPC/HEC/Enterprise services in HPC/HEC/Enterprise IT IT EnvironmentsEnvironments

Chokchai (Chokchai (BoxBox) Leangsuksun, ) Leangsuksun, Associate Professor, Computer ScienceAssociate Professor, Computer ScienceDirector, eXtreme Computing Research GroupDirector, eXtreme Computing Research GroupCenter for Entrepreneurship and Information TechnologyCenter for Entrepreneurship and Information TechnologyLouisiana Tech UniversityLouisiana Tech University

enitc

BoxBox’’s 1 minute Bios 1 minute BioMS and PhD in CS (1995): MS and PhD in CS (1995):

MS Thesis: Parallel C compilerMS Thesis: Parallel C compilerPhD Thesis: Resource management/allocation in PhD Thesis: Resource management/allocation in Heterogeneous Parallel Distributed ComputingHeterogeneous Parallel Distributed Computing

7 years in Highly Reliable Software/system 7 years in Highly Reliable Software/system industry (Lucent)industry (Lucent)

Architect, PM, Tech lead (15Architect, PM, Tech lead (15--30 team size)30 team size)R&D R&D --> 4 major network management products> 4 major network management products

Associate Professor in CS since 2002.Associate Professor in CS since 2002.12 graduate students (4 PhD)12 graduate students (4 PhD)Collaborations with national and industry labs Collaborations with national and industry labs (e.g.ORNL, Intel, Ericsson, Dell, CRAY etc)(e.g.ORNL, Intel, Ericsson, Dell, CRAY etc)

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 2

enitc

Tux idol contestantsTux idol contestants

tux02 tux03

enitc

OutlineOutlineBackground and MotivationBackground and MotivationCurrent R&D & Educational projectsCurrent R&D & Educational projects

HAHA--OSCAR Architecture, infrastructure OSCAR Architecture, infrastructure and System managementand System managementDesign & Dependability AnalysisDesign & Dependability AnalysisEducation in HAPCEducation in HAPC

ConclusionConclusion

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 3

enitc

High Performance ComputingHigh Performance Computing

HPCHPC-- ““Hardware and Software Hardware and Software techniques devised, for building techniques devised, for building computer systems to computer systems to quicklyquickly perform perform large amounts of computationlarge amounts of computation””HPC is not the same as high HPC is not the same as high throughputthroughput

enitc

Example of HPC appExample of HPC app

Page is excerpted from David Klepacki‘s presentation

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 4

enitc

Example of HPC appExample of HPC app

Page is excerpted from David Klepacki‘s presentation

enitc

HPC goes mainstreamHPC goes mainstreamDual core is available nowDual core is available nowMulti core is imminently availableMulti core is imminently availableAt SC|05, Bill Gates gave a keynote as At SC|05, Bill Gates gave a keynote as HPC goes mainstreamHPC goes mainstreamMS are in HPC cluster (windows)MS are in HPC cluster (windows)More critical applications requires HPC More critical applications requires HPC

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 5

enitc

How to achieve HPCHow to achieve HPCWork hard Work hard –– add more powerful add more powerful unit(sunit(s).).

Faster CPUFaster CPUMore CPU, parallel architectureMore CPU, parallel architectureFaster connectivity Faster connectivity

Work smart Work smart –– better algorithms to take better algorithms to take advantage of parallelism advantage of parallelism

MultipleMultiple--programming programming –– processing (Unix processing (Unix fork)fork)MultiMulti--threading threading Parallel programming (MPI, Parallel programming (MPI, openMPopenMP, PVM), PVM)

enitc

Parallel ArchitecturesParallel ArchitecturesHardware Architectures for HPC and their Hardware Architectures for HPC and their Parallel Programming ModelsParallel Programming Models●Distributed Memory Systems, MPP, Clusters -Message Passing

•Shared Memory Systems (SMP), Shared MemoryProgramming

• Specialized Architectures, Vector ProcessingData Parallel Programming (SIMD).

●The Grid, Grid Computing

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 6

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 7

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 8

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 9

enitc

TOP500.orgTOP500.org

The chart is from top500.orgThe chart is from top500.org

enitc

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 10

enitc

Stop Stop here: 11/30/2005here: 11/30/2005

enitc

Production HPC system in Production HPC system in the real world. the real world.

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 11

enitc

Availability density of each node Availability density of each node in the ASC White (was one of the top HPC)in the ASC White (was one of the top HPC)

Availability for White

0

0.2

0.4

0.6

0.8

1

1.2

000 200 400 600

Nodes index

A

AvailabilityAverage=0.9872STDEV=0.03292

The average is 0.98, with standard

deviation 0.033

The majority of the availability of the

each node is above 0.95 with a few of them below 0.8.

This indicates that, compared to others,

some nodes manifest outages

more.If the runtime systems are not aware of these nodes unreliability, it may result in low system total performance, extended application completion or failure.

enitc

Nodes MTTF density (in hours)Nodes MTTF density (in hours)

Nodes MTTF for White

0

1000

2000

3000

4000

5000

6000

1 51 101 151 201 251 301 351 401 451

Node index

MTT

F

Mean=3923 STDEV=1217

The average is 3923, with

standard deviation 1217

The maximum is 5592 hours.

The minimum is 230 hours.

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 12

enitc

Nodes Downtime (in hours)Nodes Downtime (in hours)Nodes TDT for White

0

2000

4000

6000

8000

10000

1 51 101 151 201 251 301 351 401 451

Node index

TDT Mean=355

STDEV=56

The average is 355 hours, with standard

deviation 56

Most of the total down time for each node is

around 100 hours.

Some failure events cost more time due to a prolonged repair process, and thus increase the total average TDT.

enitc

High Performance Buildout

HPC HPC BuildoutBuildout

High serviceability

High

Availability

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 13

enitc

AvailabilityAvailabilityA measurement represents a ratio of A measurement represents a ratio of uptime vs. total timesuptime vs. total timesHigh availability High availability -- ability of a system to ability of a system to perform its function continuously (without perform its function continuously (without interruption) for a significantly longer interruption) for a significantly longer period of time than the reliabilities of its period of time than the reliabilities of its individual components would suggest. individual components would suggest. High availability is most often achieved High availability is most often achieved through fault tolerance.through fault tolerance.

enitc

Degree of AvailabilityDegree of Availability

System Type Unavailability(minutes/year)

Availability(in percent)

Availability Class

Unmanaged 50,000 90 1

Managed 5,000 99 2

Well-managed 500 99.9 3

Fault-tolerant 50 99.99 4

High Availability 5 99.999 5

Very High Availability

0.599.9999 6

Ultra Availability 0.05 99.99999 7

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 14

enitc

Quiz. Find out for each 9Quiz. Find out for each 9’’ssSystem Type Unavailability

(minutes/year)Availability(in percent)

Availability Class

Unmanaged 50,000 90 1

Managed 5,000 99 2

Well-managed 500 99.9 3

Fault-tolerant 50 99.99 4

High Availability 5 99.999 5

Very High Availability

0.599.9999 6

Ultra Availability 0.05 99.99999 7

enitc

Quiz #2Quiz #2Say if the machine costs $20MSay if the machine costs $20MAvailability is 92.1%Availability is 92.1%What is the downtime and cost of What is the downtime and cost of downtime?downtime?

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 15

enitc

Unavailability = No Unavailability = No performance and functionalityperformance and functionality

Availability enables PerformanceAvailability enables Performance

Performance oriented Enterprise/Shared Major Performance oriented Enterprise/Shared Major computing resourcescomputing resources-- 7/24/3657/24/365Losses of $195K Losses of $195K -- $58M with 3.5 hrs (Meta $58M with 3.5 hrs (Meta Group report)Group report)Service provider Regulation/Mandate Service provider Regulation/Mandate

FCC mandate (Class 5 local switch)FCC mandate (Class 5 local switch)Losses time and opportunitiesLosses time and opportunitiesLifeLife--threatening threatening National Security (Home Land defense)National Security (Home Land defense)

enitc

What involves What involves (non(non--stop services)stop services)??

users

OperationConfig/Mgt

Tool

App/ServiceHW

Training

???

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 16

enitc

GoalsGoalsTowards NonTowards Non--stop services in HPC/HEC stop services in HPC/HEC environmentsenvironments

High Availability (Reliability)High Availability (Reliability)High Serviceability (planned downtime)High Serviceability (planned downtime)High Performance Computing (HPC)High Performance Computing (HPC)We want them allWe want them all

enitc

HAHA--OSCAROSCAR++::High Availability and High Availability and Performance Computing Performance Computing Architecture and Management Architecture and Management System System

++Open Source Cluster Application Resources(HAPC infrastructure and System management)HAPC infrastructure and System management)

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 17

enitc

Beowulf: Linux HPC ClusterBeowulf: Linux HPC Cluster

pbs, maui,SGE, nfs, etc

Unavailable = No performance and functionalityUnavailable = No performance and functionality

enitc

HAHA--OSCAR GoalsOSCAR Goals

High Availability for HPC cluster High Availability for HPC cluster Serviceability Serviceability -- SimplicitySimplicityTransparent Transparent -- Preserve existing Preserve existing investments, No change required, investments, No change required, retrofitableretrofitableProductionProduction--quality software releasequality software release

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 18

enitc

Related WorksRelated Works

Serviceability Serviceability ––easy Beowulf Builderseasy Beowulf BuildersOSCAR, ROCKS, Scyld OSCAR, ROCKS, Scyld

High AvailabilityHigh AvailabilityLinuxHA, Kimberlite, HP ServiceGuard, LinuxHA, Kimberlite, HP ServiceGuard, Redhat ASRedhat AS

MonitoringMonitoringMon, Mon, netSNMPnetSNMP., Ganglia, Clumon, dproc., Ganglia, Clumon, dproc

enitc

HAHA--OSCAR BeowulfOSCAR Beowulf

Optional Image servers

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 19

enitc

Key solutionsKey solutionsHigh AvailabilityHigh Availability

Redundancy (at the head/service node)Redundancy (at the head/service node)Monitoring & SelfMonitoring & Self--healing (auto detection healing (auto detection and recovery)and recovery)

Critical services (Critical services (pbspbs scheduler, scheduler, mauimaui, grid, , grid, SGE, network, authentication, member etc)SGE, network, authentication, member etc)Resources (Resources (filesystemfilesystem, cpu usage etc), cpu usage etc)Hardware and networkHardware and network

Higher ServiceabilityHigher ServiceabilitySelfSelf--build, selfbuild, self--configurationconfigurationHotHot--upgrade cluster OS or services.upgrade cluster OS or services.

enitc

SelfSelf--healing Schemeshealing Schemes

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 20

enitc

Monitoring and recoveryMonitoring and recoveryEnhancement based kernel.org MON , Enhancement based kernel.org MON , IPMI, and netIPMI, and net--SNMP frameworkSNMP frameworkRecovery Recovery

Associative ResponseAssociative ResponseLocal recovery, e.g. restart, checkpointLocal recovery, e.g. restart, checkpointFailover (Failover (simple simple or or impersonate/cloneimpersonate/clone))AdminAdmin--defined actionsdefined actions

Adaptive ResponseAdaptive ResponsePrevious state and number retryPrevious state and number retryAcceleration (TimeAcceleration (Time--series)series)E.g. maui dies, restart. After 3 times reties E.g. maui dies, restart. After 3 times reties within 3 mins, failoverwithin 3 mins, failover

enitc

Adaptive recovery state diagramAdaptive recovery state diagram

working Failover

failure

Alert.

Detect

previous state, # counter,recovery

switch over & take control at thestandby

threshold reached after # retry

previous state, # counter,recovery

After the primary node repair, thenoptional Fallback

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 21

enitc

HAHA--OSCAR ClusterOSCAR Cluster

enitc

SelfSelf--build & Selfbuild & Self--configurationconfiguration

How to build HAHow to build HA--OSCAR OSCAR Can retrofit an existing Linux BeowulfCan retrofit an existing Linux BeowulfOr start with OSCAR/ROCKS installation Or start with OSCAR/ROCKS installation tool tool

Wizard GUI based installationWizard GUI based installation

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 22

enitc

HAHA--OSCAR installationOSCAR installation

Adopt ease of build Adopt ease of build (self(self--build, config build, config w/o OS prew/o OS pre--loaded)loaded)30 min 30 min –– 1.5 hrs 1.5 hrs installation (retrofit)installation (retrofit)Take almost the Take almost the same time for same time for disaster recoverydisaster recovery

step1

Step2 create head imageStep3 Input and

clone image

Step4 config & build

Standby Step5 (optional) web admin to

add/config more services

enitc

Hardware Management Hardware Management abstractionabstractionIPMI (IPMI (Intelligent Platform Management InterfaceIntelligent Platform Management Interface))

A widely accepted specification for server management.A widely accepted specification for server management.

Defines an abstract interface for intelligent hardware Defines an abstract interface for intelligent hardware that monitor the health of serverthat monitor the health of server’’s hardware inventory.s hardware inventory.

Alerting, power control, asset tracking, health Alerting, power control, asset tracking, health monitoring, event logging are monitoring, event logging are IPMIIPMI’’ss management management strategies.strategies.

open IPMI and OpenHPI (SA forum)open IPMI and OpenHPI (SA forum)

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 23

enitc

Our early observationsOur early observations

01/25/2004 | 00:31:19 | Sys Fan 1 | critical01/25/2004 | 00:31:19 | Sys Fan 3 | critical01/25/2004 | 00:31:19 | Sys Fan 4 | critical01/25/2004 | 00:31:19 | Processor 1 Fan | ok01/25/2004 | 00:31:20 | Processor 2 Fan | ok

Can set thresholds in managed Can set thresholds in managed elements to trigger events with elements to trigger events with severity levelsseverity levelsAutomatic failure trend analysis Automatic failure trend analysis --> > predictionprediction

enitc

A failure prediction & A failure prediction & policypolicy--based based recovery managementrecovery management

Detections Detections -- the damage done!the damage done!Predictions Predictions

trend analysistrend analysisAnticipate imminent failuresAnticipate imminent failuresBetter handlingBetter handlingMore difficult for multiple events More difficult for multiple events

Example of IPMI events and trend analysis Example of IPMI events and trend analysis E.g. CPU temp raising too fast with 5 min E.g. CPU temp raising too fast with 5 min --> > prepare to checkpoint, failover and restartprepare to checkpoint, failover and restartMemory bit error detected Memory bit error detected --> take a node out> take a node out

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 24

enitc

HAHA--OSCAR monitoring and OSCAR monitoring and recovery Restructurerecovery Restructure

01/25/2004 | 00:31:19 | Sys Fan 1 | critical01/25/2004 | 00:31:19 | Sys Fan 3 | critical01/25/2004 | 00:31:19 | Sys Fan 4 | critical01/25/2004 | 00:31:19 | Processor 1 Fan | ok01/25/2004 | 00:31:20 | Processor 2 Fan | ok

enitc

Retrieving Retrieving os/swos/sw states from states from (dead?) systems(dead?) systems

Storing/retrieving Storing/retrieving os/swos/sw states states on the local on the local storagestorage in stealth mode, in stealth mode, e.ge.g, ,

message log (process state saving) message log (process state saving) Ability to resume application execution from Ability to resume application execution from dead nodesdead nodes

Cluster Cluster ““Passive monitoringPassive monitoring””"A Report of Invention has been filed and "A Report of Invention has been filed and Tech is preparing a patent application." Tech is preparing a patent application."

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 25

enitc

HAHA--OSCAR StatusOSCAR Status

HAHA--OSCAR beta was released in March 2004OSCAR beta was released in March 2004Failure Prediction and policy based recovery Failure Prediction and policy based recovery proof of conceptproof of conceptPrediction for known problems or eventsPrediction for known problems or events

Single event type can be anticipated Single event type can be anticipated Active monitoringActive monitoringHead node is achievable Head node is achievable Challenging for very large scale compute nodes Challenging for very large scale compute nodes and multiand multi--eventsevents

Automatic/Transparent Checkpoint/restart Automatic/Transparent Checkpoint/restart recovery for MPI app (HArecovery for MPI app (HA--OSCAR + FT OSCAR + FT LAM/MPI)LAM/MPI)

enitc

RoadmapRoadmapPort to other platforms (e.g. ?? SGI, Port to other platforms (e.g. ?? SGI, craycray))Federated HEC systems (see HEC future Federated HEC systems (see HEC future work)work)More sophisticate trend analysis and More sophisticate trend analysis and predictions (multi event correlations)predictions (multi event correlations)GridGrid--aware HAaware HA--OSCAROSCARSupport other cluster distributions (ROCKS, Support other cluster distributions (ROCKS, SCYLD)SCYLD)MultiMulti--head n+1 activehead n+1 active--activeactiveHotHot--upgrade cluster (OS/CMS)upgrade cluster (OS/CMS)FaultFault--tolerant applications/services tolerant applications/services frameworkframeworkFCAPS Management & carrier gradeFCAPS Management & carrier grade

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 26

enitc

Three HAThree HA--OSCAR flavorsOSCAR flavors

IPMI optionIPMI optionIPMI optionIPMI optionIPMI optionIPMI optionCPU TemperatureCPU Temperature

IPMI optionIPMI optionIPMI optionIPMI optionIPMI optionIPMI optionCPU statusCPU status

IPMI optionIPMI optionIPMI optionIPMI optionIPMI optionIPMI optionMemory bit errorMemory bit error

IPMI optionIPMI optionIPMI optionIPMI optionIPMI optionIPMI optionCPU Fan SpeedCPU Fan Speed

Heartbeat (3 sec) Heartbeat (3 sec)

gmond ,gmetadgmond ,gmetad

httpdhttpd

nisnis

NSFNSF

SGESGE

Pbs maui nfs Pbs maui nfs httpdhttpd

HAHA--OSCAR OSCAR 2+1 Active2+1 Active--ActiveActive(lab grade)(lab grade)

HAHA--ROCKSROCKSActiveActive--Hot Hot StandbyStandby(lab grade)(lab grade)

HAHA--OSCAR OSCAR ActiveActive--Hot Hot StandbyStandby

MonitoringMonitoring

enitc

Appeared in a front cover in two major Linux Appeared in a front cover in two major Linux magazines, various technical papers, research magazines, various technical papers, research exhibitions.exhibitions.web site: web site: http://xcr.cenit.latech.edu/hahttp://xcr.cenit.latech.edu/ha--oscaroscar

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 27

enitc

HAHA--OSCAR DEMO OSCAR DEMO at the endat the end

enitc

Design and Dependability Design and Dependability Analysis ResearchAnalysis Research

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 28

enitc

Reality ChecksReality ChecksGreat! We got HA Beowulf!Great! We got HA Beowulf!But How much improvement?But How much improvement?

The total uptime?The total uptime?Aggregate Performance?Aggregate Performance?

Analytical model and predictionAnalytical model and predictionStatistical technique to compare uptimeStatistical technique to compare uptimeHow many 9How many 9’’s? (downtime per/year)s? (downtime per/year)Stochastic Reward Net with SPNP Stochastic Reward Net with SPNP packagepackageIdentical hardware parameters between Identical hardware parameters between Beowulf and HABeowulf and HA--OSCAR multiOSCAR multi--headsheads

go to mgo to modelingodeling

enitc

Availability ModelAvailability ModelServer up Server down & repair

S1

S1

S2

time

Availability model

HA-OSCAR dual head model

S1&S2

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 29

enitc

HAHA--OSCAR SRN model OSCAR SRN model

•Server sub-model

•Switches

•Compute nodes

enitc

Server Sub Model Server Sub Model •P Server up•P Server down•Failover•P server repair•Failback

•S is up and ready•S takes control•S Server down•S repair

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 30

enitc

Instantaneous AvailabilityInstantaneous Availability

Steady (A) = 99.993 (36 min) vs.

Beowulf (A) = 99.65 (30 hr)

enitc

More analysis & comparisonsMore analysis & comparisons

HA-OSCAR solution vs tradit ional BeowulfUnavailable performance (based on 1 T flop machine in a year)

1

100

10000

1000000

100000000

10000000000

1000 2000 4000 6000 8000 10000

MTTF(hours)

Gflo

p Beowulf

HA-OSCAR

HA-OSCAR solution vs tradit ional Beowulf

0.90000000000000.91000000000000.92000000000000.93000000000000.94000000000000.95000000000000.96000000000000.97000000000000.98000000000000.99000000000001.0000000000000

1000 2000 4000 6000 8000 10000

MTTF(hours)

Avai

labi

lity

Beow ulfHA-oscar

Lost investment due to unavailability (based on $20M)

0

0.5

1

1.5

2

1000 2000 4000 6000 8000 10000

MTTF(hours)

mill

ion

$

Beowulf

HA-oscar

Ours .9999 vs ~ 0.911K vs. 10M TFLOB (1T system)$ 70K vs $2 M ($20M system)

Schedule downtime=200 hrs Repair time = 24 hrsMonitoring intervals=10s

Both unplanned & planned downtime

TechEd 2002© 2002 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. 31

enitc

Some HASome HA--OSCAR measurementsOSCAR measurements

33--5 sec Manual failover time5 sec Manual failover time0.9% CPU usage at each monitoring interval0.9% CPU usage at each monitoring interval

0

50

100

150

200

250

300

1 2 5 10 15 20 30 60

HA-OSCAR Mon polling interval (s)

HA-O

SC

AR N

etw

ork

load

in

Pack

ets/

Min

m

easu

red

by

TCPt

race

Comparison of network usages for HA-OSCAR different polling sizes