Clustered Systems for Massive Parallelism
-
Upload
candace-cantrell -
Category
Documents
-
view
49 -
download
1
description
Transcript of Clustered Systems for Massive Parallelism
N. Xiong@ GSU Slide 1
Chapter 05
Clustered Systems for
Massive Parallelism
N. Xiong
Georgia State University
N. Xiong@ GSU Slide 2
Chapter 05
Review and Introduction
N. Xiong@ GSU Slide 3
Chapter 05
Design Objectives of Clusters and MPPs Cluster and MPP System Architectures Design Principles of Clustered Systems Multiple Job Scheduling and
Management Virtual Clustering and Resource
Provisioning Homework Problems
Chapter 04 Main Contents
N. Xiong@ GSU Slide 4
Chapter 05
Scalability Packaging Control Homogeneity Security
Design Objectives of Clustered Systems
N. Xiong@ GSU Slide 5
Chapter 05
Design Objectives of Clustered Systems
N. Xiong@ GSU Slide 6
Chapter 05
Fundamental Cluster Design Issues
Scalable Performance Single System Image Availability Support Cluster Job Management Internode Communication Fault Tolerance and Recovery Growth of Servers in HPC and
HTC Systems
N. Xiong@ GSU Slide 7
Chapter 05
Resource-Sharing in Cluster Systems
N. Xiong@ GSU Slide 8
Chapter 05
An Idealized Cluster Architecture
Conventional databases and OLTP monitors offer users a desktop environment
Supports parallel programming based on standard languages and communication libraries
A user-interface subsystem combines the advantages of the Web interface and the windows GUI
N. Xiong@ GSU Slide 9
Chapter 05
Node Architectures and System Packaging
Two types of cluster nodes compute nodes service nodes
N. Xiong@ GSU Slide 10
Chapter 05
Compute Node Examples
N. Xiong@ GSU Slide 11
Chapter 05
Modular Packaging of IBM BlueGene/L System
N. Xiong@ GSU Slide 12
Chapter 05
Cluster System Interconnects
N. Xiong@ GSU Slide 13
Chapter 05
High-Bandwidth Interconnects
N. Xiong@ GSU Slide 14
Chapter 05
An InfiniBand Cluster Interconnection Network
N. Xiong@ GSU Slide 15
Chapter 05
High-bandwidth Interconnects in Top-500 Systems
N. Xiong@ GSU Slide 16
Chapter 05
Hardware, Software, and Middleware Support
N. Xiong@ GSU Slide 17
Chapter 05
Design Principles of Clusters
Single-System-Image (SSI ) Features Single System Single Control Symmetry Location Transparent
N. Xiong@ GSU Slide 18
Chapter 05
Design Principles of Clusters
Single-System-Image Layers Application Software Layer Hardware or Kernel Layer Middleware Layer
N. Xiong@ GSU Slide 19
Chapter 05
Design Principles of Clusters
Single-System-Image Composition Single Entry Point Single File Hierarchy Single I/O, Networking, and Memory
Space Other Desired SSI Features
N. Xiong@ GSU Slide 20
Chapter 05
Single Entry Point
N. Xiong@ GSU Slide 21
Chapter 05
Single File Hierarchy
It is persistent. It is fault tolerant to some
degree. Network File System (NFS)
and Andrew File System (AFS).
N. Xiong@ GSU Slide 22
Chapter 05
Single File Hierarchy
N. Xiong@ GSU Slide 23
Chapter 05
Single I/O, Networking, and Memory Space
Single Input/Output Single Networking Single Point of Control Single Memory Space
N. Xiong@ GSU Slide 24
Chapter 05
Single I/O, Networking, and Memory Space
N. Xiong@ GSU Slide 25
Chapter 05
An Example
N. Xiong@ GSU Slide 26
Chapter 05
Other Desired SSI Features
Single Job Management System
Single User Interface Single Process Space
N. Xiong@ GSU Slide 27
Chapter 05
Middleware Support for SSI Clustering
N. Xiong@ GSU Slide 28
Chapter 05
High Availability Through Redundancy
Reliability Availability Serviceability
N. Xiong@ GSU Slide 29
Chapter 05
Availability and Failure Rate
N. Xiong@ GSU Slide 30
Chapter 05
Availability Values of Several Representative Systems
N. Xiong@ GSU Slide 31
Chapter 05
Redundancy Techniques
N. Xiong@ GSU Slide 32
Chapter 05
Fault-Tolerant Cluster Configurations
Hot Standby Mutual Takeover Fault-Tolerance
N. Xiong@ GSU Slide 33
Chapter 05
Recovery Schemes
Backward recovery Forward recovery: in real-
time systems
N. Xiong@ GSU Slide 34
Chapter 05
Checkpointing and Recovery Techniques
Kernel, Library, and Application Levels Checkpoint Overheads Choosing an Optimal Checkpoint Interval
N. Xiong@ GSU Slide 35
Chapter 05
Checkpointing Parallel Programs
N. Xiong@ GSU Slide 36
Chapter 05
Cluster Job Scheduling and Management
Cluster Job Management Issues A user server A job scheduler A resource manager
N. Xiong@ GSU Slide 37
Chapter 05
Cluster Job Types
Serial jobs Parallel jobs Interactive jobs Batch jobs Foreign jobs
N. Xiong@ GSU Slide 38
Chapter 05
Multi-Job Scheduling Schemes
N. Xiong@ GSU Slide 39
Chapter 05
Share Cluster Nodes
Dedicated Mode Space Sharing
Time Sharing
N. Xiong@ GSU Slide 40
Chapter 05
Migration Schemes Issues
Node Availability Migration Overhead Recruitment Threshold:
the amount of time a workstation stays unused before the cluster considers it an idle node
N. Xiong@ GSU Slide 41
Chapter 05
Virtual Clustering and Resource Provisioning
N. Xiong@ GSU Slide 42
Chapter 05
Five Virtual Cluster Research Projects
N. Xiong@ GSU Slide 43
Chapter 05
Live VM Migration and Cluster Management
N. Xiong@ GSU Slide 44
Chapter 05
Effect by Live Migration
N. Xiong@ GSU Slide 45
Chapter 05
Dynamic Virtual Resource Provisioning
N. Xiong@ GSU Slide 46
Chapter 05
Autonomic Adaptation of Virtual Environments
N. Xiong@ GSU Slide 47
Chapter 05
Some References and Further Reading
N. Xiong@ GSU Slide 48
Chapter 05
Homework Problems
N. Xiong@ GSU Slide 49
Chapter 05
Homework Problems