Compute Cluster
-
Upload
daniel-chan -
Category
Technology
-
view
199 -
download
0
Transcript of Compute Cluster
e
Building the Best Computer Clusters for Mechanical Design Environments
Daniel Chan and Ed Huott
GE Corporate R&D Center e
e
Key Challenges
• A Highly Diverse Environment – Many Different Businesses – Engineering, Chemistry, Financial, Services
• Short-Term Technical Support – Quick Turnaround
• Costs – Downtime, System Admin., Software
Licenses, Productivity
e
Design Requirements
• Web-Centric – Simplify Job Submission in a Heterogeneous
Environment – Access from Anywhere and Anytime
• Minimize Complexity and Cost • Stable and Highly-Available
– Isolate Network, NFS and Insufficient Disk Space Problems, Software License Management
e
Reasons for Having a Heterogeneous Environment
• Performance
• Costs
• Legacy Applications
e
Let’s Do An Experiment
GE Corporate Standard PIII 450 MHz 256 MB RAM
$1400
Sun Ultra 10 UltraSPARC 440 MHz
256 MB RAM $5000
NT 4.0 Unix/Solaris 2.6
CFX 5 23,862 Nodes
102 MB Required
Use Perl script to launch 1, 2 and 3 concurrent jobs, respectively, for 30 times
Extract wall clock times from output files then use Minitab to analyze data
Test stability, multitasking and paging capabilities
e
Results For Sun Ultra 10
4000 5000 6000
0
10
20
30
40
50
C4
Freq
uenc
y
Histogram of C4, with Normal Curve
1700 1710 1720 1730 1740 1750 1760
0
1
2
3
4
5
6
7
8
C4
Freq
uenc
y
Histogram of C4, with Normal Curve
3320 3340 3360 3380 3400 3420 3440 3460 3480 3500
0
10
20
30
C4
Freq
uenc
y
Histogram of C4, with Normal Curve1 job 2 jobs
3 jobs
e
Results for Dell OptiPlex G1x
1924 1925 1926 1927 1928
0
10
20
TOTAL_CLOCK_TIME
Freq
uenc
y
Histogram of TOTAL_CLOCK_TIME, with Normal Curve
3900 3950 4000
0
10
20
30
40
C4Fr
eque
ncy
Histogram of C4, with Normal Curve
5800 5850 5900
0
10
20
30
40
50
TOTAL_CLOCK_TIME
Freq
uenc
y
Histogram of TOTAL_CLOCK_TIME, with Normal Curve
1 job 2 jobs
3 jobs
e
Scorecard for Sun Ultra 10
Mean Standard Deviation
Z
1 job 1724 15.9 10.8 2 jobs 3442
(2X) 26.1 13.2
3 jobs 5144 (3X)
350.2 1.5
ZUSL
=−
=µ
σµσ01.
Paging
e
Scorecard For Dell Optiplex GX1
Mean Standard Deviation
Z
1 job 1926 1.2 161 2 jobs 3909
(2.1X) 15.6 25
3 jobs 5828 (3X)
26.4 22
ZUSL
=−
=µ
σµσ01.
e
Summary of Results
• The SUN workstation is about 10% faster, but nearly 5 times as expensive
• Both systems are just as stable for a period of one week
• NT can multitask
• NT can page and appears to have “better” memory management scheme
e
APNASA Parallel Performance
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11
200
400
600
800
1000
1200
1400
PIII 550 MHz Xeon2 MB Cache, 8-Way SMP
SGI Origin 2000R10K 195 MHz
PII 333 MHz Cluster
TIM
E (S
EC)
No. of Processors
308,115 Grid Points
e
APNASA Parallel Performance
0 1 2 3 4 5 6 7 8 9 10 110
1
2
3
4
5
6
7
8
9IDEAL
PII 333 MHz Cluster
PIII 550 MHz Xeon2 MB Cache, 8-Way SMP
SGI Origin 2000R10K 195 MHz
Spee
d U
p
No. of Processors
308,115 Grid Points
e 0.80.911 2 3 4 5 6 7 8 91010 20
103
104
81118 VERTICESPII 333 MHz NT
81118 VERTICESWORKSTATIONS
23862 VERTICESWORKSTATIONS
23862 VERTICESSGI SMP
81118 VERTICESSGI SMP
556323 VERTICESSGI SMP
CP
U T
IME
(SE
CS
)
NUMBER OF PROCESSORS
CFX PERFORMANCE ON A VARIETY OF COMPUTER
PLATFORMS
e
ANSYS Performance on a Variety of Computers
0 1 2 3 4 5 6 7 8 9 10 11 12
Sun Enterprise 450Ultra II 400 MHz
HP J5000, PA8500 400 MHz
Compaq SP700, PIII 500 MHz
SGI 320, PIII 500 MHz
Dell Optiplex GX1, PII 450 MHz
Dell 6300, PIII 500 MHz Xeon
Compaq DS20, 500 MHz EV6
SGI O2K R10K 250 MHz
CPU TIME (HRS)
ELAPSED TIME CPU TIME
160,000 ELEMENTS WITH 2D CONTACT 2.1 GB OF OUTPUT AND 615 MB PAGE FILE
e
eComputing Architecture
Quality Tools
Remote Access from Anywhere Anytime
(Globalization)
Field and Customer Data (Services and eCommerce) W
eb-E
nabl
ed E
nviro
nmen
t
SGI SC
HP Cluster
NT Cluster
DM Cluster Load
Sha
ring
Faci
lity user-transparent work
load management
e
Key LSF Challenges
• Implicit Assumptions of A Homogeneous Environment – Uniform View of NFS – Uniform Mounting of User Home Directory – Unix Bias
• Wholesale Migration of User Environment Variables from Submit to Execution Hosts
e
A Web-Centric Job Management System
Web Clients Web Server
https NFS Job Spec. Files
Execution Hosts
Job Directories LSF Cluster
ftp NFS or SMB
LSF Proxy
e
Screen Shot of Web Interface
Job Source Files
Job Output Destination
Command to Run on Execution Host
Files Sent to Output
Destination
e
Process, Data Flow and Security for Web-Based Jobs
Web Client Apache (invoke_bsub.pl)
https Job Spec
Exec Host (lsf_exec.pl) Job Dir
LSF Cluster
ftp bsub
Authenticated access to setuid script File system
permissions
e
Normalized Job Execution Wrapper/Environment
• Standard job execution wrapper that runs on all execution hosts. • Key "ingredients": Perl (with Net::FTP package), wget, subset of
"standard" Unix utilities (e.g. /bin/sh, cp, mv, etc.) – (Note: Makes use of Cygwin tools on Windows NT. See http://
sources.redhat.com/cygwin/)
• Reads Job Specification File to determine: – Source of remote job directory – Command to run o Destination for job output file(s)
• Pre-determined top level (local) directory for jobs on each execution host.
• Remote jobs are migrated by FTP (wget) to unique sub-directories under top level.
• Specified command is run in migrated job directory. • Output files are transferred by FTP to specified destination.
e
Summary
• Leveraging Low-Cost NT Solution • Shield Users from Complexity
– productivity gain – better centralized resource management
• Enterprise Resource Planning – compute cycles – software licenses