Agenda
-
Upload
tryphena-malinda -
Category
Documents
-
view
43 -
download
2
description
Transcript of Agenda
© 2014 IBM Corporation
Platform Computing
1
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
February 25, 2014
© 2014 IBM Corporation
Platform Computing
2
Agenda
Mapping clients needs to cloud technologies
Addressing your pain points
Introducing IBM Platform Computing Cloud Service
Product features and benefits
Use cases
Performance benchmarks
© 2014 IBM Corporation
Platform Computing
3
HPC cloud characteristics and economics are different than general-purpose computing
• High-end hardware and special purpose devices (e.g. GPUs) are typically used to supply the needed processing, memory, network, and storage capabilities
• The performance requirements of technical computing and service-oriented workloads means that performance may be impacted in a virtualized cloud environment, especially when latency or I/O is a constraint
• HPC cluster/grid utilization is usually in the 70-90% range, removing a major potential advantage of a public cloud service provider for stable workload volumes
HPC Workloads Recommended for Private Cloud
HPC Workloads with Best Potential for Virtualized Public & Hybrid Cloud
Primary HPC Workloads
© 2014 IBM Corporation
Platform Computing
4
IBM’s HPC cloud strategy provides a flexible approach to address a variety of client needs
Evolve existing infrastructure to
HPC Cloud to enhance responsiveness,
flexibility, andcost effectiveness.
Enable integrated approach to improve
HPC cost and capability 60%
Access additional HPC capacity with
variable cost model
Private
Clouds
HybridClouds
PublicClouds
Based on HPC Cloud’s potential impact, organizations are evolving their infrastructures to enable private cloud deployments, exploring hybrid clouds, and considering public clouds.
© 2014 IBM Corporation
Platform Computing
5
Are you experiencing any of these pain points?
• Unable to meet business objectives (delay to market, etc.)• Existing resources insufficient to meet peek compute demand
– Long run times on existing cluster or grid– No access to local technical computing resources (workstation users)
• Technical resources expensive and time consuming to acquire • The skills/staff to architect and manage a technical computing infrastructure can
be difficult to acquire
1 3 5 7 9 11 13 15 17 19 21 23 -
10,000
20,000
30,000
40,000
50,000
Planned Daily Cycle (24 x 365)
Financial Services
April
April
April
April
April
May
May
May
May
June
June
June
June
0200400600800
1000120014001600
Planned Project
Life Sciences
© 2014 IBM Corporation
Platform Computing
6
IBM Platform Computing Cloud ServiceMaking the cloud work for you
Build• Complete, ready
to run clusters in the cloud
• Add additional capacity in hours instead of months
Manage• Seamless
workload management, on-premise and in the cloud
• Transparent user experience
Support• 24X7 cloud
operation support• Access to
technical computing expertise when you need it
Protect• Data encryption,
dedicated physical machines and network
• Security through physical isolation
Complete, end to end dynamic cloud solution
© 2014 IBM Corporation
Platform Computing
7
Ready to use Platform LSF & Platform Symphony clusters in the cloud
IBM Platform Computing Cloud Service (SaaS)
IBM Platform LSFIBM Platform Symphony
SoftLayer, an IBM CompanyInfrastructure
24X7 CloudOps Support
Client and ISV Applications
© 2014 IBM Corporation
Platform Computing
8
Dedicated physical and virtual machine infrastructure as a service
• 13+ data centers• 17 network PoPs• Global private network• Bare metal and virtual
machines
190,000+SERVERS
21,000+CUSTOMERS
22,000,000+DOMAIN
S
© 2014 IBM Corporation
Platform Computing
9
Workload I/O intensity
• SoftLayer’s architecture outperforms by >50% equivalent AWS instances for high I/O workloads
Control (APIs, hardware / network configurability)
• SoftLayer offers hundreds of hardware configurations vs. 14 for AWS
• ~2,000 APIs for SoftLayer vs. ~60 for AWS and none for RAX
Integrated platform of multiple architectures
• Unified integration & control panel for multiple cloud architectures
• RAX requires paid bridge, different control interfaces
Ready to use Platform LSF & Platform Symphony clusters in the cloud
Low intensity
workloads
Low degree of control and
customization
AWS IBM
High intensity
workloads
High degree of control and
customization
Single platform
Seamless integration
DIFFERENTIATOR RATING IBM ADVANTAGES
RAX
© 2014 IBM Corporation
Platform Computing
10
Non-shared physical machines for added security and performance
• Dedicated and isolated compute environment
• All machine instances are dedicated to the client
• Each cluster is isolated on a VLAN
• Only the VPN gateway has an addressable interface
• All customer data at rest is encrypted on shared file systems
• When machines instances are decommissioned the disks are scrubbed using DoD approved methods
© 2014 IBM Corporation
Platform Computing
11
Optimal performance for technical computing apps
Industrial Manufacturing Benchmark – Structural Mechanics
EDA Benchmark (IBM-MESA)
Note: Benchmark results were obtained by IBM and have not yet been externally audited or validated.
© 2014 IBM Corporation
Platform Computing
12
Run and supported by dedicated, 24X7 HPC Cloud Operations Team
CloudOps functions• Pre-provisioning: Provide guidance to client on how to enable VPN, multi-cluster settings &
security settings on the client on-premise environment• One time setup testing: Extensive testing of the cluster prior to release to the client• Extensive testing of the cluster on every event of flex-up prior to release to the client• Email alerts prior to flex-down & cluster shutdown operations• Email alerts in case of any overage (compute hours, download bandwidth)• Provide billing details of monthly usage including overage details• Provide support under IBM SLA by experts highly experienced in Platform Computing
products
Value: quality, peace of mind & minimum disruption to business• Extensive quality checks ensures minimum loss of usage hours & disruptions• Proactive alerts ensures that in-progress critical jobs are not killed in case of Flex-down &
Cluster Shutdowns and Overages• Highly trained & experienced Support ensures smooth on-boarding and minimize
disruptions
© 2014 IBM Corporation
Platform Computing
13
Industry-leading workload management
• 20 years managing distributed scale-out systems with 2000+ customers in many industries
• High performance workload management combined with intelligent resource scheduling engine
• Unmatched scalability (small clusters to global grids) and production-proven reliability
• Heterogeneous – manages System x and Power plus 3rd party systems, virtual and bare metal, accelerators / GPU, cloud, etc.
• Shared services for both compute and data intensive workloads
• Integrated solutions with vertical reference architectures
23 of 30 largest
commercial enterprises
Over 5M CPUs under management
60% of top financial services
companies
© 2014 IBM Corporation
Platform Computing
14
IBM Platform LSFOverview
Powerful workload management for demanding, distributed and mission-critical high performance computing environments.
Key Capabilities• Powerful
- Policy and resource-aware scheduling- Resource consolidation for optimal performance- Advanced self-management
• Flexible- Heterogeneous platform support- Policy-driven automation- CLI, web services, APIs
• Scalable- Thousands of concurrent users and jobs- Virtualized pool of shared resources- Flexible control, multiple policies
Client Benefits• Optimal utilization: reduced infrastructure cost• Robust capabilities: improved productivity• High throughput: faster time to results
14
© 2014 IBM Corporation
Platform Computing
15
IBM Platform Symphony
Overview
Low-latency grid management platform for distributed computing and analytics with sophisticated resource sharing
Key Capabilities• Accelerates service-oriented applications• Extreme app scalability and throughput with very low
latency• Compute and data-intensive applications on a single
platform• Sophisticated, hierarchical resource sharing• Open and flexible: choice of OS, frameworks and
languages
Client Benefits• Increase performance and analytic result quality• Reduces IT costs - increase utilization, simplify
application onboarding, reduce administration costs
Low Latency / High throughputSub-millisecond, 17,000 tasks per second
Large Scale10k cores per application, 40k cores per grid
Efficient shared services
Heterogeneous & OpenLinux, Windows, AIX, C/C++, C#, Java, Excel, Python, R
15
© 2014 IBM Corporation
Platform Computing
16
Use case 1 – hybrid cluster
The problem • Existing resources cannot meet peak demand• Resources are expensive and time consuming to acquire • Skills to architect and manage clusters are difficult to find• Fixed or reduced budgets• On-premise constraints in space, cooling and power
The solution • Fully functioning IBM Platform LSF or Symphony clusters
are provisioned on the SoftLayer cloud and connected to the on-premise cluster, expanding capacity as needed
• Leverage MultiCluster capability for managed forwarding of jobs from on premise cluster to off premise cluster
The Value• Access to additional compute capacity on a temporary basis as needed
• Near-zero wait times
• Reduce costs by paying for only what is used
• Pay for additional capacity as an operating expense
• Fully supported, end-to-end solution, from the on-premise to the on-cloud clusters
• Expected and reliable performance from running technical computing workloads on physical machines
• Transparent access to cloud resources, the end user experience does not change
© 2014 IBM Corporation
Platform Computing
17
Use case 2 – stand-alone cluster in the cloud
The problem • New and emerging need for technical computing• Skills to architect and manage clusters are difficult to find• Resources are expensive and time consuming to acquire • Inconsistent demand does not justify the investment
The solution • Fully functioning Platform LSF and Symphony clusters are
provisioned on the SoftLayer cloud providing resources as needed
The valueMarket-leading Platform LSF and Platform Symphony software
Access to technical computing resources on a temporary basis without the need to acquire, install and configure the infrastructure and cluster software
Keep costs low by paying for only what is used
Pay for capacity as an operating expense
Fully supported solution
Expected and reliable performance from running workloads on physical machines
© 2014 IBM Corporation
Platform Computing
18
Is IBM Platform Computing Cloud Service a good fit for you?
Business pain points• And you experiencing lost profit due to missed deadlines?• Do you experience pressure to convert your compute environment capital expense to
operational expense?• Have you ever missed a deadline or delayed a project because technical computing
resource procurement took too long ?
Technology pain points• Do your users ever scale back their analyses to lower fidelity or less accuracy in order to fit
them into the local compute environment or to a time window?• Do you regularly, occasionally, or permanently have fewer resources (CPUs, disk, memory,
etc) than you would like to have to service the user’s compute demand?• Do you experience a large variance in compute resource utilization?• Have you reached, or will you reach the capacity of your datacenter(s), and do you need a
plan to grow beyond that capacity ?• Are your customers asking you for cloud licenses for Platform LSF or Platform Symphony?
© 2014 IBM Corporation
Platform Computing
19
IBM Platform Computing Cloud ServiceMaking the Cloud Work for You
Unmatched ExpertiseAnalytics, Technical Computing,
Software, Services and ISV Partnerships
IBM Hybrid Cloud
ConsolidationSupporting heterogeneous IBM and non-IBM infrastructure
Cloud LeadershipExpertise from
Client Engagements
powered by
OnSmartCloud
Unmatched CapabilitiesPolicy-driven Workload
Management
OnPremise
Software & Systems
© 2014 IBM Corporation
Platform Computing
20
Thank You
© 2014 IBM Corporation
Platform Computing
21
SoftLayer and Amazon EC2 Products tested
NAMEIaaS
ProviderCPU Cores Memory
(GB)Disk Space
(GB)Physical /
VirtualHourly
Rate (USD)
SL PMSoftLayer 16 64 1000[1] Physical $1.85[2]
SL VMSoftLayer 8 8 500[3] Virtual $0.88
SL PM (ded)SoftLayer 16 64 1000[1] Physical $3.83[5]
EC2 CC2Amazon EC2 (CC2)
32 60.5 3360 Virtual $2.40[4]
EC2 2XL
Amazon EC2 (c1.xlarge)
8 7 840 Virtual $0.58
SL Physical Machine Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHzSL Physical Machine (dedicated) Intel® Xeon® CPU E5-2690 0 @ 2.90GHzSL Virtual Machine Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHzAmazon CCI2 Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHzAmazon 2XL Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
© 2014 IBM Corporation
Platform Computing
22
Memory Bandwidth
SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
STREAM(higher is better)
COPY
SCALE
ADD
TRIAD
SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)
0.00
500.00
1,000.00
1,500.00
2,000.00
2,500.00
3,000.00
3,500.00
4,000.00
4,500.00
STREAM Price Performance(higher is better)
COPY
SCALE
ADD
TRIAD
© 2014 IBM Corporation
Platform Computing
23
CPU Performance
SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)0
100
200
300
400
500
600
700
800
SuperPI(lower is better)
Ela
pse
d T
ime
SL PM SL VM EC2 CCI2 EC2 2XL SL PM (ded)0.001.002.003.004.005.006.007.008.009.00
10.00
SuperPI Price-Performance(higher is better)
thro
ug
hp
ut
per
do
llar
© 2014 IBM Corporation
Platform Computing
24
Network Bandwidth
1 10 100 1000 10000 100000 1000000 100000001
10
100
1000
10000
100000openMPI
SLVM
EC2 2XL
EC2 CCI2
SL PM
SL PM Dedicated
Message Size (Bytes)
Ban
dw
idth
(M
bit
s/s)
© 2014 IBM Corporation
Platform Computing
25
Network Latency
SL VM MPI 2 node EC2 2XL MPI 2 node EC2 CCI2 MPI 2 node
SL PM MPI 2 node SL PM (ded) MPI 2 node
0
20
40
60
80
100
120
openMPI Latency(lower is better)
© 2014 IBM Corporation
Platform Computing
26
Input / Output Performance
0.5 1 1.5 2 2.5 3 3.5 4 4.50
50000
100000
150000
200000
250000
300000
350000
I/O Bandwidth - WRITE(higher is better)
SL VM Write
EC2 2XL Write
EC2 CCI2 Write
SL PM Write
SL PM Ded Write
I/O file size (factor of memory size)kB
/sec
0.5 1 1.5 2 2.5 3 3.5 4 4.50
50000
100000
150000
200000
250000
300000
350000
400000
I/O Bandwidth - READ(higher is better)
SL VM Read
EC2 CCI2 Read
EC2 2XL Read
SL PM Read
SL PM Ded Read
I/O file size (factor of memory size)
kB/s
ec
© 2014 IBM Corporation
Platform Computing
27
Software Compilation
SL VM SL PM EC2 2XL EC2 CCI SL PM Ded0
100
200
300
400
500
600
700
800
Software Compile Performance(lower is better)
Ela
pse
d T
ime
(s)
SL VM SL PM EC2 2XL EC2 CCI SL PM Ded0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
Software Compile Price-Performance(higher is better)
Ru
ns
/ $
© 2014 IBM Corporation
Platform Computing
28
Life Science (BWA)
SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL
Series1 20846.481 26509.368 25897.4399999999
22442.7 37491
2500
7500
12500
17500
22500
27500
32500
37500
Life Sciences Benchmark (BWA)(lower is better)
Ela
pse
d t
ime
(sec
)
SL PM (ded) SL PM SL VM EC2 CCI2 EC2 2XL
Series1 22.2056844293981
7.79121780185185
6.33048533333333
14.9618 6.04021666666667
2.50
7.50
12.50
17.50
22.50
Life Sciences Benchmark (BWA) Price Per-formance
(lower is better)
$ /
run
© 2014 IBM Corporation
Platform Computing
29
EDA Benchmark (IBM-MESA)
SL PM (ded) SL PM SL VM EC2 2XL EC2 CCI20
500
1000
1500
2000
2500
3000
3500
EDA - IBM Mesa(lower is better)
Ela
pse
d T
ime
(sec
)
SL PM (ded) SL PM SL VM EC2 2XL EC2 CCI20.00
0.50
1.00
1.50
2.00
2.50
EDA - IBM Mesa - Price-Performance(higher is better)
Ru
ns
/ $
© 2014 IBM Corporation
Platform Computing
30
Provisioning Time
SL PM SL VM EC2 CCI2 EC2 2XL SL PM Ded100
1000
10000
100000
Provisioning Time (sec)(lower is better)
© 2014 IBM Corporation
Platform Computing
31
Industrial Manufacturing – Structural Mechanics
0 2 4 6 8 10 12 14 161
3
5
7
9
11
13
One Node - S4D
SL PM
EC2 CCI2
SL VM
EC2 2XL
SL PM (ded)
CPUs
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)
0 2 4 6 8 10 12 14 161
2
3
4
5
6
7
One Node - S6
SL PM
EC2 CCI2
SL VM
EC2 2XL
SL PM (ded)
CPUs
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 3213579
1113151719
Two Nodes - S4D
SL PM
EC2 CCI2
SL VM
EC2 2XL
SL PM (ded)
CPUs
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)
0 4 8 12 16 20 24 28 321
2
3
4
5
6
7
8
9
Two Nodes - S6
SL PM
EC2 CCI2
SL VM
EC2 2XL
SL PM (ded)
CPUs
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)
© 2014 IBM Corporation
Platform Computing
32
Industrial Manufacturing – CFD
1 3 5 7 9 11 13 150
2
4
6
8
10
12
14
16
18
OpenFoam Speedup Backplane(higher is better)
SL PM (ded)
SL PM
SL VM
EC2 CCI2
EC2 2XL
# cores
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 310
1
2
3
4
5
6
7
8
OpenFoam Speedup Ethernet(higher is better)
SL PM (ded)
SL PM
SL VM
EC2 CCI2
EC2 2XL
# cores
Sp
eed
up
(re
lati
ve t
o E
C2
2XL
)