Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
-
Upload
hentsu -
Category
Technology
-
view
109 -
download
0
Transcript of Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
変通 [hen-tsoo] noun1. Resourcefulness – the quality of being able to cope with a difficult situation2. Adaptability – the ability to change (or be changed) to fit changed circumstances3. Agility – the power of moving quickly and easily; nimbleness
INFINITELY SCALABLE CLUSTERSGrid computing on public cloud
AGENDA• Grid Computing Background• On-Premise & Public Cloud• Google Cloud Platform• Demo
January 2017
TERMINOLOGY• Public Cloud (AWS, Azure,
Google)• Private Cloud (Your data
centre)• High Performance
Computing (HPC)• Grid Computing• Compute Cluster
• CPUs / Processors / Cores• RAM and Disk Storage• IaaS (virtual hardware
and networking)• PaaS (software services)
January 2017
WHAT IS PUBLIC CLOUD?“A service provider makes resources, such as virtual machines, applications and storage, available to the general public.”• Utility model• No contracts• Shared hardware / multi-tenant• Self managed
January 2017
WHAT IS GRID COMPUTING?Traditional Resource Limitations:• Data Store Performance • PC Processor / Memory / Storage• Network BandwidthThe researcher may wait a long time for results.
• Grid computing moves the computational work from the PC to a cluster of servers
• The cluster processes the data on behalf of the researcher and returns the results
• Processing time is reduced• Larger datasets can be tackled
January 2017
KEY CONCEPTSThe Challenges The Workflows
Number of Tasks
Size
of
Data
Big Data
High Throughput Computing
MapReduce
High Performance Computing
Ingest Process
Analyse
Visualise
Store
January 2017
HARDWARE INFLEXIBILITY• Buy 22 core processors at
2.2GHz or 6 core processors at 3.6GHz?
• Buy 8GB, 16GB or 32GB memory modules (RAM per core ratio)?
• Graphical Processing Units (GPUs)?
• How much local storage per server?
• What network devices between servers (32 or 48 port switches)?
• What size file server?
Monday Tuesday Wednesday Thursday Friday Saturday Sunday0
20
40
60
80
100
120
Date
Jobs
per
day
Grid usage varies depending on research priorities:
January 2017
EXAMPLE OF MATLAB GRID WITH PUBLIC CLOUD- Pay only for what you use- Scale compute resource
up AND down- Minimal capital outlay on
hardware- Experiment with grid
computing platforms quickly, cheaply and with no commitment
January 2017
A DAY IN A PUBLIC CLOUD CLUSTER
Time 02:00:0004:10:0006:20:0008:30:0010:40:0012:50:0015:00:0017:10:0019:20:0021:30:0023:40:000
20
40
60
80
100
120
140
160
180
Workers Tasks in Queue
- Cluster consisting 32x 4 cores
- Max 128 worker nodes- Ramps up as jobs get
submitted- Tears down nodes
when jobs finished- Minimising costs when
not in use
January 2017
IDEAL CLUSTER SIZE?
8 16 32 64 96 128 160 192 2240
200
400
600
800
1000
1200
1400
Job Run time in seconds
Cores
Seco
nds
Ingest Process
Analyse
Visualise
Store
Optimise other parts of the workflow?
January 2017
RUNNING HYBRID CLUSTER ON IAASAWS vCPUs are hyper-threaded™
Each vCPU is a hyper thread of an Intel Xeon core for 2nd generation instance types(M4, M3, C4, C3, R3, HS1, G2, I2, and D2)https://aws.amazon.com/ec2/instance-types/
Azure does not overcommit memory or cores. vCPUs are physical cores.Azure does not use hyper-threading.https://aws.amazon.com/ec2/instance-types/
January 2017
CLOUD GRID DEPLOYMENT OPTIONS1. Infrastructure as a Service (IaaS) DIY
Spin up a compute cluster on VMs for additional capacity and new workloads
2. BurstUse existing on premises compute cluster and burst on cloud as required
3. Software as a Service (SaaS)Software vendors and Managed Service Providers provide their own SaaS solutions. Pay for compute and application software per hour
4. Platform as a Service (PaaS)Cloud providers’ data analytics platform as a service:Google BigQuery & Datalab, Microsoft HDInsight, Amazon EMR
January 2017
BIGQUERY – A GOOGLE CLOUD PLATFORM SERVICE• Fully managed and serverless architecture• Massively scalable to petabytes of data, without the need to
capacity plan• Resources are deployed as necessary in the background to run
queries in seconds• Standard SQL queries • Table partitioning• No indexing needed• Simple pricing model:
• Data storage, streaming inserts, and queries are charged• Data loading and exporting are free of charge
January 2017
BIGQUERY TECHNICAL BACKGROUNDHadoop based “service that enables interactive analysis of massively large datasets”• Distributed File System -
Stores data that’s larger than can fit on a single machine
• Map Reduce – Distributes processing across multiple systems
http://blogs.forrester.com/mike_gualtieri/13-06-07-what_is_hadoop
January 2017
FINAL NOTES – DON’T FORGET SECURITYSecurity considerations:• Secure transfer and storage of data and code• Secure remote access to cloud hosted environment• Secure authentication
• Windows AD Credentials• AWS IAM Credentials• Google Accounts• Microsoft Accounts
• Auditing (who accessed what, who changed what)
January 2017
SUMMARY• Traditional grid and HPC tools can benefit from moving into
cloud• Vast landscape of available tools• Off-the-shelf PaaS offerings• Integrations and ecosystems• Cheap and very quick to experiment
January 2017
[email protected]://hentsu.com
London:1 Fore StreetLondon EC2Y
9DTNew York:
450 Lexington Ave
New York 10017
MORE INFORMATION?