Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building...

26
Innovation Intelligence ® Michigan!/usr/group: Compute ClustersBuilding Blocks of the Public Cloud Jeff Marraccini, Vice President, Computer Systems [email protected] August 2015

Transcript of Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building...

Page 1: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Innovation Intelligence®

Michigan!/usr/group:

Compute Clusters—Building Blocks of the

Public Cloud

Jeff Marraccini, Vice President, Computer Systems

[email protected]

August 2015

Page 2: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

About me, and yes, the disclaimer

• Work at Altair Engineering in Troy

18 years

• My team manages a number of clusters

• I manage staff that handle our internal

clusters in a 2,300+ employee company, so:

My employer may not agree with all my

opinions – they are my own. I am also

a generalist. Check with others before

spending money on a cluster.

• Like many things, there are no “one size

fits all” solutions with HPC! Please research!

Fa

bric

Head Node

Storage

Exec Nodes

Visualization Nodes

Page 3: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Thank you, and seeing this stuff for real

Michigan!/usr/group contributed to my career – thank you!

Past and present members contributed to tools we use daily. Preaching to

the choir here: knowledge exchange empowers us all.

Tours:

I cannot show too much live while we are recording.

I would be glad to give you a tour if you are in the Troy, MI area – please

message me at [email protected]. Must agree not to reveal operational

specifics.

Page 4: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Overview of today’s talk

• Why clusters?

• Some history

• “Private cloud” clusters

• Architecture

• Failures

• The Virtual Machine era

• The Container / Docker era

• “Public cloud” clusters

• Facebook and the Open Data Center

• Appliance Computing

• Resources to learn more

Page 5: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Why clusters? And what’s the big deal?

• Mainframe costs, even today

• Individual server performance and Moore’s Law

• Networking + computers + “cluster software” = often vast power

• What do we do with these 3-5 year old computers on a 7-10 year budget

cycle?

• Sony PlayStations, Apple XServes, Raspberry Pi

• Operating systems (usually) no longer as expensive as the computer

Page 6: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Universities, government agencies, companies,

and basements near you…

• They got us started…

• NASA BEOWULF (you may be using a BSD/Linux Ethernet driver based on

Donald J. Becker @ NASA’s work!)

• NSA fed back scalability ideas (!!), early adopter

• Older operating systems: Tandem, Digital VAX/VMS & OpenVMS, Some UNIX,

Microsoft Windows Server Clusters

• Universities world wide – open source contributions

• Military projects

• Basement clusters run by grad & undergrad students

• LucasFilm and related special effects firms

• MASSIVE (Peter Jackson/WETA Digital!) – got us into 10GbE message passing

Page 7: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

What do they do?

• Scientific and engineering computing – the start of it all

• Render farms – special effects for movies, TV, commercials, games, live

TV and sports overlays…

• Media conversion (YouTube!)

• Web services, E-Mail at scale

• BitCoin and other computational currency

• Databases, “Big Data”

• Scale out Storage (EMC Isilon is an InfiniBand cluster!)

• Building and testing software (my workplace)!

• Social media (combining a lot of the above)

• Cracking passwords, encryption

• Neural networks / expert systems / IBM WATSON

Page 8: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Some of the largest clusters are…

• 10’s-100’s of thousands of cores

• NSA (probably), along with other governments’ security arms

• Other classified installations

• CERN

• Research labs (NCSA near Chicago is one)

• Public clouds (Google, Amazon, Microsoft, Rackspace, IBM, others)

• 1’s-10’s of thousands of cores

• Square Kilometer Array (Australia / South Africa, just got back from there)

• Weather forecasting

• Japan’s Earth project (early 2000’s)

• Render “farms”

• Large organizations (corporate, universities, “smaller” public cloud providers)

• Small businesses often have dozens to hundreds of cores, and may not

realize it if leasing private and/or public cloud services!

Page 9: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

10,000 hands working in the space of a living room

“Cluster programming is a lot like putting a large puzzle together with 10,000 hands in the space of a living room, keeping them in sync”

- Altair developer when I reported a memory leak

Page 10: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Software development complexities & architecture

• Message passing (MPI) libraries, achieve huge scales

• Shared memory with proprietary interconnect (Some Cray, NEC, SGI

Altix)

• Process Migration (LinuxPMI, OpenMOSIX, some Cray, NEC, SGI Altix

UV)

• Systemd (w/ cgroups) is really nice on clusters as it reduces start up

and restaging latency due to parallel daemon startup and reduces shell

script complexity

• Ansible, Salt, and other configuration automation tools for sysadmin

Page 11: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

“Private Cloud”

• Internal use clusters

• Sometimes accessible via remote access, Virtual Private Networks

• “Secret sauce” behind internal tools, some of which now have public

cloud front ends

• Requires a forging of networking, storage, and computing teams

• Oracle 10g databases often first exposure to IT

• Scalable internal storage (EMC Isilon, ExaGrid, HP 3PAR, Ceph, etc.)

Page 12: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

High Availability Private Cluster Block Diagram

Firewall

• Protects often unpatched cluster software and firmware

• Load balancer

• Remote access

Head Nodes

• 1

• 2

• Authentication, Scheduling, Staging, Reloading, Push notifications, Periodic Check-pointing

Switch Fabrics

• 1

• 2

• Infiniband, 1/10/40/100GB Ethernet, Proprietary (Cray!)

Execution Nodes

• 1 … N

• Local storage, local “scratch”

Shared Storage Pools

• Staging

• Check-points

640 core half rack SuperMicro

TwinBlade chassis w/ 100TB usable

storage, QDR InfiniBand, ~~9 kW

2 X this for high availability

Page 13: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Altair’s Internal Clusters

• We use PBS Professional for all (it is our product!)

• HyperWorks Unlimited – “cluster in a box” – many around the world,

hundreds to 2048 cores, single rack or virtual clusters in public clouds

• Legacy “E-Compute” & Compute Manager (newer) – several clusters of

a few hundred cores each

• HyperWorks – several hundred cores, Windows, Linux, Mac - Michigan

and India, 80+ compilations (400K+ files/each), thousands of tests daily

• Test clusters – 128-256 cores, often restaged, scrounged older hardware

Page 14: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

A regular cluster (or a basement one!)

Head Node

• Authentication, Scheduling, Staging, (Reloading, Push notifications, Periodic Check-pointing)

Cluster fabric(s)

• Ethernet switch

• Infiniband switch

• Storage Area Network

Execution Nodes

• 1 … N (could be varying hardware)

• Local storage (maybe!)

Shared Storage Pools

• Staging

• Checkpoints (maybe!)

• Could be FreeNAS, Lustre, Isilon…

Could well ALL be running on a single

virtual machine hypervisor for dev &

test!

Page 15: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

An Engineer’s Patience

96 core job running on part of the cluster from the previous slide:

Req'd Req'd Elap

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----

182651.XXXXXXXX YYYYYYYY radioss- ZZZZZZZZZZ 29362 8 96 -- -- R 36:41

node006/0*12+node007/0*12+node009/0*12+node010/0*12

+node011/0*12+node012/0*12+node013/0*12+node014/0*12

Without oversubscription, that cluster may run 10 96 core jobs at once.

Most jobs on it run longer than a day – some for a couple weeks.

We are very paranoid when someone opens the cabinet doors on it…

Page 16: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

The Fabric – cluster scaling and speed

• InfiniBand (10-56Gb/s, low latency)

• MyriNet (obsoleted, fiber optics)

• PCIe

• Ethernet (10GbE/40GbE/100GbE)

• Proprietary (CrayLink and others)

• Virtual network switches

256 core SGI half rack, QDR InfiniBand, Nvidia GPU’s,

Ethernet 1Gb/s mgmt, no HA. Surprisingly quiet in full use!

Page 17: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Storage

Varying needs = varying capacities (Computational Fluid Dynamics/CFD,

“crash”, chemistry, optimization, Bitcoin, hash cracking…)

Cluster storage is HARD, especially scale out – “Big Data” approaches not

good back end storage for scientific/engineering computing (yet)

Reliability - High availability often is more than 2X the cost

Local storage limits (blades, enterprise SSD, 2.5” HDD)

Spinning it down when portions idle = complex

Page 18: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Management

• Staging the nodes – potentially thousands during install and upgrades

Herding cats = scheduling different user communities’ requirements

Failures and recovery

• Staging jobs in/out – a CFD project may be 1TB+ of output * 200 jobs

• Push notifications, “Is it done yet?”

• Portals

• Continuous resource monitoring

• Check-pointing

• Energy efficiency

Page 19: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

When it breaks

• Nodes will fail

• We have hardware failures every week, bigger clusters may have hourly

failures or even more

• Check-pointing = costly in storage and processing time, see

http://www.csm.ornl.gov/~engelman/publications/wang10hybrid2.pdf

• Restoring from a checkpoint may be unreliable

• Restaging

• Job migration

• Jeff’s “I meant to type a 11 and typed 1” glitch

• The dreaded faulty InfiniBand cable

• “If you monitor me, my job slows down!”

Page 20: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

The Virtual Machine Cluster

• Great way to demo cluster software, Ansible/Salt, etc.

• SIMH & OpenVMS (Jeff’s VMS cluster on a Surface Pro 3 tablet)

• Multics may now be emulated, see http://multicians.org/

• Virtual network switches work great on multi-core hosts

• “Pull” the virtual network cable, see if the storage busts

• Test your upgrades

• Learn without spending $50,000+

• Hypervisors add I/O latency

• Fabric support limited

• = Scalability limited

Page 21: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

The Container / Docker – More than a fad

• Famous “Pets” vs “Livestock” (some call “Cattle”) argument for

application design

• Single operating system per host, operating system ensures containers

are sandboxed from each other AND they have cluster fabric access!

• Multiple containers (load balancer + web server + app server + database

server + log server) may be spun up and scaled with appropriate app

design

• Still have to patch the containers if there are vulnerabilities inside!

Ansible, etc. useful!

Page 22: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

“I’m out of oomph” -> BURSTING

• “Promise” of the Public Cloud

• Credit card financed computing

• Possibly loosely coupled

• Fabric compromises

• Getting better!

Internal ClusterVPN to Amazon AWS/Microsoft

Azure

Cloud Execution Nodes

Cloud fabric

Cloud storage

Page 23: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Spread out clusters

• May be in the “Public Cloud” or at multiple “Private Cloud” sites

(research centers, remote data centers, leased private capacity)

• Redundancy – Hadoop and derivatives quickly copy object data and

store archival copies, etc.

• Scalability, 100Gb/s inter-data-center links now common

• Lots of “dark fiber” available for leasing

• Watch out for latency sensitive implementations

Page 24: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Facebook and Open Compute Project

• Mainly useful for big organizations

• Power efficiency, reduce impact

• Shared power supplies

• Optimized cooling

• Storage & node spin-down

• Designed to fail and be easily serviceable

• Quick upgrades

• Scalability beyond conventional designs

• Might slow down commodity server price drops, volume decreasing

• http://www.opencompute.org/

Page 25: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Appliances and Platform as a Service (PaaS)

• “Cluster in a box” (well, racks!) or cloud

• Bursting

• Project-based computing

• Nimble

• Geek skills embedded

• Easy portal / front ends

Page 26: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,

Where do we go from here?

• Public library access to Lynda.com – Amazon AWS & Microsoft Azure

“Up and Running” courses

• SIMH hobbyist OpenVMS cluster: https://vanalboom.org/node/18

• OpenStack on virtual machines: http://www.openstack.org/ and

http://docs.openstack.org/developer/devstack/#quick-start

• Example appliance: http://www.altair.com/hwul/

• PBS Professional, IBM LSF, Grid Engine, other cluster mgmt. software

• OpenStack Ceph scalable block storage: http://ceph.com/

• Lustre storage free software: http://wiki.lustre.org/

Aside from security, the ability to build and maintain private and public

cluster systems are near the top of the pay scale in IT!