New Thor & Roxie Hardware Architecture

17

Click here to load reader

Transcript of New Thor & Roxie Hardware Architecture

Page 1: New Thor & Roxie Hardware Architecture

October 4, 2017

Jonathan Burger, Lead Infrastructure Architect

New Thor & Roxie Hardware Architecture

Page 2: New Thor & Roxie Hardware Architecture

Discussion Topics

New Thor & Roxie Hardware Architecture 2

• Hive360

• UltraThor 2.0

• Roxie Fusion

Page 3: New Thor & Roxie Hardware Architecture

Hive360A new way to HPCC Systems on Public Cloud

Page 4: New Thor & Roxie Hardware Architecture

Porting HPCC Systems Infrastructure to Cloud

Cloud Architectures

• Resources on Demand

• Automated Scaling

• HA Across Datacenters (AZ’s)

• Agile Workflow CI/CD

• Self Healing

• IaaS and PaaS

• Lower Cost

Traditional HPCC Systems Architecture

• Capitalized Resources

• Static Sizing

• Active/Passive DR

• Limited Workflow Tooling

• IT Supported Resources

• By-Hand Configuration

• Higher Cost

New Thor & Roxie Hardware Architecture 4

Page 5: New Thor & Roxie Hardware Architecture

Breaking the Traditional Paradigm

• Separation of the data from the compute plane

• Automatic self-configuration

• Leveraging built in AWS services (ASG, ELB, EFS)

• Leveraging off-instance logs & metrics collection (Cloudwatch / Grafana)

• Breaking the Clustered Roxie Paradigm

• API Driven to Leverage CI/CD

New Thor & Roxie Hardware Architecture 5

Page 6: New Thor & Roxie Hardware Architecture

The Design Outcome

• Cloud Formation Template to Automate Deployment

• Cross AZ Active/Active HPCC

• Instant Roxie Data Releases

• Self-Healing Thor/Roxie/Admin Infrastructure

• API Driven Scalability Up/Down

• Lower Running Cost

• CI/CD Ready

New Thor & Roxie Hardware Architecture 6

Page 7: New Thor & Roxie Hardware Architecture

Cloud Formation Interface

New Thor & Roxie Hardware Architecture 7

Page 8: New Thor & Roxie Hardware Architecture

Known Caveats

New Thor & Roxie Hardware Architecture 8

• No current production deployments

• Limit of 8 exabytes of data storage

• Limit of 175 active swarms of an existing hive

• Limited HPCC AMI Builds

• Maximum Thor size of 360 nodes

Page 9: New Thor & Roxie Hardware Architecture

UltraThor 2.0Next Gen Thor Physical Infrastructure

Page 10: New Thor & Roxie Hardware Architecture

Major Infrastructure Design Changes

New Thor & Roxie Hardware Architecture 10

• Leverage SSD as primary storage

• Leverage NIC bonding for HA & Higher Throughput

• Leverage Off Cluster & Lowest Cost Storage for Thor Mirror

• Higher Slaves Per Node

• Lower Physical Footprint

• Lower Electrical Footprint

• Higher Room For Expansion

Page 11: New Thor & Roxie Hardware Architecture

Hardware Details

New Thor & Roxie Hardware Architecture 11

• 100 Physical Servers + 18 Servers for Mirrored Storage

• 1600 High Performance 1.2TB SSD Drives – dual LSI Perc (RAID50)

• 200 x 25Gb/sec Network Interfaces Configured in LACP Bond

• 6800 Processing Cores

• 51.2 TB of RAM

• 3 Server Racks @48U/Rack

• 6 Leaf, 2 Spine, 2 Super Spine Uplinked via 100Gb/sec x 8 Port Channels

• Storage Servers 1.2PB of Slow Disk For Mirrored Storage

Page 12: New Thor & Roxie Hardware Architecture

Fun Facts

New Thor & Roxie Hardware Architecture 12

• Total READ/WRITE Throughput of 600GB/sec

• The equivalent of copying 100 Full Length HD Movies Per Second

• Total Bi-Sectional Network Bandwidth of 5Tb/sec

• The equivalent of downloading 31 WoW Full installs per second

• Can Sort the text of the Entire Library of Congress (@10TB) in 7 minutes

• Physical Size is ½ of Prior Design

• MTBF Calculations are 300% Better Than Prior Design

• Only 25% of Total Potential Size

Page 13: New Thor & Roxie Hardware Architecture

Roxie FusionNext Gen Roxie Infrastructure

Page 14: New Thor & Roxie Hardware Architecture

Major Infrastructure Design Changes

New Thor & Roxie Hardware Architecture 14

• Reduce Copy Times Between Environments (Cert/CT/Prod)

• Reduce Data Duplication

• Improve MTBF By Leveraging RAID & Bonding

• Reduce Technical Debt in Unused Resources

• Lower Physical Footprint

• Lower Electrical Footprint

• Better and More Environmental Isolation

Page 15: New Thor & Roxie Hardware Architecture

Roxie Fusion Approach

New Thor & Roxie Hardware Architecture 15

• Continue to Provide Environmental Isolation

• Keep Data Local To The Environment (No SAN Remote Storage)

• Share Data Between Environments to Reduce Duplication

Take really big servers and chop them into manageable pieces using containers.

Page 16: New Thor & Roxie Hardware Architecture

Roxie Fusion Highlights

New Thor & Roxie Hardware Architecture 16

• Data is pulled once and released to all environments in seconds instead of hours/days.

• Projected to reduce over 3200 physical servers from our datacenters

• Improved MTBF By Leveraging RAID & Bonding

• Projected to save $9M in capitalization every 3 years

• Better & more isolation of distinct environments

• Lowered Electrical Footprint & Cost

• Currently in Phase III POC with a full production target of Jan 2018

Page 17: New Thor & Roxie Hardware Architecture

New Thor & Roxie Hardware Architecture 17

Questions?