Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. ·...

26
Using Ansible Tower to Automate Baremetal Benchmarking Chase Hoffman MTS IT Engineer AMD Chris Janiszewski Principal Solutions Architect Red Hat

Transcript of Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. ·...

Page 1: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Using Ansible Tower to Automate Baremetal Benchmarking

Chase HoffmanMTS IT EngineerAMD

Chris JaniszewskiPrincipal Solutions ArchitectRed Hat

Page 2: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Share your automation story

1. How did you get started with Ansible?

2. How long have you been using it?

3. What's your favorite thing to do when you Ansible?

Page 3: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

What is AMD?

● 50 year old industry leading semiconductor manufacturer○ Server CPUs○ Client CPUs○ GPUs○ Application Processing Units (APUs)

● Created the first 64 bit extension for x86 (it’s AMD64, NOT x86_64!)● Proud Red Hat Partner

Page 4: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

CPU Development Process (Oversimplified)1. Chip design is created and simulated

2. Chip is baked into physical silicon3. Engineers test physical chips 4. Improvements are identified and implemented5. Steps 2-4 repeat until chips are ready for release6. Launch a new server CPU line that sets 80 world benchmark records7. ???8. Present at AnsibleFest

My team

Page 5: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Which benchmarks?● Stream (https://www.cs.virginia.edu/stream/)● SPEC®

○ SPEC SERT™ (https://www.spec.org/sert2/)

○ SPEC CPU® (https://www.spec.org/cpu2017/)

○ SPECjbb® (https://www.spec.org/jbb2015/)

○ SPECpower_ssj® (https://www.spec.org/power_ssj2008/)

○ SPEC VIRT_SC® (https://www.spec.org/virt_sc2013/)

● FIO (https://fio.readthedocs.io/en/latest/)● iPERF (https://iperf.fr/)● DGEMM● HPL (https://www.netlib.org/benchmark/hpl/)● And many more

SPEC SERT ™ is a trademark of, and SPEC ® , SPEC CPU ® , SPECjbb ® , SPECpower_ssj ® , and SPEC VIRT_SC ® are registered trademarks of, the Standard Performance Evaluation Corporation

Page 6: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Challenge● AMD’s 2nd generation EPYC™ processor (“formerly codenamed Rome”) has,

as of today, 19 different SKUs● Each one needs to be tested in multiple hardware configurations for each

benchmark ● To compete in a challenging market, we are pursuing an aggressive

development cycle that requires more data be gathered faster than ever before

Page 7: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

...or, as my boss wishes he phrased it

Tweet by Ryan Metz used by permission (https://twitter.com/RyanAEMetz/status/1169702852970856450?s=20 )

Page 8: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Hurdles● Hardware engineers are not systems engineers

○ OS instances not defined as infrastructure-as-code○ Documentation is aimed at hardware settings, not OS settings

● Lack of reproducibility of results○ Between labs○ To Customers

● Not scalable/inefficient○ Highly manual deployment process○ All tests run by hand

Page 9: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Pre-Automation

Page 10: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Solution● Automate all the things!

○ OS deployment○ General Configuration○ Test Deployment○ Test Configuration○ Test Run○ Results Upload○ OS destruction

Page 11: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Teaming Up with Teamwork Together as a Team

Page 12: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

The Glorious Present

Page 13: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Why Ansible?● YAML is easy and human readable● Cross-platform support● Active ecosystem● Lots of good training available● Trusted by Fortune 500 companies1

● Easy to find employees/contractors with experience for rapid ramp up● Easily extensible with custom modules● Can orchestrate pretty much anything

1https://www.ansible.com/blog/enterprise-ansible

Page 14: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Why Ansible Tower?● Allow non-Ansible savvy users to run tests● Multi-tenancy support● Forces playbooks to pull from source control● Workflow Engine allows us to easily string together multiple playbooks● Support from a Partner in Red Hat

Page 15: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Why OpenStack?● Native baremetal deployment support (Ironic)● Hardware characteristic collection through Introspection● Easy KVM VM deployment/management for virtual benchmarks● Deep Ansible integration● Companies run it and want to see benchmarks on it● Native network configuration support (ML2 drivers)● Support from a Partner in Red Hat

Page 16: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Automating Angry Bear (BMaaS)● Platform ● Infrastructure ● Workload● Operations

Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.

Page 17: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Use Cases● AMD Use Case (HW

Benchmarking-as-a-Service)● Provisioning Cloud● AI/ML/HPC● Edge/IoT● Rendering Farms● No overhead environments (CPU,

Network, Storage)

Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.

Page 18: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Future of Automating Bears● Networking-Ansible● RedFish● Routed Networks● Centralized Storage

Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.

Page 19: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Lessons Learned With AnsiblePart 1

● Everything should be a role - then make a playbook with one call if you need to just do one thing

○ Easier tracking in version control○ Easier to force documentation

● We found we had to write a lot of modules○ This may be because of our unique applications

● Require extensive readme.md files for each role○ Forces writers to think○ Easy to copy to wiki

● It is REALLY hard not to nest roles within roles○ If you have to do it (and we did) use metadata extensively

Page 20: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Lessons Learned With AnsiblePart 2

● Referring users to the Ansible documentation generally does not work○ An internal documentation site is required○ We’ve found the videos linked from the Ansible site are also quite good

● Forcing Code Reviews on Pull Requests is one way to stay sane

Page 21: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Lessons Learned With Tower● The user and group permission inheritance is basic, so be very deliberate in

initial configuration● Creating Workflow Templates with Surveys at each step is not intuitive in the

editor - document this well to make it easier on users● Users generally dislike the Job Template output in Tower (as well as Ansible

core), so put in a lot of error checking and breakpoints to make troubleshooting easier

Page 22: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

Lessons Learned with OpenStack● Ironic is gaining steam as a usecase, but most documentation revolves

around VMs.● Because of this non-entirely-standard-usecase, deployment was more

complex than usual.

Page 23: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

WHAT’S NEXT?!?!?● Build a front end for system selection via hardware characteristics through

OpenStack Introspection○ Possibly CloudForms○ Possibly something else

● Extend multitenancy to external users● World Domination

Page 24: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational

DISCLAIMER AND ATTRIBUTIONS

DISCLAIMERThe information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18

Timelines, roadmaps, and/or product release dates shown in these slides are plans only and subject to change. "Rome" and "Naples" are codenames for AMD architectures, and are not product names. GD-122

ATTRIBUTION©2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Red Hat, the Red Hat logo and Ansible are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the U.S. and other countries. The OpenStack Word Mark is either a registered trademark/service mark or trademark/service mark of the OpenStack Foundation, in the United States and other countries, and is used with the OpenStack Foundation's permission. Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/. No endorsement is implied. Red Hat and AMD are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

SPEC SERT ™ is a trademark of, and SPEC ® , SPEC CPU ® , SPECjbb ® , SPECpower_ssj ® , and SPEC VIRT_SC ® are registered trademarks of, the Standard Performance Evaluation Corporation.

Tweet by Ryan Metz used by permission (https://twitter.com/RyanAEMetz/status/1169702852970856450?s=20 )

Page 26: Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. · DISCLAIMER AND ATTRIBUTIONS DISCLAIMER The information contained herein is for informational