Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. ·...
Transcript of Baremetal Benchmarking Using Ansible Tower to Automate ATL Slide Decks... · 2019. 10. 10. ·...
Using Ansible Tower to Automate Baremetal Benchmarking
Chase HoffmanMTS IT EngineerAMD
Chris JaniszewskiPrincipal Solutions ArchitectRed Hat
Share your automation story
1. How did you get started with Ansible?
2. How long have you been using it?
3. What's your favorite thing to do when you Ansible?
What is AMD?
● 50 year old industry leading semiconductor manufacturer○ Server CPUs○ Client CPUs○ GPUs○ Application Processing Units (APUs)
● Created the first 64 bit extension for x86 (it’s AMD64, NOT x86_64!)● Proud Red Hat Partner
CPU Development Process (Oversimplified)1. Chip design is created and simulated
2. Chip is baked into physical silicon3. Engineers test physical chips 4. Improvements are identified and implemented5. Steps 2-4 repeat until chips are ready for release6. Launch a new server CPU line that sets 80 world benchmark records7. ???8. Present at AnsibleFest
My team
Which benchmarks?● Stream (https://www.cs.virginia.edu/stream/)● SPEC®
○ SPEC SERT™ (https://www.spec.org/sert2/)
○ SPEC CPU® (https://www.spec.org/cpu2017/)
○ SPECjbb® (https://www.spec.org/jbb2015/)
○ SPECpower_ssj® (https://www.spec.org/power_ssj2008/)
○ SPEC VIRT_SC® (https://www.spec.org/virt_sc2013/)
● FIO (https://fio.readthedocs.io/en/latest/)● iPERF (https://iperf.fr/)● DGEMM● HPL (https://www.netlib.org/benchmark/hpl/)● And many more
SPEC SERT ™ is a trademark of, and SPEC ® , SPEC CPU ® , SPECjbb ® , SPECpower_ssj ® , and SPEC VIRT_SC ® are registered trademarks of, the Standard Performance Evaluation Corporation
Challenge● AMD’s 2nd generation EPYC™ processor (“formerly codenamed Rome”) has,
as of today, 19 different SKUs● Each one needs to be tested in multiple hardware configurations for each
benchmark ● To compete in a challenging market, we are pursuing an aggressive
development cycle that requires more data be gathered faster than ever before
...or, as my boss wishes he phrased it
Tweet by Ryan Metz used by permission (https://twitter.com/RyanAEMetz/status/1169702852970856450?s=20 )
Hurdles● Hardware engineers are not systems engineers
○ OS instances not defined as infrastructure-as-code○ Documentation is aimed at hardware settings, not OS settings
● Lack of reproducibility of results○ Between labs○ To Customers
● Not scalable/inefficient○ Highly manual deployment process○ All tests run by hand
Pre-Automation
Solution● Automate all the things!
○ OS deployment○ General Configuration○ Test Deployment○ Test Configuration○ Test Run○ Results Upload○ OS destruction
Teaming Up with Teamwork Together as a Team
The Glorious Present
Why Ansible?● YAML is easy and human readable● Cross-platform support● Active ecosystem● Lots of good training available● Trusted by Fortune 500 companies1
● Easy to find employees/contractors with experience for rapid ramp up● Easily extensible with custom modules● Can orchestrate pretty much anything
1https://www.ansible.com/blog/enterprise-ansible
Why Ansible Tower?● Allow non-Ansible savvy users to run tests● Multi-tenancy support● Forces playbooks to pull from source control● Workflow Engine allows us to easily string together multiple playbooks● Support from a Partner in Red Hat
Why OpenStack?● Native baremetal deployment support (Ironic)● Hardware characteristic collection through Introspection● Easy KVM VM deployment/management for virtual benchmarks● Deep Ansible integration● Companies run it and want to see benchmarks on it● Native network configuration support (ML2 drivers)● Support from a Partner in Red Hat
Automating Angry Bear (BMaaS)● Platform ● Infrastructure ● Workload● Operations
Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.
Use Cases● AMD Use Case (HW
Benchmarking-as-a-Service)● Provisioning Cloud● AI/ML/HPC● Edge/IoT● Rendering Farms● No overhead environments (CPU,
Network, Storage)
Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.
Future of Automating Bears● Networking-Ansible● RedFish● Routed Networks● Centralized Storage
Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/.
Lessons Learned With AnsiblePart 1
● Everything should be a role - then make a playbook with one call if you need to just do one thing
○ Easier tracking in version control○ Easier to force documentation
● We found we had to write a lot of modules○ This may be because of our unique applications
● Require extensive readme.md files for each role○ Forces writers to think○ Easy to copy to wiki
● It is REALLY hard not to nest roles within roles○ If you have to do it (and we did) use metadata extensively
Lessons Learned With AnsiblePart 2
● Referring users to the Ansible documentation generally does not work○ An internal documentation site is required○ We’ve found the videos linked from the Ansible site are also quite good
● Forcing Code Reviews on Pull Requests is one way to stay sane
Lessons Learned With Tower● The user and group permission inheritance is basic, so be very deliberate in
initial configuration● Creating Workflow Templates with Surveys at each step is not intuitive in the
editor - document this well to make it easier on users● Users generally dislike the Job Template output in Tower (as well as Ansible
core), so put in a lot of error checking and breakpoints to make troubleshooting easier
Lessons Learned with OpenStack● Ironic is gaining steam as a usecase, but most documentation revolves
around VMs.● Because of this non-entirely-standard-usecase, deployment was more
complex than usual.
WHAT’S NEXT?!?!?● Build a front end for system selection via hardware characteristics through
OpenStack Introspection○ Possibly CloudForms○ Possibly something else
● Extend multitenancy to external users● World Domination
DISCLAIMER AND ATTRIBUTIONS
DISCLAIMERThe information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
Timelines, roadmaps, and/or product release dates shown in these slides are plans only and subject to change. "Rome" and "Naples" are codenames for AMD architectures, and are not product names. GD-122
ATTRIBUTION©2019 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
Red Hat, the Red Hat logo and Ansible are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in the U.S. and other countries. The OpenStack Word Mark is either a registered trademark/service mark or trademark/service mark of the OpenStack Foundation, in the United States and other countries, and is used with the OpenStack Foundation's permission. Ironic an OpenStack Community Project, used with permission from https://creativecommons.org/licenses/by/4.0/. No endorsement is implied. Red Hat and AMD are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
SPEC SERT ™ is a trademark of, and SPEC ® , SPEC CPU ® , SPECjbb ® , SPECpower_ssj ® , and SPEC VIRT_SC ® are registered trademarks of, the Standard Performance Evaluation Corporation.
Tweet by Ryan Metz used by permission (https://twitter.com/RyanAEMetz/status/1169702852970856450?s=20 )