Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
-
Upload
puppet-labs -
Category
Technology
-
view
1.311 -
download
6
description
Transcript of Puppet Availability and Performance at 100K Nodes - PuppetConf 2014
![Page 1: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/1.jpg)
puppet @ 100,000+ agents
John Jawed (“JJ”)eBay/PayPal
![Page 2: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/2.jpg)
issues ahead encountered at <1000 agents
but I don’t have 100,000 agents
![Page 3: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/3.jpg)
me
responsible for Puppet/Foreman @ eBay
how I got here:
engineer -> engineer with root access -> system/infrastructure engineer
![Page 4: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/4.jpg)
free time: PuppyConf
![Page 5: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/5.jpg)
puppet @ eBay, quick facts
-> perhaps the largest Puppet deployment-> more definitively the most diverse-> manages core security-> trying to solve the “p100k” problems
![Page 6: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/6.jpg)
#’s• 100K+ agents– Solaris, Linux, and Windows– Production & QA– Cloud (openstack & VMware) + bare metal
• 32 different OS versions, 43 hardware configurations– Over 300 permutations in production
• Countless apps from C/C++ to Hadoop– Some applications over 15+ years old
![Page 7: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/7.jpg)
currently• 3-4 puppet masters per data center• foreman for ENC, statistics, and fact collection• 150+ puppet runs per second• separate git repos per environment, common core
modules– caching git daemon used by ppm’s
![Page 8: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/8.jpg)
![Page 9: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/9.jpg)
nodes growing, sometimes violently
linear growth trendline
![Page 10: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/10.jpg)
![Page 11: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/11.jpg)
setup puppetmasters
setup puppet master, it’s the CA too
sign and run 400 agents concurrently, that’s less than half a percent of all the nodes you need to get through.
![Page 12: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/12.jpg)
![Page 13: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/13.jpg)
not exactly puppet issues
entropy unavailablecrypto is CPU heavy (heavier than you ever have and still believe)passenger children are all busy
![Page 14: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/14.jpg)
OK, let’s setup separate hosts which only function as a CA
![Page 15: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/15.jpg)
multiple dedicated CA’s
much better, distributed the CPU I/O and helped the entropy problem.
the PPM’s can handle actual puppet agent runs because they aren’t tied up signing. Great!
![Page 16: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/16.jpg)
wait, how do the CA’s know about each others certs?
some sort of network file system (NFS sounds okay).
![Page 17: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/17.jpg)
shared storage for CA cluster-> Get a list of pending signing requests (should be small!)# puppet cert list…wait…wait…
![Page 18: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/18.jpg)
![Page 19: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/19.jpg)
optimize CA’s for large # of certs
Traversing a large # of certs is too slow over NFS.
-> Profile-> Implement optimization-> Get patch accepted (PUP-1665, 8x improvement)
![Page 20: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/20.jpg)
<3 puppetlabs team
![Page 21: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/21.jpg)
optimizing foreman- read heavy is fine, DB’s do it well.- read heavy in a write heavy environment is more challenging.
- foreman writes a lot of log, fact, and report data post puppet run.- majority of requests are to get ENC data
- use makara with PG read slaves (https://github.com/taskrabbit/makara) to scale ENC requests
- Needs updates to foreigner (gem)- If ENC requests areslow, puppetmasters fall over.
![Page 22: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/22.jpg)
optimizing foreman
ENC requests load balanced to read slaves
fact/report/host info write requests sent to master
makara knows how to arbitrate the connection (great job TaskRabbit team!)
![Page 23: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/23.jpg)
more optimizationsmake sure RoR cache is set to use dalli (config.cache_store = :dalli_store), see foreman wiki
fact collection optimization (already in upstream), without this reporting facts back to foreman can kill a busy puppetmaster! (if you care: https://github.com/theforeman/puppet-foreman/pull/145)
![Page 24: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/24.jpg)
<3 the foreman team
![Page 25: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/25.jpg)
let’s add more nodes
Adding another 30,000 nodes (that’s 30% coverage).
Agent setup: pretty standard stuff, puppet agent as a service.
![Page 26: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/26.jpg)
results
average puppet run: 29 seconds.
not horrible. but average latency is a lie because that usually represents the mean average (sum of N / N).
the actual puppet run graph looks more like…
![Page 27: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/27.jpg)
curve impossible
No one in operations or infrastructure ever wants a service runtime graph like this.
meanaverage
![Page 28: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/28.jpg)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16765 puppet 20 0 341m 76m 3828 S 53.0 0.1 67:14.92 ruby 17197 puppet 20 0 343m 75m 3828 S 40.7 0.1 62:50.01 ruby 17174 puppet 20 0 353m 78m 3996 S 38.7 0.1 70:07.44 ruby 16330 puppet 20 0 338m 74m 3828 S 33.8 0.1 66:08.81 ruby 17231 puppet 20 0 344m 75m 3820 S 29.8 0.1 70:00.47 ruby 17238 puppet 20 0 353m 76m 3996 S 29.8 0.1 69:11.94 ruby 17187 puppet 20 0 343m 76m 3820 S 26.2 0.1 70:48.66 ruby 17156 puppet 20 0 353m 75m 3984 S 25.8 0.1 64:44.62 ruby… system processes
PPM running @ medium load
![Page 29: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/29.jpg)
60 seconds later…idle PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17343 puppet 20 0 344m 77m 3828 S 11.6 0.1 74:47.23 ruby 31152 puppet 20 0 203m 9048 2568 S 11.3 0.0 0:03.67 httpd 29435 puppet 20 0 203m 9208 2668 S 10.9 0.0 0:05.46 httpd 16220 puppet 20 0 337m 74m 3828 S 10.3 0.1 70:07.42 ruby 16354 puppet 20 0 339m 75m 3816 S 10.3 0.1 62:11.71 ruby… system processes
![Page 30: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/30.jpg)
120 seconds later…thrashing PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16765 puppet 20 0 341m 76m 3828 S 94.0 0.1 67:14.92 ruby 17197 puppet 20 0 343m 75m 3828 S 93.7 0.1 62:50.01 ruby 17174 puppet 20 0 353m 78m 3996 S 92.7 0.1 70:07.44 ruby 16330 puppet 20 0 338m 74m 3828 S 90.8 0.1 66:08.81 ruby 17231 puppet 20 0 344m 75m 3820 S 89.8 0.1 70:00.47 ruby 17238 puppet 20 0 353m 76m 3996 S 89.8 0.1 69:11.94 ruby 17187 puppet 20 0 343m 76m 3820 S 88.2 0.1 70:48.66 ruby 17156 puppet 20 0 353m 75m 3984 S 87.8 0.1 64:44.62 ruby17152 puppet 20 0 353m 75m 3984 S 86.3 0.1 64:44.62 ruby17153 puppet 20 0 353m 75m 3984 S 85.3 0.1 64:44.62 ruby17151 puppet 20 0 353m 75m 3984 S 82.9 0.1 64:44.62 ruby… more ruby processes
![Page 31: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/31.jpg)
![Page 32: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/32.jpg)
what we really want
A flat consistent runtime curve, this is important for any production service.Without predictability there is no reliability!
![Page 33: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/33.jpg)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16765 puppet 20 0 341m 76m 3828 S 53.0 0.1 67:14.92 ruby 17197 puppet 20 0 343m 75m 3828 S 40.7 0.1 62:50.01 ruby 17174 puppet 20 0 353m 78m 3996 S 38.7 0.1 70:07.44 ruby 16330 puppet 20 0 338m 74m 3828 S 33.8 0.1 66:08.81 ruby 17231 puppet 20 0 344m 75m 3820 S 29.8 0.1 70:00.47 ruby 17238 puppet 20 0 353m 76m 3996 S 29.8 0.1 69:11.94 ruby 17187 puppet 20 0 343m 76m 3820 S 26.2 0.1 70:48.66 ruby 17156 puppet 20 0 353m 75m 3984 S 25.8 0.1 64:44.62 ruby… system processes
consistency @ medium load
![Page 34: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/34.jpg)
hurdle: runinterval
near impossible to get a flat curve because of uneven and chaotic agent run distribution.
runinterval is non-deterministic … even if you manage to sync up service times eventually it’s nebulous.
![Page 35: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/35.jpg)
the puppet agent daemon approach is not going to work.
![Page 36: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/36.jpg)
plan A: puppet via crongenerate run time based some deterministic agent data point (IP, MAC address, hostname, etc.).
IE, if you wanted a puppet run every 30 minutes, your crontab may look like:
08 * * * * puppet agent -t38 * * * * puppet agent -t
![Page 37: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/37.jpg)
plan A yields
Fewer and predictable spikes
![Page 38: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/38.jpg)
Improved.
But does not scale because cronjobs help run times become deterministic but lack even distribution.
![Page 39: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/39.jpg)
eliminate all masters? masterless puppet
kicking the can down the road, somewhere infrastructure still has to serve the files and catalog to agents.
masterless puppet creates a whole host of other issues (file transfer channels, catalog compiler host).
![Page 40: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/40.jpg)
eliminate all masters? masterless puppet
…and the same issues exists in albeit in different forms.
shifts problems to “compile interval” and “manifest/module push interval”.
![Page 41: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/41.jpg)
plan Z: increase your runinterval
Z, the zombie apocalypse plan (do not do this!).
delaying failure till you are no longer responsible for it (hopefully).
![Page 42: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/42.jpg)
alternate setups
SSL termination on load balancer – expensive- LB’s are difficult to deploy, cost more (you still
need fail over otherwise it’s a SPoF!)
caching – cache is meant to make things faster, not required to work. If cache is required to make services functional, solving the wrong problem.
![Page 43: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/43.jpg)
zen moment
maybe the issue isn’t about timing the agent from the host.
maybe the issue is that the agent doesn’t know when there’s enough capacity to reliably and predictably run puppet.
![Page 44: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/44.jpg)
enforcing states is delayed
runinterval/cronjobs/masterless setups still render puppet as a suboptimal solution in a state sensitive environment (customer and financial data).
the problem is not unique to puppet. salt, coreOS, et al. are susceptible.
![Page 45: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/45.jpg)
security trivia
web service REST3DotOh just got compromised and allows a sensitive file managed by puppet to be manipulated.
Q: how/when does puppet set the proper state?
![Page 46: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/46.jpg)
the how; sounds awesome
A: every puppet runs ensures that a file is in its’ intended state and records the previous state if it was not.
![Page 47: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/47.jpg)
the when; sounds far from awesome
A: whenever puppet is scheduled to run next. up to runinterval minutes from the compromise, masterless push, or cronjob execution.
![Page 48: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/48.jpg)
smaller intervals help but…
all the strategies have one common issue:
puppet masters do not scale with smaller intervals, exasperate spikes in the runtime curve.
![Page 49: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/49.jpg)
this needs to change
![Page 50: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/50.jpg)
pvc
“pvc” – open source & lightweight process for a deterministic and evenly distributed puppet service curve…
…and reactive state enforcement puppet runs.
![Page 51: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/51.jpg)
pvca different approach that executes puppet runs based on available capacity and local state changes.
pings from an agent to check if its’ time to run puppet.
file monitoring to force puppet runs when important files change outside of puppet (think /etc/shadow, /etc/sudoers).
![Page 52: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/52.jpg)
pvc
basic concepts:- Frequent pings to determine when to run puppet
- Tied in to backend PPM health/capacity- Frequent fact collection without needing to run puppet- Sensitive files should be subject to monitoring
- on change or updates outside of puppet, immediately run puppet!
- efficiency an important factor.
![Page 53: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/53.jpg)
pvc advantages
-> variable puppet agent run timing- allows the flat and predictable service curve (what we
want).- more frequent puppet runs when capacity is available,
less frequent puppet runs less capacity is available.
![Page 54: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/54.jpg)
pvc advantages
-> improves security (kind of a big deal these days)- puppet runs when state changes rather than waiting to
run.- efficient, uses inotify to monitor files.- if a file being monitored is changed, a puppet run is
forced.
![Page 55: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/55.jpg)
pvc advantages- orchestration between foreman & puppet
- controlled rollout of changes- upload facts between puppet runs into foreman
![Page 56: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/56.jpg)
pvc – backend3 endpoints – all get the ?fqdn=<certname> parameter
GET /host – should pvc run puppet or facter?POST /report – raw puppet run output, files monitored were changedPOST /facts – facter output (puppet facts in JSON)
![Page 57: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/57.jpg)
pvc – /host> curl http://hi.com./host?fqdn=jj.e.com
< PVC_RETURN=0< PVC_RUN=1< PVC_PUPPET_MASTER=puppet.vip.e.com< PVC_FACT_RUN=0< PVC_CHECK_INTERVAL=60< PVC_FILES_MONITORED="/etc/security/access.conf /etc/passwd"
![Page 58: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/58.jpg)
pvc – /facts
allows collecting of facts outside of the normal puppet run, useful for monitoring.
set PVC_FACT_RUN to report facts back to the pvc backend.
![Page 59: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/59.jpg)
pvc – git for auditingpush actual changes between runs into git
- branch per host, parentless branches & commits are cheap.
- easy to audit fact changes (fact blacklist to prevent spam) and changes between puppet runs.
- keeping puppet reports between runs is not helpful.
![Page 60: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/60.jpg)
pvc – incremental rolloutsselect candidate hosts based on your criteria and set an environment variable via the /host endpoint output:
FACTER_UPDATE_FLAG=true
in your manifest, check:
if $::UPDATE_FLAG {…
}
![Page 61: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/61.jpg)
example pvc.confhost_endpoint=http://jj.e.com./hostreport_endpoint=http://jj.e.com./reportfacts_endpoint=http://jj.e.com./factsinfo=1warnings=1
![Page 62: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/62.jpg)
pvc – available on github
$ git clone https://github.com/johnj/pvc
make someone happy, achieve:
![Page 63: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/63.jpg)
wishlist
stuff pvc should probably have:• authentication of some sort• a more general backend, currently tightly integrated
into internal PPM infrastructure health• whatever other users wish it had
![Page 64: Puppet Availability and Performance at 100K Nodes - PuppetConf 2014](https://reader036.fdocuments.us/reader036/viewer/2022081414/548204b1b07959150c8b4691/html5/thumbnails/64.jpg)
misc. lessons learnedyour ENC has to be fast, or your puppetmasters fail without ever doing anything.
upgrade ruby to 2.x for the performance improvements.
serve static module files with a caching http server (nginx).