From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" •...
Transcript of From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" •...
![Page 1: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/1.jpg)
From Development to Deployment(ESaaS §12.1)!
© 2013 Armando Fox & David Patterson, all rights reserved
![Page 2: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/2.jpg)
Outline of topics"
• Continuous integration & continuous deployment"
• Upgrades & feature flags"• Availability & responsiveness"• Monitoring"• Relieving pressure on the database"• Defending customer data"
![Page 3: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/3.jpg)
Development vs. Deployment"
Development:"• Testing to make sure your app works as
designed"Deployment:"• Testing to make sure your app works when
used in ways it was not designed to be used"
![Page 4: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/4.jpg)
Bad News"
• “Users are a terrible thing”"• some bugs only appear under stress"• production environment != development
environment"• the world is full of evil forces"• and idiots"
![Page 5: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/5.jpg)
Good News:PaaS makes deployment way easier"
• get Virtual Private Server (VPS), maybe in cloud"
• install & configure Linux, Rails, Apache, mysqld, openssl, sshd, ipchains, squid, qmail, logrotate…"
• fix almost-weekly security vulnerabilities"• find yourself in Library Hell"• tune all moving parts to get most bang for
buck"• figure out how to automate horizontal scaling"
![Page 6: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/6.jpg)
Our goal: stick with PaaS!"
Is this really feasible?"• Pivotal Tracker & Basecamp each run on a
single DB (128GB commodity box <$10K)"• Many SaaS apps are not world-facing
(internal or otherwise limited interest)"
PaaS handles…! We handle…!“Easy” tiers of horizontal scaling" Minimize load on database"Component-level performance tuning"
Application-level performance tuning (e.g. caching)"
Infrastructure-level security" Application-level security"
![Page 7: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/7.jpg)
“Performance & security” defined"
• Availability or Uptime"What % of time is site up & accessible?!
• Responsiveness"– How long after a click does user get response?"
• Scalability"– As # users increases, can you maintain responsiveness
without increasing cost/user?"• Privacy"
– Is data access limited to the appropriate users?"• Authentication"
– Can we trust that user is who s/he claims to be?"• Data integrity"
– Is users’ sensitive data tamper-evident?"
Performance
Stability!Security!
![Page 8: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/8.jpg)
P ≥ min (C, H, R)
P ≤ C ≤ min(H, R)
Can’t tell without additional information
P ≤ C ≤ H ≤ R ☐
☐
☐
☐
8"
Let R = RottenPotatoes app's availability H = Heroku's availability C = Internet connection availability P = Armando's perception of RP availability"Which relationship among these quantities holds?"
![Page 9: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/9.jpg)
Quantifying Availability and Responsiveness
(ESaaS §12.2)!
© 2013 Armando Fox & David Patterson, all rights reserved
![Page 10: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/10.jpg)
Availability and Response time"
• Gold standard: US public phone system, 99.999% uptime (“five nines”)"– Rule of thumb: 5 nines ~ 5 minutes/year"– Since each nine is an order of magnitude, 4
nines ~ 50 minutes/year, etc."– Good Internet services get 3-4 nines"
• Response time: how long after I interact with site do I perceive response?"– For small content on fast network, dominated by
latency (not bandwidth)"
![Page 11: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/11.jpg)
Is response time important?"• How important is response time?*"
– Amazon: +100ms => 1% drop in sales"– Yahoo!: +400ms => 5-9% drop in traffic"– Google: +500ms => 20% fewer searches"
• Classic studies (Miller 1968, Bhatti 2000)"<100 ms is “instantaneous”">7 sec is abandonment time"
• http://code.google.com/speed"11"Source: Nicole Sullivan (Yahoo! Inc.), Design Fast Websites, http://www.slideshare.net/stubbornella/designing-fast-websites-presentation
Jeff Dean, Google Fellow"
“Speed is a feature”"
![Page 12: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/12.jpg)
Simplified (& false) view of performance"
• For standard normal distribution of response times around mean: ±2 standard deviations around mean is 95% confidence interval"
12"
• Average response time T means: • 95%ile users are getting T+2σ • 99.7% users get T+3σ"
![Page 13: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/13.jpg)
A real example"
25%"50%"(median)"
75%" 95%"Mean"
Courtesy Bill Kayser, Distinguished Engineer, New Relic. http://blog.newrelic.com/breaking-down-apdex Used with permission of the author.
![Page 14: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/14.jpg)
Service Level Objective (SLO)"• Time to satisfy user request
(“latency” or “response time”)"• SLO: Instead of worst case or average: what % of
users get acceptable performance"• Specify %ile, target response time, time window"
– e.g., 99% < 1 sec, over a 5 minute window"– why is time window important?"
• Service level agreement (SLA) is an SLO to which provider is contractually obligated"
14
![Page 15: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/15.jpg)
Apdex: simplified SLO"
• Given a threshold latency T for user satisfaction:"– Satisfactory requests take t≤T"– Tolerable requests take T≤ t ≤ 4T"– Apdex = (#satisfactory + 0.5(#tolerable)) / #reqs"– 0.85 to 0.93 generally “good”"
• Warning! Can hide systematic outliers if not used carefully!"– e.g. critical action occurs once in every 15 clicks
but takes 10x as long => (14+0)/15 > 0.9"
![Page 16: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/16.jpg)
Apdex Visualization"
T=1500ms, Apdex = 0.7"
![Page 17: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/17.jpg)
Apdex Visualization"
T=1000ms, Apdex = 0.49"
![Page 18: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/18.jpg)
What to do if site is slow?"
• Small site: overprovision"– applies to presentation & logic tier"– before cloud computing, this was painful"– today, it’s largely automatic (e.g. Rightscale)"
• Large site: worry"– Provision 1,000-computer site by 10% = 100
idle computers"• Insight: same problems that push us out of
PaaS-friendly tier are the ones that will dog us when larger!!
![Page 19: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/19.jpg)
RottenPotatoes can still meet its uptime goal if there are no further outages this year If no users actually tried to get to the site during the outage, uptime wasn’t hurt There isn’t enough information to determine whether RottenPotatoes can meet its user-perceived uptime goal
Because of the outage, RottenPotatoes has no hope of meeting its uptime goal this year
☐
☐
☐
☐
19"
RottenPotatoes’ target uptime is 99.9%. Yesterday there was a one hour outage. Which statement is true:
![Page 20: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/20.jpg)
Continuous Integration & Continuous Deployment
(ESaaS §12.3)!
© 2013 Armando Fox & David Patterson, all rights reserved
![Page 21: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/21.jpg)
Releases Then and Now:Windows 95 Launch Party"
![Page 22: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/22.jpg)
Releases Then and Now"• Facebook: master branch pushed once a week,
aiming for once a day (Bobby Johnson, Dir. of Eng., in late 2011)"
• Amazon: several deploys per week"• StackOverflow: multiple deploys per day (Jeff
Atwood, co-founder)"• GitHub: tens of deploys per day (Zach Holman)"• Rationale: risk == # of engineer-hours invested in
product since last deploy!"Like development and feature check-in, deployment
should be a non-event that happens all the time!
![Page 23: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/23.jpg)
Successful Deployment"
• Automation: consistent deploy process"– PaaS sites like Heroku, CloudFoundry
already do this"– Use tools like Capistrano for self-hosted sites"
• Continuous integration: integration-testing the app beyond what each developer does"– Pre-release code checkin triggers CI"– Since frequent checkins, CI always running"– Common strategy: integrate with GitHub"
https://github.com/saasbook/hw2_rottenpotatoes/admin/hooks
![Page 24: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/24.jpg)
Why CI?"
• Differences between dev & production envs"• Cross-browser or cross-version testing"• Testing SOA integration when remote
services act wonky"• Hardening: protection against attacks"• Stress testing/longevity testing of new
features/code paths"• Example: Salesforce CI runs 150K+ tests
and automatically opens bug report when test fails"
![Page 25: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/25.jpg)
Continuous Deployment"
• Push => CI => deploy several times per day"– deploy may be auto-integrated with CI runs"
• So are releases meaningless?"– Still useful as customer-visible milestones"– “Tag” specific commits with release names" git tag 'happy-hippo' HEAD git push --tags"
– Or just use Git commit ID to identify release"
![Page 26: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/26.jpg)
In CI
In the staging environment
All of these
Using autotest with RSpec+Cucumber ☐
☐
☐
☐
26"
RottenPotatoes just got some new AJAX features. Where does it make sense to test these features?
![Page 27: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/27.jpg)
Upgrades & Feature Flags(ESaaS §12.4)!Armando Fox"
© 2013 Armando Fox & David Patterson, all rights reserved
![Page 28: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/28.jpg)
The trouble with upgrades"
• What if upgraded code is rolled out to many servers?"– During rollout, some will have version n and
others version n+1…will that work?"• What if upgraded code goes with schema
migration?"– Schema version n+1 breaks current code"– New code won’t work with current schema"
![Page 29: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/29.jpg)
Naïve update"
1. Take service offline"2. Apply destructive migration, including data
copying"3. Deploy new code"4. Bring service back online"
• May result in unacceptable downtime"
http://pastebin.com/5dj9k1cj
![Page 30: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/30.jpg)
Incremental Upgrades with Feature Flags"
1. Do nondestructive migration"2. Deploy method protected by feature flag"
3. Flip feature flag on; if disaster, flip it back"4. Once all records moved, deploy new code
without feature flag"5. Apply migration to remove old columns"
http://pastebin.com/TYx5qaSB
http://pastebin.com/qqrLfuQh
![Page 31: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/31.jpg)
“Undoing” an upgrade"
• Disaster strikes…use down-migration? "– is it thoroughly tested?"– is migration reversible?"– are you sure someone else didn’t apply an
irreversible migration?"• Use feature flags instead"
– downmigrations are primarily for development"
![Page 32: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/32.jpg)
Other uses for feature flags"
• Preflight checking: gradual rollout of feature to increasing numbers of users"– to scope for performance problems, e.g."
• A/B testing"• Complex feature whose code spans multiple
deploys"• rollout gem (on GitHub) covers these
cases and more!
![Page 33: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/33.jpg)
A column in an existing database table
A separate database table
These are all good places to store feature-flag values
A YAML file in config/ directory of app ☐
☐
☐
☐
33"
Which one, if any, is a POOR place to store the value (eg true/false) of a feature flag?
![Page 34: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/34.jpg)
Monitoring (ESaaS §12.5)!Armando Fox"
© 2013 Armando Fox & David Patterson, all rights reserved
![Page 35: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/35.jpg)
Kinds of monitoring"
• “If you’re not monitoring it, it’s probably broken”"
• At development time (profiling)"– Identify possible performance/stability problems
before they get to production"• In production"
– Internal: instrumentation embedded in app and/or framework (Rails, Rack, etc.)"
– External: active probing by other site(s)."
![Page 36: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/36.jpg)
Why use external monitoring?"
• Detect if site is down"• Detect if site is slow for reasons outside
measurement boundary of internal monitoring"
• Get user’s view from many different places on the Internet"
• Example: Pingdom"
![Page 37: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/37.jpg)
Internal monitoring"
• pre-SaaS/PaaS: local"– Info collected & stored locally, eg Nagios"
• Today: hosted"– Info collected in your app but stored centrally"– Info available even when app is down"
• Example: New Relic"– conveniently, has both a development mode
and production mode"– basic level of service is free for Heroku apps "
![Page 38: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/38.jpg)
Kinds of monitoring"
![Page 39: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/39.jpg)
Sampling of monitoring tools"What is monitored! Level! Example tool ! Hosted!Availability" site" pingdom.com" Yes"Unhandled exceptions"
site" airbrake.com" Yes"
Slow controller actions or DB queries"
app" newrelic.com (also has dev mode)"
Yes"
Clicks, think times" app" Google Analytics" Yes"Process health & telemetry (MySQL server, Apache, etc.)"
process" god, monit, nagios" No"
• Interesting: Customer-readable monitoring features with cucumber-newrelic" http://pastebin.com/TaecHfND
![Page 40: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/40.jpg)
What to measure?"
• Stress testing or load testing: how far can I push my system..."– ...before performance becomes unacceptable?"– ...before it gasps and dies?"
• Usually, one component will be bottleneck!– a particular view, action, query, …"
• Load testers can be simple or sophisticated"– bang on a single URI over and over"– do a fixed sequence of URI’s over and over"– play back a log file" 40"
![Page 41: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/41.jpg)
Longevity Bugs"
• Resource leak (RAM, file buffers, sessions table) is classic example"
• Some infrastructure software such as Apache already does rejuvenation "– aka “rolling reboot”"
• Related: running out of sessions"– Solution: store whole session[] in cookie (Rails
3 does this by default)"
![Page 42: From Development to Deployment · 2014. 1. 11. · • Cross-browser or cross-version testing" • Testing SOA integration when remote services act wonky" • Hardening: protection](https://reader034.fdocuments.us/reader034/viewer/2022051822/5fecc4de8cc9b444ea389e3b/html5/thumbnails/42.jpg)
Maximum CPU utilization
99%ile response time
Rendering time of 3 slowest views
Slowest queries ☐
☐
☐
☐
42"
Which is probably not a metric of high interest to you, the app operator?