The Reluctant SysAdmin : 360|iDev Austin 2010
-
Upload
voxilate -
Category
Technology
-
view
362 -
download
0
Transcript of The Reluctant SysAdmin : 360|iDev Austin 2010
The Reluctant SysAdmin
Managing the Server side of a Client-Server iPhone App
Jen Harvey, Voxilate@jen_h
360|iDev AustinNov 10 2010
Wednesday, November 17, 2010
• Me: Network security background, OSS & Linux fangirl
• Currently: Co-founder of Voxilate with Steven Hugg
• Last year: Traveling the country while bootstrapping the company, building iPhone apps on the road
A little background...
Wednesday, November 17, 2010
HeyTell
• HeyTell Voice Messenger allows users to share short voice messages & location
• Released February 2010
• Have been building, managing, deploying, re-deploying, updating, expanding, scaling on the road ever since...
Wednesday, November 17, 2010
• Map of travels, 360iDev San Jose!
360|iDev San Jose!
Wednesday, November 17, 2010
Current Objectives• Keep over 1 million users happy &
using our app
• Maintain respectable uptime & performance while adding new features & expanding our reach
• Get a little sleep at night
• Share what we’ve learned so that others who embark on similar journeys can also sleep!
Wednesday, November 17, 2010
Agenda
• Why a Server?
• Choose Your Poison
• Build It Out
• Lock It Down
• Maintain & Monitor
Wednesday, November 17, 2010
So...why would you want to run a server
component?
Wednesday, November 17, 2010
Metrics!• What metrics are valuable to you?
• Number of total users
• Number of active users per day/month/year
• Number of whatever-it-is-you-do all day (for us, submitted messages)
• Number of customers vs. users
• Busiest times of day/week/month?
grep is
awesome
Wednesday, November 17, 2010
Track app usage & errors
•Speed customer support
•Understand how users really use your app
•Be alerted when errors occur
•Really useful for beta testing to determine app viability
Wednesday, November 17, 2010
Provide value-added content
• Virtual goods or in-app purchase goodies
• User-to-User or User-to-Public content sharing
• Run your own analytics or ad servers
Wednesday, November 17, 2010
Basic Web Server
• Informational site for game
• Customer service site
• FAQ hosting
• Note: This is not what we’re focusing on in this talk, but the info here is pretty general purpose! :)
Wednesday, November 17, 2010
Control your own Push Notifications
• Don’t need an external service (free)
• Can be a little painful to set up, but resources & libraries exist on web for PHP, Java, Python, Ruby...
• Additional insight when users run into Push Notification issues
Wednesday, November 17, 2010
Your systems
Apple’s systems
iPhone Client
App Store Receipt checking
• Verify user is a customer before enabling feature
• To gather real-time statistics: piracy trends, conversion rates for freemium apps
Wednesday, November 17, 2010
#1 reason to use a server component?
Wednesday, November 17, 2010
Your server is your app’s engine
Image courtesy of Richard Smith/gocarts on flickr: http://flickr.com/gocartsWednesday, November 17, 2010
Choose Your Poison
Wednesday, November 17, 2010
We’re lucky! So many hosting options!
Wednesday, November 17, 2010
Cloud: Infrastructure as a Service
• Pay-as-you-go systems deployment
• Amazon Web Services (EC2, S3, RDS, ELB, ...)
• Microsoft Azure
• VMWare vCloud
• Rackspace Cloud (formerly Mosso)
• ...Wednesday, November 17, 2010
Cloud: Platform as a Service• Write your app for the platform,
interact via API, provider handles scaling and administrative tasks:
• Heroku (for Ruby enthusiasts, built on EC2)
• Google App Engine (Java, Python, JRuby...)
• Engine Yard (Ruby)
• ...Wednesday, November 17, 2010
Virtual Private Servers (VPS)
• You pay for a dedicated server, sometimes a VM, sometimes hardware
• Rackspace
• Slicehost
• Linode
• ...
Wednesday, November 17, 2010
Your Mom’s Basement
• Or your office.
• You don’t find sleep essential, do you?
• (No, really, this is fantastic if you have a large team & money to build out...but as an indie, you are likely to have neither)
Wednesday, November 17, 2010
Considerations
• What’s your preferred language & OS? Write and work with what you know!
• How much responsibility/flexibility/portability do you want/need to have?
• What’s your budget? GAE & AWS have free tiers to give you a taste & likely have enough horsepower to start with.
Wednesday, November 17, 2010
My advice:Go with what you
know & feel comfortable with
Wednesday, November 17, 2010
We chose Amazon Web Services
• Quick & flexible & full of building blocks:
• Load balancers
• Hosted MySQL & SimpleDB
• Multiple availability zones
• Lots of h/w & memory configs
• S3 redundant storageWednesday, November 17, 2010
And...• Great APIs: Command line tools & lots
of libraries
• Can script anything or integrate w/web app
• Can do some management tasks from phone
• Huge user community - many ways to obtain support
Wednesday, November 17, 2010
Also...
• Quick & simple to prototype system architecture
• Easy to bring up identical-to-production test beds with same configuration as production - but with discrete & separate security grouping
• Published Service Level Agreement and Security Practices documentation
Wednesday, November 17, 2010
Cons
• Handle scaling (& everything else) yourself - just because your app is “in the cloud,” doesn’t mean it automatically scales
• Harder to set up, pre-built machine images available, but still need to customize/secure
• Instances are ephemeral (but I like this because of the way it forces you to architect)
Wednesday, November 17, 2010
Build It Out
Wednesday, November 17, 2010
A note on scaling early• Be prepared to do it
• Know it’s coming if you’re successful and architect/code with the understanding that you’re the guy/gal who’s going to have to make it work when it comes
• Don’t overarchitect early on
• Slow, hypeless ramp-up & predictable viral growth can help here
Wednesday, November 17, 2010
Cool! We have a Enterprise-Grade(TM)horizontal webscale scaling solution!
Uh, it’s getting corrupted every
12 hours.
SHUTDOWN
EVERYTHING
Wednesday, November 17, 2010
Build with security in mind
• Develop & build your custom software with security in mind
• You know what anomalous behavior is/can be
• Put on the adversary’s hat - what could they do? What’s the worst outcome? Is it worth building in protection for certain scenarios?
Wednesday, November 17, 2010
The Voltron Principle
Individual components join to build the ultimate defender of the universe
Wednesday, November 17, 2010
• Single Linux-based machine image we use to build everything on top of
• Document changes for future migrations (I ♥ script)s
• On deployment, bolt-on the pieces we need & config changes
• If a host goes down, we can bring up an identical host in known state in minutes, swap out their IPs and run the post-mortem once we’ve normalized
Voltron Core
Wednesday, November 17, 2010
• Essential logs & configuration files periodically stored on S3
• Rotate logs frequently, especially as you grow
• Don’t store passwords or keys in configs, populate these on deploy (I abuse sed, you may use something more elegant)
Wednesday, November 17, 2010
Load balancer
Security Infrastructure
DatabasePersistent
storage Cache
Base AMI
Application Core
Notification server
Wednesday, November 17, 2010
But some days...
Wednesday, November 17, 2010
Assume everything will fail
Wednesday, November 17, 2010
Ready to setup our new domain name?
Hey, do CNAMEs have a “.” at the
end?
D’OH!Let’s wait 2
hours for it to expire...
Wednesday, November 17, 2010
Find your possible points of failure (rusty robot joints)
• DNS - if your hostname doesn’t resolve, your app can’t get home
• Are backups working?
• Storage and/or database - what happens when/if they go away?
• DDoS (intentional or not...)
Wednesday, November 17, 2010
• Deal with small amounts of failure gracefully (cache, limited functionality)
• Don’t put your web server & application server components on the same *anything*
Wednesday, November 17, 2010
But you will, without a doubt, run into a ‘flesh wound’ issue
Wednesday, November 17, 2010
How you handle it is pivotal
Wednesday, November 17, 2010
The database is bogged down. I think this one feature is causing it.
Does anyone even know we have that
feature?
That feature’s GONE!
Wednesday, November 17, 2010
• Respond to customer support emails (have cut & pastable friendly response - small team has no time for personal emails in crisis)
• You may feel like it’s the end of the world, but this, too, shall pass
Customer Communication == Key
Wednesday, November 17, 2010
Hey, guys, Justin Bieber just announced he’s using us on Twitter!
Cool. Who’s that?
Gah! Server’s melted! Users
revolt!
Wednesday, November 17, 2010
Helpful tip for high-traffic systems
• If you’re looking to max out connections on a single Linux-based system, think about:
• Memory & file handles (see also: ulimit tweaking)
• Connection tracking as relates to memory (look up netfilter/tcp stack tweaking)
Wednesday, November 17, 2010
Lock it Down
Wednesday, November 17, 2010
Yes, security is your problem
• If you are storing users personal information, you are subject to laws and regulations in the US, specific states, and foreign countries
• Many jurisdictions define personal information differently
• Most regulations require a written policy and best practices for security
Wednesday, November 17, 2010
So what’s best practices?
• Secure your perimeter
• Secure your services
• Detect, alert on, and block suspicious activity
• Protect your users and encrypt user information in transit and at rest
• Have written policies and plans
Wednesday, November 17, 2010
Secure Your Perimeter• AWS has (at least) two walls
• One is its “security group” context
• One is your image’s local firewall
• Block everything by default, open only the ports you need
• No root login
• Passwordless login only (use key pairs)
Wednesday, November 17, 2010
Secure your services
• Services should not run as root (for ex., www-data for apache2)
• Service usernames should not have shell login access
• Monitor for security vulnerabilities & upgrade when needed
• Build security into custom software
Wednesday, November 17, 2010
For host-based intrusion detection - I love OSSEC:
• Quick & easy, lightweight, Open Source, free
• Alerts on logs - extensive default ruleset but can customize alerting for your specific app
• Daily Tripwire & rootkit checks
• Active response: can block IPs on suspicious behavior
Detect & Alert
Wednesday, November 17, 2010
• If you need to store user information, encrypt in transit and at rest
• If you need data from your systems locally, use encryption end-to-end -- down to encrypting your drive
• Use SSL in the great wide world, it’s not that hard!
Protect Your Users
Wednesday, November 17, 2010
Why use SSL?
• Protects your users from sending personal data over the Internet in the clear
• Protects you from neophyte reverse engineers
Wednesday, November 17, 2010
On Using SSL
• EC2 Load Balancer now allows SSL termination - https to the LB, http inside data center
• Small & bootstrapped like us? Use StartSSL - free certs. Go to someone like DigiCert for nifty wildcard certs once you’ve got the resources.
Wednesday, November 17, 2010
The CCATS Issue• These guys rule: http://www.zetetic.net/
blog/2009/08/03/mass-market-encryption-commodity-classification-for-iphone-applications-in-8-easy-steps/
• Can deploy to US & Canada immediately, then expand reach after approval
• Took us just over a month to obtain
• Check w/Apple first; may not be required anymore.
Wednesday, November 17, 2010
User Passwords
• Many users will use the same password for everything--banks & FourSquare.
• There’s nothing you can do about it.
• Databases full of email addresses and passwords are attractive targets for this reason
Wednesday, November 17, 2010
Don’t be an attractive target...don’t make personal information necessary to use the service, if at all possible
Wednesday, November 17, 2010
Allow mechanisms for users to update or clear their information at any time without your intervention
Wednesday, November 17, 2010
Whenever possible, educate users about protecting their privacy (this leads to all good - more educated users, fewer complaints, more trust, more goodwill, more/happier users!)
Wednesday, November 17, 2010
• Have a policy for purging data/accounts/etc. that you don’t need and follow it
• Automate this or build it into the app if you can
• Have a written policy for data breaches and intrusions
• Write down instructions for yourself-- this’ll keep you sane if you ever have a real breach or a false alarm
Wednesday, November 17, 2010
• Keep a list of the services you use, one quick & dirty thing to do is scrape vulnerability feeds like feed://nvd.nist.gov/download/nvd-rss.xml for your service names
• When security issues are reported and new versions released, patch out of band, test, replace (pretty easy to do with EC2!)
Wednesday, November 17, 2010
Your users & piece of mind are totally
worth it! This will save you
time & sanity in the long run
Wednesday, November 17, 2010
Maintain & Monitor
Wednesday, November 17, 2010
Planning Maintenance
• If you can, use a load balancer and switch out backend servers
• Have backup systems in working state for fall-back
• Track usage statistics throughout your app’s lifetime - schedule maintenance for “slowest” time period
Wednesday, November 17, 2010
00 02 04 06 08 10 12 14 16 18 20 22
SundayMondayTuesdayWednesdayThursdayFridaySaturday
The User Rollercoaster(# connections/hour, GMT)
Sunday, 11:00 GMT it is, then.
Wednesday, November 17, 2010
Keep a Calendar
• Keep a calendar of important dates:
• Developer certificate expirations
• SSL certificate expiration
• APNS certificate expiration
• Domain name registry expiration
Wednesday, November 17, 2010
Monitor Uptime• Check out Pingdom - set thresholds to
be alerted when servers are slow or inaccessible
• Configure OSSEC to alert on conditions that precipitate an “issue”
• Set alerts or automated account recharges for *everything* that could block app functionality
• Make sure someone’s always accessible
Wednesday, November 17, 2010
Hey, the server’s down. Where are you guys?
I’m on a BOAT!
I’m on a PLANE, yo!
Wednesday, November 17, 2010
HeyTell Systems Uptime
97
97.75
98.5
99.25
100
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Wednesday, November 17, 2010
Downtime, Minutes
0
175
350
525
700
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
99.2% uptime == 350 minutes == 5.8 hours!!
Wednesday, November 17, 2010
Total Downtime:1.5 days
Uptime: 99.48%
(uptime sounds sexy; downtime...not so much)
Wednesday, November 17, 2010
Managing on the Run
• Phone SSH client (CommandBot on Droid, iSSH on iPhone)
• EC2 Management client (Decaf on Droid, iAWSManager on iPhone)
• Separate Support Account email setup on phone
• Notepad app with customer support FAQ answers
Wednesday, November 17, 2010
Other lifesavers on the run
• Reliable 3G service
• Mobile broadband card and/or tethering setup
• Netbook or small laptop
Wednesday, November 17, 2010
Summary• On hosting: Go with what you know
• Architect with failure & future scaling issues in mind
• Lock it down: Keep your data & your users safe
• Monitoring & maintenance: Make your systems work for you
• Good luck! You can do it!
Wednesday, November 17, 2010
References & Links
OSSEC: http://ossec.net
Pingdom uptime cheatsheet: http://royal.pingdom.com/2009/03/24/a-handy-uptime-and-downtime-conversion-cheat-sheet/
AWS Free Tier info: http://aws.amazon.com/free/
AWS Security Doc: http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper.pdf
TCP stack tweakage: http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
Wednesday, November 17, 2010