Ops for Developers
-
Upload
mojo-lingo -
Category
Technology
-
view
325 -
download
5
description
Transcript of Ops for Developers
![Page 1: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/1.jpg)
Ops for DevelopersOr: How I Learned To Stop Worrying And Love The Shell
spkr8.com/t/13191
Friday, August 10, 12
![Page 2: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/2.jpg)
Prologue
Introductions
Friday, August 10, 12
![Page 4: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/4.jpg)
What are my passions?
• Telephony Applications
• Information Security
• Performance and Availability Design
• Open Source
Friday, August 10, 12
![Page 5: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/5.jpg)
What do I do?
• Today I write code and run Mojo Lingo
• But Yesterday...
Friday, August 10, 12
![Page 6: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/6.jpg)
This was my world
Friday, August 10, 12
![Page 7: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/7.jpg)
Ops Culture
Friday, August 10, 12
![Page 8: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/8.jpg)
“I am allergic to downtime”
Friday, August 10, 12
![Page 9: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/9.jpg)
It’s About Risk
• If something breaks, it will be my pager that goes off at 2am
• New software == New ways to break
• If I can’t see it, I can’t manage it or monitor it and it will break
Friday, August 10, 12
![Page 10: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/10.jpg)
Agenda• 9:00 - 10:30
• Operating Systems & Hardware
• All About Bootup
• 10:30 - 11:00: Break
• 11:00 - 12:30
• Observing a Running System
• Optimization/Tuning
• 12:30 - 1:30 Lunch
• 1:30 - 3:00
• Autopsy of an HTTP Request
• Dealing with Murphy
• 3:00 - 3:30: Break
• 3:30 - 5:00
• Scaling Up
• Deploying Apps
• Audience Requests
Friday, August 10, 12
![Page 11: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/11.jpg)
Part I
Operating Systems & Hardware
Friday, August 10, 12
![Page 12: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/12.jpg)
OS History LessonBSD, System V, Linux and Windows
Friday, August 10, 12
![Page 13: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/13.jpg)
UNICS(Sep. 1969)
Soon renamed “Unix Time Sharing System
Version 1”
UNIX Time Sharing System Version 5(Jun. 1974)
UNIX Sys III(Nov. 1981)
1BSD(Mar. 1978)
4.3BSD(Jun. 1986)
UNIX Sys V(Jan. 1983)
Friday, August 10, 12
![Page 14: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/14.jpg)
Friday, August 10, 12
![Page 15: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/15.jpg)
Hardware Components
Friday, August 10, 12
![Page 16: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/16.jpg)
Common Architectures
• Intel x86 (i386, x86_64)
• SPARC
• POWER
• ARM
• But none of this really matters anymore
Friday, August 10, 12
![Page 17: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/17.jpg)
CPU Configurations
• Individual CPU
• SMP: Symmetric Multi-Processing
• Multiple Cores
• Hyperthreading/Virtual Cores
Friday, August 10, 12
![Page 18: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/18.jpg)
(Virtual) Memory
• RAM + Swap = Available Memory
• Swapping strategies vary across OSes
• What your code sees is a complete virtualization of this
• x86/32-bit processes can only “see” 3GB of RAM from a 4GB address space
Friday, August 10, 12
![Page 19: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/19.jpg)
Storage Types
• Local Storage (SATA, SAS, USB, Firewire)
• Network Storage (NFS, SMB, iSCSI, AOE)
• Storage Network (FibreChannel, Fabrics)
Friday, August 10, 12
![Page 20: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/20.jpg)
Networking
• LAN (100Mb still common; 1Gbit standard; 10Gb and 100Gb on horizon)
• WAN (T-1, Frame Relay, ATM, MetroE)
• Important Characteristics
• Throughput
• Loss
• Delay
Friday, August 10, 12
![Page 21: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/21.jpg)
Part II
All About Bootup
Friday, August 10, 12
![Page 22: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/22.jpg)
Phases
• BIOS
• Kernel Bootstrap
• Hardware Detection
• Init System
Friday, August 10, 12
![Page 23: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/23.jpg)
System Services
• Varies by OS
• Common: SysV Init Scripts; /etc/inittab; rc.local
• Solaris: SMF
• Ubuntu: Upstart
• Debian: SysV default; Upstart optional
• OSX: launchd
• RedHat/CentOS: SysV Init Scripts
Friday, August 10, 12
![Page 24: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/24.jpg)
SysV Init Scripts
• Created in /etc/init.d; Symlinked into runlevel directories
• Symlinks prefixed with special characters to control startup/shutdown order
• Prefixed with “S” or “K” to start or stop service in each level
• Numeric prefix determines order
• /etc/rc3.d/S10sshd -> /etc/init.d/sshdFriday, August 10, 12
![Page 25: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/25.jpg)
rc.local
• Single “dumb” startup script
• Run at end of system startup
• Quick/dirty mechanism to start something at bootup
Friday, August 10, 12
![Page 26: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/26.jpg)
/etc/inittab
• The original process supervisor
• Not (easily) scriptable
• Starts a process in a given runlevel
• Restarts the process when it dies
Friday, August 10, 12
![Page 27: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/27.jpg)
Supervisor Processes
• Solaris SMF
• Ubuntu Upstart
• OSX launchd
• daemontools
Friday, August 10, 12
![Page 28: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/28.jpg)
Ruby Integrations
• Supervisor Processes
• Bluepill
• God
• Startup Script Generator
• Foreman
Friday, August 10, 12
![Page 29: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/29.jpg)
Choosing a Boot Mechanism
• Is automatic recovery desirable?(Hint: sometimes it’s not)
• Does it integrate with monitoring?
• Is it a one-off that will get forgotten?
• Does it integrate into OS startup/shutdown?
• How much work to integrate with your app?
Friday, August 10, 12
![Page 30: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/30.jpg)
Part III
Observing a Running System
Friday, August 10, 12
![Page 31: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/31.jpg)
Common Tools
• top
• free
• vmstat
• netstat
• fuser
• ps
• sar (not always installed by default)
Friday, August 10, 12
![Page 32: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/32.jpg)
Power Tools
• lsof
• iostat
• iftop
• pstree
• Tracing tools
• strace
• tcpdump/wireshark
Friday, August 10, 12
![Page 33: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/33.jpg)
Observing CPU
• Go-to tools: top, ps
• CPU is not just about computation
• Most Important:%user, %system, %nice, %idle, %wait
• Other: hardware/software interrupts, “stolen” time (especially on EC2)
Friday, August 10, 12
![Page 34: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/34.jpg)
The Mystical Load Avg.
• Broken into 1, 5 and 15 minute averages
• Gives a coarse view of overall system load
• Based on # processes waiting for CPU time
• Rule of thumb: stay below the number of CPUs in a system (eg. a 4 CPU host should be below a 4.00 load average)
Friday, August 10, 12
![Page 35: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/35.jpg)
When am I CPU bound?
• 15 minute load average exceeding the number of non-HT processors
• %user + %system consistently above 90%
Friday, August 10, 12
![Page 36: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/36.jpg)
Observing RAM
• Go-to tools: top, vmstat
• Available memory isn’t just “Free”
• Buffers + Cache fill to consume available RAM (this is a good thing!)
Friday, August 10, 12
![Page 37: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/37.jpg)
RAM vs. Swap
• RAM is the amount of physical memory
• Swap is disk used to augment RAM
• Swap is orders of magnitude slower
• Some VM types have no meaningful swap
• Rule of thumb: pretend swap doesn’t exist
Friday, August 10, 12
![Page 38: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/38.jpg)
Paging Strategies
• Solaris: Page in advance
• Linux: Page on demand (last resort)
• Windows: Craziness
Friday, August 10, 12
![Page 39: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/39.jpg)
When am I memory bound?
• Free + buffers + cache < 15% of RAM
• Swap utilization above 10% avail. swap (Linux only)
• Check for high disk utilization to confirm “thrashing”
Friday, August 10, 12
![Page 40: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/40.jpg)
Observing Disk
• Go-to tools: iostat, top
• Disk is usually hardest thing to observe
• Better in recent Linux kernels (> 2.6.20)
Friday, August 10, 12
![Page 41: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/41.jpg)
RAID
• Redundant Array of Inexpensive Drives
• Different strategies have different performance/durability tradeoffs
• RAID-0
• RAID-1
• RAID-10
• RAID-5
• RAID-6Friday, August 10, 12
![Page 42: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/42.jpg)
When am I disk bound?
• %wait is consistently above 10% to 20%
• ... though %wait can be network too
• SCSI and FC command queues are long
• Known failure mode: disk more than 85% full causes tremendous VFS overhead
Friday, August 10, 12
![Page 43: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/43.jpg)
Observing Network
• Go-to tools: netstat, iftop, wireshark
• Be wary of choke-points
• Switch interconnects
• WAN links
• Firewalls
Friday, August 10, 12
![Page 44: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/44.jpg)
Link Optimization
• Use Jumbo Frames for Gbit+ links
• Port aggregation for throughput:
• Best: many-to-many
• Good: one-to-many
• Useless: one-to-one
• ... but still useful for HA
Friday, August 10, 12
![Page 45: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/45.jpg)
When am I network bound?
• This one is easy: 99% of the time this is link saturation
• Gotchas: which link?
• Addendum: loss/delay (especially for TCP) can wreak havoc on throughput
• ... but usually only a problem across WAN
Friday, August 10, 12
![Page 46: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/46.jpg)
Part IV
Optimization & Performance Tuning
Friday, August 10, 12
![Page 47: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/47.jpg)
Hardware Options
• A.K.A. “Throw hardware at it”
• Not the first thing to try
• Are the services tuned? SQL queries, application behavior, caching options
• Is something broken, causing performance degradation?
Friday, August 10, 12
![Page 48: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/48.jpg)
Hardware Options
• RAM is usually the single biggest performance win (cost/benefit tradeoff)
• Faster disk is next best
• Then look at CPU and/or Network
• ...but do the work to figure out why your performance is limited in the first place
Friday, August 10, 12
![Page 49: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/49.jpg)
Kernel Tunables
• Not as necessary as in the “old days”
• Almost all settings can be adjusted at runtime on Linux, Solaris
• Most valuable settings are buffer limits or counters/timers
• There be dragons! Read carefully before twisting these knobs
Friday, August 10, 12
![Page 50: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/50.jpg)
Environment Settings
• ulimits
• max files
• stack size
• memory limits
• core dumps
• others
• Still subject to system-wide (kernel) limits
Friday, August 10, 12
![Page 51: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/51.jpg)
Environment limits
• Hard limits cannot be raised by unprivileged users
• PAM configuration may also be in effect
Friday, August 10, 12
![Page 52: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/52.jpg)
Application Tunables• There are not many for C-Ruby
• JVM has many
• Mostly related to how RAM is allocated and garbage collected
• Very dependent on application
• Any time an “xVM” is involved, there is probably a tunable (JVM, CLR)
• But we are developers! Tune/profile your app before looking to the environment
Friday, August 10, 12
![Page 53: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/53.jpg)
Performance Management Tools
• sysstat (sar)
• SNMP (and related tools like Cacti)
• Integrated Monitoring + Trending tools
• Zabbix
• OpenNMS
• and a plethora of commercial tools
Friday, August 10, 12
![Page 54: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/54.jpg)
Part V
Putting It All TogetherAutopsy of a single HTTP request, end-to-end
Friday, August 10, 12
![Page 55: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/55.jpg)
Live Demo/Whiteboard
Friday, August 10, 12
![Page 56: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/56.jpg)
Part VI
Pulling It All ApartAnticipating Murphy and his Law
Friday, August 10, 12
![Page 57: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/57.jpg)
Most Common Pitfalls
• Disk Full
• DNS Unavailable/Slow
• Insufficient RAM
• Suboptimal Service Configuration
• Firewall misconfiguration
• Archaic: Network mismatch (Full/Half Duplex)
Friday, August 10, 12
![Page 58: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/58.jpg)
DNS and Performance
• Possibly most-overlooked perf. impact
• Everything uses DNS
• If you make nothing else redundant, make this redundant!
Friday, August 10, 12
![Page 59: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/59.jpg)
Part VII
Scaling Up
Friday, August 10, 12
![Page 60: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/60.jpg)
Horizontal or Vertical?
• Vertical: Making one server/instance go faster
• Horizontal: Parallelizing requests to get more things done in the same amount of time
Friday, August 10, 12
![Page 61: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/61.jpg)
Clustering
• Parallelizing requests to increase overall throughput: horizontal scaling
• Techniques to make information more available:
• Caching (memcache, file-based caching)
• Distribute data sets
• Replication
Friday, August 10, 12
![Page 62: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/62.jpg)
Distributing Data
• Replication
• Split Reads (One writer/master; multiple slaves/readers)
• Multiple Masters (dangerous!)
• Sharding (must consider HA)
Friday, August 10, 12
![Page 63: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/63.jpg)
Failover/HA
• Consistency requires concept of Quorum
• Losing partition gets killed: STONITH
• Multi-master systems ignore this at the cost of potential non-determinisim
Friday, August 10, 12
![Page 64: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/64.jpg)
Tuning Services
• Some VM types (especially JVM or CLR) have tunables for memory consumption
• Databases usually have memory settings
• These can make dramatic differences
• Very workload dependent
• Deep troubleshooting: strace, wireshark
Friday, August 10, 12
![Page 65: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/65.jpg)
Part VIII
Deploying Applications
Friday, August 10, 12
![Page 66: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/66.jpg)
12 Factor Application
• Deployability starts with application design
• Clear line between configuration and logic
• Permit easy horizontal scaling
• Are OS-agnostic (yay Ruby!)
• Minimize differences between dev and prod
• http://12factor.net - by Heroku cofounder
Friday, August 10, 12
![Page 67: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/67.jpg)
Deployment Tools
• Capistrano
• The de facto standard
• Requires effort to set up, test
• Requires integration with system startup
• Most flexible
Friday, August 10, 12
![Page 68: Ops for Developers](https://reader034.fdocuments.us/reader034/viewer/2022050816/54939b30ac7959042e8b491a/html5/thumbnails/68.jpg)
Deployment Tools
• “Move it to the cloud”
• Heroku
• Cloud Foundry
Friday, August 10, 12