© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Steven Bryen - Manager, Solutions Architecture – Amazon Web ServicesChris Turvil - Head of Cloud and Platform Agility - Trainline
July 2016
Deep Dive on Amazon EC2 Instances
InstancesAPI
Networking
EC2EC2
Purchase options
Amazon Elastic Compute Cloud is Big
Host ServerHypervisor
Guest 1 Guest 2 Guest n
Amazon EC2 Instances
2006 2008 2010 2012 2014 2016
m1.small
m1.largem1.xlarge
c1.mediumc1.xlarge
m2.xlarge
m2.4xlargem2.2xlarge
cc1.4xlarge
t1.micro
cg1.4xlarge
cc2.8xlarge
m1.medium
hi1.4xlarge
m3.xlargem3.2xlarge
hs1.8xlarge
cr1.8xlarge
c3.largec3.xlarge
c3.2xlargec3.4xlargec3.8xlargeg2.2xlarge
i2.xlargei2.2xlargei2.4xlargei2.4xlarge
m3.mediumm3.large
r3.larger3.xlarger3.2xlarger3.4xlarger3.8xlarge
t2.microt2.smallt2.med
c4.largec4.xlargec4.2xlargec4.4xlargec4.8xlarge
d2.xlarged2.2xlarged2.4xlarged2.8xlargeg2.8xlarge
t2.largem4.large
m4.xlargem4.2xlargem4.4xlarge
m4.10xlarge
Amazon EC2 Instances History
What to Expect from the Session
• Getting the best system performance from your EC2 Instances
• How Amazon EC2 instances deliver performance while providing flexibility and agility
• How to make the most of advanced features for EC2 and companion services
Defining Performance
• Servers are hired to do jobs• Performance is measured differently depending on the job
Hiring a Server
?
Performance Factors
Resource Performance factors Key indicatorsCPU Sockets, number of cores, clock
frequency, bursting capabilityCPU utilization, run queue length
Memory Memory capacity Free memory, anonymous paging, thread swapping
Network interface
Max bandwidth, packet rate Receive throughput, transmit throughput over max bandwidth
Disks Input / output operations per second, throughput
Wait queue length, device utilization, device errors
Resource Utilization
• For given performance, how efficiently are resources being used?• Something at 100% utilization can’t accept any more work• Low utilization can indicate more resources are being purchased
than needed
Example: Web Application• MediaWiki installed on Apache with 140 pages of content• Load increased in intervals over time
Example: Web Application• Memory stats
Example: Web Application• Disk stats
Example: Web Application• Network stats
Example: Web Application• CPU stats
• Picking an instance is tantamount to resource performance tuning• Give back instances as easily as you can acquire new ones• Find an ideal instance type and workload combination
Instance Selection = Performance Tuning
Delivering Compute Performance with Amazon EC2 Instances
Review: C4 Instances
• Custom Intel E5-2666 v3 at 2.9 GHz• P-state and C-state controls
Model vCPU Memory (GiB) EBS (Mbps)c4.large 2 3.75 500c4.xlarge 4 7.5 750c4.2xlarge 8 15 1,000c4.4xlarge 16 30 2,000c4.8xlarge 36 60 4,000
Review: T2 Instances
• Lowest cost EC2 instance at $0.013 per hour• Burstable performance• Fixed allocation enforced with CPU credits
Model vCPU CPU Credits / Hour
Memory (GiB)
Storage
t2.micro 1 6 1 EBS Onlyt2.small 1 12 2 EBS Onlyt2.medium 2 24 4 EBS Onlyt2.large 2 36 8 EBS Only
How Credits Work
• A CPU credit provides the performance of a full CPU core for one minute
• An instance earns CPU credits at a steady rate
• An instance consumes credits when active
• Credits expire (leak) after 24 hours
Baseline rate
Credit balance
Burstrate
Tip: Monitor CPU credit balance
Delivering Memory Performance with Amazon EC2 Instances
Announced: X1 Instances
• Largest memory instance with 2 TB of DRAM• Quad socket, Intel E7 processors with 128 vCPUs
Model vCPU Memory (GiB) Local Storage
x1.32xlarge 128 1952 2x 1920GB
Delivering I/O Performance with Amazon EC2 Instances
Review: I2 Instances
16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD365K random read IOPS for 32 vCPU instance
Model vCPU Memory (GiB)
Storage Read IOPS Write IOPS
i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000
Device Pass Through: Enhanced Networking
• SR-IOV eliminates need for driver domain• Physical network device exposes virtual function to
instance• Requires a specialized driver, which means:
• Your instance OS needs to know about it • EC2 needs to be told your instance can use it
Hardware
Before Enhanced NetworkingDriver Domain Guest Domain Guest Domain
VMM
Frontend driver
Frontend driver
Backend driver
DeviceDriver
Physical CPU
Physical Memory
Network Device
Virtual CPU Virtual Memory
CPU Scheduling
Sockets
Application1
23
4
5
Hardware
After Enhanced NetworkingDriver Domain Guest Domain Guest Domain
VMM
NIC Driver
Physical CPU
Physical Memory
SR-IOV Network Device
Virtual CPU Virtual Memory
CPU Scheduling
Sockets
Application1
2
3
NIC Driver
Tip: Use Enhanced Networking
• Highest packets-per-second• Lowest variance in latency• Instance OS must support it• Look for SR-IOV property of instance or image
Leveraging Features of EC2 and Companion Services
Auto Recovery for Amazon EC2
• Recover Instances that have become impaired due to underlying hardware problem.
• Instance maintains Instance ID, Private IP, Elastic IP and metadata.
• Configured through CloudWatchEC2 Action
Auto Recovery for Amazon EC2
• Examples of problems causing system status checks to fail:• Loss of Network Connectivity• Loss of system power• Software issues on the physical host• Hardware issues on the physical host.
• Only supported on:• C3, C4, M3, M4, R3, T2 and X1 Instances• Instances in a VPC• Instances with shared Tenancy• Instances that use EBS storage exclusively
Auto Scaling – Lifecycle Hooks
• Hold instance in Pending or Terminating state.
• Notification of Lifecycle event triggering via CloudWatch Events or SNS. (Lambda)
• Default timeout is one hour.
• Can CONTINUE or ABANDON. Set a default results using –default-result
Auto Scaling Lifecycle Hooks - Adding
aws autoscaling put-lifecycle-hook--lifecycle-hook-name my-hook--auto-scaling-group-name my-asg--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING
aws autoscaling put-lifecycle-hook--lifecycle-hook-name my-hook--auto-scaling-group-name my-asg--lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING
On Launch
On Termination
Auto Scaling Lifecycle Hooks - Managing
aws autoscaling record-lifecycle-action-heartbeat --lifecycle-action-token bcd2f1b8-9a78-44d3-8a7a-4dd07d7cf635--lifecycle-hook-name my-launch-hook--auto-scaling-group-name my-asg
aws autoscaling complete-lifecycle-action --lifecycle-action-result CONTINUE --lifecycle-action-token bcd2f1b8-9a78-44d3-8a7a-4dd07d7cf635--lifecycle-hook-name my-launch-hook--auto-scaling-group-name my-asg
Send Heartbeat to Extend
Complete Action Early
trainline
£2bn 28m 70%visitors/moticket sales/yr mobile
trainline
Monolithic application
One release every 6 weeks
Legacy tech
Physical data centre
Centralised Ops
Environment Snowflakes
Vision
Gofaster!
trainline
Organisation
Small Cross-Functional
Teams
Functional Teams
trainline
Architecture
Micro-servicesMonolith
trainline
Cloud Readiness
DNS
Tech Baseline
Singletons
Security
Licenses
Ops Ready
trainline
Top 10 AWS Services trainline
DynamoDB LambdaAuto-ScalingEC2 S3
SQS SNS ECSCloudFormation
Kinesis
Auto-Scaling Groups
Unit of Management
Scale-up Patching
Auto-Recovery
Lifecycle Hooks
trainline
Performance Tuning trainline
0.3s = £8m/year
Oracle Exadata
4TB, 600MB IO/sec
Latency critical
Target < 5% slower
16 core license
trainline
Oracle Exadata
AWSBaseline
-25%(32 core)
-12% -8% -5%
AppTuning
trainline
InfraTuning
DatabaseTuning
Today: +10%
Migration Approach
BigBangDatabase
CommonInfra Services Services
trainline
Continuous Delivery
Everything now Continuous Delivery
Downtime reduced 30%
trainline
Pets Cattle Baked Containers Lambda
Deployment Continuum trainline
Environment Manager
Self-service portal & API for common infrastructure tasks
trainline
Environment Abstraction over AWS
Env & Infra Config Management
Deployment & Toggling
Scaling and Patching
Compare and Synchronise
Environment Manager trainline
1,300 servers actively managed
20,000 deployments since Jan
40% fewer Jira tickets
Improved visibility & productivity
Open Source!(very soon)
trainline
100% AWS
250 Micro-services
70+ Releases a Week
30% Less Downtime
2hrs for new Environment
£1.2m Annual Saving
trainline
trainline
Remember to complete your evaluations!
Thank You
Top Related