Post on 16-Apr-2017
Mastering PostgreSQL with AWS
Jafar Shameem
Business Development Manager,
Amazon Web Services Miles Ward
Senior Manager, Solutions Architecture,
Amazon Web Services Jay Edwards
CTO, PalominoDB
Agenda
• AWS Storage Options and EBS
• EBS Provisioned IOPS
• About Postgres
• Postgres on AWS best practices
• Lessons learned from the OFA campaign
Storage Options on AWS
Block Storage (Elastic Block Store)
Object Storage (S3, Glacier)
Use for: • Access to raw
unformatted block level storage
• Persistent Storage
Use for: • Pictures, videos,
highly durable media storage
• Cold storage for long-term archive
Amazon Elastic Block Store (EBS) Elastic Block Storage: Persistent Storage for EC2
High performance block storage
device
Mount as drives to instances
Persistent and independent of
instance lifecycle
Feature Details
High performance
file system
Mount EBS as drives and format as required
Flexible size Volumes from 1GB to 1TB in size
Secure Private to your instances
Available Replicated within an Availability Zone
Backups Volumes can be snapshotted for point in time restore
Monitoring Detailed metrics captured via Cloud Watch
Standard and Provisioned IOPS Volume Types
Standard Volumes Provisioned IOPS Volumes
Optimized for
Workloads with low or moderate IOPS needs and occasional bursts.
Transactional workloads requiring consistent IOPS.
Volume Attributes
Up to 1 TB, average 100 IOPS per volume. Best effort performance. Can be striped together for larger size and higher IOPS.
Up to 1TB, 4,000 IOPS per volume. Consistent IOPS. Can be striped together for larger size and higher IOPS.
Workloads File server, Log processing, Websites, Analytics, Boot, etc.
Business applications, MongoDB, SQL server, MySQL, Postgres, Oracle, etc.
Introducing Provisioned IOPS Volumes
❶ Select a new type of Provisioned IOPS volumes
❸ Specify the number of IOs per second your application needs, up to 4000 PIOPS per volume. The volume will deliver the specified IO per second.
❷ Specify the volume capacity
$ ec2-create-volume --size 500 --availability-zone us-east-1b --type io1 –iops 2000
What are customers running on EBS?
Enterprises
Enterprise workloads
are built on block storage
Oracle, SAP, Microsoft
Applications
Convenient, cost-
effective, reliable file
server
Gaming/Social/ Mobile/Education
Very high performance
and consistent IO
for NoSQL and
relational DBs
Marketing / Analytics
Fast sequential IO
access
PostgreSQL
• Open-source RDBMS
• Rich features
• Extraordinary stability
• Focus on performance
• Full ACID compliance for applications requiring durability and
availability
• Robust GIS functionality
Concepts
• Master PostgreSQL host o Accepts both reads and writes
o May have many replicas
o Records transferred to replicas using Write-Ahead logging (WAL)
• Secondary PostgreSQL host o Receives WAL records from the master
o Replication can be real-time or delayed
• Hot standby o A secondary host that can receive read queries
Installation
• Start with an Amazon Machine Image (AMI) of your choice
• Launch EC2 instance and attach EBS volume to it
• Install software from ftp.postgresql.org
• Edit EC2 security group to allow ingress for port 5432
• Edit postgres.conf for:
o listen_addresses = ‘*’
• For master-slave configurations:
o Set max_wal_senders > 0
Temporary data / SSD Storage
• You can create a normal tablespace on instance storage with
UNLOGGED tables in it to take advantage of increased performance
available with SSDs
• When you create a new table, query the relfilenode of the new table and
backup the file system identified by the query results into permanent
storage. (Be sure to do this before you put any data in the table).
Replication Basics
• Records are transferred to the replicas via Write-Ahead Logging (WAL)
• Replication can be real-time through “streaming replication” or delayed
via “WAL archiving
• Replication on PostgreSQL supports two levels of durability:
asynchronous and synchronous. Only one replica can be in
synchronous mode. You may, however, provide an ordered list of
candidate synchronous replicas if the primary replica is down.
• Since version 9.2, PostgreSQL has supported Cascading Replication
Architecture – Production Designs
• Functional Partitioning
• Vertically scale to largest EC2 instance and storage
• Tune for the available hardware
• Use replication to create multiple replicas if bound by reads
• Shard your data-sets if bound by writes
Architecture – Anti-patterns
• Vertical Scaling does not offer all the benefits of horizontal scaling
• Scaling step-by-step when you know you need a big system is not
efficient
• ACID compliance has a cost. Consider NoSQL data stores for logs or
session data
• Might not need to do everything in the DB.
Performance – Minimum production scale
• Always use Elastic Block Store
(EBS)
o Significant write cache
o Superior random IO performance
o Enhanced durability compared to
instance stores
Performance – Larger production scale
• Move up to higher bandwidth
instance types (m1.xlarge,
c1.xlarge, m2.4xlarge)
• Increase EBS volume size to >
300 GB
• Increase number of volumes in
RAID set
Performance – Extra-large scale
• Leverage Cluster Compute
instance types
o More bandwidth to EBS
o Ex. CC2 will make
excellent primary nodes,
particularly when paired
with a large number of
EBS volumes (= 8)
• Improve RAID configuration
with:
o effective_io_concurrency
= # of stripes in RAID set
Performance – Extra-large production
scale
• Can also leverage SSD
instance type (hi1.4xlarge)
o 2 x 1 TB SSD storage
(ephemeral storage)
o Perfect for replicas
• If replicas on SSD instance
types, disable integrity
features such as fsync and
full_page_writes on those
hosts to improve
performance
Benchmarking storage
• Sequential test example:
o dd if=/dev/zero of=<location in the disk> bs=8192 count=10000
oflag=direct
• Seek test example:
o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-
mode=rndrw prepare
o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-
mode=rndrw --file-fsync-all run
o sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-
mode=rndrw cleanup
o For more aggressive tests, add --file-sync-all option, especially if
comparing different filesystems (ex. ext4 vs XFS)
Benchmarking storage through
PostgreSQL
• Use pgbench
• Install the set with the respective scale:
o pgbench -i -s1000 -Upostgres database
• Run a simple test with 20 clients with 100 transactions each against the
master
o pgbench -c 20 -t 100 -Upostgres database
• Run a “only-read/no vacuum” test against the slave:
o pgbench -S -n -c 20 -t 1000 -h slave -Upostgres database
• If planning on using pgpool, test against it instead of DB
Backups using EC2 snapshots
• Snapshot mounted volume:
o SELECT pg_start_backup(‘label’,true);
o ec2-create-snapshot -d "postgres clon" vol-24592c0e
o SELECT pg_stop_backup();
• If operating near maximum I/O capacity, it is
recommended to use a replica for backups
Restores using a EC2 snapshot
• Check available snapshot
o $ ec2-describe-snapshots
• Create EBS volumes from each snapshot used to backup the DB
o $ ec2-create-volume --snapshot snap-219c1308 --availability-zone
eu-west-1c
• Attach volumes to instances
o $ ec2-attach-volume -i i-96ec5edd -d /dev/sdc vol-eb1561c1
• If using RAID set, replace volumes in same order for easiest re-creation
of the RAID volume in the OS
• Mount instance and assign corresponding permissions
Tunables
• Swappiness, vm, kernel tuning
o By default shmmax and shmall have really small values. Those
values are linked to shared_buffers in postgresql.conf, if this value is
higher than the kernel parameters, the PostgreSQL won’t start.
o vm.swappiness is recommended to be setup with a value under 5.
This setting will avoid use swap unless is really necessary.
• File System Tuning
o XFS (nobarrier,noatime,noexec,nodiratime)
o EXT3/4
• You can use ext3 or non journaled file systems for logs.
Tunables
• WAL
o It’s strongly recommend to separate the data from the pg_xlog (WAL) folder.
For the WAL files we recommend strongly XFS filesystem, due to the high
amount of fsync generated.
o checkpoint_segments. The value of this variable will depend strictly on the
amount of data modified on the instance. At the beginning, you can start with
a moderate value and monitor the logs looking for HINTS
o File segments are 16MB each so it will be easy to fill them if you have batch
of processes adding or modifying data. You could easily need more than 30
on a busy server.
o We recommend not using ext3 file system if you plan to have the WALs in
the same directory as the data. fsync calls are handled inefficiently by this file
system.
Tunables
• Memory Tuning
o shared_buffers is the most important and difficult memory variable to
tune up. A fast recommendation could be start with ¼ of your RAM.
• PGTune is a python script that recommends a configuration
according the hardware on your server.
o https://github.com/gregs1104/pgtune/archive/master.zip
Monitoring
• Use CloudWatch service to monitor –
o checkpoint_segments warnings
o Number of connections
o Memory usage and load average
o Slow queries
o Replication lag
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
Security
• Disk Encryption
o Filesystem or OS tools
• Row level Encryption
o pgcrypto
• SSL
• Authentication and Network
• And IAM!!
Lessons Learned
• Use the best practices mentioned earlier
• Use Provisioned IOPS
• AWS Enterprise Support is definitely worth the cost
• Inventory Management is underrated – it’s magic!
• Trusted Advisor is much better than it used to be
• AWS Product Lifecycle
o Starts off not so good
o Gets LOTS better
• Hard to keep up to date with every feature of every product
• Slides will be made available here: o http://aws.amazon.com/ebs/webinars/
• Benchmarking Postgres with EBS 4000 IOPS/volume o http://palominodb.com/blog/2013/05/08/benchmarking-postgres-aws-4000-piops-ebs-instances
• Creating consistent EBS snapshots with MySQL and XFS on Ecs o http://alestic.com/2009/09/ec2-consistent-snapshot
• Understanding Amazon EBS Availability and Performance o http://www.slideshare.net/AmazonWebServices/understanding-ebs-availabilityandperformance
• Benchmarking EBS performance: o http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html
Get started on Provisioned IOPS
today! aws.amazon.com/ebs
Questions: e-mail: shameemj@amazon.com