Dumb Simple PostgreSQL Performance (NYCPUG)

49
Dumb Simple PostgreSQL Performance Joshua D. Drake Command Prompt, Inc. United States PostgreSQL Software in the Public Interest [email protected] @cmdpromptinc +joshua

description

A 45 minute talk discussing various in production performance enhancements for PostgreSQL. We touch on hard drives including SSD and RAID. We also discuss Memory, PostgreSQL settings and various other topics such as DASvsNASvsSAN.

Transcript of Dumb Simple PostgreSQL Performance (NYCPUG)

Page 1: Dumb Simple PostgreSQL Performance (NYCPUG)

Dumb Simple PostgreSQL Performance

Joshua D. DrakeCommand Prompt, Inc.United States PostgreSQLSoftware in the Public Interest

[email protected]@cmdpromptinc+joshua

Page 2: Dumb Simple PostgreSQL Performance (NYCPUG)

I assume you all have

An Android, Iphone or Windows (really?) Phone?

Page 3: Dumb Simple PostgreSQL Performance (NYCPUG)

A moment of silence

I would like to take a moment to observe a moment of silence in honor:

Page 4: Dumb Simple PostgreSQL Performance (NYCPUG)

Donation to PgUS

Of all of you donating to PgUS:

https://www.postgresql.us/donate

Page 5: Dumb Simple PostgreSQL Performance (NYCPUG)

Start with Hard Drives

Hard drives are the slowest part of the system.

Page 6: Dumb Simple PostgreSQL Performance (NYCPUG)

Rules of the Hard Drive

How fast the data can be retrieved or written to disk is the single largest bottleneck you will experience.

Rule #1:

Thou shall have a hardware RAID controller with BBU

Rule #2:

There are only two RAID levels 1 and 10.

Rule #3:

It is better to purchase 14 small drives than 7 big drives.

Page 7: Dumb Simple PostgreSQL Performance (NYCPUG)

BBU

Battery Backup UnitUsed on good RAID cards in case of power

outage or sudden crash. Allows for storage of pending writes until the machine comes back on line. A requirement if you are running any kind of

CACHE on the RAID.

Page 8: Dumb Simple PostgreSQL Performance (NYCPUG)

RAID 1

Redundancy through use of mirror

Increased performance (sometimes) through shared or partitioned reads

If you have enough spindles, RAID 1 is great for pg_xlog.

Page 9: Dumb Simple PostgreSQL Performance (NYCPUG)

RAID 1 + 0

Minimum 4 Spindles

Increased performance through use of stripe

Increased reliability through use of mirror

Page 10: Dumb Simple PostgreSQL Performance (NYCPUG)

Hard Drive TechnologySATA

SATA is fine but you need at least twice as many spindles to get the same performance of SAS. If you need a lot of space, the cost per megabyte can't be beat.

SAS

The workhorse of the hard drive industry. Reasonably priced and high performance.

SSD

A relative newcomer, SSD is extremely fast and fairly expensive. Higher potential for two drive failure in RAID configurations. Insure that it is power failure safe. Check write lifetime.

Page 11: Dumb Simple PostgreSQL Performance (NYCPUG)

Power Loss Safe SSD

Intel

320: http://www.intel.com/content/www/us/en/solid-state-drives/ssd-320-brief.html710: http://www.intel.com/content/www/us/en/solid-state-drives/ssd-710-brief.htmlS3700: http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-dc-s3700-series.html

OCZ R SeriesRM84/88: http://ocz.com/enterprise/z-drive-r4-pcie-ssd/rs-specifications

SamsungSamsung SM1625

Crucial[1]M500 series: http://www.micron.com/products/solid-state-storage/client-ssd#m5001. (Best price / Capacity)

Page 12: Dumb Simple PostgreSQL Performance (NYCPUG)

DAS vs NAS vs SAN●DAS is almost always faster●DAS is almost always more cost effective●DAS can be just as scalable (see Dell MD1220s)●DAS can be just as manageable

●NAS is expensive●NAS is not as reliable (for PostgreSQL) as it generally uses something like NFS●NAS is highly configurable●NAS is highly manageable●NAS is a shared resource

●SAN is expensive●Generally uses iSCSI●Limited by network bandwidth which is almost always slower (excluding 10Gb) than DAS●SAN is highly configurable●SAN is highly manageable●SAN is a shared resource

Page 13: Dumb Simple PostgreSQL Performance (NYCPUG)

Lots of memory

●PostgreSQL is efficient.

●Memory is cheap (330.00 for 32GB)

●Most data sets are less than 4Gb.

●If you have more memory than data your active data set can remain in file and or shared_buffer cache.

Page 14: Dumb Simple PostgreSQL Performance (NYCPUG)

Processor

●PostgreSQL is processed based.

●AMD shines in this arena.

Page 15: Dumb Simple PostgreSQL Performance (NYCPUG)

Upgrade to 9.2

Source: http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html

Page 16: Dumb Simple PostgreSQL Performance (NYCPUG)

Linux Kernel

If you are running Kernel 3.2 – 3.8.

Upgrade, NOW!

Page 17: Dumb Simple PostgreSQL Performance (NYCPUG)

Numbers don't lie (before)                CPU     %user     %nice   %system   %iowait    %steal     %idle08:45:01 AM     all     30.91      0.00      5.66     40.05      0.00     23.3808:55:02 AM     all     29.32      0.00      5.10     39.66      0.00     25.9209:05:02 AM     all     31.71      0.00      6.24     40.99      0.00     21.0609:15:01 AM     all     32.45      0.00      6.59     46.74      0.00     14.2109:25:01 AM     all     20.62      0.00      5.39     60.00      0.00     14.0009:35:01 AM     all     31.03      0.00      3.61     33.95      0.00     31.4109:45:01 AM     all     36.54      0.00      3.22     34.13      0.00     26.1109:55:02 AM     all     40.17      0.00      3.66     30.98      0.00     25.1910:05:01 AM     all     33.49      0.00      3.04     32.28      0.00     31.1910:15:01 AM     all     48.63      0.00      2.87     25.50      0.00     23.0010:25:01 AM     all     51.34      0.00      3.56     26.06      0.00     19.0410:35:01 AM     all     39.41      0.00      3.44     29.86      0.00     27.2910:45:02 AM     all     36.07      0.00      8.79     30.94      0.00     24.2010:55:03 AM     all     38.04      0.00      7.98     32.98      0.00     21.0111:05:11 AM     all     39.25      0.00      8.81     36.75      0.00     15.1911:15:02 AM     all     35.19      0.00      8.76     41.98      0.00     14.0711:25:03 AM     all     38.21      0.00      9.65     38.86      0.00     13.2811:35:02 AM     all     42.92      0.00     11.66     34.28      0.00     11.1411:45:02 AM     all     39.40      0.00      9.96     39.03      0.00     11.6111:55:01 AM     all     28.72      0.00      3.27     36.32      0.00     31.69

Page 18: Dumb Simple PostgreSQL Performance (NYCPUG)

Numbers don't lie (3.9.x)                CPU     %user     %nice   %system   %iowait    %steal     %idle08:35:02 AM     all     40.08      0.00      4.46     10.66      0.00     44.8008:45:01 AM     all     38.80      0.00      3.94      7.96      0.00     49.2908:55:01 AM     all     31.48      0.00      3.03      2.58      0.00     62.9109:05:01 AM     all     32.18      0.00      3.09      3.86      0.00     60.8709:15:01 AM     all     26.71      0.00      2.39      3.52      0.00     67.3909:25:01 AM     all     30.49      0.00      3.10      2.80      0.00     63.6109:35:01 AM     all     32.50      0.00      3.49      3.42      0.00     60.6009:45:01 AM     all     36.76      0.00      3.85      6.39      0.00     53.0109:55:01 AM     all     44.45      0.00      4.63      9.23      0.00     41.6910:05:02 AM     all     38.39      0.00      4.28      8.60      0.00     48.7210:15:01 AM     all     33.57      0.00      3.53      4.10      0.00     58.8010:25:01 AM     all     29.42      0.00      2.96      3.16      0.00     64.4510:35:01 AM     all     32.90      0.00      3.37      5.33      0.00     58.4010:45:01 AM     all     34.56      0.00      3.62      4.32      0.00     57.5010:55:01 AM     all     34.84      0.00      3.37      4.27      0.00     57.5211:05:02 AM     all     38.30      0.00      4.05      7.56      0.00     50.0811:15:01 AM     all     36.80      0.00      3.54      9.51      0.00     50.1611:25:01 AM     all     34.79      0.00      3.82      8.17      0.00     53.2111:35:01 AM     all     32.68      0.00      3.07      4.97      0.00     59.2811:45:02 AM     all     31.77      0.00      3.45      6.07      0.00     58.7211:55:01 AM     all     33.58      0.00      3.92      6.39      0.00     56.10

Page 19: Dumb Simple PostgreSQL Performance (NYCPUG)

VM settings

●vm.dirty_background_ratio●vm.dirty_ratio●vm.dirty_background_bytes●vm.dirty_bytes●I can't say it any better than Greg Smith:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Page 20: Dumb Simple PostgreSQL Performance (NYCPUG)

PostgreSQL memory settings

● shared_buffers

● work_mem

● maintenance_work_mem

● effective_cache_size

Page 21: Dumb Simple PostgreSQL Performance (NYCPUG)

What are shared_buffers●The working cache of all hot tuples (and Index entries) within PostgreSQL.

●Pre-allocated cache (buffers).

●On Linux sysctl.conf – kernel.shmmax

●Use 20% of available memory (up to 40%, as of 9.2 your mileage may vary)

●Watch out for IO Storms (extremely rare on 9.x+)

Page 22: Dumb Simple PostgreSQL Performance (NYCPUG)

What is work_mem

●The working memory available for work operations (sorts) before PostgreSQL will swap.●Be aware of it, not afraid of it.●Set reasonable amount globally●Use per transaction for agressive allocation●Use EXPLAIN ANALYZE to see if you are overflowing

Page 23: Dumb Simple PostgreSQL Performance (NYCPUG)

Example EXPLAIN ANALYZE

QUERY PLAN -------------------------------------------------------------------------- Sort (cost=0.02..0.03 rows=1 width=0) (actual time=2270.744..2588.341 rows=1000000 loops=1) Sort Key: (generate_series(1, 1000000))

Sort Method: external merge Disk: 13696kB -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.006..144.720 rows=1000000 loops=1) Total runtime: 3009.218 ms(5 rows)

Page 24: Dumb Simple PostgreSQL Performance (NYCPUG)

What is maintenance_work_mem

The amount of memory (RAM) allowed for maintenance tasks before PostgreSQL swaps. Typical tasks are ANALYZE, VACUUM, CREATE INDEX, REINDEX

Without maintanence, performance will decrease.

Page 25: Dumb Simple PostgreSQL Performance (NYCPUG)

maintenance_work_mem

Set to a reasonable amount for autovacuumUse SET for per session changes such as CREATE INDEX

SET maintenance_work_mem to '1GB';CREATE INDEX foo ON bar(baz);RESET maintenance_work_mem;

Page 26: Dumb Simple PostgreSQL Performance (NYCPUG)

What is effective_cache_size

A pointer for the PostgreSQL planner to hint at how much of the database will be cached. This is

not an allocation setting.

Page 27: Dumb Simple PostgreSQL Performance (NYCPUG)

effective_cache_size

Take into account shared_buffers

total used free shared buffers cachedMem: 6126208 3168356 2957852 0 480884 1258304

% of cached + shared_buffers = effective_cache_size

● % depends on workload. Generally between 40% and 70%

● Can be used to encourage index scans, use higher than normal amounts if have fast IO.

Page 28: Dumb Simple PostgreSQL Performance (NYCPUG)

Let's talk I/O

log_checkpoints

checkpoint_timeout

checkpoint_completion_target

checkpoint_segments

wal_sync_method

synchronous_commit

Page 29: Dumb Simple PostgreSQL Performance (NYCPUG)

log_checkpoints

By default this is off. Turn on to correlate between checkpoints and spikes in %IOWait from sar.

Page 30: Dumb Simple PostgreSQL Performance (NYCPUG)

checkpoint_timeout

The amount of time PostgreSQL will wait before it forces a checkpoint. Properly configured it reduces IO utilization. Set to 60 minutes. It is affected by:

● checkpoint_segments● checkpoint_completion_target

Page 31: Dumb Simple PostgreSQL Performance (NYCPUG)

checkpoint_completion_target

This paramater is used to reduce spikes in IO by completing a checkpoint over a period of time.

Do not change this paramater, increase checkpoint_timeout instead.

Page 32: Dumb Simple PostgreSQL Performance (NYCPUG)

checkpoint_segments

The number of transaction logs that will be utilized before a checkpoint is forced. Each segment is 16 Mb. The default is 3. Use checkpoint_warning to see if you need more.

Change to at least 10.

Use checkpoint_warning and logging to get accurate setting.

Page 33: Dumb Simple PostgreSQL Performance (NYCPUG)

wal_sync_method

The type of fsync that will be called to flush file modifications to disk. Leave commented to have PostgreSQL figure it out. On Linux it should look like:

postgres=# show wal_sync_method ; wal_sync_method ----------------- fdatasync

Page 34: Dumb Simple PostgreSQL Performance (NYCPUG)

synchronous_commitSpecifies whether transaction commit will wait for WAL records to be written to disk before the command returns a "success" indication to the client.

Depends on application. Turn off for faster commits. Low risk of lost commits (but not integrity).

Required to be on if you want synchronous replicaiton.

Page 35: Dumb Simple PostgreSQL Performance (NYCPUG)

Let's talk brains

● default_statistics_target● seq_page_cost● random_page_cost ● cpu_operator_cost● cpu_tuple_cost

Page 36: Dumb Simple PostgreSQL Performance (NYCPUG)

default_statistics_target

An arbitrary value used to determine the volume of statistics collected on a relation. The larger the value the longer analyze takes but generally the

better the plan. Can be set per column.

Page 37: Dumb Simple PostgreSQL Performance (NYCPUG)

default_statistics_target

set default_statistics_target to 100;pggraph_2_2=# analyze verbose pggraph_indexrollup;INFO: analyzing "aweber_shoggoth.pggraph_indexrollup"

INFO: "pggraph_indexrollup": scanned 30000 of 1448084 pages, containing 1355449 live rows and 0 dead rows; 30000 rows in sample, 65426800 estimated total rowsANALYZE

Page 38: Dumb Simple PostgreSQL Performance (NYCPUG)

default_statistics_target

set default_statistics_target to 300;pggraph_2_2=# analyze verbose pggraph_indexrollup;INFO: analyzing "aweber_shoggoth.pggraph_indexrollup"

INFO: "pggraph_indexrollup": scanned 90000 of 1448084 pages, containing 4066431 live rows and 137 dead rows; 90000 rows in sample, 65428152 estimated total rowsANALYZEpggraph_2_2=#

Page 39: Dumb Simple PostgreSQL Performance (NYCPUG)

Increasing per column

ALTER TABLE foo ALTER COLUMN BAR SET STATISTICS 120

Page 40: Dumb Simple PostgreSQL Performance (NYCPUG)

default_statistics_target

How do I know to increase it?

Unique (cost=264.65..282.65 rows=100 width=2) (actual time=8.665..12.460 rows=100 loops=1) -> Sort (cost=264.65..273.65 rows=3600 width=2) (actual time=8.664..10.423 rows=3600 loops=1) Sort Key: one Sort Method: quicksort Memory: 265kB -> Seq Scan on bar

(cost=0.00..52.00 rows=52 width=2) (actual time=0.007..1.894 rows=3600 loops=1) Total runtime: 12.553 ms

Page 41: Dumb Simple PostgreSQL Performance (NYCPUG)

seq_page_cost

Tells the planner how expensive a sequential scan is. It directly relates to random_page_cost.

Page 42: Dumb Simple PostgreSQL Performance (NYCPUG)

random_page_cost

Tells the planner the expense of fetching a random page. If using RAID 10, the value should be inverted with seq_page_cost (1.0 vs 4.0) or at

least made the same.

This can hurt data analysis queries, look into cpu_tuple_cost as well.

Page 43: Dumb Simple PostgreSQL Performance (NYCPUG)

cpu_operator_cost

Sets the planner's estimate of the cost of processing each operator or function executed during a query. The default is 0.0025.

In real world tests, a setting of 0.5 generally provides a better plan. Test using SET in a session.

SET cpu_operator_cost TO 0.5;EXPLAIN ANALYZE SELECT ...

Page 44: Dumb Simple PostgreSQL Performance (NYCPUG)

cpu_tuple_cost

Sets the planner's estimate of the cost of processing each row during a query. The default is 0.01.

In real world tests, a setting of 0.5 generally provides a better plan. Test using SET in a session.

SET cpu_tuple_cost TO 0.5;EXPLAIN ANALYZE SELECT ...

Page 45: Dumb Simple PostgreSQL Performance (NYCPUG)

Design

Connection Pooling

Load Balancing

Page 46: Dumb Simple PostgreSQL Performance (NYCPUG)

Connection Pooling● Reduces CPU utilization 

● Keeps relations hot (in cache)

● Pgbouncer (no ssl):

– http://pgfoundry.org/projects/pgbouncer

● Pgpool2 (SSL capable on client to pool or pool to server):

● http://www.pgpool.net

Page 47: Dumb Simple PostgreSQL Performance (NYCPUG)

Load Balancing

Hot Standby + PgPool-II

Page 48: Dumb Simple PostgreSQL Performance (NYCPUG)

Autovacuum

Just say no to disabling. If you are experiencing ”peformance problems” due to vacuum. You are

experiencing performance problems lack of management/provisioning/planning. Just say no to

disabling.

Page 49: Dumb Simple PostgreSQL Performance (NYCPUG)

Questions?Questions / Comments?

Take your best shot!

I can speak about:

Hardware

Consulting

Open Source Communities

Non-Profits

PostgreSQL

Politics