Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW...

42
Taming Performance Variability in PostgreSQL Shawn S. Kim

Transcript of Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW...

Page 1: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

Taming Performance Variability

in PostgreSQL

Shawn S. Kim

Page 2: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

PostgreSQL Execution Model

2

Storage Device

Linux Kernel

P1

Client

P2

I/O

P3 P4

Request Response

I/O I/O I/O

PostgreSQL

Database

performance

* PostgreSQL’s processes

- Backend (foreground)

- Checkpointer

- Autovacuum workers

- WAL writer

- Writer

- …

Page 3: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

3

Impact of Background Tasks

r5.xlarge4 vCPUs32GiB memory

r5d.4xlarge16 vCPUs128GiB memory

AWS Seoul region

SysBench 1.0.15--oltp-table-size=10000000--oltp-tables-count=24(50GB dataset)

PostgreSQL 11.332GB shared_buffer64GB effective_cache_size…

Local NVMe SSD

Page 4: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

4

Impact of Background Tasks

Checkpoint tuning* improves average throughput

* Basics of Tuning Checkpoints (https://www.2ndquadrant.com/en/blog/basics-of-tuning-checkpoints)

4x throughput

at the cost of

increased recovery time

Server: r5d.4xlarge, 300GB NVMe SSD, CentOS 7, PostgreSQL v11.3 (shared_buffers=32GB, effective_cache_size=64GB) Client: r5.xlarge

Page 5: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

5

Done..?

Page 6: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

6

Performance Variability

Checkpoint tuning makes PostgreSQL unpredictable

Server: r5d.4xlarge, 300GB NVMe SSD, CentOS 7, PostgreSQL v11.3 (shared_buffers=32GB, effective_cache_size=64GB) Client: r5.xlarge

99th response time was collected

every 5 seconds for both cases

190ms

9ms

Page 7: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

7

Default is better

in terms of variability

Page 8: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

8

What’s the Problem?

CPU utilization was collected

every 5 seconds using sar

iowait is the main bottleneck for the frequent checkpoint case

Page 9: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

9

What’s the Problem?

CPU utilization is highly fluctuated even without checkpoint

CPU utilization was collected

every 5 seconds using sar

Page 10: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

10

What’s the Problem?

Storage Device

Caching Layer

PostgreSQL

File System Layer

Block LayerAb

str

actio

n

Buffer Cache

read() write()

FG FGBG

BG FG BGBG

reorder

Background I/Os interfere foreground I/Os inside Linux

BG

reorder

Page Cache

I/O Scheduler

Firmware Scheduler

Page 11: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

11

What’s the Problem?

Storage Device

Caching Layer

PostgreSQL

File System Layer

Block LayerAb

str

actio

n

Linux makes backends to indirectly wait for background I/Os

I/OFGlock

BGwait

FGwait

wait

BGvarwake

I/O Priority Inversion

Page 12: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

12

Hardware Upgrade to the Rescue

6x HW scale up mostly resolves the variability

Server: r5d.24xlarge, RAID0 (900GB NVMe SSD x4), CentOS 7, PostgreSQL v11.3 (shared_buffers=192GB, effective_cache_size=384GB) Client: r5.xlarge

6x CPU cores +

6x DRAM +

10x storage throughput

= 6x infra cost

Whole dataset fit in

PostgreSQL buffer

31.37 ms vs 9.56 ms

Page 13: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

13

Software solution..?

Page 14: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

14

AppOS is a PostgreSQL extension that

provides specialized file I/O stack

Page 15: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

15

AppOS Extension

Page 16: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

16

AppOS Extension

Page 17: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

17

Performance Variability with AppOS

AppOS extension efficiently resolves the variability

Server: r5d.4xlarge, 300GB NVMe SSD, CentOS 7, PostgreSQL v11.3 (shared_buffers=32GB, effective_cache_size=64GB) Client: r5.xlarge

Stable response

without extra HW

31.37 ms vs 9.56 ms vs 5.77 ms

Page 18: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

18

Performance Variability with AppOS

AppOS extension efficiently resolves the variability

Server: r5d.4xlarge, 300GB NVMe SSD, CentOS 7, PostgreSQL v11.3 (shared_buffers=32GB, effective_cache_size=64GB) Client: r5.xlarge

Page 19: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

19

Performance Variability with AppOS

AppOS extension improves average throughput as well

Server: r5d.4xlarge, 300GB NVMe SSD, CentOS 7, PostgreSQL v11.3 (shared_buffers=32GB, effective_cache_size=64GB) Client: r5.xlarge

2x throughput

with AppOS extension

Page 20: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

20

AppOS Internals

_PG_init

Postgres

Linux kernel

syscall

Postgres

Linux kernel

syscall

prehooksyscall

posthook

Postgres

Linux kernel

syscall

prehook

syscall

posthook

appos

core

FileI/O

On-diskaccess

Page 21: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

21

AppOS Internals

Virtual File

System

Page Cache

Block I/O

Context-aware

I/O Scheduler

Appos core API

I/O API

Postgres Context Classifier

POSIX file API

checkpointer

autovacuum

backend

…Cached Direct

Linux

Page 22: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

22

AppOS Internals

File Level Read/Write Lock

D D CDD C D

write

D DD D DD CC

I/O Level Range Lock

Linux page cache AppOS page cache

PostgreSQL shared buffer

readwrite

BLOCK!!!

AppOS uses range lock instead of big file lock

PostgreSQL shared buffer

readwrite

Page 23: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

23

AppOS Internals

AppOS

Page CacheD C

Appos

I/O Scheduler Context-Aware I/O Sheduler

checkpointer: 200 shares

autovacuum: 200 shares

backend: 500 shares

writeback: 100 shares

Example:

D D

Linux kernel

backend

read

writeback

write

control

inflight I/Os

AppOS schedules I/Os based on context and congestion

Page 24: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

24

Use Cases (1)

AppOS makes PostgreSQL more predictable

* GCP Tokyo: n1-standard-4 ~ n1-standard-96 (300GB or 2TB SSD) * SysBench 1.0.15 (client): n1-highcpu-8 (8 cores), 50GB initial dataset * PostgreSQL 11.4: shared_buffers 25% of mem, max_wal_size 8GB

Real-time SLA Replication lagAutovacuum

Page 25: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

25

Use Cases (1)

Real-time SLA

* GCP Tokyo: n1-standard-4 ~ n1-standard-96 (300GB or 2TB SSD) * SysBench 1.0.15 (client): n1-highcpu-8 (8 cores), 50GB initial dataset * PostgreSQL 11.4: shared_buffers 25% of mem, max_wal_size 8GB

100 ms

Failure rate

0.14% vs. 0.002%

Page 26: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

26

Use Cases (1)

Real-time SLA

* GCP Tokyo: n1-standard-4 ~ n1-standard-96 (300GB or 2TB SSD) * SysBench 1.0.15 (client): n1-highcpu-8 (8 cores), 50GB initial dataset * PostgreSQL 11.4: shared_buffers 25% of mem, max_wal_size 8GB

100 ms

Costs

$35,000 vs. $2,200

Page 27: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

27

Use Cases (1)

Autovacuum

Vacuum Analyze

Page 28: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

28

Use Cases (1)

Autovacuum

Aggressive Autovacuum:

autovacuum_vacuum_cost_delay=1s

autovacuum_vacuum_cost_limit=10000

Server: r5d.4xlarge, 300GB NVMe SSD, Ubuntu 16.04, PostgreSQL v11.3 (shared_buffers=32GB, max_wal_size=8GB) Client: c5.xlarge

Page 29: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

29

Use Cases (1)

Autovacuum

13,868 TPS vs 4483 TPS vs 12,561 TPS

Server: r5d.4xlarge, 300GB NVMe SSD, Ubuntu 16.04, PostgreSQL v11.3 (shared_buffers=32GB, max_wal_size=8GB) Client: c5.xlarge

Page 30: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

30

Use Cases (1)

Replication lag

c5.xlarge4 vCPUs8GiB memory

r5d.4xlarge16 vCPUs128GiB memoryNVMe SSD

AWS Seoul region

Availability Zone A

r5d.4xlarge16 vCPUs128GiB memoryNVMe SSD

Availability Zone B

Page 31: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

31

Use Cases (1)

Replication lag

Server: r5d.4xlarge, 300GB NVMe SSD, Ubuntu 16.04, PostgreSQL v11.3 (shared_buffers=32GB, max_wal_size=64GB) Client: c5.xlarge

Page 32: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

32

Use Cases (1)

Replication lag

Server: r5d.4xlarge, 300GB NVMe SSD, Ubuntu 16.04, PostgreSQL v11.3 (shared_buffers=32GB, max_wal_size=64GB) Client: c5.xlarge

Standby

Page 33: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

33

Use Cases (2)

AppOS makes PostgreSQL cloud storage-native

Cloud block storage

Local SSDAtomic write support

Page 34: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

34

Use Cases (2)

Cloud block storage

PostgreSQLMySQL/MariaDB

Page 35: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

35

Use Cases (2)

Cloud block storage

e.g., 8KB * 16 pages

e.g., 128KB block

Cloud block storage

AppOSPage Cache

Postgres

shared buffer

Page 36: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

36

Use Cases (2)

Atomic write support

Page 37: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

Use Cases (2)

37

Atomic write support

* GCP Tokyo: n1-standard-32 (32 vCPUs, 120GB memory, 500GB 16KB SSD PD) * SysBench 1.0.15 (client): n1-highcpu-8 (8 vCPUs), 50GB initial dataset * PostgreSQL 9.6.12: shared_buffers 25% memory, max_wal_size 2GB

4x ~ 12x

with atomic write support

on Google Cloud

Page 38: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

38

Use Cases (2)

Local SSD as a memory extension

Local

SSD

+ Fast

- Ephemeral

Network

StoragePostgres

Server Network

+ Durable

- Slow

Page 39: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

39

Use Cases (2)

Local SSD as a memory extension

write

Network

Storage

Local SSD

Write through

write ack

ack write

Write back

async

write

+ Durable

- Slow write

Network

Storage

Local SSD

ack

+ Fast write

- Data loss

write

Flush on sync

async

write

Network

Storage

Local SSD

ack

+ Durable

+ Fast write

- Slow syncsync

flush

Page 40: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

40

Use Cases (3)

AppOS can seamlessly work with PostgreSQL-derived DBs

Page 41: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

41

Use Cases (3)

TimescaleDB

https://blog.timescale.com/blog/timescaledb-vs-6a696248104e

* AWS Seoul: m5.2xlarge (8 cores, 32gb mem, 1000GB 300 iops ebs) * SysBench 1.0.15 (client): m5.2xlarge (8 cores), 50GB initial dataset * PostgreSQL 9.6.13(Timescale 1.4.0): shared_buffers 8gb, max_wal_size 8GB

Page 42: Taming Performance Variability in PostgreSQL · 2019-09-21 · Hardware Upgrade to the Rescue 6x HW scale up mostly resolves the variability Server: r5d.24xlarge, RAID0 (900GB NVMe

https://apposha.io

[email protected]