Post on 11-Dec-2018
The Best of Bruce’s Postgres Slides
BRUCE MOMJIAN
This talk has the best slides from my 25+ Postgres presentations.Creative Commons Attribution License http://momjian.us/presentations
Last updated: November, 2018
1 / 26
Postgres System Architecture
utility
Plan
Optimal Path
Query
Postmaster
Postgres Postgres
Libpq
Main
Generate Plan
Traffic Cop
Generate Paths
Execute Plan
e.g. CREATE TABLE, COPYSELECT, INSERT, UPDATE, DELETE
Rewrite Query
Parse Statement
UtilityCommand
Storage ManagersCatalogUtilities
Access Methods Nodes / Lists
Mastering PostgreSQL Administration
2 / 26
Shared Memory Creation
postmaster postgres postgres
Program (Text)
Data
Program (Text)
Data
Shared Memory
Program (Text)
Data
Shared Memory Shared Memory
Stack Stack
fork()
Stack
Inside PostgreSQL Shared Memory
3 / 26
Shared Buffers and WAL
Recovery
fsync
fsync
Query and Checkpoint Operations Transaction Durability
BackendPostgres
BackendPostgres
BackendPostgres
PostgreSQL Shared Buffer Cache Write−Ahead Log
Kernel Disk Buffer Cache
Disk Blocks
PostgreSQL Performance Tuning
4 / 26
Backend Flowchart - Magnified
utility
Plan
Optimal Path
Query
Generate Plan
Traffic Cop
Generate Paths
Execute Plan
e.g. CREATE TABLE, COPYSELECT, INSERT, UPDATE, DELETE
Rewrite Query
Parse Statement
UtilityCommand
PostgreSQL Internals Through Pictures
5 / 26
Query Processing
FindExec: found "/var/local/postgres/./bin/postmaster" using argv[ 0]./bin/postmaster: BackendStartup: pid 3320 user postgres db test socket 5./bin/postmaster child[ 3320 ]: starting with (postgres −d99 −F −d99 −v131072 −p test )FindExec: found "/var/local/postgres/./bin/postgres" using argv[ 0]DEBUG: connection: host=[local] user=postgres database=testDEBUG: InitPostgresDEBUG: StartTransactionCommandDEBUG: query: SELECT firstname FROM friend WHERE age = 33;DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs ( "firstname" "lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}DEBUG: rewritten parse tree:DEBUG: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs ( "firstname" "lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}DEBUG: plan: { SEQSCAN :startup_cost 0.00 :total_cost 22.50 :rows 10 :width 12 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 }DEBUG: ProcessQueryDEBUG: CommitTransactionCommandDEBUG: proc_exit( 0)DEBUG: shmem_exit( 0)DEBUG: exit( 0)./bin/postmaster: reaping dead processes..../bin/postmaster: CleanupProc: pid 3320 exited with status 0
PostgreSQL Internals Through Pictures
6 / 26
EXPLAIN with Constants of Various Frequencies
l | count | lookup_letter---+-------+-----------------------------------------------------------------------p | 199 | Seq Scan on sample (cost=0.00..13.16 rows=199 width=2)s | 9 | Seq Scan on sample (cost=0.00..13.16 rows=9 width=2)c | 8 | Seq Scan on sample (cost=0.00..13.16 rows=8 width=2)r | 7 | Seq Scan on sample (cost=0.00..13.16 rows=7 width=2)t | 5 | Bitmap Heap Scan on sample (cost=4.29..12.76 rows=5 width=2)f | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)v | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)d | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)a | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)_ | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)u | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)e | 2 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)i | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)k | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)
Explaining the Postgres Query Optimizer
7 / 26
Deadlocks
SELECT pg_sleep(0.500); SELECT * FROM lockview1;pid | vxid | lock_type | lock_mode | granted | xid_lock | relname
-------+-------+---------------+------------------+---------+----------+------------11306 | 2/61 | transactionid | ExclusiveLock | t | 710 |11306 | 2/61 | relation | RowExclusiveLock | t | | i_lockdemo11306 | 2/61 | relation | RowExclusiveLock | t | | lockdemo11306 | 2/61 | tuple | ExclusiveLock | t | | lockdemo11306 | 2/61 | transactionid | ShareLock | f | 711 |11642 | 3/116 | transactionid | ExclusiveLock | t | 711 |11642 | 3/116 | relation | RowExclusiveLock | t | | i_lockdemo11642 | 3/116 | relation | RowExclusiveLock | t | | lockdemo11642 | 3/116 | tuple | ExclusiveLock | t | | lockdemo11642 | 3/116 | transactionid | ShareLock | f | 710 |
Unlocking the Postgres Lock Manager
8 / 26
MVCC Behavior
INSERT
UPDATE
DELETE
old (delete)
new (insert)
ExpCre 40
ExpCre 40
47
ExpCre 64
78
ExpCre 78
UPDATE is effectively a DELETE and an INSERT.MVCC Unmasked
9 / 26
MVCC Examples
Visible
Invisible
Invisible
Create−Only
Create & Expire
Visible
Visible
Internally, the creation xid is stored in the system column ’xmin’, and expire in ’xmax’.
Invisible
ExpCre 30
ExpCre 50
CreExp
30110
ExpCre 30
75
ExpCre 30
80
ExpCre 110
For simplicity, assume all othertransactions are committed.
Open Transactions: 25, 50, 75
The highest−numberedcommitted transaction: 100
Snapshot
Sequential Scan
MVCC Unmasked
10 / 26
Heap Page Structure
Page Header Item Item Item
Tuple
Tuple Tuple Special
8K
PostgreSQL Internals Through Pictures
11 / 26
Pg_upgrade: Restore Schema In New Cluster
1
2
3 6
5
4 7
8
9
pg_class
User Tables and Indexes
15 21 27
262014
13 19 25
241812
17 2311
10 16 22
System Tables and Indexes
clog
Old Cluster
1
2
3 6
5
4 7
8
9
pg_class
User Tables and Indexes
26
System Tables and Indexes
clog
New Cluster
pg_dumpall − −schema
17 23
24
25
27
18
10
11
12
13
14
15
16
19
20
21
22
Rapid Upgrades With Pg_Upgrade
12 / 26
Pg_upgrade: Copy User Heap/Index Files
1
2
3 6
5
4 7
8
9
pg_class
User Tables and Indexes
15 21 27
262014
13 19 25
241812
17 2311
10 16 22
System Tables and Indexes
clog
Old Cluster
1
2
3 6
5
4 7
8
9
pg_class
User Tables and Indexes
26
System Tables and Indexes
clog
New Cluster
1610
11
13
12
14
15
19
20
21
22
23
24
25
27
18
17
Rapid Upgrades With Pg_Upgrade
13 / 26
Continuous Archiving
02:0
011
:00
09:0
013
:00
WAL
WALWAL
ContinuousFile System−
Level Backup Archive (WAL)
The Magic of Hot Streaming Replication
14 / 26
Point-in-Time Recovery
WAL
WAL
17:0
017
:30
17:4
017
:55
ContinuousFile System−
Level Backup Archive (WAL)
WAL
The Magic of Hot Streaming Replication
15 / 26
Streaming Replication Setup
02:0
011
:00
09:0
013
:00
WAL
WALWAL
File System−
Level Backup Server
Standby
The Magic of Hot Streaming Replication
16 / 26
Streaming Replication in Operation
archivecommand command
Primary Standby
Network
restore
/pg_wal/pg_wal
Archive
Directory
WAL
The Magic of Hot Streaming Replication
17 / 26
Read Scaling Using Pgpool & Streaming Replication
SELECTINSERT, UPDATE,
DELETE to master
host
Slave SlaveMasterreplication
to any host
replicationstreaming
��������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������
��������������������������������������������������������������������������������������������������������
pgpool
A full copy of the data exists on every node.
PostgreSQL Replication Solutions 18 / 26
Write Scaling Using FDW-Based Sharding
������������������������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������
�����������������������������������������������������������������������������������������������������������������������������������������������������������������
SQL Queries
PG FDW
Foreign Server Foreign Server Foreign Server
aggregates (10)joins (9.6)
sorts (9.6)aggs on partitions (11)
The Future of Postgres Sharding
19 / 26
Database Server Hardware Priorities
I/O
CPU
Memory
Database Hardware Selection Guidelines
20 / 26
Postgres’s Central Role
Extensions
NoSQL
Postgres
WarehouseData
Foreign DataWrappers
Window Functions
Data Paritioning
Bitmap Scans Sharding
Oracle
MongoDB
Easy DDL
JSON
PL/R
ISN
PostGIS
Making Postgres Central in Your Data Center
21 / 26
Use of the Contains Operator @>
\do @>List of operators
Schema | Name | Left arg type | Right arg type | Result type | Description------------+------+---------------+----------------+-------------+-------------pg_catalog | @> | aclitem[] | aclitem | boolean | containspg_catalog | @> | anyarray | anyarray | boolean | containspg_catalog | @> | anyrange | anyelement | boolean | containspg_catalog | @> | anyrange | anyrange | boolean | containspg_catalog | @> | box | box | boolean | containspg_catalog | @> | box | point | boolean | containspg_catalog | @> | circle | circle | boolean | containspg_catalog | @> | circle | point | boolean | containspg_catalog | @> | jsonb | jsonb | boolean | containspg_catalog | @> | path | point | boolean | containspg_catalog | @> | polygon | point | boolean | containspg_catalog | @> | polygon | polygon | boolean | containspg_catalog | @> | tsquery | tsquery | boolean | contains
Non-Relational Postgres
22 / 26
Postgres System Tables
starelid
staattnum
staop
pg_statistic
oprleft
oprright
oprresult
oprcom
oprnegate
oprlsortop
oprrsortop
oprcode
oprrest
oprjoin
pg_operator
typrelid
typelem
typinput
typoutput
typbasetype
pg_type
prolang
prorettype
pg_proc
pg_rewrite
ev_class
datlastsysoid
pg_database
tgfoid
tgrelid
pg_trigger
inhrelid
pg_inherits
inhparent
pg_language
pg_namespacepg_depend pg_shadow
pg_aggregate
aggfinalfn
aggtransfn
aggfnoid
aggtranstype
castsource
casttarget
pg_cast
castfunc
pg_description
pg_constraint
contypid
pg_conversion
conproc
amopopr
amopclaid
pg_attribute
indexrelid
attnum
amopclaid
atttypid
indrelid
pg_attrdef
pg_group
adrelid
pg_index
adnum
pg_am
pg_amop
amgettuple
reltoastidxid
aminsert
reltoastrelid amcostestimate
amproc
ambeginscan
pg_amproc
amrescan
relfilenode
amendscan
relam
ammarkpos
reltype amrestrpos
pg_class
ambuild
opcdeftype
ambulkdelete
pg_opclass
attrelid
PostgreSQL Internals Through Pictures
http://www.postgresql.org/docs/current/catalogs.html23 / 26
CTEs: Mixing Modification Commands
CREATE TEMPORARY TABLE old_orders (order_id INTEGER);
WITH source (order_id) AS (DELETE FROM orders WHERE name = ’my order’ RETURNING order_id
), source2 AS (DELETE FROM items USING source WHERE source.order_id = items.order_id
)INSERT INTO old_orders SELECT order_id FROM source;
Programming the SQL Way with Common Table Expressions
24 / 26
SSL ’Verify-Ca’ Is SecureFrom Spoofing
server.crt
X
root.crt
Database
Fake PostgreSQL
Database
Server
PostgreSQLSSL verify-ca
Invalid certificate
Server(no CA signature)Client
Database
Securing PostgreSQL From External Attack
25 / 26
Conclusion: Release Dates and Sizes After 2000
version | reldate | months | relnotes | lines | change | % change----------+------------+--------+----------+---------+--------+----------7.0 | 2000-05-08 | 11 | | 383270 | 51992 | 157.1 | 2001-04-13 | 11 | | 410500 | 27230 | 77.2 | 2002-02-04 | 10 | 250 | 394274 | -16226 | -37.3 | 2002-11-27 | 10 | 305 | 453282 | 59008 | 147.4 | 2003-11-17 | 12 | 263 | 508523 | 55241 | 128.0 | 2005-01-19 | 14 | 230 | 654437 | 145914 | 288.1 | 2005-11-08 | 10 | 174 | 630422 | -24015 | -38.2 | 2006-12-05 | 13 | 215 | 684646 | 54224 | 88.3 | 2008-02-04 | 14 | 223 | 762697 | 78051 | 118.4 | 2009-07-01 | 17 | 314 | 939098 | 176401 | 239.0 | 2010-09-20 | 15 | 237 | 999862 | 60764 | 69.1 | 2011-09-12 | 12 | 203 | 1069547 | 69685 | 69.2 | 2012-09-10 | 12 | 238 | 1148192 | 78645 | 79.3 | 2013-09-09 | 12 | 177 | 1195627 | 47435 | 49.4 | 2014-12-18 | 15 | 211 | 1261024 | 65397 | 59.5 | 2016-01-07 | 13 | 193 | 1340005 | 78981 | 69.6 | 2016-09-29 | 8 | 214 | 1380458 | 40453 | 3
PostgreSQL: Past, Present, and Future
http://momjian.us/presentations 26 / 26