Database Health Check
-
Upload
postgresql-experts-inc -
Category
Technology
-
view
2.151 -
download
3
Transcript of Database Health Check
![Page 1: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/1.jpg)
DatabaseServerHealthCheck
Josh BerkusPostgreSQL Experts Inc.pgCon 2010
![Page 2: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/2.jpg)
DATABASE SERVERHELP 5¢
![Page 3: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/3.jpg)
Program of Treatment
● What is a Healthy Database?● Know Your Application● Load Testing● Doing a database server checkup
● hardware● OS & FS● PostgreSQL● application
● Common Ailments of the Database Server
![Page 4: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/4.jpg)
What is a Healthy Database Server?
![Page 5: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/5.jpg)
What is a Healthy Database Server?
● Response Times
![Page 6: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/6.jpg)
What is a Healthy Database Server?
● Response Times● lower than required● consistent & predicable
● Capacity for more● CPU and I/O headroom● low server load
![Page 7: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/7.jpg)
25 50 75 100 125 150 175 200 225 2500
5
10
15
20
25
30
Number of Clients
Med
ian
Re s
pon s
e T
ime
Max Response Time
Exp
ecte
d Lo
ad
![Page 8: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/8.jpg)
What is an Unhealthy Database Server?
● Slow response times● Inconsistent response times● High server load● No capacity for growth
![Page 9: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/9.jpg)
25 50 75 100 125 150 175 200 225 2500
5
10
15
20
25
30
Number of Clients
Med
ian
Re s
pon s
e T
ime
Max Response Time
Exp
ecte
d Lo
ad
![Page 10: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/10.jpg)
A healthy database server is able to maintain consistent
and acceptable response times under expected loads with
margin for error.
![Page 11: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/11.jpg)
25 50 75 100 125 150 175 200 225 2500
5
10
15
20
25
30
Number of Clients
Med
ian
Re s
pon s
e T
ime
![Page 12: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/12.jpg)
Hitting The Wall
![Page 13: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/13.jpg)
CPUs Floored
Average: CPU %user %system %iowait %idleAverage:all 69.36 0.13 24.87 5.77
0 88.96 0.09 10.03 1.111 12.09 0.02 86.98 0.002 98.90 0.00 0.00 10.103 77.52 0.44 1.70 20.34
16:38:29 up 13 days, 22:10, 3 users, load average: 11.05, 9.08, 8.13
![Page 14: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/14.jpg)
CPUs Floored
Average: CPU %user %system %iowait %idleAverage:all 69.36 0.13 24.87 5.77
0 88.96 0.09 10.03 1.111 12.09 0.02 86.98 0.002 98.90 0.00 0.00 10.103 77.52 0.44 1.70 20.34
16:38:29 up 13 days, 22:10, 3 users, load average: 11.05, 9.08, 8.13
![Page 15: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/15.jpg)
IO Saturated
Device: tps MB_read/s MB_wrtn/ssde 414.33 0.40 38.15sdf 1452.00 99.14 29.00
Average: CPU %user %system %iowait %idleAverage:all 34.75 0.13 58.75 6.37
0 8.96 0.09 90.03 1.111 12.09 0.02 86.98 0.002 91.90 0.00 7.00 10.103 27.52 0.44 51.70 20.34
![Page 16: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/16.jpg)
Out of Connections
FATAL: connection limit exceeded for non-superusers
![Page 17: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/17.jpg)
How close are youHow close are youto the wall?to the wall?
![Page 18: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/18.jpg)
The Checkup(full physical)
1. Analyze application
2. Analyze platform
3. Correct anything obviously wrong
4. Set up load test
5. Monitor load test
6. Analyze Results
7. Correct issues
![Page 19: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/19.jpg)
The Checkup(semi-annual)
1. Check response times
2. Check system load
3. Check previous issues
4. Check for Signs of Illness
5. Fix new issues
![Page 20: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/20.jpg)
Knowyour
application!
![Page 21: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/21.jpg)
Application database usage
Which does your application do?
✔ small reads
✔ large sequential reads
✔ small writes
✔ large writes
✔ long-running procedures/transactions
✔ bulk loads and/or ETL
![Page 22: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/22.jpg)
What Color Is My Application?● Web Application (Web)
● Online Transaction Processing (OLTP)
● Data Warehousing (DW)
W
O
D
![Page 23: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/23.jpg)
What Color Is My Application?● Web Application (Web)
● DB much smaller than RAM● 90% or more simple queries
● Online Transaction Processing (OLTP)
● Data Warehousing (DW)
W
O
D
![Page 24: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/24.jpg)
What Color Is My Application?● Web Application (Web)
● DB smaller than RAM● 90% or more simple queries
● Online Transaction Processing (OLTP)● DB slightly larger than RAM to 1TB● 20-40% small data write queries● Some long transactions and complex read queries
● Data Warehousing (DW)
W
O
D
![Page 25: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/25.jpg)
What Color Is My Application?● Web Application (Web)
● DB smaller than RAM● 90% or more simple queries
● Online Transaction Processing (OLTP)● DB slightly larger than RAM to 1TB● 20-40% small data write queries● Some long transactions and complex read queries
● Data Warehousing (DW)● Large to huge databases (100GB to 100TB)● Large complex reporting queries● Large bulk loads of data● Also called "Decision Support" or "Business Intelligence"
W
O
D
![Page 26: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/26.jpg)
What Color Is My Application?● Web Application (Web)
● CPU-bound● Ailments: idle connections/transactions, too many queries
● Online Transaction Processing (OLTP)● CPU or I/O bound● Ailments: locks, database growth, idle transactions,
database bloat● Data Warehousing (DW)
● I/O or RAM bound
● Resources: database growth, longer running queries, memory usage growth
W
O
D
![Page 27: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/27.jpg)
Special features required?
● GIS● heavy cpu for GIS functions● lots of RAM for GIS indexes
● TSearch● lots of RAM for indexes● slow response time on writes
● SSL● response time lag on connections
![Page 28: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/28.jpg)
LoadTesting
![Page 29: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/29.jpg)
12:00:00 AM02:00:00 AM
04:00:00 AM06:00:00 AM
08:00:00 AM10:00:00 AM
12:00:00 PM02:00:00 PM
04:00:00 PM06:00:00 PM
08:00:00 PM10:00:00 PM
0
10
20
30
40
50
60
70
80
Time
Re
qu
est
s P
er
Se
c on
d
![Page 30: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/30.jpg)
12:00:00 AM02:00:00 AM
04:00:00 AM06:00:00 AM
08:00:00 AM10:00:00 AM
12:00:00 PM02:00:00 PM
04:00:00 PM06:00:00 PM
08:00:00 PM10:00:00 PM
0
10
20
30
40
50
60
70
80
Time
Re
qu
est
s P
er
Se
c on
d
DO
WN
TIM
E
![Page 31: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/31.jpg)
When preventing downtime,it is not average load which
matters, it is peak load.
![Page 32: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/32.jpg)
What to load test
● Load should be as similar as possible to your production traffic
● You should be able to create your target level of traffic● better: incremental increases
● Test the whole application as well ● the database server may not be your weak point
![Page 33: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/33.jpg)
How to Load Test
1. Set up a load testing tool
you'll need test servers for this*
2. Turn on PostgreSQL, HW, application monitoring
all monitoring should start at the same time
3. Run the test for a defined time
1 hour is usually good
4. Collect and analyze data
5. Re-run at higher level of traffic
![Page 34: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/34.jpg)
Test Servers
● Must be as close as reasonable to production servers● otherwise you don't know how production will be
different● there is no predictable multiplier
● Double them up as your development/staging or failover servers
● If your test server is much smaller, then you need to do a same-load comparison
![Page 35: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/35.jpg)
Tools for Load Testing
![Page 36: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/36.jpg)
Production Test
1. Determine the peak load hour on the production servers
2. Turn on lots of monitoring duringthat peak load hour
3. Analyze results
Pretty much your only choice without a test server.
![Page 37: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/37.jpg)
Issues with Production Test
● Not repeatable
− load won't be exactly the same ever again
● Cannot test target load
− just whatever happens to occur during that hour
−can't test incremental increases either
● Monitoring may hurt production performance
● Cannot test experimental changes
![Page 38: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/38.jpg)
The Ad-Hoc Test
● Get 10 to 50 coworkers to open several sessions each
● Have them go crazy on using the application
![Page 39: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/39.jpg)
Problems with Ad-Hoc Testing
● Not repeatable● minor changes in response times may be due to
changes in worker activity
● Labor intensive● each test run shuts down the office
● Can't reach target levels of load● unless you have a lot of coworkers
![Page 40: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/40.jpg)
Seige
● HTTP traffic generator● all test interfaces must be addressable as URLs● useless for non-web applications
● Simple to use● create a simple load test in a few hours
● Tests the whole web application● cannot test database separately
● http://www.joedog.org/index/siege-home
![Page 41: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/41.jpg)
pgReplay
● Replays your activity logs at variable speed● get exactly the traffic you get in production
● Good for testing just the database server● Can take time to set up
● need database snapshot, collect activity logs● must already have production traffic
● http://pgreplay.projects.postgresql.org/
![Page 42: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/42.jpg)
tsung● Generic load generator in erlang
● a load testing kit rather than a tool● Generate a tsung file from your actvity logs using
pgFouine and test the database● Generate load for a web application using custom
scripts
● Can be time consuming to set up● but highly configurable and advanced● very scalable - cluster of load testing clients
● http://tsung.erlang-projects.org/
![Page 43: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/43.jpg)
pgBench
● Simple micro-benchmark● not like any real application
● Version 9.0 adds multi-threading, customization● write custom pgBench scripts● run against real database
● Fairly ad-hoc compared to other tools● but easy to set up
● ships with PostgreSQL
![Page 44: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/44.jpg)
Benchmarks
● Many “real” benchmarks available● DBT2, EAstress, CrashMe, DBT5, DBMonster, etc.
● Useful for testing your hardware● not useful for testing your application
● Often time-consuming and complex
![Page 45: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/45.jpg)
Platform-specific
● Web framework or platform tests● Rails: ActionController::PerformanceTest● J2EE: OpenDemand, Grinder, many more
– JBoss, BEA have their own tools● Zend Framework Performance Test
● Useful for testing specific application performance● such as performance of specific features, modules
● Not all platforms have them
![Page 46: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/46.jpg)
Flight-Check
● Attend the tutorial tomorrow!
![Page 47: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/47.jpg)
monitoring PostgreSQL during load test
log_collector = onlog_destination = 'csvlog'log_filename = 'load_test_1_%h'log_rotation_age = 60minlog_rotation_size = 1GB
log_min_duration_statement = 0log_connections = onlog_disconnections = onlog_temp_files = 100kBlog_lock_waits = on
![Page 48: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/48.jpg)
monitoring hardware during load test
sar -A -o load_test_1.sar 30 240
iostat or fsstat or zfs iostat
![Page 49: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/49.jpg)
monitoring application during load test
● Collect response times● with timestamp● with activity
● Monitor hardware and utilization● activity● memory & CPU usage
● Record errors & timeouts
![Page 50: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/50.jpg)
Checking Hardware
![Page 51: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/51.jpg)
Checking Hardware
● CPUs and Cores● RAM● I/O & disk support● Network
![Page 52: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/52.jpg)
CPUs and Cores
● Pretty simple: ● number● type● speed● L1/L2 cache
● Rules of thumb● fewer faster CPUs is
usually better than more slower ones
● core != cpu● thread != core● virtual core != core
![Page 53: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/53.jpg)
CPU calculations
● ½ to 1 core for OS● ½ to 1 core for software raid or ZFS● 1 core for postmaster and bgwriter● 1 core per:
● DW: 1 to 3 concurrent users● OLTP: 10 to 50 concurrent users● Web: 100 to 1000 concurrent users
![Page 54: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/54.jpg)
CPU tools
● sar● mpstat● pgTop
![Page 55: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/55.jpg)
in praise of sar
● collects data about all aspects of HW usage● available on most OSes
● but output is slightly different
● easiest tool for collecting basic information● often enough for server-checking purposes
● BUT: does not report all data on all platforms
![Page 56: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/56.jpg)
sar
CPUs: sar -P ALL and sar -uMemory: sar -r and sar -RI/O: sar -b and sar -dnetwork: sar -n
![Page 57: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/57.jpg)
sar CPU output
06:05:01 AM CPU %user %nice %system %iowait %steal %idle06:15:01 AM all 14.26 0.09 6.01 1.32 0.00 78.3206:15:01 AM 0 14.26 0.09 6.01 1.32 0.00 78.32
15:08:56 %usr %sys %wio %idle15:09:26 10 5 0 8515:09:56 9 7 0 8415:10:26 15 6 0 8015:10:56 14 7 0 7915:11:26 15 5 0 8015:11:56 14 5 0 81
Linux
Solaris
![Page 58: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/58.jpg)
Memory
● Only one statistic: how much?● Not generally an issue on its own
● low memory can cause more I/O● low memory can cause more CPU time
![Page 59: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/59.jpg)
memory sizing
SharedBuffers
work_memmaint_mem
FilesystemCache
In Buffer
In Cache
On Disk
![Page 60: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/60.jpg)
Figure out Memory Sizing
● What is the active portion of your database?● i.e. gets queried frequently
● How large is it?● Where does it fit into the size categories?● How large is the inactive portion of your
database?● how frequently does it get hit? (remember backups)
![Page 61: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/61.jpg)
Memory Sizing
● Other needs for RAM – work_mem:● sorts and aggregates: do you do a lot of big ones?● GIN/GiST indexes: these can be huge● hashes: for joins and aggregates● VACUUM
![Page 62: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/62.jpg)
I/O Considerations
● Throughput● how fast can you get data off disk?
● Latency● how long does it take to respond to requests?
● Seek Time● how long does it take to find random disk pages?
![Page 63: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/63.jpg)
I/O Considerations
● Throughput● important for large databases● important for bulk loads
● Latency● huge effect on small writes & reads● not so much on large scans
● Seek Time● important for small writes & reads● very important for index lookups
![Page 64: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/64.jpg)
I/O Considerations
● Web● concerned about read latency & seek time
● OLTP● concerned about write latency & seek time
● DW/BI● concerned about throughput & seek time
![Page 65: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/65.jpg)
------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-
Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
32096M 79553 99 240548 45 50646 5 72471 94 185634 10 1140 1
------Sequential Output------ --Sequential Input-- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP/sec %CP
24G 260044 33 62110 17 89914 15 1167 25
6549ms 4882ms 3395ms 107ms
![Page 66: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/66.jpg)
Common I/O Types
● Software RAID & ZFS● Hardware RAID Array● NAS/SAN● SSD
![Page 67: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/67.jpg)
Hardware RAID Sanity Check
● RAID 1 / 10, not 5● Battery-backed write cache?
● otherwise, turn write cache off
● SATA < SCSI/SAS● about ½ real throughput
● Enough drives?● 4-14 for OLTP application● 8-48 for DW/BI
![Page 68: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/68.jpg)
Sw RAID / ZFS Sanity Check
● Enough CPUs?● will need one for the RAID
● Enough disks?● same as hardware raid
● Extra configuration?● caching● block size
![Page 69: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/69.jpg)
NAS/SAN Sanity Check
● Check latency!● Check real throughput
● drivers often a problem
● Enough network bandwidth?● multipath or fiber required to get HW RAID
performance
![Page 70: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/70.jpg)
SSD Sanity Check
● 1 SSD = 4 Drives● relative performance
● Check write cache configuration● make sure data is safe
● Test real throughput, seek times● drivers often a problem
● Research durability stats
![Page 71: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/71.jpg)
IO Tools
● I/O Tests● dd test● Bonnie++● IOZone● filebench
● Monitoring Tools● sar● mpstat iowait● iostat● on zfs: fsstat, zfs
-iostat● EXPLAIN ANALYZE
![Page 72: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/72.jpg)
Network
● Throughput● not usually an issue, except:
– iSCSI / NAS / SAN– ELT & Bulk Load Processes
● remember that gigabit is only 100MB/s!
● Latency● real issue for Web / OLTP● consider putting app ↔ database on private
network
![Page 73: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/73.jpg)
Checkups for the Cloud
![Page 74: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/74.jpg)
Just like real HW, except ...
● Low ceiling on #cpus, RAM● Virtual Core < Real Core
● “CPU Stealing”● last-generation hardware● calculate 50% more cores
![Page 75: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/75.jpg)
Cloud I/O Hell
● I/O tends to be very slow, erratic● comparable to a USB thumb drive● horrible latency, up to ½ second● erratic, speeds go up and down● RAID together several volumes on EBS● use asynchronous commit
– or at least commit_siblings
![Page 76: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/76.jpg)
#1 Cloud Rule
If your databasedoesn't fit in RAM,
don't host iton a public cloud
![Page 77: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/77.jpg)
Checking Operating Systemand Filesystem
![Page 78: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/78.jpg)
OS Basics
● Use recent versions● large performance, scaling improvements in Linux &
Solaris in last 2 years
● Check OS tuning advice for databases● advice for Oracle is usually good for PostgreSQL
● Keep up with information about issues & patches● frequently specific releases have major issues● especially check HW drivers
![Page 79: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/79.jpg)
OS Basics
● Use Linux, BSD or Solaris!● Windows has poor performance and weak
diagnostic tools● OSX is optimized for desktop and has poor
hardware support● AIX and HPUX require expertise just to install, and
lack tools
![Page 80: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/80.jpg)
Filesystem Layout
● One array / one big pool● Two arrays / partitions
● OS and transaction log● Database
● Three arrays● OS & stats file● Transaction log● Database
![Page 81: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/81.jpg)
Linux Tuning
● XFS > Ext3 (but not that much)
● Ext3 Tuning: data=writeback,noatime,nodiratime● XFS Tuning: noatime,nodiratime
– for transaction log: nobarrier
● “deadline” I/O scheduler● Increase SHMMAX and SHMALL
● to ½ of RAM
● Cluster filesystems also a possibility● OCFS, RHCFS
![Page 82: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/82.jpg)
Solaris Tuning
● Use ZFS● no advantage to UFS anymore● mixed filesystems causes caching issues● set recordsize
– 8K small databases– 128K large databases– check for throughput/latency issues
![Page 83: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/83.jpg)
Solaris Tuning
● Set OS parameters via “projects”● For all databases:
● project.max-shm-memory=(priv,12GB,deny)
● For high-connection databases:● use libumem● project.max-shm-ids=(priv,32768,deny)● project.max-sem-ids=(priv,4096,deny)● project.max-msg-ids=(priv,4096,deny)
![Page 84: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/84.jpg)
FreeBSD Tuning
● ZFS: same as Solaris● definite win for very large databases● not so much for small databases
● Other tuning per docs
![Page 85: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/85.jpg)
PostgreSQL Checkup
![Page 86: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/86.jpg)
postgresql.conf: formulae
shared_buffers = available RAM / 4
![Page 87: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/87.jpg)
postgresql.conf: formulae
max_connections =web: 100 to 200OLTP: 50 to 100DW/BI: 5 to 20
if you need more, use pooling!
![Page 88: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/88.jpg)
postgresql.conf: formulae
Web/OLTP:work_mem = Av.RAM * 2 / max_connections
DW/BI:work_mem AvRAM / max_connections
![Page 89: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/89.jpg)
postgresql.conf: formulae
Web/OLTP:maintenance_work_mem = Av.RAM * 16
DW/BI:maintenance_work_mem = AvRAM / 8
![Page 90: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/90.jpg)
postgresql.conf: formulae
autovacuum = on
DW/BI & bulk loads:autovacuum = offautovacuum_max_workers = 1/2
![Page 91: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/91.jpg)
postgresql.conf: formulae
checkpoint_segments = web: 8 to 16oltp: 32 to 64BI/DW: 128 to 256
![Page 92: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/92.jpg)
postgresql.conf: formulae
wal_buffers = 8MB
effective_cache_size = AvRAM * 0.75
![Page 93: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/93.jpg)
How much recoverability do you need?
● None: ● fsync=off● full_page_writes=off● consider using ramdrive
● Some Loss OK● synchronous_commit = off● wal_buffers = 16MB to 32MB
● Data integrity critical● keep everything on
![Page 94: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/94.jpg)
File Locations
● Database● Transaction Log● Activity Log● Stats File● Tablespaces?
![Page 95: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/95.jpg)
Database Checks: Indexes
select relname, seq_scan, seq_tup_read, pg_size_pretty(pg_relation_size(relid)) as size, coalesce(n_tup_ins,0) + coalesce(n_tup_upd,0) + coalesce(n_tup_del,0) as update_activity from pg_stat_user_tables where seq_scan > 1000 and pg_relation_size(relid) > 1000000 order by seq_scan desc limit 10; relname | seq_scan | seq_tup_read | size | update_activity ----------------+----------+--------------+---------+----------------- permissions | 12264 | 53703 | 2696 kB | 365 users | 11697 | 351635 | 17 MB | 741 test_set | 9150 | 18492353300 | 275 MB | 27643 test_pool | 5143 | 3141630847 | 212 MB | 77755
![Page 96: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/96.jpg)
Database Checks: IndexesSELECT indexrelid::regclass as index , relid::regclass as table FROM pg_stat_user_indexes JOIN pg_index USING (indexrelid) WHERE idx_scan < 100 AND indisunique IS FALSE;
index | table acct_acctdom_idx | accounts hitlist_acct_idx | hitlist hitlist_number_idx | hitlist custom_field_acct_idx | custom_field user_log_accstrt_idx | user_log user_log_idn_idx | user_log user_log_feed_idx | user_log user_log_inbdstart_idx | user_log user_log_lead_idx | user_log
![Page 97: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/97.jpg)
Database Checks:Large Tables
relname | total_size | table_size-------------------------+------------+------------ operations_2008 | 9776 MB | 3396 MB operations_2009 | 9399 MB | 3855 MB request_by_second | 7387 MB | 5254 MB request_archive | 6975 MB | 3349 MB events | 92 MB | 66 MB event_edits | 82 MB | 68 MB 2009_ops_eoy | 33 MB | 19 MB
![Page 98: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/98.jpg)
Database Checks:Heavily-Used Tables
select relname, pg_size_pretty(pg_relation_size(relid)) as size, coalesce(n_tup_ins,0) + coalesce(n_tup_upd,0) + coalesce(n_tup_del,0) as update_activity from pg_stat_user_tables order by update_activity desc limit 10;
relname | size | update_activity ------------------------+---------+----------------- session_log | 344 GB | 4811814 feature | 279 MB | 1012565 daily_feature | 28 GB | 984406 cache_queue_2010_05 | 2578 MB | 981812 user_log | 30 GB | 796043 vendor_feed | 29 GB | 479392 vendor_info | 23 GB | 348355 error_log | 239 MB | 214376 test_log | 945 MB | 185785 settings | 215 MB | 117480
![Page 99: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/99.jpg)
Database Unit Tests
● You need them!● you will be changing database objects and rewriting
queries● find bugs in testing or in testing … or in production
● Various tools● pgTAP● Framework-level tests
– Rails, Django, Catalyst, JBoss, etc.
![Page 100: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/100.jpg)
Application StackCheckup
![Page 101: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/101.jpg)
The Layer Cake
HardwareStorage
Operating System
PostgreSQL
Middleware
Application
Filesystem
Schema
Drivers
Queries
RAM/CPU Network
Kernel
Config
Connections Caching
Transactions
![Page 102: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/102.jpg)
The Layer Cake
HardwareStorage
Operating System
PostgreSQL
Middleware
Application
Filesystem
Schema
Drivers
Queries
RAM/CPU Network
Kernel
Config
Connections Caching
Transactions
![Page 103: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/103.jpg)
The Funnel
HW
Application
Middleware
PostgreSQL
OS
![Page 104: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/104.jpg)
Check PostgreSQL Drivers
● Does the driver version match the PostgreSQL version?
● Have you applied all updates?● Are you using the best driver?
● There are several Python, C++ drivers● Don't use ODBC if you can avoid it.
● Does the driver support cached plans & binary data?● If so, are they being used?
![Page 105: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/105.jpg)
Check Caching
![Page 106: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/106.jpg)
Check Caching
● Does the application use data caching?● what kind?● could it be used more?● what is the cache invalidation strategy?● is there protection from “cache refresh storms”?
● Does the application use HTTP caching?● could they be using it more?
![Page 107: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/107.jpg)
Check Connection Pooling
● Is the application using connection pooling?● all web applications should, and most OLTP● external or built into the application server?
● Is it configured correctly?● max. efficiency: transaction / statement mode● make sure timeouts match
![Page 108: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/108.jpg)
Check Query Design
● PostgreSQL does better with fewer, bigger statements
● Check for common query mistakes● joins in the application layer● pulling too much data and discarding it● huge OFFSETs● unanchored text searches
![Page 109: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/109.jpg)
Check Transaction Management
● Are transactions being used for loops?● batches of inserts or updates can be 75% faster if
wrapped in a transaction
● Are transactions aborted properly?● on error● on timeout● transactions being held open while non-database
activity runs
![Page 110: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/110.jpg)
Common Ailmentsof the
Database Server
![Page 111: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/111.jpg)
Check for them, monitor for them
● ailments could throw off your response time targets● database could even “hit the wall”
● check for them during health check● and during each checkup
● add daily/continuous monitors for them● Nagios check_postgres.pl has checks for many of
these things
![Page 112: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/112.jpg)
Database Growth
● Checkup:● check both total database size and largest table(s)
size daily or weekly
● Symptoms:● database grows faster than expected● some tables grow continuously and rapidly
![Page 113: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/113.jpg)
Database Growth
● Caused By:● faster than expected increase in usage● “append forever” tables● Database Bloat
● Leads to:● slower seq scans and index scans● swapping & temp files● slower backups
![Page 114: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/114.jpg)
Database Growth
● Treatment:● check for Bloat● find largest tables and make them smaller
– expire data– partitioning
● horizontal scaling (if possible)● get better storage & more RAM, sooner
![Page 115: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/115.jpg)
Database Bloat-[ RECORD 1 ]+-----schemaname | publictablename | user_logtbloat | 3.4wastedpages | 2356903wastedbytes | 19307749376wastedsize | 18 GBiname | user_log_accttime_idxituples | 941451584ipages | 9743581iotta | 40130146ibloat | 0.2wastedipages | 0wastedibytes | 0wastedisize | 0 bytes
![Page 116: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/116.jpg)
Database Bloat
● Caused by: ● Autovacuum not keeping up
– or not enough manual vacuum– often on specific tables only
● FSM set wrong (before 8.4)● Idle In Transaction
● Leads To:● slow response times● unpredictable response times● heavy I/O
![Page 117: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/117.jpg)
Database Bloat
● Treatment:● make autovacuum more aggressive
– on specific tables with bloat● fix FSM_relations/FSM_pages● check when tables are getting vacuumed● check for Idle In Transaction
![Page 118: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/118.jpg)
Memory Usage Growth00:00:01 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s01:00:00 0 0 100 0 0 100 0 002:00:00 0 0 100 0 0 100 0 003:00:00 0 0 100 0 0 100 0 004:00:00 0 0 100 0 0 100 0 0
00:00:01 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s01:00:00 3788 115 98 0 0 100 0 002:00:00 21566 420 78 0 0 100 0 003:00:00 455721 1791 59 0 0 100 0 004:00:00 908 6 96 0 0 100 0 0
![Page 119: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/119.jpg)
Memory Usage Growth
● Caused by:● Database Growth or Bloat● work_mem limit too high● bad queries
● Leads To:● database out of cache
– slow response times● OOM Errors (OOM Killer)
![Page 120: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/120.jpg)
Memory Usage Growth
● Treatment● Look at ways to shrink queries, DB
– partitioning– data expiration
● lower work_mem limit● refactor bad queries● Or just buy more RAM
![Page 121: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/121.jpg)
Idle Connections
select datname, usename, count(*) from pg_stat_activity where current_query = '<IDLE>' group by datname, usename;
datname | usename | count ---------+---------+------- track | www | 318
![Page 122: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/122.jpg)
Idle Connections
● Caused by:● poor session management in application● wrong connection pool settings
● Leads to:● memory usage for connections● slower response times● out-of-connections at peak load
![Page 123: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/123.jpg)
Idle Connections
● Treatment:● refactor application● reconfigure connection pool
– or add one
![Page 124: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/124.jpg)
Idle In Transaction
select datname, usename, max(now() - xact_start) as max_time, count(*) from pg_stat_activity where current_query ~* '<IDLE> in transaction' group by datname, usename;
datname | usename | max_time | count ---------+----------+---------------+------- track | admin | 00:00:00.0217 | 1 track | www | 01:03:06.0709 | 7
![Page 125: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/125.jpg)
Idle In Transaction
● Caused by:● poor transaction control by application● abandoned sessions not being terminated fast
enough
● Leads To:● locking problems● database bloat● out of connections
![Page 126: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/126.jpg)
Idle In Transaction
● Treatment● refactor application● change driver/ORM settings for transactions● change session timeouts & keepalives on pool,
driver, database
![Page 127: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/127.jpg)
Longer Running Queries
● Detection:● log slow queries to PostgreSQL log● do daily or weekly report (pgfouine)
● Symptoms:● number of long-running queries in log increasing● slowest queries getting slower
![Page 128: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/128.jpg)
Longer Running Queries
● Caused by:● database growth● poorly-written queries● wrong indexes● out-of-date stats
● Leads to:● out-of-CPU● out-of-connections
![Page 129: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/129.jpg)
Longer Running Queries
● Treatments:● refactor queries ● update indexes● make Autoanalyze more aggressive● control database growth
![Page 130: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/130.jpg)
Too Many Queries
![Page 131: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/131.jpg)
Too Many Queries
● Caused By:● joins in middleware● not caching● poll cycles without delays● other application code issues
● Leads To:● out-of-CPU● out-of-connections
![Page 132: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/132.jpg)
Too Many Queries
● Treatment:● characterize queries using logging● refactor application
![Page 133: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/133.jpg)
Locking
● Detection:● log_lock_waits● scan activity log for deadlock warnings● query pg_stat_activity and pg_locks
● Symptoms:● deadlock error messages● number and time of lock_waits getting larger
![Page 134: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/134.jpg)
Locking
● Caused by:● long-running operations with exclusive locks● inconsistent foreign key updates● poorly planned runtime DDL
● Leads to:● poor response times● timeouts● deadlock errors
![Page 135: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/135.jpg)
Locking
● Treatment● analyze locks● refactor operations taking locks
– establish a canonical order of updates for long transactions
– use pessimistic locks with NOWAIT● rely on cascade for FK updates
– not on middleware code
![Page 136: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/136.jpg)
Temp File Usage
● Detection:● log_temp_files = 100kB● scan logs for temp files weekly or daily
● Symptoms:● temp file usage getting more frequent● queries using temp files getting longer
![Page 137: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/137.jpg)
Temp File Usage
● Caused by:● Sorts, hashes & aggregates too big for work_mem
● Leads to:● slow response times● timeouts
![Page 138: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/138.jpg)
Temp File Usage
● Treatment● find swapping queries via logs● set work_mem higher for that ROLE, or● refactor them to need less memory, or● buy more RAM
![Page 139: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/139.jpg)
All healthy now?
See you in six months!
![Page 140: Database Health Check](https://reader034.fdocuments.us/reader034/viewer/2022042513/554f5726b4c905423f8b5641/html5/thumbnails/140.jpg)
Q&A
● Josh Berkus● [email protected]● it.toolbox.com/blogs/
database-soup
● PostgreSQL Experts● www.pgexperts.com● pgCon Sponsor
● Also see:● Load Testing
(tommorrow)● Testing BOF (Friday)
Copyright 2010 Josh Berkus & PostgreSQL Experts Inc. Distributable under the creative commons attribution license,except for 3rd-party images which are property of their respective owners.