Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool for DBA~

Post on 30-May-2015

2.112 views 12 download



How do you analyze the database performance and sign of trouble database? PostgreSQL provides many useful statistcs and DB activities via system views and cotrib modules. But it is difficult to understand detail information and see all of the database condition. Pg_statsinfo and pg_stats_reporter which were made by NTT Corporation are statistics reporting tools as open source software for DBA. They provide more useful statistics information and visual reporting. In this session, I introduce architecture, installation and use case about these tools.

Transcript of Introduction of pg_statsinfo and pg_stats_reporter ~Statistics Reporting Tool for DBA~

Introduction  of  pg_̲statsinfo  and  pg_̲stats_̲reporter  

~∼  Statistics  Reporting  Tool  for  DBA  ~∼  

NTT  Open  Source  Software  CenterMitsumasa  KONDO

• pg_statsinfo  • Monitor  and  Collect  PostgreSQL  Statistics  and  Activities

• pg_stats_reporter  •  Visualize  PostgreSQL  Statistics  and  Activities  getting  from  pg_̲statsinfo

Todayʼ's Introduction Software  



pg_statsinfo Repository  Database

Database Statistics

and Activity

DB  Server  A

DB  Server  B

DB  Server  C

Sample report which was created by pg_stats_reporter

Creating report

Store  of  DBstatistics


• Monitoring  and  Collecting  PostgreSQL  Statistics  and  Activities  •  Collecting  statistics  and  activities•  All  tables  in  pg_̲catalog  schema•  pg_̲log  information•  OS  resources  

•  Other  Features•  Create  Report  by  command  line•  Alert  and  Monitoring  function•  Log  management  function•  Auto  repositoryDB  management

• Other  relative  information  •  BSD  License•  Latest  version  is  2.5.0•̲id=1000422•  Working  on  PostgreSQL  9.3!• Web  online  manual  is  here•̲statsinfo-‐‑‒ja.html

What is pg_statsinfo ?

Collective  Database  Statistics

• Programing  Language  •  C

• Starting  and  Pre-­‐Setting  method  •  Start  pg_̲statsinfo  via  shared_̲preload_̲library•  Add  postgresql.conf  to  pg_̲statsinfo  configuration,  then  it  can  start  normally  in  PostgreSQL.

• System  Configuration  •  Install  pg_̲statsinfo  in  monitoring  instance

•  Not  need  to  install  in  repository  database  instance•  Monitoring  instance  and  repository  database  can  set  together  incetance  

Architecture of pg_statsinfo


Collect and send database statistics


Monitoring  instance

Repository  database


OS  resources

pg_log Statistics  ofdatabase

• Collect  statistics  and  activities  in  PostgreSQL  •  All  information  gathering  PostgreSQLʼ’s  statistics  collector  (ex.  pg_̲catalog)

•  Detail  of  statistics  collector,  please  see  PostgreSQL  documentJ

• /document/9.2/html/monitoring-‐‑‒stats.html•  Get  statistcs  as  snapshot  at  uniformity  time

•  Default  every  10  minute  

•  Analyze  pg_̲log  and  get  activities  from  logs•  Get  activities  which  only  output  pg_̲log

•  Checkpoint  activities•  VACUUM  activities

•  Get  OS  resources  information  in  /proc•  Get  every  5  seconds  in  sampling,  when  get  snapshot,  insert  average  values  of  sampling

•  CPU  usage  information(idle,  iowait,  system,  user,  Load  Average)•  Memory  usage  information(memfree,  buffers,  cached,  swap,  dirty)•  Disk  usage  information(IO  size,  IO  time,  usage  size  of  disk)

Features of pg_statsinfo 1/5

• Create  reports  on  command  line  •  Output  text  format  report  on  command  line

•  Example)  Database  admin  or  SQL  Engineer  who  wants  to  see  database  statistics

•  Cover  almost  all  report  item  created  by  pg_̲stats_̲reporter

Features of pg_statsinfo 2/5

$  pg_statsinfo  -U  postgres  -B  2013-10-01  -r  ALL  |    less  

Command  example:  Create  report  for  all  monitor  instances  on  2013-10-1  to  now

Features of pg_statsinfo 2/5

• Auto  maintenance  repository  database  feature  •  Delete  statistics  that  stored  in  repository  database  automatically

•  Pg_̲statsinfo  stored  data  that  are  used  partitioning  method  per  day.  •  So  it  can  use  TRUNCATE  to  delete  old  data•  Delete  data  is  faster  and  lower  cost

• Note  • When  we  use  in  multi  monitor  instance,  giving  priority  to  shortest  maintenance  period  of  stored  data  configuration

Features of pg_statsinfo 3/5



Maintenance period of stored data config

1 week

Maintenance period of stored data config

2 weeks

DB  server  A

DB  server  B

Store  of  database  statistics

Default maintenance period of stored data is 1 weeks

Get  and  Send  database  statistics

Repository  database

•  Log  management  feature  •  Easy  to  manage  PostgreSQLʼ’s  log•  Log  filtering  feature

•  Can  set  log  level  in  pg_̲statsinfo,  it  means  that  we  can  having  two  log  level•  example)PostgreSQLʼ’s  log  level  is  lower  setting  to  save  detail   information,  and  pg_̲statsinfo  log  level  is  higher  setting  to  easy  to  read  in  daily

•  This  feature  can  fix  log  file  name(ex.  postgresql.log)  It  can  use  in  monitoring  log  software.

•  Multi  output  log  feature•  Can  output  syslog  and  pg_̲log

•  Change  log  level  feature•  If  you  want  to  change  log  level  in  especially  log  message,  we  can  change  it

•  ex)change  log  level  INFO  to  LOG  in  especially  log  message

•  Log  compression  and  managing  feature•  Compress  old  logs  and  manage  automatically

Features of pg_statsinfo 4/5  


pg_log(csv  format)

Log  by  statsinfo(postgresql.log)

log formulation

Flow  of  extraction  statistics  from  pg_log

• Alert  and  Monitoring  Function  (Trigger  Function)  •  Output  alert  log  when  over  the  alert  thresholds  in  database

•  usage)monitor  alert  log  by  monitoring  software•  Alert  function  is  executed  in  every  snapshot

•  Default  setting  is  under  following,  set  property  value  on  your  server

•  Setting  method  is  UPDATE  SQL  for  statsrepo.alert  table

Features of pg_statsinfo 5/5  

colum  name default explanation

instid - Target instance ID rollback_tps 100 Number of rollback (sec)commit_tps 1000 Number of commit per seconds (sec)

garbage_size 20000 Garbage records size in the table(%) garbage_percent 30 Garbage records percentage in the database(%)

garbage_percent_table 30 Garbage records percentage in the table(%)

response_avg 10 average response time in the query (sec) response_worst 60 Worst response time in the query (sec)

enable_alert true Enable alert function

Alert configuration table

How to install pg_statsinfo ?

$  su#  rpm  –ivh  pg_statsinfo-2.50-1.pg93.rhel6.x86_64.rpm

1. Install  RPM  file’s

#minimum  configurationshared_preload_libraries  =  ‘pg_statsinfo’                                #  pre-load  library  settinglog_filename  =  'postgresql-%Y-%m-%d_%H%M%S.log'  #  configuration  of  log  file’s  (must  need)

2.  Add  configuration  to  postgresql.conf

$  pg_ctl  –D  data  start

3.  Start  PostgreSQL  in  normally

server  startingLOG:    loaded  library  "pg_statsinfo"LOG:    pg_statsinfo  launcher  started

LOG:    startLOG:    installing  schema:  statsinfo

LOG:    installing  schema:  statsrepo_partition

4.  If  we  see  under  following  log  messages,  install  was  succeed  !

How  to  install  pg_statsinfo  is  indicated  in  Web  manual  !  J

2.Confirmation  of  Install  

3.Collect  Database  Statistics  and  Activities  (Snapshot)  

4.Create  Report  

Demo of pg_statsinfo

• One  snapshot  size  is  300kB  ~  800kB•  Be  careful  disk  full  by  snapshots!

• Software  installing  degradation  is  almost  nothing  •  But  little  bit  happen.  In  DBT-‐‑‒2  benchmark,  we  confirm  2%  degradation.

•  If  you’d  like  to  separate  repository  server,  set  “pg_statsinfo.repository_server”  in  postgresql.conf  .  

•  Default  setting  is  ʻ‘host=localhost  port=5432ʼ’

•  If  you  use  password  in  repository  database,  set  /var/lib/pgsql/.pgpass    

•  pg_̲statsinfo  works  on  postgres  user

TIPS of pg_statsinfo

• Visualization  PostgreSQL  statistics  and  activities  getting  from  pg_statsinfo  

•  Report  items•  Transaction  situation•  Size  of  Database•  OS  resources•  Amount  of  WAL  output•  Replication  state•  Deadlock  information

•  Successor  software  of                                        pg_̲reporter

• Extra  information  •  BSD  License•  Latest  version  is  2.0.0•̲id=1000422•  Detail  online  manual  is  here•̲stats_̲reporter-‐‑‒ja.html

What is pg_stats_reporter ?

Report  of  pg_stats_reporter

• Software  •  Apache  +  PHP  +  PostgreSQL

•  Only  PHP  +  PostgreSQL  combination  is  OK•  Need  PostgreSQL  8.3  later

• Programing  Language  •  PHP  +  javascript  +  SQL

• Using  Library  •  PHP  framework

•  Smarty•  User  Interface

•  jQuery,  jQuery  UI,  tablesorter,  Superfish•  Creating  graph

•  dygraphs,  jqPlot

Architecture of pg_stats_reporter  

• By  Wab  Browser  •  Only  a  few  clicks  for  creating  report.

How to Create Report ? 1/2

②  Push “create new

report” button

①  Select database instance

for reporting

③  Set term and time of report

21  Copyright(c)2013  NTT  Corp.  All  Rights  Reserved.

• By  command  line  •  It  works  on  phpʼ’s  stand  alone  mode.

•  Usage  scene•  Create  report  in  command  line.•  Create  reports  by  crond  in  regular  intervals.

•  If  you  use  only  command  line  mode,  Apache  wasnʼ’t  needed

•  If  you  have  security  policy  which  cannot  install  Apache

•  Need  to  save  reports  in  long  term•  Repository  database  is  saved  until  certain  terms•  Created  reports  arenʼ’t  erased.

How to Create Report ? 2/2

$  pg_stats_reporter  -B  2013-10-01  -E  2013-10-08  -O  report_dir  [LOG]  Report  file  created:  sample_localhost_5432_1_20131008-1419_20131008-1945.html

Command usage: Create report in 10/1 to 10/8 at report_dir�

•  Index  of  Report  feature  •  Create  report  and  index  of  reports  in  report  directory•  It  is  easy  to  see  and  sort  out  reports

How to Create Report ? 2/2


Report  HTML  1

Libraly of pg_stats_reporter

Report  HTML2


Directory of Report

Reports which were created past

Index of report

How to install pg_stats_reporter ?

$  su#  rpm  –ivh    httpd-2.2.15-15.el6_2.1.x86_64.rpm  \\                                php-5.3.3-3.el6_2.8.x86_64.rpm  \\

                               php-common-5.3.3-3.el6_2.8.x86_64.rpm  \\                                php-pgsql-5.3.3-3.el6_2.8.x86_64.rpm  \\

                               php-intl-5.3.3-3.el6_2.8.x86_64.rpm  \\                                pg_stats_reporter-1.0.0-1.el6.noarch.rpm

1. Install  pg_stats_reporter  RPM  and  dependency  RPMs

#  vim  /etc/pg_stats_reporter.ini-----  configuration  of  repository  database  -----  host  =  localhost

port  =  5432dbname  =  postgres

username  =  postgrespassword  =

2.  Set  pg_stats_reporter.ini(configuration  file)  (default  setting  is  under  following) 

#  service  httpd  start

3.  Start  Apache  HTTP  server

4.  Access  under  following  URL  


How  to  install  pg_stats_reporter  is  indicated  in  Web  manual  !  J

Please set SELINUX disable!!

2.Confirmation  of  Install  

3.Create  Report  

Demo of pg_stats_reporter

• Android  and  iPad  are  ready  

•  It  is  based  on  jQueryUI  library,  so  we  can  easy  to  change  interface  design  (mostly  color)  

•  Logo  picture  can  be  also  changed  with  file  replaced

•  It  can  select  report  items  on  reports  •  If  weʼ’d  like  to,  set  /etc /pg_̲stats_̲reporter.ini  with  your  needed  report  item

• For  Security    • We  can  use  .httpaccess•  Apacheʼ’s  security  technic  can  use  in  same

TIPS of pg_stats_reporter

• TPC-­‐C  benchmark  software  that  developed  by    Open  Source  Development  Labs(OSDL)    

•  Shopping  simulation  in  parts  wholesaler• /

•  Benchmark  score  is  calculated  by  only  response  in  uniformity  time

•  Response  time  is  very  important!•  IO  bottle-‐‑‒neck  benchmark

• Mainly  benchmark  parameter  • warehouse

•  Database  size  parameter•  Increase  one  hundred  thousands  record  per  adding  1  parameter•  Mainly  used  coordination  size  of  database

•  TPW•  Transaction  per  warehouse

•  Prepared  clients  corresponding  warehouse  size,  Default  10•  If  we  set  lower  TPW,  it  will  be  CPU  bottle-‐‑‒necked  benchmark

What is DBT-2?

• Mainly  bottle-­‐neck  •  Random  read/write

•  Almost  SQL  plans  are  index  scan•  Random  read/write  performance  and  cache  or  buffer  replace  performance  are  important

•  Parallel  execution  performance  is  also  important•  PostgreSQL  is  better  than  other  RDBMSJ

• Other  features  •  Plan  of  SQLs  are  very  simple

•  Most  of  SQLs  are  only  index  scan  access.  •  Exist  ideal  Benchmark  score  

•  If  DB  response  all  transactions  in  limit  time,  it  is  be  ideal  score

•  Limit  of  performance  is  memory  2x  equals  database  size.

•  Amount  of  WAL  output  is  less  than  pgbench,  WAL  is  not  bottle-‐‑‒neck.  

Transaction Tendency in DBT-2

Test  Server  and  Settings  of  postgresql.confServer HP  DL360  G7

CPU Xeon  E5640  2.66GHz  (1P/4C)

Memory DDR3-10600R-9  18GB

RAID  card P410i  /  256MB  cache

Disk 4  x  146GB(1.5krpm)  RAID  1  +  0

max_connections = 300 shared_buffers = 2458MB work_mem = 1MB maintenance_work_mem = 64MB fsync = on wal_sync_method = fdatasync full_page_writes = on wal_buffers = -1 archive_mode = on

checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.7 random_page_cost = 2.0 effective_cache_size = 9GB default_statistics_target = 10 log_destination = 'syslog’ autovacuum = on

postgresql.conf    (mainly  changed  parameter)  

Wherehouse  size  =  320(database  size  is  about  40GB)  and  TPW  =  10  

Visualizing DBT-2 by pg_stats_reporter 1/5

• Transaction  Situation  •  It  was  seen  fluctuates  transactions.  It  is  because  some  benchmark  

specifications and some implementation dependent in PostgreSQL•  Lower  performance  in  executing  CHECKPOINT•  CHECKPOINT  was  mainly  caused  by  checkpoint_̲timeout

•  postgresql.conf  sets  checkpoint_̲timeout  =  15min  and  checkpoint_̲segments  =  300

• Amount  of  WAL  output  •  Output  4.6GB  WAL  in  data  load  to  benchmark  finished•  In  data  load,  Maximum  WAL  speed  is  54MB/sec•  In  executing  benchmark  test,  Maximum  WAL  speed  is  12MB/sec

•  When  starting  CHECKPOINT,  WAL  Speed  is  higher,  it  is  because  “full  page  write”.

Visualizing DBT-2 by pg_stats_reporter 2/5

• CPU  usage  •  Iowait  is  most,  next  is  idle  (It  indicates  IO  bottle-‐‑‒neck  situation.)•  Part  of  final  CHECKPOINT  causes  high  Load  Average

•  It  is  because  executing  ugly  consecutive  fsync().•  PostgreSQL  CHECKPOINT  logic  is  not  goodL

Visualizing DBT-2 by pg_stats_reporter 3/5

• Update  and  heavily  access  Tables  •  HOT(Heap  on  Tuple)  is  good  working!•  order_̲line  table  and  stock  table  have  many  access•  Each  tableʼ’s  Cache  hit  rate  are  very  high,  but…  (Is  it  really?L)

Visualizing DBT-2 by pg_stats_reporter 4/5

34  Copyright(c)2013  NTT  Corp.  All  Rights  Reserved.

• Query  executed  situation  •  Queries  which  have  complicated  filter  phrase  is  slow•  Unexpected,  COMMIT  assumes  long  time!

•  It  is  because  long  transaction  COMMIT  needs  lot  of  WAL  (WAL  buffer  writing)

•  Final  CHECKPOINT  fsync()  phase  makes  queries  slower

Visualizing DBT-2 by pg_stats_reporter 5/5

• Use  direct_cp  in  archive  copy  command  • When  we  use  archive  mode  in  PostgreSQL,  cp  command  consume  large  amount  of  waste  file  cache,  and  it  is  caused  lower  performance

•  BSD  License  Software•

• Use  SSD  •  In  general,  database  bottle-‐‑‒neck  is  random  access.  SSD  has  10  times  faster  random  access  than  MD

•  If  you  need  large  disk  or  donʼ’t  have  cost,  you  may  use  tablespace  in  only  hot  table,  it  is  very  efficiency.

• Use  large  RAID  cache  card  •  PostgreSQL  CHECKPOINT  does  not  consider  fsync()  schedule  at  all.  It  is  caused  very  heavy  disk  write  and  fail  overL

•  If  you  use  large  raid  cache  card,  it  may  prevent  a  little.

For More Performance

• pg_statsinfo  •  Monitor  and  Collect  PostgreSQL  Statistics  and  Activities  with  time  series

•  BSD  License•̲statsinfo-‐‑‒ja.html

•  Collect  whole  of  statistics  an  activities  for  DB  admin  needed•  If  youʼ’d  like  to  another  new  report,  Create  reporting  SQL  from  collecting  information

• pg_stats_reporter  •  Visualize  PostgreSQL  Statistics  and  Activities  that  are  collected  by  pg_̲statsinfo

•  BSD  License•̲stats_̲reporter-‐‑‒ja.html

•  jQuery  Based  Useful  Interface•  Report  index  feature  is  also  useful

•  It  is  easy  to  improve  software,  because  it  is  created  by  PHP  +  JavaScript

•  It  is  also  easy  to  submit  patchJ
