Best practices guide for IBM Power Systems solution for MariaDB

33
© Copyright IBM Corporation, 2014 Best practices guide for IBM Power Systems solution for MariaDB Guidance for the installation and tuning of MariaDB running on Linux on Power featuring the new IBM POWER8 technology Axel Schwenke MariaDB Corporation Sergey Vojtovich MariaDB Foundation Hari Reddy IBM Systems and Technology Group ISV Enablement December 2014

Transcript of Best practices guide for IBM Power Systems solution for MariaDB

Page 1: Best practices guide for IBM Power  Systems solution for MariaDB

© Copyright IBM Corporation, 2014

Best practices guide for IBM Power Systems solution for MariaDB

Guidance for the installation and tuning of MariaDB running on

Linux on Power featuring the new

IBM POWER8 technology

Axel Schwenke

MariaDB Corporation

Sergey Vojtovich

MariaDB Foundation

Hari Reddy

IBM Systems and Technology Group ISV Enablement

December 2014

Page 2: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

Table of contents

Abstract ..................................................................................................................................... 1

Introduction .............................................................................................................................. 1

Prerequisites ............................................................................................................................. 1

Executive summary .................................................................................................................. 1

Storage engines ....................................................................................................................................... 2

Server architecture ................................................................................................................................... 2

Installation of MariaDB ............................................................................................................. 4

Installing from binary packages ............................................................................................................... 4

Installing from binary .tar files .................................................................................................................. 4

Building and installing from source .......................................................................................................... 5

Configuring MariaDB ................................................................................................................ 7

Data directory ........................................................................................................................................... 7

XtraDB configuration ................................................................................................................................ 8

Sample MariaDB configuration file (my.cnf) .......................................................................................... 10

Tuning of MariaDB .................................................................................................................. 11

MariaDB tuning ...................................................................................................................................... 11

Miscellaneous hints ................................................................................................................................ 11

Power Systems built with the POWER8 technology ............................................................ 12

Linux on Power tuning guidelines......................................................................................... 13

Simultaneous Multithreading (SMT) ...................................................................................................... 13

Hardware prefetch ................................................................................................................................. 14

PowerKVM .............................................................................................................................. 15

PowerKVM tuning ................................................................................................................... 17

Guest CPU model and topology ............................................................................................................ 17

Mapping of guest CPUs ......................................................................................................................... 17

CPU placement .............................................................................................................. 18

CPU tuning ..................................................................................................................... 20

I/O tuning (virtio) .................................................................................................................................... 21

virtio ................................................................................................................................ 21

cache .............................................................................................................................. 22

io...................................................................................................................................... 22

Memory tuning ....................................................................................................................................... 23

Network tuning ....................................................................................................................................... 23

Sample guest XML configuration ........................................................................................................... 23

Summary ................................................................................................................................. 27

Acknowledgements ................................................................................................................ 28

Resources ............................................................................................................................... 29

About the authors ................................................................................................................... 30

Trademarks and special notices ........................................................................................... 31

Page 3: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

1

Abstract

This white paper describes the installation and tuning of MariaDB version 10.0.14 on IBM Power Systems servers featuring the IBM POWER8 processor technology. The target audience is users and system integrators interested in using Linux on Power and MariaDB. Some familiarity with MariaDB, Linux on Power, and IBM PowerKVM might be helpful.

Introduction

This paper provides best practices guidelines for deploying MariaDB on Linux on Power featuring IBM®

POWER8™ and IBM PowerKVM technology. The instructions and guidelines described in this paper are

applicable to any version of MariaDB, Linux on Power distributions, PowerKVM and any POWER8

Systems capable of running Linux on Power distributions and PowerKVM. The subsequent sections

provide an overview of the MariaDB architecture, the instructions to build, install, and run MariaDB on

Linux on Power and PowerKVM, and guidelines to tune MariaDB and PowerKVM to run MariaDB

efficiently on Linux on Power and PowerKVM.

Prerequisites

In addition to the prior knowledge of MariaDB, basic familiarity with commands and tools used in Linux,

PowrKVM, and Linux on Power might be very helpful.

Executive summary

MariaDB scales extremely well on POWER8, mostly because the thread synchronization costs are very

low. That means, MariaDB can use many processor cores and must be configured to make the most out of

that. The following techniques help improve performance of running MariaDB on a Linux on Power system

using the POWER8 technology.

Partition global data structures such as the InnoDB buffer pool and the adaptive hash index. Bump

up max_connections, table_open_cache and thread_cache.

Configure a higher number of InnoDB background threads: read I/O threads, write I/O threads,

and purge threads. If you use replication, configure the slave to use multiple threads.

Tune the I/O subsystem; use appropriate I/O scheduler, file system, and mount options.

Use higher levels of simultaneous mutlithreading (SMT) of POWER8 to use the processor

resources efficiently.

Use PowerKVM and virtio, a virtualization standard for network and disk device drivers to get high

performance network and disk operations

Configure PowerKVm guest so that MariaDB can make use of the nonuniform memory access

(NUMA) architecture of POWER8.

Page 4: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

2

Introduction to MariaDB

MariaDB is a fork of the MySQL database. The MariaDB Corporation (https://mariadb.com/) is the

company behind the MariaDB development. MariaDB aims to be a fully open source drop-in replacement

for MySQL. MariaDB extends MySQL in several ways. For a feature comparison, refer

https://mariadb.com/kb/en/mariadb/mariadb-vs-mysql-features/.

Support for the POWER8 platform was added in MariaDB 10.0.14. The MariaDB 10.0.x series

corresponds to MySQL 5.6.x (feature wise). Refer to Figure 1 for a representation of the MariaDB

architecture.

Storage engines

The MariaDB architecture consists of the components presented in Figure 1. MySQL and hence MariaDB

use a design where logical and physical operations on data are separated. The part of the server that

handles physical data storage is called the storage engine. MariaDB comes with a bunch of standard

storage engines. The storage engine application programming interface (API) allows adding storage

engines in the form of dynamic libraries. This feature is used by third parties to add their own storage

engines.

The most common storage engine in MySQL is InnoDB. InnoDB is a fully Atomicity, Consistency, Isolation,

Durability (ACID) compliant, transactional engine. Some years ago Percona, another MySQL offspring,

started XtraDB. This is a patched version of the InnoDB engine and is at the heart of the Percona Server

(refer to http://www.percona.com/software/percona-server).

MariaDB uses XtraDB by default. InnoDB and XtraDB are mutually exclusive, partly because XtraDB

identifies itself as InnoDB. MariaDB allows you to choose between XtraDB and InnoDB.

Server architecture

The MariaDB server is a single process by the name mysqld running under a nonprivileged user ID

(default: mysql). The mysqld process spawns a few helper threads when it is started. One additional

worker thread is started for each connection that is opened by a client or an application. When the

connection is closed, the thread goes to the thread cache and is eventually reused.

For applications that require a very big number (thousands) of open connections, this model does not

scale well. For those cases, MariaDB offers a thread pool (which must be enabled in the configuration file.

The thread pool limits the number of worker threads and multiplexes connections to threads.

Applications communicate with the MariaDB server over a socket, which can be either a UNIX® domain

socket for applications running on the same server or a TCP socket on the default port 3306 when the

client application resides on a separate server in a clustered environment. The protocol is 100%

compatible between MariaDB and MySQL. So, any application build for MySQL can work with MariaDB.

The general architecture of MariaDB is shown in Figure 1.

Page 5: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

3

Figure 1: MariaDB architecture

Page 6: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

4

Installation of MariaDB

This section describes the following three ways to install MariaDB on Linux on Power:

Installation from binary packages

Installation from the .tar files

Building from source and installing

Installing from binary packages

The MariaDB binary packages for Linux on Power are available on the MariaDB.com website as part of

the MariaDB Enterprise product suite. You can download the binary packages at:

https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version

available is: 10.0.14. These packages are available for all registered users (registration is free). This

section provides the essential steps to download and install MariaDB from the binary packages.

Perform the following steps to install MariaDB from binary packages.

1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account

(registration is required and it is free of cost).

2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version

(10.0.14 at the time of publishing this paper.)

3. Select Ubuntu14.04 as the platform.

4. Click the Ubuntu 14.04 LTS (Trusty) Packages link.

5. Run the following commands to install the MariaDB server.

sudo apt-get install python-software-properties software-properties-common

sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com

0xd324876ebe6a595f

sudo add-apt-repository 'deb http://USER:[email protected]/mariadb-

enterprise/10.0/repo/ubuntutrusty main'

sudo apt-get update

sudo apt-get install mariadb-server

Note: USER and PASSWORD must be replaced with a valid mariadb.com account ID and password.

Installing from binary .tar files

The MariaDB binary .tar files for Linux of Power are available at the MariaDB.com website as part of

MariaDB Enterprise product suite. You can download the binary .tar files from

https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version

available is: 10.0.14. The .tar files are available for all registered users (registration is free). Essential

Page 7: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

5

steps to download and install MariaDB from the binary .tar files on Linux on Power are listed in this

section.

1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account

(registration is required and it is free of cost).

2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version

(10.0.14 at the time of publishing this paper.)

3. Select Ubuntu14.04 as the platform.

4. To download the .tar files, click mariadb-10.0.14-linux-ppc64le.tar.gz and save the file at /tmp.

5. Run the following commands to unpack the binary files

cd /usr/local

tar xfz /tmp/mariadb-10.0.14-linux-ppc64le.tar.gz

cd mariadb-10.0.14-linux-ppc64le

6. Follow the instructions listed in the file called INSTALL-BINARY which is located in this directory to

install the MariaDB server.

Building and installing from source

The MariaDB source is available at the MariaDB.com website as part of MariaDB Enterprise product suite.

You can download the source tarball from

https://mariadb.com/user/register?destination=my_portal/download. At this time, the latest version

available is: 10.0.14. The tarballs are available for all registered users (registration is free).

You need several prerequisites installed to build MariaDB from source. Besides cmake and make, you

need a C/C++ toolchain. IBM Advance Toolchain 8.01 is recommended. In addition, you need libncurses5-

dev, libssl-dev, libaio-dev, libjemalloc-dev, bison. cmake will complain if anything is missing. Please see

the MySQL manual for a detailed description of the necessary post install steps:

https://dev.mysql.com/doc/refman/5.6/en/source-installation.html. This section provides the essential steps

to build and install MariaDB from the source .tar files.

1 You can download and install IBM Advance Toolchain from

ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/IBM%20Advance%20Toolchain%20for%20PowerLinux%20Documentation

Page 8: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

6

Perform the following steps to build and install MariaDB from the source .tar files.

1. Visit https://mariadb.com/user/register?destination=my_portal/download and create a new account

(registration is required and it is free of cost).

2. On the main download page in Maria DB Enterprise (left column), select the MariaDB version

(10.0.14 at the time of publishing this paper.)

3. Select Source Code as the platform.

4. Click mariadb-10.0.14-.tar.gz to download the .tar file.

5. Run the following commands to install the prerequisites.

apt-get install cmake gcc g++ make libaio-dev libevent-dev \\

libjemalloc-dev libncurses5-dev bison libssl-dev

6. Run the following commands to build and install the MariaDB server.

tar xfz mariadb-10.0.14-.tar.gz

cd mariadb-10.0.14

mkdir build

cd build

cmake -DCMAKE_INSTALL_PREFIX=/usr/local/mariadb-10.0.14 \\

-DCMAKE_BUILD_TYPE=Release

make

make install

Page 9: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

7

Configuring MariaDB

MariaDB is configured through a set of server variables. Almost all of those have safe default values, but

some must be modified. This is done in a file named /etc/mysql/my.cnf. For a description of the my.cnf file,

refer to the following URL: https://dev.mysql.com/doc/refman/5.6/en/option-files.html.

Many of the server variables are dynamic variables, which means that they can be modified by the SQL

SET statement (SET GLOBAL variable=values) while the server is running (and the change becomes

effective immediately). There are other variables which must be set in the .cnf file and require a restart of

the MariaDB server. Refer to the URL: https://dev.mysql.com/doc/refman/5.6/en/mysqld-option-

tables.html for a description of all these variables.

Data directory

The storage engines used in MariaDB store data in files below a common subdirectory, called the data

directory. This is one of the most important server variables to set. The directory must exist and be owned

by the user ID that started the mysqld process. The initial setup of the data directory is done with the

mysql_install_db tool. This tool creates the initial system databases and accounts. The installation from

the binary packages performs this setup automatically.

MariaDB works with most of the file systems in use on modern Linux distributions. Best performance is

reached with ext4 or xfs. Btrfs is known to have performance problems with database workload. It is

recommended to use the realtime mount option and for ext4 also nobarrier option . For the block device

holding the file system, it is recommended to use the deadline I/O scheduler (refer to the following listing).

The I/O scheduler used for a block device can be checked and set in sysfs:

cat/sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler

noop deadline [cfq]

echo "deadline" >

/sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler

cat /sys/devices/vio/2000/host0/target0:0:0/0:0:0:0/block/sda/queue/scheduler

noop [deadline] cfq

Listing 1: Instructions to set deadline scheduler

For a decent I/O performance, you must use a high performance I/O subsystem. Good results have been

seen from local Redundant Array of Independent Disks (RAID) setups using battery backed RAM cache.

But also, storage area network (SAN) storage works well. Getting a decent performance from network-

attached storage (NAS) devices can be tricky and typically requires much tuning on the NAS side.

Page 10: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

8

XtraDB configuration

Proper setting of several configuration variables is important for an efficient operation of an XtraDB

storage engine. Some of the variables are listed in Table 1.

XtraDB parameter Description Recommended value

innodb_buffer_pool_size Memory buffer shared between all mysqld threads, caching InnoDB data and index pages.

As large as possible.

Set it 50% to 85% of

available memory.

innodb_buffer_pool_instances The InnoDB buffer pool can be

partitioned to reduce mutex

contention.

Try the number of

processor cores; values

above 32 seem to have

little effect though.

innodb_adaptive_hash_index_partitions The InnoDB adaptive hash index

can be partitioned to reduce mutex

contention.

Try the number of

processor cores; values

above 32 seem to have

little effect though.

innodb_log_file_size The size of one XtraDB redo log.

Large size results in long recovery

and

small size results in slow

performance, especially for write-

intensive workloads

The total size of all relay

logs should be 5% to

20% of the buffer pool

size.

innodb_log_buffer_size This is a buffer for write operations

to the InnoDB REDO log. The log

is flushed at every COMMIT or at

least once a second. Ideally, the

buffer should be big enough to

hold all subsequent REDO log

write operations.

Values up to 16MB are

reasonable.

innodb_io_capacity The number of I/O operations that

InnoDB can do per second. Pure

background I/O will not exceed

that limit. When InnoDB is forced

to use synchronous I/O, it will

however exceed that limit.

This should match the

characteristics of your

I/O subsystem.

max_connections

The maximum number of

concurrent connections.

Depends on the

application.

table_open_cache A shared cache for table handles.

If this cache is too small, it will

cause severe performance loss.

Set to

(max_connections) *

(max number of tables

used in a single

Page 11: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

9

operation (that is, a JOIN

or a subquery)

For write workloads

innodb_flush_neighbors InnoDB flushes dirty pages

ordered by age. If a page to be

flushed has dirty neighbors, those

can be flushed at the same time.

Enable neighbor flushing

on traditional disks,

disable on flash storage.

innodb_read_io_threads

innodb_write_io_threads

Number of InnoDB background I/O

threads.

Depends on the storage

subsystem. If it can do

multiple I/O requests in

parallel, use that

number.

Table 1: XtraDB configuration parameters

Page 12: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

10

Sample MariaDB configuration file (my.cnf)

A sample MariaDB/InnoDB configuration file is shown in the following listing. This file can be used as a

starting template and modified as needed.

[mysqld]

basedir = /usr/local/mariadb-10.0.14

datadir = /data/mariadb

performance-schema = false

max_connections = 1000

back_log = 150

table_open_cache = 2000

key_buffer_size = 16M

query_cache_type = 0

join_buffer_size = 32K

sort_buffer_size = 32K

innodb_file_per_table = true

innodb_open_files = 100

innodb_data_file_path = ibdata1:50M:autoextend

innodb_flush_method = O_DIRECT_NO_FSYNC

innodb_log_buffer_size = 16M

innodb_log_file_size = 4G

innodb_log_files_in_group = 2

innodb_buffer_pool_size = 32G

innodb_buffer_pool_instances = 32

innodb_adaptive_hash_index_partitions = 32

innodb_thread_concurrency = 0

#tuning for SAN storage

innodb_adaptive_flushing = 1

innodb_flush_neighbors = 1

innodb_io_capacity = 4000

innodb_io_capacity_max = 6000

innodb_lru_scan_depth = 4096

innodb_purge_threads = 2

innodb_read_io_threads = 8

innodb_write_io_threads = 16

Listing 2: Sample MariaDB/XtraDB configuration file

Page 13: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

11

Tuning of MariaDB

Improving the performance of MariaDB running on Linux on Power requires the tuning of MariaDB

application and the Linux on Power system. This section provides the tuning information specific to

MariaDB. In the section “Linux on Power tuning guidelines” the details about the tuning of the underlying

Linux on Power and PowerKVM running MariaDB are provided.

MariaDB tuning

Because MariaDB inherits most of its features from MySQL, most MySQL tuning guidelines apply. Check

the MySQL manual at: https://dev.mysql.com/doc/refman/5.6/en/optimization.html.

The XtraDB engine has its own tuning guide at: http://www.percona.com/doc/percona-server/5.6/.

Finally the MariaDB knowledge base contains a MariaDB specific tuning guide:

https://mariadb.com/kb/en/mariadb/documentation/optimization-and-tuning/

Miscellaneous hints

When running MariaDB on NUMA hardware, Use of the --numa-interleave option for mysqld_safe is

recommended. This can also be set in my.cnf in a [mysqld_safe] section.

Page 14: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

12

Power Systems built with the POWER8 technology

This section presents an overview of the POWER8 technology. POWER8 is a multicore, multichip (node),

and a multisocket processor technology. The number of chips and sockets available can vary with the

model purchased. A representative layout of the POWER8 processor is shown in the following figure with

double the memory bandwidth when compared to the IBM POWER7+™ processor.

Figure 2: POWER8 processor

Page 15: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

13

Linux on Power tuning guidelines

This section describes Linux on Power tuning guidelines for:

Simultaneous multithreading

Hardware prefetch

Simultaneous Multithreading (SMT)

When running Linux under PowerKVM, you have the option to choose among different SMT levels for the

guest. Setting higher levels of SMT in the PowerKVM guest improves system utilization and throughput..

The best choice for setting the SMT level depends on how the database is used, specifically of the

concurrency of the database workload.

For mostly read-only workloads if the application runs many SQL queries concurrently, then the system

throughput can be maximized by configuring a system SMT level of 8. If the database load has lower

concurrency, especially if there are not more concurrent operations than virtual processor cores when

configured in SMT4, then SMT4 will be the better choice.

For database workloads with significant amount of writes, SMT4 will be the better choice in most cases.

The following figure shows typical system behavior of a two-socket / 20 core POWER8 system. With

SMT4, this system has 80 virtual processor cores, and with SMT8, it has 160 virtual processor cores.

1 10 20 40 80 160 320

OLTP readonly system throughput

SMT4

SMT8

client threads

tra

ns

actio

ns

pe

r s

eco

nd

Figure 3: MariaDB read-only performance for various SMT settings and thread counts in the MariaDB server

Page 16: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

14

Recommendation: The recommendation is to enable the PowerKVM guest to run up to eight SMT

threads per virtual core giving the user the flexibility to adjust (as needed) the SMT setting on the guest

For mostly read-only workloads which exhibit higher levels of concurrency set the SMT level 0 8 in the

PowerKVM guests. For read-only workloads with lower concurrency and read-write workloads set the SMT

level to 4. Also, it is recommended to conduct a benchmark analysis with your workload and resource

usage in order to determine the SMT setting that can benefit your workload.

You can use the following Linux on Power command to set SMT=8 in the KVM guest. ppc64_cpu –smt=8

The correct choice of the SMT level also depends on how the application uses the database. If the

application can keep many database connections busy, then higher SMT levels give a benefit because

MariaDB can then schedule the resulting worker threads on more virtual processors. But, if the application

uses only few connections to the database, then the additional virtual processors would not be used

anyway.

Hardware prefetch

The Data Stream Control Register (DSCR) of IBM POWER® processors is used to control the degree of

aggressiveness of memory prefetching for load and store. Performance of MariaDB improves slightly when

the hardware prefetch is turned off. The command to turn hardware prefetch on or off is shown in this

section.

Recommendation: Turn the hardware prefetch off before starting the MariaDB server. Preferably, the

hardware prefetch should be turned off on the KVM guest. Always conduct a benchmark analysis with your

workload and resource usage in order to determine the DSCR setting that can benefit your workload.

Use the following command to turn the hardware prefetch ON. ppc64_cpu –dscr=0

Use the following command to turn the hardware prefetch OFF. ppc64_cpu –dscr=1

Page 17: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

15

PowerKVM

IBM PowerKVM is a hypervisor that allows virtualization of a Linux on Power system, using the open

source virtualization standard, Kernel Virtual Machine (KVM). PowerKVM environment consists of physical

hardware (node), PowerKVM OS and hypervisor (host), and KVM guests (domains). Refer to the following

figure. PowerKVM supports a variety of Linux operating systems for the guest operating system.

Figure 4: PowerKVM environment

virsh is a PowerKVM system application that uses the command-line interface (CLI) to manage the

PowerKVM guests. Some examples of the virsh command are given in this section. A PowerKVM guest

can be configured by entering its definition in an XML format into a file and giving the XML file as a

parameter to the virsh command. The virsh command can be used to manage the KVM guests. For

more details about the virsh command, refer to “man virsh” on a PowerKVM host. The virsh command

does not work on a PowerKVM guest.

Page 18: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

16

The following list provides examples of the vrish command.

To define a guest VM: virsh define p215vm135.xml

To start a guest VM: virsh start p215vm135

To shut down a guest VM: virsh shutdown p215vm135

To undefine a guest VM: virsh undefine p215vm135

To display guest information: virsh start p215vm135

To list the virtual machines: virsh list

To dump the KVM guest configuration: virsh dumpxlml p215vm135 > p215vm135-current.xml

Page 19: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

17

PowerKVM tuning

The PowerKVM guest definition comprises of several elements. In this section, the following elements are

described:

Guest CPU model and topology

Mapping of guest CPUs

I/O tuning (virtio)

Network tuning (virtio)

For more details about the KVM tunable elements described in this section, refer to:

http://libvirt.org/formatdomain.html#elements and http://wiki.libvirt.org/page/Virtio. For an excellent tutorial

on these topics, refer to:

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_4

46ebc23c550/page/Basic%20XML%20tuning%20tips%20for%20PowerKVM%20guests

Guest CPU model and topology

The cpu element can be used to define the PowerKVM guest CPU topology, including the max SMT mode

as indicated by the threads parameter. The three parameters, sockets, cores, and threads define the total

number of vCPUs allocated to a PowerKVM guest using the formula defined.

Recommendation: Always set the number of threads to 8 in the guest configuration setup.

Use the following topology specification to create a guest with 80 vCPUs on one socket using 10 cores

and SMT=8

<cpu>

<topology sockets='1' cores='10' threads='8'/>

</cpu>

Definition of the number of CPUs allocated to a PowerKVM guest:

Number of vCPUs = sockets * cores/socket * thread per core = 1 * 10 * 8 = 80

Mapping of guest CPUs

IBM POWER8 processor-based servers use a NUMA memory architecture. POWER8 implements NUMA

with a multicore, multichip (node), and a multisocket processor technology. numactl (on host and guest)

and virsh (on host only) commands provide information about the current NUMA configuration on the host

as well as the guest. Use the following commands to get the host and guest NUMA configurations.

numactl –H

virsh nodeinfo

In PowerKVM, the guest can be configured to be “NUMA-aware” which allows the user to closely map the

application to the hardware resources. This can improve the performance of the MariaDB application

running on a POWER8 system. The advantage of pinning the vCPUs to the host CPUs is that you can

Page 20: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

18

control where the guest runs, reduce the number of scheduler switches, and improve the possibility of

getting data out of the caches. Pinning of guest vCPUs to the host CPUs is useful when a subset of the

host CPUs need to be allocated to the KVM guest. This section describes two different methods of

mapping the guest vCPUs to the host CPUs.

CPU placement

In this specification, the vCPUs in a guest can run in any of the host CPUs allocated to the guest.

vcpu

The content of this element, vcpu, defines the maximum number of virtual CPUs allocated for the guest

OS, which must be between 1 and the maximum number supported by the hypervisor.

cpuset

The optional attribute cpuset is a comma-separated list of physical CPU numbers that the domain

process and virtual CPUs can be pinned to by default. The pinning policy of the domain process and

virtual CPUs can also be specified separately by the cputune element and will override the specification

in the vcpu element.

placement

The optional attribute placement can be used to indicate the CPU placement mode for the domain

process. The recommended value for this attribute is ‘static’.

An example of how to set the vpcu element is given in the following listings:

numactl -H

available: 4 nodes (0-1,16-17)

node 0 cpus: 0 8 16 24 32

node 0 size: 65536 MB

node 0 free: 6668 MB

node 1 cpus: 40 48 56 64 72

node 1 size: 65536 MB

node 1 free: 8216 MB

node 16 cpus: 80 88 96 104 112

node 16 size: 65536 MB

node 16 free: 8205 MB

node 17 cpus: 120 128 136 144 152

node 17 size: 65536 MB

node 17 free: 6159 MB

node distances:

node 0 1 16 17

0: 10 20 40 40

1: 20 10 40 40

16: 40 40 10 20

17: 40 40 20 10

The host has 4 numa nodes each with 5 cores and 64 GB of memory.

Listing 3: Use numactl –H on the host to get host hardware configuration

Page 21: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

19

<domain>

...

<vcpu>80</vcpu>

<vcpu placement='static' cpuset='80,88,96,104,112,120,128,136,144,152'</vcpu>

...

</domain>

Listing 4: vCPUs assigned to 10 cores on the second socket on the host

The ten physical cores (80,88,96,104,112,120,128,136,144,152) from the two numa nodes (16,17) on the host are assigned to the guest. Based on the definition of the topology of the guest described in , these are 10 virtual cores each with 8 (SMT) threads giving the guest a total of 80 vCPUs.

The vCPUs on the guest are numbered as 0,1..78,79. The 8 vCPUs that belong to a virtual core are

numbered consecutively (0..7 for example) and are co-scheduled on the same pysical core on the host.

Based on the ‘placement’ specification given above, the scheduler is allowed to assign the 8 vCPU set to

any of the 10 physical cores on the host

virsh dominfo p215vm135

Id: 8

Name: p215vm135

UUID: 6801ed05-65d0-4910-b459-7e2adaf2c971

OS Type: hvm

State: running

CPU(s): 80

CPU time: 306.3s

Max memory: 131072000 KiB

Used memory: 131072000 KiB

Listing 5: Run the virsh command on the host to confirm the guest has 80 vCPUs

Page 22: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

20

numactl –H

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

node 0 size: 127856 MB

node 0 free: 126053 MB

node distances:

node 0

0: 10

Listing 6: numactl –H on the guest to get the guest view of the vCPUs allocated

CPU tuning

cputune

The content of this element, cptune, defines how each of the guest vCPUs can be allocated to the

physical processors on the host. This allows more fine-grained assignment of specific vCPUs than that

allowed by the element, vcpu. If both cputune and vcpu elements are specified, cputune takes

precedence over vCPU.

vcpupin

The optional vcpupin element specifies which of host's physical CPUs the domain vCPU will be pinned

to. You should pin vCPU "virtual core" groups to the same set of host CPUs.That is, if you set SMT

threads=8, pin a group of eight vCPUs to the same CPU set.

cpuset

The optional attribute cpuset is a comma-separated list of physical CPU numbers that domain process

and virtual CPUs can be pinned to by default. The pinning policy of domain process and virtual CPUs

specified separately by the element, cputune, overrides any pinning specified in the vcpu element.

emulatorpin

The optional emulatorpin element specifies which of the host physical CPUs the "emulator", a subset of

a domain not including vCPU (that is, everything in the domain besides the vCPUs, such as the vhost-net

process) will be pinned to.

An example of vcpupin and emulatorpin is given in the following listing. In this mapping, a guest vCPU

is restricted to a node on the host. Benchmark results show that such an allocation scheme can result in a

slightly better performance.

Recommendation: Use cputune to limit the movement of vCPUs to a single NUMA node on the host.

Page 23: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

21

<domain>

...

<cputune>

<vcpupin vcpu='0' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='1' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='2' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='3' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='4' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='5' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='6' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='7' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='8' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='72' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='73' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='74' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='75' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='76' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='77' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='78' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='79' cpuset='120,128,136,144,152'/>

<emulatorpin cpuset='0,8,16,24,32,40,48,56,64,72,80'/>

</cputune>

...

</domain>

Listing 7: Detailed pinning of specific guest vVPUs and emulator to physical processors on the host

I/O tuning (virtio)

There are a few basic tuning options you can apply to your guest disk, whether it is backed by a block

device, a raw file, or a qcow2 file.

virtio

Virtio is a virtualization standard for network and disk device drivers where just the guest's device driver

"knows" it is running in a virtual environment, and cooperates with the hypervisor. This enables guests to

Page 24: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

22

get high performance network and disk operations, and gives most of the performance benefits of

para virtualization.

For best performance, the virtio model is recommended over the Power Architecture Platform Reference

(PAPR) based SCSI) model or an emulated device because virtio has been tuned to perform well for KVM

environments. To set this, you need to set the disk bus type to virtio.

Recommendation: In the disk element section, set bus='virtio'.

cache

The following guest disk caching mode are supported:

Cache mode Description Performance impact

cache=writethrough Enables host cache; disables guest

cache

Improves reads, impacts writes

cache=none Disables host cache; enables guest

cache

Improves writes, impacts reads

cache=writeback Enable both host and guest caches I/O is improved, but data might be lost

Table 2: Options for the cache parameter

The cache=none mode is usually a good option in general for performance, unless read performance is

critical, in which case cache=writethrough might be the best.

Recommendation: In the disk element section, set cache=none.

io

Setting the I/O type can help further tune the guest disk device for best performance. io=native uses

kernel aio (async I/O) where io=threads uses host userspace threads, typically io=native provides

better performance.

Recommendation: In the disk element section, set io='native'.

An example of using bus, cache, and io options in the xml specification of guest is given in the following

listing.

<disk type='block' device='disk'>

<driver name='qemu' type='raw' cache='none' io='native'/>

<source dev='/dev/mapper/mpathb'/>

<target dev='vda' bus='virtio'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>

</disk>

Listing 8: Example of guest I/O tuning

Page 25: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

23

Memory tuning

The element, numatune, specifies the size, location, and allocation policy for the memory that is allocated

to the guest. The memory element specifies the host nodes that are allocated to the guest and how the

memory from these host nodes is allocated. MariaDB performs slightly better with memory mode set to

interleave. An example of setting the memory mode to interleave between two NUMA nodes is given in

the following listing.

Recommendation: In the numatune element section, set memory mode=’interleave’.

<domain>

...

<numatune>

<memory mode='interleave' nodeset='16-17'/>

</numatune>

...

</domain>

Listing 9: An example of using the numatune element

Network tuning

The virtio model provides better performance over the PAPR- based virtual LAN (VLAN) model or an

emulated device because virtio has been tuned to perform well for KVM environments. An example of how

to set the network model type to virtio is given in the following listing.

<interface type="bridge">

<source bridge='brnet1'/>

<target dev='guest001'/>

<mac address='AA:BB:CC:11:22:33'/>

<model type='virtio'/>

<alias name='net0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>

</interface>

Listing 10: An example of network tuning

Sample guest XML configuration

In this section, a sample XML configuration file which contains the definition of a PowerKVM guest is

provided (refer to the following listing). This configuration can be used as a template and adjustments can

be made to suit a specific need. This configuration defines a guest with the following characteristics:

Host has two sockets, each socket with two NUMA nodes and each node with 5 cores.

The guest topology defines the allocation of two sockets and 10 cores per socket and the SMT is set to 8.

In this allocation, 160 vCPUs are allocated to the guest.

Page 26: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

24

<address type='pci' reg='0x1000'/>

</interface>

<serial type='pty'>

<source path='/dev/pts/1'/>

<target type='isa-serial' port='0'/>

<alias name='serial0'/>

<address type='spapr-vio' reg='0x30001000'/>

</serial>

<console type='pty' tty='/dev/pts/1'>

<source path='/dev/pts/1'/>

<target type='serial' port='0'/>

<alias name='serial0'/>

<address type='spapr-vio' reg='0x30001000'/>

</console>

<input type='mouse' bus='usb'>

<alias name='input0'/>

</input>

<input type='keyboard' bus='usb'>

<alias name='input1'/>

</input>

<input type='tablet' bus='usb'>

<alias name='input2'/>

</input>

<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'>

<listen type='address' address='0.0.0.0'/>

</graphics>

<video>

<model type='vga' vram='9216' heads='1'/>

<alias name='video0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>

</video>

<memballoon model='virtio'>

<alias name='balloon0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>

</memballoon>

</devices>

<seclabel type='dynamic' model='selinux' relabel='yes'>

<label>system_u:system_r:svirt_t:s0:c422,c425</label>

<imagelabel>system_u:object_r:svirt_image_t:s0:c422,c425</imagelabel>

</seclabel>

</domain>

Page 27: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

25

<on_crash>restart</on_crash>

<devices>

<emulator>/usr/bin/qemu-kvm</emulator>

<disk type='file' device='disk'>

<driver name='qemu' type='raw' cache='none'/>

<source file='/dev/VM_Storage/21513337-34a5-4a0b-bc83-e9e1451c3500-0.img'/>

<backingStore/>

<target dev='sda' bus='virtio'/>

<alias name='scsi0-0-0-0'/>

<address type='drive' controller='0' bus='0' target='0' unit='0'/>

</disk>

<disk type='file' device='cdrom'>

<driver name='qemu' type='raw'/>

<source file='/var/lib/libvirt/images/ubuntu-14.04.1-server-ppc64el.iso'/>

<backingStore/>

<target dev='sdc' bus='virtio' tray='open'/>

<readonly/>

<alias name='scsi0-0-0-2'/>

<address type='drive' controller='0' bus='0' target='0' unit='2'/>

</disk>

<disk type='block' device='disk'>

<driver name='qemu' type='raw' cache='none' io='native'/>

<source dev='/dev/mapper/mpathb'/>

<backingStore/>

<target dev='vda' bus='virtio'/>

<alias name='virtio-disk0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>

</disk>

<controller type='usb' index='0'>

<alias name='usb0'/>

<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>

</controller>

<controller type='pci' index='0' model='pci-root'>

<alias name='pci.0'/>

</controller>

<controller type='scsi' index='0'>

<alias name='scsi0'/>

<addres'spapr-vio' reg='0x2000'/>

</controller>

<interface type='bridge'>

Page 28: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

26

<mac address='52:54:00:74:89:c1'/>

<source bridge='brenP3p5s0f0'/>

<target dev='vnet0'/>

<model type='virtio'/>

<alias name='net0'/>

<domain type='kvm' id='20'>

<name>p215vm135</name>

<uuid>21513337-34a5-4a0b-bc83-e9e1451c3500</uuid>

<memory unit='KiB'>131072000</memory>

<currentMemory unit='KiB'>131072000</currentMemory>

<vcpu placement='static' cpuset='80,88,96,104,112,120,128,136,144,152'>80</vcpu>

<cputune>

<vcpupin vcpu='0' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='1' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='2' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='3' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='4' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='5' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='6' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='7' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='8' cpuset='80,88,96,104,112'/>

<vcpupin vcpu='72' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='73' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='74' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='75' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='76' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='77' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='78' cpuset='120,128,136,144,152'/>

<vcpupin vcpu='79' cpuset='120,128,136,144,152'/>

<emulatorpin cpuset='80,88,96,104,112,120,128,136,144,152'/>

</cputune>

<numatune>

<memory mode='interleave' nodeset='16-17'/>

</numatune>

<resource>

<partition>/machine</partition>

</resource>

<os>

<type arch='ppc64' machine='pseries-2.2'>hvm</type>

Page 29: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

27

<boot dev='hd'/>

<boot dev='cdrom'/>

</os>

<features>

<acpi/>

<apic/>

<pae/>

</features>

<cpu>

<topology sockets='1' cores='10' threads='8'/>

</cpu>

<clock offset='utc'/>

<on_poweroff>destroy</on_poweroff>

<on_reboot>restart</on_reboot>

Listing 11: Sample PowerKVM configuration file to set up a KVM guest with 80 vCPUs

Summary

This paper described the deployment and tuning of MariaDB on Linux on Power and PowerKVM. The

interactions between the application and the system components are very complex and the guidelines

provided in this paper simplify the task of improving the performance of MariaDB running on Linux on

Power and PowerKVM. References are provided in the following section for more detailed treatment of

these topics.

Page 30: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

28

Acknowledgements

Thanks to the following people for their contributions to this paper.

Jenifer Hopper is a performance analyst in IBM Systems and Technology Linux Technology Center. You

can reach Jenifer at [email protected],com.

Mark Nellen is a program manager in IBM Systems and Technology Group, ISV Enablement organization.

You can reach Mark at [email protected].

Maya Pandya is a technology manager in IBM Systems and Technology Group, ISV Enablement

organization. You can reach Maya at [email protected].

Michael (Monty) Widenius is an advisor and a board member in MariaDB Corporation and is also a

member of the board of directors in MariaDB. You can reach Monty at [email protected].

Basu Vaidyanathan is a performance analyst in IBM Systems and Technology Group Performance

Analysis organization. You can reach Basu at [email protected].

Page 31: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

29

Resources

The following websites provide useful references to supplement the information contained in this paper:

MariaDB Foundation

www.MariaDB.org/

MariaDB Corporation official website

www.MariaDB.com/

IBM Systems on PartnerWorld

ibm.com/partnerworld/systems

IBM Power Systems

ibm.com/systems/in/power/?lnk=mhpr

IBM Linux on Power – resources

ibm.com/systems/power/software/linux/resources.html

IBM Power Systems hardware documentation

http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp

Page 32: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

30

About the authors

This paper is the result of a collaborative effort by MariaDB Corporation, MariaDB Foundation, and IBM.

Hari Reddy is a technical consultant in IBM Systems and Technology Group ISV Enablement

organization. You can reach Hari at [email protected].

Axel Schwenke is a performance analyst in MariaDB Corporation. You can reach Axel at

[email protected].

Sergey Vojtovich is a software developer in the MariaDB Foundation. You can reach Sergey reach at

[email protected].

Page 33: Best practices guide for IBM Power  Systems solution for MariaDB

Best practices guide for IBM Power Systems solution for MariaDB

31

Trademarks and special notices

© Copyright IBM Corporation 2014.

References in this document to IBM products or services do not imply that IBM intends to make them

available in every country. Information is provided "AS IS" without warranty of any kind.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business

Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked

terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these

symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information

was published. Such trademarks may also be registered or common law trademarks in other countries. A

current list of IBM trademarks is available on the Web at "Copyright and trademark information" at

www.ibm.com/legal/copytrade.shtml.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information concerning non-IBM products was obtained from a supplier of these products, published

announcement material, or other publicly available sources and does not constitute an endorsement of

such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly

available information, including vendor announcements and vendor worldwide homepages. IBM has not

tested these products and cannot confirm the accuracy of performance, capability, or any other claims

related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the

supplier of those products.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive

statement of a commitment to specific levels of performance, function or delivery schedules with respect to

any future products. Such commitments are only made in IBM product announcements. The information is

presented here to communicate IBM's current investment and development activities as a good faith effort

to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled

environment. The actual throughput or performance that any user will experience will vary depending upon

considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the

storage configuration, and the workload processed. Therefore, no assurance can be given that an

individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Any references in this information to non-IBM websites are provided for convenience only and do not in

any manner serve as an endorsement of those websites. The materials at those websites are not part of

the materials for this IBM product and use of those websites is at your own risk.