OpenDBCamp Virtualization

56
Liz van Dijk - @lizztheblizz - [email protected] VIRTUAL DATABASES? Optimizing for Virtualization Sunday 8 May 2011

description

This is a presentation I gave on impulse at Open Database Camp in Sardegna, Italy last weekend, en then a bit less impulsively at the Inuits igloo. A word of caution: I included the notes because they contain some extra info, but the presentation was hacked together from several older ones (not all of them my own) so there might be some flukes in there. :)

Transcript of OpenDBCamp Virtualization

Page 1: OpenDBCamp Virtualization

Liz van Dijk - @lizztheblizz - [email protected]

VIRTUAL DATABASES?Optimizing for Virtualization

Sunday 8 May 2011

Page 2: OpenDBCamp Virtualization

THE GOAL

“Virtualized Databases suck!” - Is this really true? Does it have to be?

Sunday 8 May 2011

Databases are supposed to be “hard to virtualize” and have decreased performance in a virtual environment. This is actually correct, dumping a native database into a virtual environment without applying any changes could potentially cause some issues.

Page 3: OpenDBCamp Virtualization

HOW DO WE GET THERE?

1. Understanding just why the virtual environment impacts performance, and taking the correct steps to adapt our database to its new habitat.

2. Optimize, optimize, optimize...

Sunday 8 May 2011

Action: We have to understand why performance of databases is influenced, and how we can arm ourselves against this impact.

On the other hand, while there used be less of a need for optimization in an environment where hardware was abundant, a virtual environment causes struggles for resources more quickly. It’s important to create our application as slim as possible without losing performance. In many cases, performance can be multiplied by having a closer look at the database.

Message: Why is this interesting for you? This knowledge could convince you to make the switch to a virtual environment, trusting it won’t hit your software’s performance, and will help you take a look at your existing infrastructure to take the necessary steps to run your application as optimal as possible.

Page 4: OpenDBCamp Virtualization

THE INFLUENCE OF VIRTUALIZATION

• All “kernel” activity is more costly:• Interrupts• System Calls (I/O)•Memory page management

Sunday 8 May 2011

So, let’s start with the understanding step: what could potentially slow down because of virtualization?

The 3 most important aspects are:Interrupts - An actual piece of hardware is looking for attention from the CPU. Making use of Jumbo Frames is a very good idea in a virtual environment, because sending the same data causes less interrupts (1500 --> 9000 bytes per packet)System Calls - A process is looking for attention from the kernel to do a privileged task like accessing certain hardware (network/disk IO)Page Management - This is the most important one for databases: think caching. The database keeps an enormous amount of data in its own caches, so memory is manipulated a lot of the time. Every time something changes in this memory, the virtual host has to perform a double translation: From Virtual Memory to VM pagetable to physical address.

Usually, this causes the biggest performance hit when switching from native to virtual. We really have to do everything we can to minimize this problem.

Page 5: OpenDBCamp Virtualization

GENERAL OPTIMIZATION STRATEGY

Making the right hardware choices

Tuning the hypervisor to your database’s needs

Tuning the OS to your database’s needs

Squeezing every last bit of performance out of your database

Sunday 8 May 2011

Performance issues should be dealt with systematically, and we can split that process up in these 4 steps.

Page 6: OpenDBCamp Virtualization

HARDWARE CHOICES

• Choosing the right CPU’s

• Intel 5500/7500 and later types (Nehalem) / All AMD quadcore Opterons (HW-assisted/MMU virtualization)

• Choosing the right NIC’s (VMDQ)

• Choosing the right storage system (iSCSI vs FC SAN)

Sunday 8 May 2011

CPU’s --> HW Virtualization (dom -1) & HAP

best price/quality at the momentOpteron 6000 series very good at datamining/decision supportXeon 5600 series still very good at OLTP

VMDQ = sorting/queueing offloaded to the NIC

Page 7: OpenDBCamp Virtualization

CPU EVOLUTION

Sunday 8 May 2011

Page 8: OpenDBCamp Virtualization

CPU EVOLUTION

Sunday 8 May 2011

Page 9: OpenDBCamp Virtualization

CPU EVOLUTION

Sunday 8 May 2011

Page 10: OpenDBCamp Virtualization

OVERVIEW  NIC  -­‐  VMDQ  /  NETQUEUENetqueue  Devices Part  nr Speed Interface

Intel  Ethernet  Server  Adapter  X520-­‐SR2  2  ports E10G42BFSR 10Gbps SR-­‐LC

Intel  Ethernet  Server  Adapter  X520-­‐DA2  2  ports E10G42BTDA 10Gbps SFP+

Intel  Gigabit  ET  Dual  Port  Server  Adapter  2  ports E1G42ET 1Gbps RJ-­‐45  -­‐  Copper

Intel  Gigabit  EF  Dual  Port  Server  Adapter  2  ports E1G42EF 1Gbps RJ-­‐45  -­‐  Fibre

Intel  Gigabit  ET  Quad  Port  Server  Adapter  4  ports E1G44ET 1Gbps RJ-­‐45  -­‐  Copper

Intel  Gigabit  CT  Desktop  Adapter EXPI9301CT 1Gbps RJ-­‐45  -­‐  Copper

Supermicro  Add-­‐on  Card  AOC-­‐SG-­‐I2  2  ports AOC-­‐SG-­‐I2 1Gbps RJ-­‐45  copper

Onboard  82576  (8  Virtual  Queues)

Onboard  82574  Geen  IOV

Broadcom's  NetXtreme  II  Ethernet  chipse 1-­‐10  GBps1-­‐10  GBps

Alle  Neterions   1-­‐10  GBps1-­‐10  GBps

Sunday 8 May 2011

Page 11: OpenDBCamp Virtualization

SAN CHOICES

• iSCSI (using 10Gbit if possible)• ESX with Hardware Initiator (iSCSI HBA)• ESX with Software Initiator• Initiator inside the Guest OS• vSphere: iSCSI HBA pass-through to Guest OS

• Fibre Channel• ESX with FC-HBA• vSphere: FC-HBA pass-through to Guest OS

Sunday 8 May 2011

10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9GbitsThis problem can be reduced by the following technologiesVT-d ---> Moving DMA and address translation to the NICVMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementationSR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.

Page 12: OpenDBCamp Virtualization

SAN CHOICES

• iSCSI (using 10Gbit if possible)• ESX with Hardware Initiator (iSCSI HBA)• ESX with Software Initiator• Initiator inside the Guest OS• vSphere: iSCSI HBA pass-through to Guest OS

• Fibre Channel• ESX with FC-HBA• vSphere: FC-HBA pass-through to Guest OS Server with (hardware) iSCSI = iSCSI Target

(Virtualization-) server with (hardware) iSCSI= iSCSI Initiator

Sunday 8 May 2011

10Gbit = high CPU overhead!! We’re talking 24GHz to fill up 9GbitsThis problem can be reduced by the following technologiesVT-d ---> Moving DMA and address translation to the NICVMDQ/Netqueue ---> Netqueue is pretty much VMware’s implementationSR-IOV ---> Allowing one physical device (NIC) to show itself as multiple virtual devices.

Page 13: OpenDBCamp Virtualization

GENERAL OPTIMIZATION STRATEGY

Making the right “hardware” choices

Tuning the hypervisor to your database’s needs

Tuning the OS to your database’s needs

Squeezing every last bit of performance out of your database

Sunday 8 May 2011

Page 14: OpenDBCamp Virtualization

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

Sunday 8 May 2011

CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.

Page 15: OpenDBCamp Virtualization

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

Sunday 8 May 2011

CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.

Page 16: OpenDBCamp Virtualization

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

1 | 0xD

2 | 0xC

3 | 0xF

Page Table

6 | 0xG

5 | 0xH

4 | 0xA

8 | 0xE

7 | 0xB

etc.

Sunday 8 May 2011

CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.

Page 17: OpenDBCamp Virtualization

VIRTUAL MEMORY

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

TLB

1 | 0xD

5 | 0xH

2 | 0xC

etc.

Managed by software

Actual Hardware

OS

1

2

3

4

5

Virtual Memory

6

7

8

9

10

11

12

1 | 0xD

2 | 0xC

3 | 0xF

Page Table

6 | 0xG

5 | 0xH

4 | 0xA

8 | 0xE

7 | 0xB

etc.

Sunday 8 May 2011

CPU’s: AMD: all 4core opteronsIntel: Xeon 5500, 7500, 5600

Physical memory is divided into segments of 4KB, which is translated in software to so-called pages. Small chunks with each its own address, which the CPU uses to find the data in the physical memory.

A piece of software always gets a continuous block of “virtual” memory assigned to it within an OS, even though the physical memory is fragmented, to prevent a coding nightmare. (keeping track of every single page address is madness).

The page table was made for the CPU to run through and to make the necessary translation to the physical memory. The CPU has a hardware cache that keeps track of these entries, the Translation Lookaside Buffer. This is an extremely fast buffer that saves the most recent addresses, so the CPU can prevent running through the Page Table as much as possible.

Page 18: OpenDBCamp Virtualization

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.

Page 19: OpenDBCamp Virtualization

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

B

A

1 | 0xG

5 | 0xD

2 | 0xF

12 | 0xE

10 | 0xB

9 | 0xC

“Shadow” Page Table

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.

Page 20: OpenDBCamp Virtualization

SPT VS HAP

Mem

0xA

0xB

0xC

0xD

0xE

0xF

0xG

0xH CPU

TLB

A1 | 0xD

A5 | 0xH

A2 | 0xC

etc.

B12 | 0xB

B10 | 0xE

B9 | 0xA

VM A

VM B

1 | 0xD

5 | 0xH

2 | 0xC

N

“Read-only”Page Table

12 | 0xB

10 | 0xE

9 | 0xA

etc.

1

2

3

4

5

1

2

3

4

12

Managed by VM OS

Managed by hypervisor

Actual Hardware

Sunday 8 May 2011

In a virtual environment, where the guest OS is not allowed direct access to the memory, this was solved in a different way. Each VM gets access to its own page table, but this one is actually locked/read-only, and as soon as a change is made, a “trap” is generated, so the hypervisor is forced to take over and handle the page management. This causes a lot of overhead, because every single memory management action forces the hypervisor to intervene.

As an alternative, new CPU’s came to the market with a modified TLB-cache, which was able to keep track of the complete translation path (VM virtual address --> VM physical address --> host physical address)

Downside: Because of this, filling up the TLB got a lot more complex. A page that is not yet in there is very hard to find. Once the TLB is properly warmed up, though, most applications rarely have to wait for other pages.

Page 21: OpenDBCamp Virtualization

HAP

Sunday 8 May 2011

As you can see, in general this does help improve performance, though not by a really huge amount. It opens the door to a great combination with another technique, though!

Page 22: OpenDBCamp Virtualization

HAP + LARGE PAGES

Setting Large Pages:• Linux - increase SHMMAX in rc.local• Windows - grant “Lock Pages in memory”

• MySQL (only InnoDB) - large-pages• Oracle - ORA_LPENABLE=1 in registry• SQL Server - Enterprise only, need >8GB RAM. For buffer

pool start up with trace flag -834

Sunday 8 May 2011

While using HAP, you should definitely make use of Large Pages, because filling up the TLB is a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s attempts to prevent entries from disappearing from the TLB too fast.

Oracle: HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_HOME_NAME

Page 23: OpenDBCamp Virtualization

HAP + LARGE PAGES

Setting Large Pages:• Linux - increase SHMMAX in rc.local• Windows - grant “Lock Pages in memory”

• MySQL (only InnoDB) - large-pages• Oracle - ORA_LPENABLE=1 in registry• SQL Server - Enterprise only, need >8GB RAM. For buffer

pool start up with trace flag -834

Sunday 8 May 2011

While using HAP, you should definitely make use of Large Pages, because filling up the TLB is a lot more expensive. By using Large Pages (2mb in 4kb), a LOT more memory can be accessed by a single entry. This in combination with a bigger TLB in the newest CPU’s attempts to prevent entries from disappearing from the TLB too fast.

Oracle: HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_HOME_NAME

Page 24: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 25: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 26: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 27: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 28: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 29: OpenDBCamp Virtualization

VIRTUAL HBA’S

• Choices (ESX)

• Before vSphere:• BusLogic Parallel (Legacy)• LSI Logic Parallel (Optimized)

• Since vSphere• LSI Logic SAS (default as of Win2008)• VMware Paravirtual (PVSCSI)

• Thin vs Thick Provisioning (vSphere)

• Snapshots & performance do not go together

Sunday 8 May 2011

BusLogic ---> Generic adapterLSILogic ---> Optimized adapter that requires toolsLSILogic SAS ---> Presents itself as a SAS controller (necessary for Windows clustering)PVSCSI ---> Fully paravirtualized high performance adapter, created to use iSCSI from the guest, supports command queueing.

Page 30: OpenDBCamp Virtualization

VIRTUAL NIC’S• Choices (ESX)

• Before vSphere:• Flexible (emulation)• E1000 (Intel E1000 emulation, default x64)• (enhanced) VMXNET (paravirtual)

• Since vSphere:• VMXNET 3 (third generation paravirtual NIC)

• Jumbo frames, NIC Teaming, VLANs

• Colocation (minimize NIC traffic by sharing a host)

Sunday 8 May 2011

Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion • Necessary for VMDq!!

Page 31: OpenDBCamp Virtualization

VIRTUAL NIC’S• Choices (ESX)

• Before vSphere:• Flexible (emulation)• E1000 (Intel E1000 emulation, default x64)• (enhanced) VMXNET (paravirtual)

• Since vSphere:• VMXNET 3 (third generation paravirtual NIC)

• Jumbo frames, NIC Teaming, VLANs

• Colocation (minimize NIC traffic by sharing a host)

Sunday 8 May 2011

Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion • Necessary for VMDq!!

Page 32: OpenDBCamp Virtualization

VIRTUAL NIC’S• Choices (ESX)

• Before vSphere:• Flexible (emulation)• E1000 (Intel E1000 emulation, default x64)• (enhanced) VMXNET (paravirtual)

• Since vSphere:• VMXNET 3 (third generation paravirtual NIC)

• Jumbo frames, NIC Teaming, VLANs

• Colocation (minimize NIC traffic by sharing a host)

Sunday 8 May 2011

Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion • Necessary for VMDq!!

Page 33: OpenDBCamp Virtualization

VIRTUAL NIC’S• Choices (ESX)

• Before vSphere:• Flexible (emulation)• E1000 (Intel E1000 emulation, default x64)• (enhanced) VMXNET (paravirtual)

• Since vSphere:• VMXNET 3 (third generation paravirtual NIC)

• Jumbo frames, NIC Teaming, VLANs

• Colocation (minimize NIC traffic by sharing a host)

Sunday 8 May 2011

Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion • Necessary for VMDq!!

Page 34: OpenDBCamp Virtualization

VIRTUAL NIC’S• Choices (ESX)

• Before vSphere:• Flexible (emulation)• E1000 (Intel E1000 emulation, default x64)• (enhanced) VMXNET (paravirtual)

• Since vSphere:• VMXNET 3 (third generation paravirtual NIC)

• Jumbo frames, NIC Teaming, VLANs

• Colocation (minimize NIC traffic by sharing a host)

Sunday 8 May 2011

Flexible ---> required for 32-bit systemsAutomatically turns into a VMXNET after installing VMware Tools

VMXNet adds ‘Jumbo Frames’

VMXNET3 adds:* MSI/MSI-X support (if supported by guest OS Kernel) • Receive Side Scaling (Windows 2008 only) • IPv6 checksum & TCP Segmentation Offloading (segmentation of packages --> NIC, not CPU) • VLAN Offloading • Bigger TX/RX ring sizes• Optimizations for iSCSI & VMotion • Necessary for VMDq!!

Page 35: OpenDBCamp Virtualization

GENERAL OPTIMIZATION STRATEGY

Making the right hardware choices

Tuning the hypervisor to your database’s needs

Tuning the OS to your database’s needs

Squeezing every last bit of performance out of your database

Sunday 8 May 2011

Page 36: OpenDBCamp Virtualization

BEST OS CHOICES

• 64-bit Linux for MySQL

•MySQL 5.1.32 or later

• ... ? (discuss mode on! :) )

Sunday 8 May 2011

Modified mutexes for InnoDB = improvement of locking for multithreaded environments. This allows for much better scaling.

Page 37: OpenDBCamp Virtualization

DON’T FORGET

• VMware Tools

• Paravirtualized Vmxnet, PVSCSI

• Ballooning

• Time Sync

• ... and more recent drivers

• Integration Services

• Paravirtualized Drivers

• Hypercall adapter

• Time Sync

• ... and more recent drivers

Sunday 8 May 2011

Definitely install the tools of the hypervisor in question to enable use of its newest functionalities. This is very important if you want to use for example overcommitting memory in ESX, or using paravirtualization in Linux on Hyper-V.

Page 38: OpenDBCamp Virtualization

CACHING LEVELS

• CPU

• Application

• Filesystem / OS

• RAID Controller (switch off or use a BBU!)

•Disk

Sunday 8 May 2011

CPU: Just buy the right CPUApp/FS: use the correct settings (Direct IO)RAID Controller: Make use of a battery backupped unit (for transactional databases: lots of random writes in the cache, so to be sure, the RAID controller keeps track of those). This is mostly used as a write buffer.Disk: If cache is available on-disk, it’s best we disable this, especially when the power drops (so nothing can get stuck in the caches). HP disables these by default.

Page 39: OpenDBCamp Virtualization

GENERAL OPTIMIZATION STRATEGY

Making the right hardware choices

Tuning the hypervisor to your database’s needs

Tuning the OS to your database’s needs

Squeezing every last bit of performance out of your database

Sunday 8 May 2011

Page 40: OpenDBCamp Virtualization

DIRECT IO

• Less Page management• Smallest cache possible vs Less I/O

SQL Server : AutomaticallyMySQL: only for use with InnoDB! - innodb_flush_method=O_DIRECTOracle: filesystemio_options=DIRECTIO

Sunday 8 May 2011

Though in Windows this is on by default, in Linux it should definitely be enabled. Otherwise everything that is already cached by the InnoDB buffer pool may also be cached by the filesystem cache, so two separate but identical caches need to be maintained in the memory: far too much memory management.

MySQL’s MyISAM actually depends on this filesystem cache. It expects the OS to do the brunt of the caching work itself.

Page 41: OpenDBCamp Virtualization

GENERAL MY.CNF OPTIMIZATIONS

• max_connections (151) (File descriptors!)

• Per connection

• read_buffer_size (128K) (Full Scan)

• read_rnd_buffer_size (256K) (Order By)

• sort_buffer_size (2M) (Sorts)

• join_buffer_size (128K) (Full Scan Join)

Sunday 8 May 2011

Page 42: OpenDBCamp Virtualization

GENERAL MY.CNF OPTIMIZATIONS

• thread_cache (check out max_used_connections)

• table_cache (64) - table_open_cache (5.1.3x)

• Engine dependent

• open_tables variable

• opened_tables ∆ ≈ 0 • innodb_buffer_pool_size

• innodb_thread_concurrency

Sunday 8 May 2011Try to fit max_used_connections into the thread_cache IF POSSIBLE

Page 43: OpenDBCamp Virtualization

INDEXING

• Heaps

• Unclustered Indexes

• Clustered Indexes (InnoDB)

Sunday 8 May 2011

Page 44: OpenDBCamp Virtualization

INDEX FRAGMENTATION

• Happens with clustered indexes

• Large-scale fragmentation of the indexes could cause serious performance problems

• Fixes:

• SQL Server : REBUILD/REORGANIZE•MySQL: ALTER TABLE tbl_name ENGINE=INNODB•Oracle: ALTER INDEX index_name REBUILD

Clustered Index Leaf Level

Sunday 8 May 2011

Page 45: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 46: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 47: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 48: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 49: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 50: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

InsertDelete

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 51: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Checkpoint process

InsertDelete

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 52: OpenDBCamp Virtualization

STORAGE ENGINE INTERNALS

DB Front

Buffer PoolCache

Datafile

Transaction Log

Update

Checkpoint process

InsertDelete

Sunday 8 May 2011

SQL Server --> Set memory options in server properties > Memory > Server memory Options

Page 53: OpenDBCamp Virtualization

DATA AND LOG PLACEMENT

Sunday 8 May 2011

This is most important for transactional databases.

As you can see, the difference of using a decent SAS or SSD disk for the database log is negligible. There is no use sinking the cache into an SSD for logs, just get a decent, fast SAS.

Page 54: OpenDBCamp Virtualization

SQL STATEMENT ‘DUHS’

• Every table MUST have a primary key

• If possible, use a clustered index

•Only keep regularly used indexes around (f. ex. FK)

•WHERE > JOIN > ORDER BY > SELECT

•Don’t use SELECT *

• Try not to use COUNT() (in InnoDB always a full table scan)

Sunday 8 May 2011

Page 55: OpenDBCamp Virtualization

GENERAL OPTIMIZATION STRATEGY

Making the right hardware choices

Tuning the hypervisor to your database’s needs

Tuning the OS to your database’s needs

Squeezing every last bit of performance out of your database

Sunday 8 May 2011

Page 56: OpenDBCamp Virtualization

QUESTIONS?

I don’t have the attention span to keep up a blog :(

Results of benchmarks: http://www.anandtech.com/tag/IT

Sunday 8 May 2011