Red Hat Enterprise Linux-6-Performance Tuning Guide-En-US

5/19/2018 Red Hat Enterprise Linux-6-Performance Tuning Guide-En-US

Red Hat Subject Matter Experts

Red Hat Enterprise Linux 6

Performance Tuning Guide

Opt imizing subsystem throughput in Red Hat Enterprise Linux 6

Edit ion 4.0


Red Hat Enterprise Linux 6 Performance Tuning Guide

Opt imizing subsystem throughput in Red Hat Enterprise Linux 6

Edit ion 4.0

Red Hat Subject Matter Experts

Edited by

Don Domingo

Laura Bailey


Legal Notice

Copyright 2011 Red Hat, Inc. and others.

This document is licensed by Red Hat under the Creative Commo ns Attribution-ShareAlike 3.0Unported License. If you dis tribute this do cument, or a modified versio n of it, you must provideattribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all RedHat trademarks must be removed.

Red Hat, as the licenso r of this document, waives the right to enforce, and agrees no t to assert,Section 4d o f CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the InfinityLogo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and o thercountries.

Linux is the registered trademark o f Linus Torvalds in the United States and o ther countries.

Java is a regis tered trademark o f Oracle and/or its affiliates.

XFS is a trademark of Silicon Graphics International Co rp. or its subsidiaries in the UnitedStates and/or o ther countries.

MySQL is a registered trademark o f MySQL AB in the United States, the European Unio n andother countries.

Node.js is an o fficial trademark of Joyent. Red Hat Software Collections is not formallyrelated to o r endorsed by the official Joyent Node.js open so urce o r commercial project.

The OpenStack Word Mark and OpenStack Logo are either registered trademarks/servicemarks or trademarks/service marks of the OpenStack Foundation, in the United States and o ther

countries and are used with the OpenStack Foundation's permiss ion. We are not affiliated with,endorsed or sponso red by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Abstract

The Performance Tuning Guide describes ho w to o ptimize the performance of a systemrunning Red Hat Enterprise Linux 6 . It also documents performance-related upgrades in RedHat Enterprise Linux 6 . While this guide contains procedures that are field- tested and proven,

Red Hat recommends that you properly test all planned configurations in a testing environmentbefore applying it to a production environment. You sho uld also back up all your data and pre-tuning configurations.
http://creativecommons.org/licenses/by-sa/3.0/


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Table of Contents

Preface

1. Doc ument Conventio ns

1.1. Typographic Conventions

1.2. Pull-quo te Conventions

1.3. No tes and Warnings

2. Getting Help and Giving Feedback2.1. Do Yo u Need Help ?

2.2. We Need Feedb ack!

Chapter 1. Overview

1.1. Ho w to read this bo ok

1.1.1. Audience

1.2. Release o verview

1.2.1. New features i n Red Hat Enterp rise Linux 6

1.2.1.1. New in 6 .6

1.2.1.2. New in 6 .5

1.2.2. Ho rizontal Scalability1.2.2.1. Parallel Co mputing

1.2.3. Distrib uted Systems

1.2.3.1. Communication

1.2.3.2. Storage

1.2.3.3. Converged Networks

Chapt er 2. Red Hat Ent erpriseLinux 6 Performance Features

2.1. 6 4-Bit Suppo rt

2.2. Tic ket Spi nloc ks

2.3. Dynamic Lis t Structure

2.4. Tic kless Kernel

2.5. Co ntrol Gro ups

2.6. Storag e and File System Impro vements

Chapt er 3. Monitoring and Analyzing System Performance

3.1. The proc File System

3.2. GNOME and KDE System Monito rs

3.3. Performance Co-Pilot (PCP)

3.3.1. PCP Architecture

3.3.2. PCPSetup

3.4. irqb alance3.5. Buil t-in Command-line Monitoring Tools

3.6. Tuned and ktune

3.7. Appli catio n Profilers

3.7.1. SystemTap

3.7.2. OProfile

3.7.3. Valgrind

3.7.4. Perf

3.8. Red Hat Enterpris e MRG

Chapter 4. CPU

Topo logy

Threads

Interrup ts

4.1. CPU Top olo gy

4

4

4

5

6

66

7

8

8

8

9

9

9

10

11

11

12

12

13

14

16

16

16

17

17

18

19

21

21

21

22

22

23

2425

26

27

27

29

29

30

31

32

32

32

32

33

T able of Cont ents

1


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .

4.1.2. Tuning CPU Performance

4.1.2.1. Setting CPU Affinity with taskset

4.1.2.2. Co ntroll ing NUMA Pol icy with numactl

4.1.3. Hard ware p erformance po licy (x86_energy_perf_po licy)

4.1.4. turbostat

4.1.5. numastat

4.1.6. N UMA Affinity Manag ement Daemon (numad)

4.1.6 .1. Benefits o f numad

4.1.6.2. Mod es o f op eratio n

4.1.6.2.1. Using numad as a service

4.1.6.2.2. Using numad as an executable

4.2. CPU Sched uling

4.2.1. Realtime scheduling po licies

4.2.2. Normal scheduling p ol icies

4.2.3. Po lic y Selectio n

4.3. Interrup ts and IRQ Tuning

4.4. CPU Frequency Go verno rs

4.5. Enhancements to N UMA in Red Hat Enterp rise Linux 6

4.5.1. Bare-metal and Scalab ility Op timizations

4.5.1.1. Enhancements in topology-awareness

4.5.1.2. Enhancements in Multi-processor Synchronization

4.5.2. Virtualization Optimizations

Chapter 5. Memory

5.1. Huge Translation Lookaside Buffer (HugeTLB)

5.2. Huge Pages and Transparent Huge Pages

5.3. Using Valgrind to Profile Memory Usage

5.3.1. Profili ng Memo ry Usag e with Memcheck

5.3.2. Profiling Cache Usage with Cachegrind

5.3.3. Pro filing Heap and Stack Space with Mass if

5.4. Capacity Tuning

5.5. Tuning Vir tual Memory

Chapter 6. Input/Out put

6.1. Features

6 .2. Analysis

6.3. Too ls

6.4. Configuration

6 .4.1. Comp letely Fair Q ueuing (CFQ)6.4.2. Deadline I/O Scheduler

6.4.3. Noo p

Chapt er 7. File Systems

7.1. Tuning Considerations for File Systems

7.1.1. Formatting Op tions

7.1.2. Mount Op tions

7.1.3. File sys tem maintenance

7.1.4. App lication Consi derations

7.2. Profiles for file system performance

7.3. File Systems

7.3.1. The Ext4 File System

7.3.2. The XFS File System

7.3.2.1. Basic tuning for XFS

34

36

36

38

38

39

41

42

42

42

43

43

44

44

45

45

46

47

47

47

48

48

49

49

49

50

50

51

52

53

56

59

59

59

6 1

6 5

6 56 7

6 8

70

70

70

70

72

72

72

73

73

74

74

Red Hat Ent erprise Linux 6 Performance Tuning G uide

2


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

7.3.2.2.1. Optimizing for a large number of files

7.3.2.2.2. Optimizing for a large number o f files in a sing le directo ry

7.3.2.2.3. Op timising for co ncurrency

7.3.2.2.4. Optimising for applications that use extended attributes

7.3.2.2.5. Op timising for sustained metadata mod ifications

7.4. Clustering

7.4.1. Glo b al File System 2

Chapt er 8. Net working

8.1. Network Performance Enhancements

Receive Packet Steering (RPS)

Receive Flow St eering

get sockopt support for T CP thin- streams

T ransparent Proxy (TProxy) suppo rt

8.2. Op timized Network Settings

Socket receive buffer size

8 .3. Overview o f Packet Receptio n

CPU/cache aff inity

8.4. Resolving Co mmon Q ueuing/Frame Loss Issues

8 .4.1. NIC Hard ware Buffer

8 .4.2. Socket Queue

8.5. Multicast Consi derations

8.6. Receive-Side Scaling (RSS)

8 .7. Receive Packet Steering (RPS)

8 .8. Receive Flow Steering (RFS)

8.9 . Accelerated RFS

Revision Hist ory

75

75

75

76

76

77

77

80

8 0

80

80

81

81

8 1

83

8 3

84

8 4

8 4

8 5

8 6

8 6

8 7

8 8

8 9

90

T able of Cont ents

3


Preface

1. Document Conventions

This manual uses several con ventions to h ighlight certain words an d phrases and draw attention to

specific pieces of information.

1.1. T ypographic Conventions

Four typographic conventions are used to ca ll attention to specific words and phrases. These

conventions, and the circumstances they apply to, are as follows.

Mono-spaced Bold

Used to hig hlight system input, includ ing shell command s, file names and paths. Also used to

high light keys and key combinations. For example:

To see the contents of the file my_next_bestselling_novelin you r currentworking d irectory, enter th e cat my_next_bestselling_novel command at the

shell prompt and press Enterto execute the command.

The above includes a file name, a shell command and a key, all presented in mono-spaced bold and

all distingu ishab le thanks to context.

Key combina tions can be distinguished from an in dividua l key by the plus sign that connects each

part of a key combination . For example:

Press Enterto execute the command.

Press Ctrl+Alt+F2to switch to a virtual termina l.

The first example high ligh ts a particula r key to press. The second example highligh ts a key

combination : a set of three keys pressed simultaneously.

If sou rce code is discussed, class na mes, methods, functions, variab le names and returned va lues

mentioned within a paragraph will be presented as above, in mono-spaced bold . For example:

File-related classes include filesystemfor file systems, filefor files, and dirfor

directories. Each class has its own asso ciated set of permissions.

Proportional Bold

This denotes words or phrases encountered o n a system, including a pplication names; dialog -box

text; labeled buttons; check-box and rad io-bu tton labels; menu titles and submenu titles. For

example:

Choose System Preferences Mousefrom the main menu bar to laun ch

Mouse Preferences. In the Buttonstab, select the Left-handed mousecheck

box and click Closeto switch the primary mouse button from the left to the right

(making the mouse suitable for use in the left hand).

To insert a special character into a gedit file, choose Applications

Accessories Character Map from the main menu bar. Next, choose Search

Findfrom the Character Map menu bar, type the name of the character in the

Searchfield an d click Next. The character you sought will be highlighted in the


4


Character Table. Double-click this highlighted character to place it in the Text

to copyfield and then click the Copybutton. Now switch back to your document

and choose Edit Paste from the gedit menu b ar.

The above text inclu des app lication names; system-wide menu names and items; app lication -specific

menu n ames; and buttons an d text found within a GUI interface, all presented in proportional bo ld

and all distinguisha ble by context.

Mono-spaced Bold Italicor Proportional Bold Italic

Whether mono-spaced bold or prop ortional bold, the addition of italics indicates replaceable or

variable text. Italics d enotes text you do not input literally o r displayed text tha t chang es dependin g

on circumstance. For example:

To con nect to a remote machine using ssh, type ssh [email protected] a

shell prompt. If the remote machine is example.comand your username on that

machine is john, type ssh [email protected].

The mount -o remount file-systemcommand remounts the named file system.

For example, to remount the /homefile system, the command is mount -o remount

/home.

To see the version o f a currently installed packag e, use the rpm -qpackage

command. It will return a result as follo ws:package-version-release.

Note the words in b old italics a bove: username, domain.name, file-system, package, version and

release. Each word is a p laceho lder, either for text you enter when issuing a command or for text

disp layed by the system.

Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and

important term. For example:

Publican is a DocBookpublishing system.

1.2. Pull-quote Conventions

Terminal ou tput and source code listings are set off visually from the surroun ding text.

Output sent to a termina l is set in mono-spaced romanand presented thus:

books Desktop documentation drafts mss photos stuff svn

books_tests Desktop1 downloads images notes scripts svgs

Source-code listings are also set in mono-spaced romanbut add syntax highlig hting as follows:

staticintkvm_vm_ioctl_deassign_device(structkvm *kvm,

structkvm_assigned_pci_dev *assigned_dev)

{

intr = 0;

structkvm_assigned_dev_kernel *match;

mutex_lock(&kvm->lock);

match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,

assigned_dev->assigned_dev_id);

if(!match) {

printk(KERN_INFO "%s: device hasn't been assigned

Preface

5


before, "

"so cannot be deassigned\n", __func__);

r = -EINVAL;

gotoout;

}

kvm_deassign_device(kvm, match);

kvm_free_assigned_device(kvm, match);

out:

mutex_unlock(&kvm->lock);

returnr;

}

1.3. Notes and Warnings

Fina lly, we use three visua l styles to draw attention to information that might otherwise be overlooked.

Note

Notes are tips, shortcuts or alternative approaches to the task a t hand . Ignoring a note should

have no nega tive consequences, but you might miss out on a trick that makes your life easier.

Important

Important boxes detail things that are easily missed: configuration changes that only apply to

the current session, or services that need restarting before an update will app ly. Ignoring a

box labeled Important will no t cause data loss but may cau se irritation an d frustration.

Warning

Warnings shou ld no t be ignored. Ignoring warnings will most likely cause data loss.

2. Get t ing Help and Giving Feedback

2.1. Do You Need Help?

If you experience difficulty with a procedure described in this do cumentation, visit the Red Hat

Customer Portal at http://access.redhat.com. Through the customer portal, you can:

search o r browse through a knowledgebase of technical support articles abou t Red Hat products.

submit a suppo rt case to Red Hat Globa l Support Services (GSS).

access other product documentation.


6
http://access.redhat.com/


Red Hat also hosts a large number of electronic mailin g lists for discussio n o f Red Hat software and

techno logy. You can find a list of publicly availab le mailing lists at

https://www.redhat.com/mailman/listinfo . Click on the name of any mailin g list to subscribe to that list

or to access the list archives.

2.2. We Need Feedback!

If you find a typographica l error in this manual, or if you have though t of a way to make this manua lbetter, we would love to hear from you! Please submit a report in Bugzilla : http://bugzilla.redhat.com/

aga inst the product Red Hat Enterprise Linux 6.

When submitting a bug report, be sure to mention the manua l's identifier: doc-

Performance_Tuning_Guide

If you have a suggestion for improving the documentation , try to be as specific as possib le when

describing it. If you have found an error, please includ e the section number and some of the

surrounding text so we can find it easily.

Preface

7
http://bugzilla.redhat.com/https://www.redhat.com/mailman/listinfo


Chapter 1. Overview

The Performance Tuning Guideis a comprehensive reference on the configu ration and o ptimization of

Red Hat Enterprise Linux. While this release also con tains information on Red Hat Enterprise Linux 5

performance capabi lities, all in structions su ppl ied herein are specific to Red Hat Enterprise Linux 6.

1.1. How to read this book

This bo ok is d ivided in to chap ters discussing specific subsystems in Red Hat Enterprise Linux. The

Performance Tuning Guidefocuses on three major themes per subsystem:

Features

Each subsystem chapter describes performance features un ique to (or implemented

differently in) Red Hat Enterprise Linux 6. These chapters also discu ss Red Hat Enterprise

Linux 6 u pdates that significantly improved the performance of specific sub systems over

Red Hat Enterprise Linux 5.

Analysis

The book also enumerates performance indica tors for each specific subsystem. Typica l

values for these ind icators a re described in the context of specific services, helping you

understand their significance in real-world, production systems.

In addition, the Performance Tuning Guidealso sh ows d ifferent ways o f retrieving

performance da ta (that is, profiling) for a subsystem. Note that some of the profiling too ls

showcased here are documented elsewhere with more detail.

Configuration

Perhaps the most important information in this book are instructions on how to adjust the

performance o f a specific sub system in Red Hat Enterprise Linux 6. The Performance Tuning

Guideexplain s how to fine-tune a Red Hat Enterprise Linux 6 subsystem for sp ecific

services.

Keep in mind tha t tweaking a specific sub system's performance may a ffect the performance of

another, sometimes adversely. The default configuration of Red Hat Enterprise Linux 6 is optimal for

mostservices running under moderateloads.

The procedures enumerated in the Performance Tuning Guidewere tested extensively by Red Hat

engineers in bo th lab and field. However, Red Hat recommends that you p roperly test all p lanned

configura tions in a secure testing envi ronment before applyin g it to your production servers. Youshould also back up all da ta and co nfiguration information before you start tuning your system.

1.1.1. Audience

This book is sui table for two types of readers:

System/Bus iness Analyst

This b ook enumerates and explains Red Hat Enterprise Linux 6 performance features a t a

high level, providing enough information on how subsystems perform for specific

workloads (both b y defau lt and when o ptimized). The level of d etai l used in describ ing RedHat Enterprise Linux 6 p erformance features helps p otential customers and sales engineers

understand the suitabil ity of this p latform in provid ing resource-intensive services at an

acceptable level.


8


The Performance Tuning Guidealso provides links to more detailed documentation on each

feature whenever possib le. At that detail level, readers can understand these performance

features enough to form a high-level strategy in d eploying an d op timizing Red Hat

Enterprise Linux 6. This a llows readers to both develop andevaluate infrastructure

proposals.

This feature-focused level of documentation is su itable for readers with a high-level

understanding o f Linux subsystems and enterprise-level networks.

System Administrat or

The procedures enumerated in this bo ok are suitab le for system administrators with RHCE

skill level (or its equ ivalent, tha t is, 3-5 years experience in deploying and managing

Linux). The Performance Tuning Guideaims to provide as much detail as possible about the

effects of each configuration ; this means describin g any performance trade-offs that may

occur.

The underlying skill in performance tuning lies not in knowing how to ana lyze and tune a

subsystem. Rather, a system administrator adept at performance tun ing knows how to

ba lance and optimize a Red Hat Enterprise Linux 6 system for a specific purpose. This meansalsoknowing which trade-offs and performance penalties are acceptable when attempting

to implement a con figura tion design ed to bo ost a specific subsystem's performance.

1.2. Release overview

1.2.1. New features in Red Hat Enterprise Linux 6

Read this section for a brief overview of the performance-related ch anges inclu ded in Red Hat

Enterprise Linux 6 .

1.2.1 .1. New in 6.6

perf has been up dated to version 3.12, which includes a number of new features, inclu ding:

New perf record options for statistically sampling consecutive taken branches, -jand -b.

See the man p age for further details: man perf-record .

Several new perf reportparameters, including --groupand --percent-limit, and

add itional options for sorting d ata collected with the perf record -jand -boptions

enabled. See the man page for further details : man perf-report.

New perf memcommand for profiling load and store memory access.

Several new options in perf top, including --percent-limitand --obj-dump.

--forceand --appendop tions h ave been removed from perf record .

New --initial-delayoption for perf stat.

New --output-filenameoption for perf trace.

New --groupoption for perf evlist.

Changes to the perf top-G and perf record -g op tion: these are no longer alternatives

to the --call-graphop tion. When libunwind support is ad ded to future versions o f

Red Hat Enterprise Linux, these options will enable the configured unwind method.

[1]

Chapt er 1. O verview

9


1.2.1 .2. New in 6.5

Updates to the kernel remove a bottleneck in memory manag ement and improve performance by

allowing I/O load to spread across multiple memory poo ls when the irq-tab le size is 1 GB or

greater.

The cpupowerutils packa ge has been upda ted to include the turbostat and

x86_energy_perf_policytools. These too ls are newly documented in Section 4.1.4,

turbostat and Section 4.1.3, Hardware performance po licy (x86_energy_perf_policy) .

CIFS now suppo rts larger rsize options and a synchrono us readpages, allowing for significant

increases in through put.

GFS2 now provides an Orlov block a llocator to increase the speed a t which block a llocation

takes place.

The virtual-hostsprofile in tuned has been adjusted. The value for

kernel.sched_migration_costis now 5000000 nan oseconds (5 milliseconds) instead o f

the kernel default 500000 nanoseconds (0.5 milliseconds). This reduces contention at the run

queue lock for large virtualization h osts.

The latency-performanceprofile in tuned has been adjusted. The value for power

management qua lity of service, cpu_dma_latency requirement, is now 1instead of 0 .

Several op timization s are now included in the kernel copy_from_user()and

copy_to_user()functions, improv ing the performance of both.

The perf too l has received a number of updates and enhancements, inclu ding:

Perf can now use hardware counters provided by the System z CPU-measurement counter

facility. There are four sets of hardware coun ters ava ilab le: the basic counter set, the problem-

state counter set, the crypto-activity counter set, and the extended counter set.

A new command, perf trace, is now available. This enables strace-like behavior using

perf in frastructure to a llow add itiona l targets to be traced. For further information, refer to

Section 3.7.4, Perf .

A script browser has been added to enable users to view all availab le scripts for the current

perf data file.

Several ad ditional sample scripts are now ava ilable.

Ext3 has been upd ated to reduce lock contention, thereby improving the performance of multi-

threaded write operations.

KSM has been updated to be aware of NUMA topo logy, allowing it to take NUMA locality in to

account while coalescing pa ges. This prevents performance drops related to pages being moved

to a remote node. Red Hat recommends avoid ing cross-node memory merging when KSM is in

use.

This update introduces a new tunable, /sys/kernel/mm/ksm/merge_nodes , to control this

behavior. The default valu e (1) merges pages across different NUMA nodes. Setmerge_nodesto

0 to merge pages only on the same node.

hdparmhas been updated with several new flags, including --fallocate, --offset, and -R

(Write-Read-Verify enablement). Additionally, the --trim-sector-rangesand --trim-sector-ranges-stdinoptions replace the --trim-sectorsoption, allowing more than a

sing le sector range to b e specified. Refer to the man page for further information about these

options.


10


1.2.2. Horizontal Scalability

Red Hat's efforts in improv ing the performance of Red Hat Enterprise Linux 6 focus on scalability.

Performance-boosting features are evaluated primarily based on how they affect the platform's

performance in different areas of the workload spectrum that is, from the lonely web server to the

server farm mainframe.

Focusing on scalabili ty allows Red Hat Enterprise Linux to maintain its versatility for different types of

workloads and purposes. At the same time, this means that as your business grows and your

workload sca les up, re-co nfiguring your server environment is less prohib itive (in terms of cost and

man-hours) and more intuitive.

Red Hat makes improvements to Red Hat Enterprise Linux for both horizontal scalabilityand vertical

scalability; however, horizon tal scalabili ty is the more generally app licab le use case. The idea behind

horizontal scalability is to use multiple standard computersto distribute heavy workloads in order to

improve performance and reliability.

In a typical server farm, these standard computers come in the form of 1U rack-mounted servers and

blade servers. Each standard computer may be as small as a simple two-socket system, although

some server farms use large systems with more sockets. Some enterprise-grade networks mix largeand small systems; in such cases, the large systems are high performance servers (for example,

database servers) and the small ones are dedicated app lication servers (for example, web or mail

servers).

This type of sca lab ility simplifies the growth of your IT infrastructure: a medium-sized business with

an a ppro priate load migh t only need two pizza box servers to suit all their needs. As the business

hires more people, expands its operations, increases its sales volumes and so forth, its IT

requirements increase in both volume and complexity. Horizontal sca lab ility allows IT to simply

deploy additional machines with (mostly) identical configurations as their predecessors.

To summarize, horizontal scalability adds a layer of abstraction that simplifies system hardware

administration. By developing the Red Hat Enterprise Linux platform to scale horizon tally, increasing

the capacity and performance of IT services can be as simple as adding new, easily con figured

machines.

1.2.2 .1. Parallel Comput ing

Users benefit from Red Hat Enterprise Linux's horizontal sca lab ility not just because it simplifies

system hardware administration; but also because horizontal scalability is a suitable development

philosophy given the current trends in hardware advancement.

Consider this: most complex enterprise applica tions have thousa nds of tasks that must be performed

simultaneously, with different coordination methods between tasks. While early computers ha d a

single-core processor to juggle all these tasks, virtually all processors available today have multiple

cores. Effectively, modern computers put multiple cores in a sing le socket, making even sing le-socket

desktops or laptops multi-processor systems.

As of 20 10, stand ard Intel an d AMD processors were available with two to sixteen cores. Such

processors are prevalent in pizza box or blade servers, which can now con tain as many as 40 cores.

These low-cost, high -performance systems bring la rge system capab ilities and ch aracteristics into

the mainstream.

To achieve the best performance and utiliza tion o f a system, each core must be kept busy. This

means that 32 separate tasks must be runn ing to take advan tage of a 32 -core blade server. If ablade chass is con tains ten of these 32-core blades, then the entire setup can pro cess a minimum of

320 tasks simultaneously. If these tasks a re part of a single job, they must be coord ina ted.


11


Red Hat Enterprise Linux was developed to adapt well to hardware development trends and ensure

that bu sinesses can fully b enefit from them. Section 1.2.3, D istributed Systems explores the

technologies that enab le Red Hat Enterprise Linux's horizo ntal sca lab ility in greater detail.

1.2.3. Distributed Systems

To fully realize horizontal scala bility, Red Hat Enterprise Linux uses many components of distributed

computing. The technologies that make up d istributed computing are divided in to three layers:

Communication

Horizontal scala bility requires many tasks to be performed simultaneously (in paral lel). As

such , these tasks must have interprocess communicationto coordinate their work. Further, a

platform with horizontal scalability should be able to share tasks across multiple systems.

Storage

Storage via local disks is not sufficient in addressing the requirements of horizontal

scala bility. Some form of distributed or sha red storage is needed, one with a layer of

abstraction that allows a single storage volume's capacity to grow seamlessly with theaddition of new storage hardware.

Management

The most important duty in d istributed computing is the managementlayer. This

management layer coordinates all software and hardware compon ents, efficiently

managing communication, storage, and the usage of shared resources.

The follo wing sections describe the technologies within each layer in more detail.

1.2.3.1. Co mmunicatio n

The communication layer ensures the transport of data, and is composed of two parts:

Hardware

Software

The simplest (and fastest) way for multiple systems to communicate is through shared memory. This

entails the usage of familiar memory read/write operations; shared memory has the high bandwidth,

low latency, and low overhead o f ordina ry memory read/write operations.

Ethernet

The most common way o f communicating between computers is over Ethernet. Today, Gigabit Ethernet

(GbE) is provided by defau lt on systems, and most servers include 2-4 ports of Gigab it Ethernet. GbE

provides good b andwidth and latency. This is the foundation of most distributed systems in use

today. Even when systems include faster network ha rdware, it is still common to u se GbE for a

dedicated management in terface.

10GbE

Ten Gigabit Ethernet(10GbE) is rapidly growing in acceptance for high end and even mid-range

servers. 10GbE provides ten times the bandwidth o f GbE. One of its major a dvantages is with modernmulti-core processors, where it restores the ba lance between communication and computing. You

can compare a single core system using Gb E to an eigh t core system using 10GbE. Used in this way,

10GbE is especially valuable for maintaining overall system performance and avoiding

communication bottlenecks.


12


Unfortunately, 10GbE is expensive. While the cost of 10GbE NICs has come down, the price of

interconnect (especially fibre op tics) remains h igh, and 10GbE network switches are extremely

expensive. We can expect these prices to decline over time, but 10GbE today is most heavily u sed in

server room backbones and performance-critical applications.

Infiniband

Infiniband offers even higher performance than 10GbE. In ad dition to TCP/IP and UDP network

connections used with Ethernet, Infiniband also supports shared memory communication. This

allows Infiniband to work between systems via remote direct memory access(RDMA).

The use of RDMA allows Infiniband to move data d irectly between systems without the overhead of

TCP/IP or socket connections. In turn, this reduces latency, which is critical to some applica tions.

Infiniban d is most commonly used in High Performance Technical Computing(HPTC) applications

which requ ire high bandwid th, low la tency and low overhead . Many supercomputing app lica tions

benefit from this, to the point that the best way to improve performance is by investing in Infiniband

rather than faster processors or more memory.

RoCE

RDMA over Converged Ethernet(RoCE) implements Infiniband-style communications (inc lud ing

RDMA) over a 10GbE infrastructure. Given the cost improvements associated with the growing

volume of 10GbE produ cts, it is reasonable to expect wider usage of RDMA and RoCE in a wide

range of systems and app lications.

Each of these communication methods is fully-su pported b y Red Hat for use with Red Hat Enterprise

Linux 6 .

1.2.3.2. St orage

An environ ment that uses distribu ted computing u ses multiple instances of shared storag e. This can

mean one of two things:

Multiple systems storing data in a single location

A storage unit (e.g. a volume) composed o f multiple storage appliances

The most familiar example of storage is the local d isk drive mounted on a system. This is approp riate

for IT operations where all appl ication s are hosted on on e host, or even a small number of hosts.

However, as the infrastructure scales to do zens or even hun dreds of systems, managing as many

local storage disks becomes difficult and complicated.

Distributed storage adds a layer to ease and automate storage hardware administration as the

business scales. Having multiple systems share a handful of storage instan ces reduces the number

of devices the administrator needs to manage.

Consolidating the storage capabilities of multiple storage appliances into one volume helps both

users and administrators. This type of distributed storage provides a layer of abstraction to storage

poo ls: users see a single unit of storage, which an administrator can easily grow by adding more

hardware. Some techno log ies that enable distributed storage also provide added benefits, such as

failover and multipathing.

NFS

Network File System(NFS) allows multiple servers or users to mount and use the same instance of

remote storage via TCP or UDP. NFS is commonly used to ho ld data sha red by multiple applica tions.

It is also convenient for bu lk storage of large amounts of data.


13


SAN

Storage Area Networks(SANs) use either Fibre Channel or iSCSI protoco l to p rovide remote access to

storage. Fibre Channel infrastructure (such as Fibre Channel host bus adapters, switches, and

storage arrays) combines high performance, high bandwidth, and massive storage. SANs separate

storage from processing, providing considerable flexibility in system design.

The other majo r advantage of SANs is tha t they prov ide a mana gement environment for performing

major storage hardware administrative tasks. These tasks in clud e:

Controlling access to storage

Managing large amoun ts of data

Provisioning systems

Backing up and replicating data

Taking snapshots

Supporting system failover

Ensuring data integrity

Migrating data

GFS2

The Red Hat Global File System 2(GFS2) file system provides several specialized capa bili ties. The

basic function of GFS2 is to provid e a single file system, includ ing concurrent read/write access,

shared across multiple members of a cluster. This means that each member of the cluster sees exactly

the same data "on d isk" in the GFS2 filesystem.

GFS2 allows all systems to have concurrent access to the "disk". To maintain data integrity, GFS2

uses a Distributed Lock Manager(DLM), which on ly allows one system to write to a specific location a t

a time.

GFS2 is especially well-suited for failover applications that require high availability in storage.

For further information about GFS2, refer to the Global File System 2. For further information about

storage in general, refer to the Storage Administration Guide. Both are available from

http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

1.2.3.3. Converged Net works

Communication over the network is normally done through Ethernet, with storage traffic us ing a

dedicated Fib re Channel SAN environment. It is common to ha ve a dedica ted network or serial link

for system manag ement, and perhaps even heartbeat . As a result, a sing le server is typically on

multiple networks.

Providing multiple connections on each server is expensive, bulky, and complex to manage. This

gave rise to the need for a way to consolid ate all conn ections into on e. Fibre Channel over Ethernet

(FCoE) and Internet SCSI(iSCSI) address this n eed.

FCoE

[2]


14
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/


With FCoE, standard fibre chann el commands and data pa ckets are transported over a 10GbE

physical infrastructure via a single converged network adapter(CNA). Standard TCP/IP ethernet traffic

and fibre channel storage operations can be transported via the same link. FCoE uses one physical

network interface card (an d one cable) for multiple logica l network/storage connections.

FCoE offers the following a dvantages:

Reduced number of conn ections

FCoE reduces the number of network connections to a server by ha lf. You can still choose

to have multiple connections for performance or availabili ty; however, a single connection

provides both storage and network connectivity. This is especially helpful for pizza box

servers and b lade servers, since they both h ave very limited sp ace for components.

Lower cos t

Reduced number of connections immedia tely means reduced number of cab les, switches,

and o ther networking equipment. Ethernet's h istory also features great economies of scale;

the cost of networks drops d ramatically a s the number of devices in the market goes from

millions to b illions, as was seen in the decline in the price of 100Mb Ethernet and gigab it

Ethernet devices.

Similarly, 10GbE will a lso become cheaper as more bus inesses adapt to its use. Also, as

CNA hardware is integrated into a sing le chip, widespread use will also increase its volume

in the market, which will result in a sign ificant price drop over time.

iSCSI

Internet SCSI(iSCSI) is ano ther type of con verged network pro tocol; it is an alternative to FCoE. Like

fibre channel, iSCSI provides blo ck-level storage over a network. However, iSCSI does no t provide a

complete management environment. The main advan tage of iSCSI over FCoE is that iSCSI provides

much of the capab ility and flexibil ity of fibre channel, but at a lower cost.

[1] Red Hat Certified Engineer. For more information, refer to

http://www.redhat.com/training/certifications/rhce/.

[2] Heartbeatis the exchange o f messag es b etween systems to ensure that each system is s till

functioning. If a system "lo ses heartbeat" it is assumed to have failed and is shut do wn, with another

system taking o ver fo r it.


15
http://www.redhat.com/training/certifications/rhce/


Chapter 2. Red Hat Enterprise Linux 6 Performance Features

2.1. 64-Bit Support

Red Hat Enterprise Linux 6 supports 64-bit processors; these processors can theoretically use up to

16 exabytesof memory. As of general availability (GA), Red Hat Enterprise Linux 6.0 is tested and

certified to support up to 8 TB of physical memory.

The size of memory supported by Red Hat Enterprise Linux 6 is expected to grow over several minor

updates, as Red Hat continues to in troduce an d improve more features that enab le the use of larger

memory blocks. Fo r cu rrent details, see https://access.redhat.com/site/articles/rhel-limits . Example

improvements (as of Red Hat Enterprise Linux 6.0 GA) are:

Huge pages and transpa rent huge pages

Non-Uniform Memory Access improvements

These improvements are outlined in g reater detail in the sections that follow.

Huge pages and transparent huge pages

The implementation of huge pagesin Red Hat Enterprise Linux 6 allows the system to manage memory

use efficiently across d ifferent memory worklo ads. Huge pages dynamically u tilize 2 MB pages

compared to the standard 4 KB page size, allowing app lications to scale well while processing

gigabytes an d even terabytes o f memory.

Huge pag es are difficul t to manu ally create, manage, and use. To address this, Red Hat Enterprise 6

also features the use of transparent huge pages(THP). THP automatically manages many of the

complexities invo lved in the use of huge pages.

For more information on huge pages and THP, refer to Section 5.2, Huge Pages and Transparent

Huge Pages .

NUMA improvement s

Many new systems no w support Non-Uniform Memory Access(NUMA). NUMA simplifies the desig n a nd

creation of ha rdware for large systems; however, it also a dds a layer of complexity to app lication

develop ment. For example, NUMA implements b oth local and remote memory, where remote memory

can take several times longer to access than loca l memory. This feature has performance

implications for operating systems and a pplications, an d should be configured carefully.

Red Hat Enterprise Linux 6 is better op timized for NUMA use, thanks to several additional features

that help mana ge users and applica tions o n NUMA systems. These features include CPU affinity,

CPU pinn ing (cpusets), numactl and control group s, which allow a p rocess (affinity) or applica tion

(pinning ) to "b ind" to a specific CPU or set of CPUs.

For more information about NUMA support in Red Hat Enterprise Linux 6, refer to Section 4.1.1, CPU

and NUMA Topology .

2.2. Ticket Spinlocks

A key part of any system design is ensuring that one process does not alter memory used by anotherprocess. Uncontrolled data change in memory can result in data corruption and system crashes. To

prevent this, the operating system allows a process to lock a piece of memory, perform an operation ,

then un lock or " free" the memory.


16
https://access.redhat.com/site/articles/rhel-limits


One common implementation of memory locking is through spin locks, which allow a process to keep

checking to see if a lock is availa ble and take the lock as soon a s it becomes available. If there are

multiple processes competing for the same lock, the first one to request the lock after it has been freed

gets it. When all processes have the same access to memory, this approach is " fair" a nd works qu ite

well.

Unfortunately, on a NUMA system, not all processes ha ve equal a ccess to the locks. Processes on

the same NUMA node as the lock have an unfair advan tage in obtaining the lock. Processes on

remote NUMA nodes experience lock starva tion and degraded performance.

To address this, Red Hat Enterprise Linux implemented ticket spinlocks. This feature adds a

reservation queue mechanism to the lock, al lowing allprocesses to take a lock in the order that they

requested it. This eliminates timing problems and unfair ad van tages in lock requests.

While a ticket spinlock has slightly more overhead than an ordinary spinlock, it scales better and

provides better performance on NUMA systems.

2.3. Dynamic List Structure

The operating system requires a set of information on each processor in the system. In Red Hat

Enterprise Linux 5, this set of information was a lloca ted to a fixed-size array in memory. Information

on each individu al processor was obtained by indexing into this array. This method was fast, easy,

and straightforward for systems that contained relatively few processors.

However, as the number of processors for a system grows, this method p rodu ces signi ficant

overhead. Because the fixed-size array in memory is a sing le, shared resource, it can become a

bottleneck as more processors attempt to access it at the same time.

To address this, Red Hat Enterprise Linux 6 uses a dynamic list structurefor processor information.

This al lows the array used for processor information to be allocated dyna mically: if there are only

eight processors in the system, then only eight entries are created in the list. If there are 2048

processors, then 2048 entries a re created as well.

A dynamic list structure allows more fine-grain ed lockin g. For example, if information needs to b e

updated a t the same time for processors 6, 72, 183, 657, 931 and 1546, this can be done with greater

parallelism. Situations like this obviously occur much more frequently on large, high-performance

systems than on small systems.

2.4. Tickless Kernel

In previous versions of Red Hat Enterprise Linux, the kernel used a timer-based mechanism thatcontinuously p rodu ced a system interrupt. Du ring each interrupt, the systempolled; that is, it checked

to see if there was work to be done.

Dependin g on the setting, this system interrupt or timer tickcou ld occur several hu ndred or several

thousand times per second. This ha ppened every second, regardless of the system's worklo ad. On a

ligh tly load ed system, this impactspower consumptionby preventing the processor from effectively

using sleep states. The system uses the least power when it is in a sleep state.

The most power-efficient way for a system to op erate is to do work as quickly as possib le, go into the

deepest sleep state possible, and sleep a s long as p ossib le. To implement this, Red Hat Enterprise

Linux 6 uses a tickless kernel. With this, the interrupt timer has been removed from the idle loop,transforming Red Hat Enterprise Linux 6 into a completely in terrupt-driven environment.

The tickless kernel allows the system to go into deep sleep states du ring idle times, and respon d

quickly when there is work to be don e.

Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures

17


For further information, refer to the Power Management Guide, available from


2.5. Control Groups

Red Hat Enterprise Linux p rovides many useful options for performance tuning . Large systems,

scaling to hundreds of processors, can b e tuned to deliver superb performance. But tuning these

systems requ ires considerable expertise and a well-defined workload. When large systems were

expensive and few in number, it was acceptable to g ive them special treatment. Now that these

systems are mainstream, more effective tools are needed.

To further complica te things, more powerful systems are being u sed now for service consolid ation .

Workloads that may h ave been run ning on four to eight older servers are now placed in to a sin gle

server. And as d iscussed earlier in Section 1.2.2.1, Para llel Computing , many mid-range systems

nowadays contain more cores than yesterday's high-performance machines.

Many modern ap plications are designed for parallel processing, using multiple threads or p rocesses

to improve performance. However, few applications can make effective use of more than eight

threads. Thus, multiple applica tions typica lly need to be installed on a 32 -CPU system to maximizecapacity.

Consider the situation : small, inexpensive mainstream systems are no w at parity with the performance

of yesterday's expensive, high-performance mach ines. Cheaper high-performance mach ines gave

system architects the ability to conso lida te more services to fewer machin es.

However, some resources (such as I/O and network communication s) are shared, and do not grow

as fast as CPU count. As such , a system housing multiple app lications can experience degraded

overall p erformance when one ap plication h ogs too much o f a sing le resource.

To ad dress this, Red Hat Enterprise Linux 6 n ow supports control groups(cgroups). Cgroups allow

administrators to allocate resources to specific tasks as needed. This means, for example, being a ble

to alloca te 80% of four CPUs, 60GB of memory, and 40% of disk I/O to a database application. A

web a pp lica tion running o n the same system could be given two CPUs, 2GB of memory, and 50% of

availab le network band width.

As a result, both da tabase and web applica tions deliver goo d performance, as the system prevents

both from excessively consuming system resources. In addi tion, many aspects of cgroup s are self-

tuning, allowing the system to respond acco rdingly to changes in workload.

A cgroup has two major components:

A list of tasks assigned to the cgroup

Resources alloca ted to those tasks

Tasks assigned to the cgroup run withinthe cgroup. Any child tasks they spawn a lso run within the

cgroup. This a llows an a dministrator to manage an entire application a s a single unit. An

administrator can also configure allocations for the following resources:

CPUsets

Memory

I/O

Network (bandwidth)

Within CPUsets, cgroups a llow administrators to configure the number of CPUs, affinity for specific


18


CPUs or nodes , and the amoun t of CPU time used by a set of tasks. Using cgroups to configure

CPUsets is vital for ensuring good overall performance, preventing an application from consuming

excessive resources at the cost of other tasks while simultaneously ensuring tha t the app lication is

not starved for CPU time.

I/O bandwidth an d network band width are managed by other resource controllers. Aga in, the

resource controllers allow you to determine how much bandwidth the tasks in a cgroup can

consume, and ensure that the tasks in a cgroup n either consu me excessive resources nor are

starved of resources.

Cgroup s allo w the administrator to define and a llocate, at a high level, the system resources that

various applications need (and will) consume. The system then automatically manages and

balances the various a pplica tions, delivering g ood predictable performance and optimizing the

performance of the overall system.

For more information on how to use control g roups, refer to the Resource Management Guide,

available from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

2.6. Storage and File Syst em Improvements

Red Hat Enterprise Linux 6 also features several improvements to storage and file system

management. Two of the most no table advances in this version are ext4 and XFS sup port. For more

comprehensive coverage of performance improvements relating to storage and file systems, refer to

Chapter 7, File Systems.

Ext4

Ext4 is the default file system for Red Hat Enterprise Linux 6. It is the fourth generation version of the

EXT file system family, supporting a theoretical maximum file system size of 1 exabyte, and single file

maximum size of 16TB. Red Hat Enterprise Linux 6 supports a maximum file system size of 16TB, anda single file maximum size of 16TB. Other than a much larger storage capacity, ext4 also inclu des

several n ew features, such as:

Extent-based metadata

Delayed a llocation

Journal check-summing

For more in formation abou t the ext4 file system, refer to Section 7.3.1, The Ext4 File System.

XFS

XFS is a robu st and mature 64-bit journalin g file system that supports very large files and file systems

on a sing le host. This file system was o riginally developed by SGI, and has a long history of running

on extremely large servers and storage arrays. XFS features include:

Delayed a llocation

Dyna mically-allocated inodes

B-tree indexing for scala bili ty of free space management

Online defragmentation and file system growing

Sophisticated metadata read-ahead algorithms

[3]

Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures

19


While XFS scales to exabytes, the maximum XFS file system size supported by Red Hat is 100TB. For

more information about XFS, refer to Section 7.3.2, The XFS File System .

Large Bo ot Drives

Tradition al BIOS supports a maximum disk size of 2.2TB. Red Hat Enterprise Linux 6 systems using

BIOS can suppo rt disks larger than 2.2TB by using a new disk structure called Global Partition Table

(GPT). GPT can on ly be used for da ta disks; it cannot be used for boot drives with BIOS; therefore,

boot drives can o nly b e a maximum of 2.2TB in size. The BIOS was originally created for the IBM PC;

while BIOS has evolved considerably to adapt to modern hardware, Unified Extensible Firmware

Interface(UEFI) is designed to support new and emerging hardware.

Red Hat Enterprise Linux 6 also supports UEFI, which can be used to replace BIOS (still sup ported).

Systems with UEFI runnin g Red Hat Enterprise Linux 6 a llow the use of GPT and 2 .2TB (and la rger)

partitions for both boot partition and d ata partition.

Important UEFI for 32-bit x86 systems

Red Hat Enterprise Linux 6 does not support UEFI for 32-bi t x86 systems.

Important UEFI for AMD64 and Intel 64

Note that the boot configu rations of UEFI and BIOS differ sign ificantly from each other.

Therefore, the installed system must boot using the same firmware that was used du ring

installation . You cann ot install the operating system on a system that uses BIOS and then

boot this instal lation on a system that uses UEFI.

Red Hat Enterprise Linux 6 supports version 2.2 o f the UEFI specification. Hardware that

supports version 2.3 of the UEFI specification o r later shou ld bo ot and operate with Red Hat

Enterprise Linux 6, but the add itiona l functionality defined by these later specifications will not

be ava ilab le. The UEFI specifications a re available from http://www.uefi.org/specs/agreement/.

[3] A node is generally defined as a set of CPUs o r co res within a soc ket.


20
http://www.uefi.org/specs/agreement/


Chapter 3. Monitoring and Analyzing System Performance

This chapter briefly introduces tools that can be used to monitor and ana lyze system and app lication

performance, and poin ts out the situation s in which each tool is most useful. The data collected by

each too l can reveal bo ttlenecks or other system problems that con tribute to less-than-optimal

performance.

3.1. The proc File Syst em

The proc" file system" is a directory that contains a hierarchy o f files that represent the current state

of the Linux kernel. It allows applica tions and u sers to see the kernel's v iew of the system.

The procdirectory also contains information abou t the hardware of the system, and an y currently

runn ing processes. Most of these files are read-only, but some files (primarily those in /proc/sys)

can be manipulated b y users and app lications to communicate configuration changes to the kernel.

For further information about viewing and editing files in the procdi rectory, refer to the Deployment

Guide, availa ble from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

3.2. GNOME and KDE System Monitors

The GNOME and KDE desktop environments both have graphica l tools to ass ist you in monitoring

and modifying the behavior of your system.

GNO ME System Monito r

The GNO ME System Monito rdisp lays basic system information and allows you to monitor system

processes, and resource or file system usage. Open it with the gnome-system-monitorcommandin the Terminal, or click on the Applicationsmenu, and select System Too ls > System

Monitor.

GNO ME System Monito rhas four tabs:

System

Displays basic in formation a bou t the computer's hardware and software.

Processes

Shows active processes, and the relation ships between those processes, as well as

detailed information abou t each process. It also lets you filter the processes disp layed, and

perform certain actions on those processes (start, stop, kill, cha nge priority, etc.).

Resources

Displays the current CPU time usage, memory and swap space usag e, and network usage.

File Systems

Lists all moun ted file systems alongside some basic information about each , such as the

file system type, mount poin t, and memory usa ge.

For further information about the GNO ME System Moni to r, refer to the Help menu in the

app lication, or to the Deployment Guide, availab le from


Chapt er 3. Monitoring and Analyzing Syst em Performance

21
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/


KDE System Gu ard

The KDE System Gu ard allows you to monitor current system load and processes that are running .

It also lets you perform actions on processes. Open it with the ksysguard command in the

Terminal, or click on the Kickoff Application Launcherand select Applications> System>

System Monit or .

There are two tabs to KDE System Gu ard :

Process Tabl e

Displays a list of all running processes, alphabetically by default. You can also sort

processes by a nu mber of other properties, inclu ding total CPU usage, physica l or sha red

memory usag e, owner, and p riority. You can also filter the visible results, search for specific

processes, or perform certain actions on a process.

System Load

Displays historical g raphs of CPU usage, memory and swap space usage, and n etwork

usage. Hover over the graphs for detailed ana lysis and graph keys.

For further information about the KDE System Gu ard , refer to the Help menu in the app lication.

3.3. Performance Co-Pilot (PCP)

Performance Co-Pilot (PCP) provides tools and infrastructure for monitoring, analysing, and

respond ing to details of system performance. PCP has a fully d istributed architecture, which means it

can b e run on a sing le host, or across any number of hosts, depending on yo ur needs. PCP is also

designed with a plug-in architecture, making it useful for monitoring and tuning a wide variety of

subsystems.

To install PCP, run:

# yum install pcp

Red Hat also recommends thepcp-guipa ckage, which provides the ability to create visualizations of

collected da ta, and thepcp-docpackage, which installs detailed PCP documentation to the

/usr/share/doc/pcp-docdirectory.

3.3.1. PCP Architecture

A PCP deployment is comprised of both co llector and mon itor systems. A sing le host can b e both a

collector an d a monitor, or collectors and monitors can be distributed across a number of hosts.

Collector

A collector system collects performance da ta from one or more domains, an d stores it for

analysis. Collectors have a Performance Metrics Collector Daemon (pmcd ), which passes

requests for performance data to an d from the appropria te Performance Metrics Do main

Agent, and one or more Performance Metrics Domain Agents (PMDA), which are

responsible for respond ing to requests about their domain (a database, a server, an

applica tion, or similar). PMDAs are controlled by the pmcd running on the same collector.

Monitor

A monitor system uses monitoring too ls like pmieor pmreportto display and analyse data

from local or remote collectors.


22


3.3.2. PCP Setup

A collector requires a runn ing Performance Metrics Collector Daemon (pmcd ) and o ne or more

Performance Metrics Domain Agents (PMDAs).

Procedure 3.1. Settin g up a collector

1. Install PCP

Run the follo wing command to install PCP on this system.

# yum install pcp

2. Start pmcd

# service pmcd start

3. Opt ionally, confi gure addition al PMDAs

The kernel, pmcd, per-process, memory-mapped valu es, XFS, and JBD2 PMDAs are installed

and con figured in /etc/pcp/pmcd/pmcd.confby default.

To con figure an ad ditional PMDA, change into its directory (for example,

/var/lib/pcp/pmdas/pmdaname) directory, and run the Install script. For example, to

install the PMDA for proc:

# cd /var/lib/pcp/pmdas/proc

# ./Install

Then follow the prompts to configure the PMDA for a collector system, or a monitor and

collector system.

4. Opt ionally, listen f or remote monitors

To respond to remote monitor systems, pmcd must be able to listen for remote monitors

throug h po rt 44321. Execute the following to open the appropria te port:

# iptables -I INPUT -p tcp -dport 44321 -j ACCEPT

# iptables save

You will need to ensure that no o ther firewall rules blo ck access through this port.

A monitor requires thatpcpis instal led and ab le to connect to at least one pmcd instance, remote or

local. If your collector and your monitor are both on a single machine, and yo u ha ve followed the

instructions in Procedure 3.1, Setting up a collector , there is no further configuration and you can

go a head and use the monitoring tools p rovided by PCP.

Procedure 3.2. Setting up a remote monitor

1. Install PCP

Run the following command to in stall PCP on the remote monitor system.

# yum install pcp


23


2. Connect t o remote collectors

To connect to remote collector systems, PCP must be able to contact remote collectors

throug h po rt 44321. Execute the following to open the appropria te port:

# iptables -I INPUT -p tcp -dport 44321 -j ACCEPT

# iptables save

You will a lso need to ensure that no other firewall rules block access throu gh this p ort.

You can now test that you can connect to the remote collector by run ning a p erformance monitoring

tool with the -hop tion to sp ecify the IP address of the collector you want to con nect to, for example:

# pminfo -h 192.168.122.30

3.4. irqbalance

irqbalanceis a command lin e tool that distribu tes hardware interrupts across processors toimprove system performance. It runs as a daemon by defau lt, but can be run o nce on ly with the --

oneshotoption.

The following parameters are useful for improving performance.

--powerthresh

Sets the number of CPUs that can id le before a CPU is p laced into powersave mode. If more

CPUs than the threshold are more than 1 standard d eviation below the average softirq

workload and no CPUs are more than one standard deviation ab ove the average, and

have more than o ne irq assign ed to them, a CPU is placed into p owersave mode. In

powersave mode, a CPU is not part of irq balancing so that it is not woken unnecessarily .

--hintpolicy

Determines how irq kernel affinity hinting is handled. Valid valu es are exact(irq a ffinity

hint is always ap plied), subset(irq is ba lanced, but the assign ed object is a subset of the

affinity hint), or ignore(irq affinity hin t is igno red completely).

--policyscript

Defines the location of a script to execute for each interrupt request, with the device path

and irq number passed as arguments, and a zero exit code expected by irqbalance. The

script defined can sp ecify zero or more key value pai rs to guide irqbalancein managing

the passed irq.

The following are recognized as valid key value pairs.

ban

Valid values are true(exclude the passed irq from balancing) or false(perform

balancing o n this irq).

balance_level

Allows user override of the balance level of the passed irq. By default the bala nce

level is based on the PCI device class of the device that owns the irq. Valid va lues

are none, package, cache, or core.


24


numa_node

Allows user override of the NUMA node that is considered local to the passed irq .

If information about the local node is not specified in ACPI, devices are

considered equ idis tant from all nodes. Valid values are integers (starting from 0)

that identify a specific NUMA node, and -1, which sp ecifies that an irq should be

considered equidistant from all nodes.

--banirq

The interrupt with the specified interrupt request number is a dded to the list of banned

interrupts.

You can a lso use the IRQBALANCE_BANNED_CPUSenvironment variable to specify a mask o f CPUs

that are ignored b y irqbalance.

For further details, see the man pag e:

$ man irqbalance

3.5. Built-in Command-line Monitoring Tools

In addition to graphical monitoring tools, Red Hat Enterprise Linux provides several tools that can be

used to moni tor a system from the command lin e. The advantage of these tools is tha t they can be

used ou tside run level 5. This section discu sses each tool briefly, and suggests the purposes to

which each tool is best suited.

top

The top tool p rovides a dyna mic, real-time view of the processes in a runn ing system. It can disp laya variety of information, includ ing a system summary and the tasks currently being manag ed by the

Linux kernel. It also has a limited ab ility to manipu late processes. Both its operation an d the

information it displays are highly configurable, and an y configuration details can be made to persist

across restarts.

By defau lt, the processes shown are ordered by the percentage of CPU usage, giving an easy view

into the processes that are consuming the most resources.

For detailed information a bout using top , refer to its man page: man top.

ps

The pstool takes a snapshot of a select group of active processes. By default this group is limited to

processes owned by the current user and a ssocia ted with the same termina l.

It can provide more detailed in formation abo ut processes than top , but is not dynamic.

For detailed information a bout using ps, refer to i ts man page: man ps.

vmstat

vmstat (Virtual Memory Statistics) ou tputs instan taneous reports about your system's p rocesses,

memory, paging, b lock I/O, interrupts an d CPU activity.

Although it is not dyn amic like top , you can specify a sampling interval, which lets you observe

system activity in near-real time.


25


For detailed information a bout using vmstat , refer to its man page: man vmstat.

sar

sar(System Activity Reporter) collects and reports information about today's system activity so far.

The defau lt output covers today's CPU utilization a t ten minu te intervals from the beginning o f the

day:

12:00:01 AM CPU %user %nice %system %iowait %steal

%idle

12:10:01 AM all 0.10 0.00 0.15 2.96 0.00

96.79

12:20:01 AM all 0.09 0.00 0.13 3.16 0.00

96.61

12:30:01 AM all 0.09 0.00 0.14 2.11 0.00

97.66

...

This too l is a useful alternative to a ttempting to create periodic reports on system activity throu gh topor similar tools.

For detailed information a bout using sar, refer to i ts man page: man sar.

3.6. Tuned and ktune

Tuned is a daemon tha t monitors and collects data on the usage of various system compon ents,

and uses that information to dynamically tun e system settings as required. It can react to chan ges in

CPU and network use, and adjust settings to improve performance in active devices or reduce power

consumption in inactive devices.

The accompanying ktunepa rtners with the tuned-admtool to p rovide a number of tuning profiles

that are pre-configured to enha nce performance and reduce power consumption in a nu mber of

specific use ca ses. Edit these profiles or create new profiles to create performance so lution s tailo red

to your environment.

The profiles provided as part of tuned-adminclude:

default

The defaul t power-savin g profile. This is the most basic power-savin g profile. It enables

on ly the disk and CPU plug-ins. Note that this is not the same as turnin g tuned-admoff,where bo th tuned and ktuneare disabled.

latency-performance

A server profile for typica l latency performance tun ing. This profile disables dynamic tuning

mechanisms and transp arent hugepages. It uses the performancegoverner for p-states

through cpuspeed , and sets the I/O schedu ler to deadline. Add itiona lly, in Red Hat

Enterprise Linux 6.5 a nd later, the profile requests a cpu_dma_latencyvalue of 1. In Red

Hat Enterprise Linux 6.4 and earlier, cpu_dma_latency requested a value of 0 .

throughput-performance

A server profile for typica l throughput performance tuning. This p rofile is recommended if

the system does not ha ve enterprise-class s torage. throughput-performance disa bles power

saving mechanisms and enables the deadlineI/O scheduler. The CPU governor is set to

performance. kernel.sched_min_granularity_ns(scheduler minimal preemption


26


gran ularity) is set to 10 milliseconds, kernel.sched_wakeup_granularity_ns

(scheduler wake-up gran ula rity) is set to 15milliseconds, vm.dirty_ratio(virtual

memory dirty ratio) is set to 40%, and transp arent huge pages are enabled.

enterprise-storage

This profile is recommended for enterprise-sized server configurations with enterprise-class

storage, including battery-backed controller cache protection and management of on-disk

cache. It is the same as the throughput-performance profile, with one addition: filesystems are re-mounted with barrier=0 .

virtual-guest

This profile is optimized for virtual machines. It is based on the enterprise-storage

profile, but also d ecreases the swappiness of virtual memory. This p rofile is availab le in

Red Hat Enterprise Linux 6 .3 and later.

virtual-host

Based on the enterprise-storagep rofile, virtual-hostdecreases the swappin ess of

virtual memory and enables more aggressive writeback of dirty pag es. Non-roo t and non-boot file systems are mounted with barrier=0 . Add itiona lly, as of Red Hat Enterprise Linux

6.5, the kernel.sched_migration_costparameter is set to 5milliseconds. Prior to Red

Hat Enterprise Linux 6.5, kernel.sched_migration_costused the default value of 0.5

milliseconds. This p rofile is availa ble in Red Hat Enterprise Linux 6.3 and later.

Refer to the Red Hat Enterprise Linu x 6 Power Management Guide, availab le from

http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , for further information about

tuned and ktune.

3.7. Applicat ion Profilers

Profiling is the process of gathering information about a program's behavior as it executes. You

profile an appl ication to determine which areas of a program can b e optimized to increase the

prog ram's overall sp eed, reduce its memory usag e, etc. App lication profiling too ls help to simplify

this process.

There are three supported profiling too ls for use with Red Hat Enterprise Linux 6 : SystemTap ,

OProfile and Valgrind . Documenting these profilin g tools is o utside the scope of this guide;

however, this section does prov ide links to further information and a brief overview of the tasks for

which each profiler is su itable.

3.7.1. SystemT ap

SystemTap is a tracing and probing tool that lets users monitor and ana lyze operating system

activities (particularly kernel activities) in fine detail. It provides information s imilar to the output of

tools like netstat, top , psand iostat , but includes additional filtering and an alysis options for the

information tha t is collected.

SystemTap p rovides a deeper, more precise analysis of system activities and a pp lication behavior to

allow you to pinpo int system and app lication bo ttlenecks.

The Function Callgraph plug-in for Eclipse uses SystemTap a s a back-end, allowing it to thoroughlymonitor the status of a prog ram, including fun ction ca lls, returns, times, and user-space variables,

and display the information visually for easy op timization.


27


The Red Hat Enterprise Linux 7 SystemTap Beginner's Guideinclu des several sample scripts that are

useful for profiling an d monitorin g performance. By default they are installed to the

/usr/share/doc/systemtap-client-version/examplesdirectory.

Network monito ring scripts (in examples/network)

nettop.stp

Every 5 seconds, prin ts a list of processes (process identifier and command) with thenumber of packets sent and received and the amount of da ta sent and received by the

process during that interval.

socket-trace.stp

Instruments each o f the functions in the Linux kernel's net/socket.cfile, and p rints trace

data.

tcp_connections.stp

Prints information for each new incoming TCP connection accepted by the system. The

information includes the UID, the command a ccepting the connection, the process identifier

of the command, the port the conn ection is on , and the IP address of the origin ator of the

request.

dropwatch.stp

Every 5 second s, prints the nu mber of socket buffers freed at locations in the kernel. Use the

--all-modulesop tion to see symbol ic na mes.

Storage monito ring scripts (in examples/io )

disktop.stp

Checks the status of read ing/writing d isk every 5 second s and outputs the top ten entries

during that period.

iotime.stp

Prints the amount of time spent on read a nd write operations, an d the number of bytes read

and written.

traceio.stp

Prints the top ten executab les based o n cumulative I/O traffic observed, every second .

traceio2.stp

Prints the executab le name and process id entifier as reads and writes to the specified

device occur.

inodewatch.stp

Prints the executab le name and process id entifier each time a read o r write occu rs to the

specified inode on the specified major/mino r device.

inodewatch2.stp

Prints the executab le name, process identifier, and attributes each time the attribu tes a re

changed on the specified inode on the specified major/mino r device.


28


The latencytap.stp scrip t records the effect that d ifferent types o f latency have on one or more

processes. It prin ts a list of la tency types every 30 seconds, sorted in descending order by the total

time the process or processes spent waiting . This can be useful for id entifying the cause of bo th

storage and network la tency. Red Hat recommends using the --all-modulesoption with this

script to better enab le the mapping o f latency events. By defau lt, this script is installed to the

/usr/share/doc/systemtap-client-version/examples/profilingdirectory.

For further information about SystemTap, refer to the SystemTap Beginners Guide, available from


3.7.2. OProfile

OProfile (oprofile ) is a system-wide performance monitoring too l. It uses the processor's d edicated

performance monitoring h ardware to retrieve information about the kernel and system executables,

such as when memory is referenced, the number of L2 cache requests, and the number of hardware

interrupts received. It can a lso be used to determine processor usage, and which ap plica tions and

services are used most.

OProfile can also be used with Eclipse via the Eclipse OProfile plug-in. This plug-in a llows users to

easily determine the most time-consuming areas o f their code, and perform all command-line

functions of OProfile with rich visua lization of the results.

However, users should be aware of several OProfile limitations:

Performance monitoring sa mples may not be precise - because the processor may execute

instructions ou t of order, a sample may be recorded from a nearby instruction , instead o f the

instruction that triggered the in terrupt.

Because OProfile is system-wide and expects processes to start and stop multiple times, samples

from multiple runs are allowed to accumulate. This means you may need to clear sa mple data

from previous runs.

It focuses on identifying prob lems with CPU-limited p rocesses, and therefore does not identify

processes that are sleeping whi le they wait on locks for other events.

For further information about using OProfile, refer to the Deployment Guide, availab le from

http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or to the oprofile

documentation on your system, located in /usr/share/doc/oprofile-.

3.7.3. Valgrind

Valg rind p rovides a number of detection a nd p rofiling tools to help improve the performance andcorrectness of your app lication s. These tools can detect memory and thread-related errors as well as

heap, stack an d array o verruns, allowing you to easily loca te and correct errors in your a pplication

code. They can also pro file the cache, the heap, and branch-prediction to identify factors that may

increase application speed and minimize application memory use.

Valgrind ana lyzes your a pplication by running it on a synthetic CPU and instrumenting the existing

applica tion code as it is executed. It then prints "co mmentary" clearly identifying each process

invo lved in applica tion execution to a user-specified file descriptor, file, or network so cket. The level

of instrumentation varies depending on the Valg rind too l in use, and its settings, but it is important to

note that executing the instrumented code can take 4-50 times lon ger than n ormal execution .

Valgrind can be used on your application as-is, without recompiling. However, because Valgrind

uses debugging information to pinpoint issues in your code, if your ap plication an d support libraries

were no t compi led with debugging in formation ena bled, recompi ling to include this information is

highly recommended.


29
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/


As of Red Hat Enterprise Linux 6.4, Valgrind integrates with gdb(GNU Project Debug ger) to improve

debugging efficiency.

More information about Valgrind is ava ilab le from the Developer Guide, available from

http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or by using the man

valgrind command when the valgrindpacka ge is installed. Accompanying documentation can also

be found in:

/usr/share/doc/valgrind-/valgrind_manual.pdf

/usr/share/doc/valgrind-/html/index.html

For in formation about how Valgrind can be used to profile system memory, refer to Section 5.3,

Using Valgrin d to Pro file Memory Usage.

3.7.4. Perf

The perf tool p rovides a number of useful performance counters that let the user assess the impact

of other commands on their system:

perf stat

This command prov ides overall statistics for common performance events, inclu ding

instructions executed and clock cycles consumed. You can use the option flags to gather

statistics on events other than the default measurement events. As of Red Hat Enterprise

Linux 6 .4, it is possible to use perf statto filter monitoring b ased on one or more

specified con trol groups (cgroups). For further information, read the man page: man perf-

stat.

perf record

This command records performance data into a file which can be later analyzed usingperf report. For further details , read the man page: man perf-record .

As of Red Hat Enterprise Linux 6.6, the -band -jop tions are provided to allo w statistical

sampling of taken branches. The -boption samples any b ranch es taken, while the

Red Hat Enterprise Linux-6-Performance Tuning Guide-En-US

Documents

Transcript of Red Hat Enterprise Linux-6-Performance Tuning Guide-En-US