OEL - RHEL - Performance Tuning

download OEL - RHEL - Performance Tuning

of 93

Transcript of OEL - RHEL - Performance Tuning

  • 8/21/2019 OEL - RHEL - Performance Tuning

    1/93

    Red Hat Subject Matter Experts

    Red Hat Enterprise Linux 6

    Performance Tuning Guide

    Opt imizing subsystem throughput in Red Hat Enterprise Linux 6

    Edit ion 4.0

  • 8/21/2019 OEL - RHEL - Performance Tuning

    2/93

  • 8/21/2019 OEL - RHEL - Performance Tuning

    3/93

    Red Hat Enterprise Linux 6 Performance Tuning Guide

    Opt imizing subsystem throughput in Red Hat Enterprise Linux 6

    Edit ion 4.0

    Red Hat Subject Matter Experts

    Edited by

    Don Domingo

    Laura Bailey

  • 8/21/2019 OEL - RHEL - Performance Tuning

    4/93

    Legal Notice

    Copyright 2011 Red Hat, Inc. and others.

    This document is licensed by Red Hat under the Creative Commo ns Attribution-ShareAlike 3.0Unported License. If you dis tribute this do cument, or a modified versio n of it, you must provideattribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all RedHat trademarks must be removed.

    Red Hat, as the licenso r of this document, waives the right to enforce, and agrees no t to assert,Section 4d o f CC-BY-SA to the fullest extent permitted by applicable law.

    Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the InfinityLogo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and o thercountries.

    Linux is the registered trademark o f Linus Torvalds in the United States and o ther countries.

    Java is a regis tered trademark o f Oracle and/or its affiliates.

    XFS is a trademark of Silicon Graphics International Co rp. or its subsidiaries in the UnitedStates and/or o ther countries.

    MySQL is a registered trademark o f MySQL AB in the United States, the European Unio n andother countries.

    Node.js is an o fficial trademark of Joyent. Red Hat Software Collections is not formallyrelated to o r endorsed by the official Joyent Node.js open so urce o r commercial project.

    The OpenStack Word Mark and OpenStack Logo are either registered trademarks/servicemarks or trademarks/service marks of the OpenStack Foundation, in the United States and o ther

    countries and are used with the OpenStack Foundation's permiss ion. We are not affiliated with,endorsed or sponso red by the OpenStack Foundation, or the OpenStack community.

    All other trademarks are the property of their respective owners.

    Abstract

    The Performance Tuning Guide describes ho w to o ptimize the performance of a systemrunning Red Hat Enterprise Linux 6 . It also documents performance-related upgrades in RedHat Enterprise Linux 6 . While this guide contains procedures that are field- tested and proven,

    Red Hat recommends that you properly test all planned configurations in a testing environmentbefore applying it to a production environment. You sho uld also back up all your data and pre-tuning configurations.

    http://creativecommons.org/licenses/by-sa/3.0/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    5/93

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table of Contents

    Preface

    1. Do cumentConventions

    1.1. T ypographic Conventions

    1.2. Pull-quo te Conventions

    1.3. No tes and Warning s

    2. Getting Help andGiving Feedb ack

    2.1. Do You Need Help?

    2.2. We Need Feedback!

    Chapt er 1. O verview

    1.1. Ho wto read this bo ok

    1.1.1. Audience

    1.2. Release overview

    1.2.1. Summary of features

    1.2.1.1. New features i n Red H at Enterpri se Linux 6

    1.2.1.1.1. New in 6 .5

    1.2.2. Horizontal Scalability1.2.2.1. Parallel Co mputing

    1.2.3. Distrib uted Systems

    1.2.3.1. Communication

    1.2.3.2. Storage

    1.2.3.3. Converged Networks

    Chapt er 2. Red Hat Ent erpriseLinux 6 Performance Features

    2.1. 6 4-Bit Suppo rt

    2.2. Tic ket Spi nloc ks

    2.3. Dynamic Lis t Structure

    2.4. Tic kless Kernel

    2.5. Co ntrol Gro ups

    2.6. Storag e and File System Impro vements

    Chapt er 3. Monitoring and Analyzing System Performance

    3.1. The proc File System

    3.2. GNOME and KDE System Monito rs

    3.3. Built-in Command-line Monitori ng Too ls

    3.4. irqb alance

    3.5. Tuned and ktune

    3.6. App lication Profilers3.6.1. SystemTap

    3.6.2. OProfi le

    3.6.3. Valgrind

    3.6.4. Perf

    3.7. Red Hat Enterpris e MRG

    Chapter 4. CPU

    Topo logy

    Threads

    Interrupts

    4.1. CPU Topo logy4.1.1. CPU and NUMA Top olo gy

    4.1.2. Tuning CPU Performance

    4.1.2.1. Setting CPU Affinity with taskset

    4

    4

    4

    5

    6

    6

    6

    7

    8

    8

    8

    9

    9

    9

    9

    1011

    11

    12

    13

    14

    16

    16

    16

    17

    17

    18

    19

    21

    21

    21

    22

    23

    24

    2525

    26

    26

    27

    27

    29

    29

    29

    29

    3030

    31

    33

    T able of Cont ents

    1

  • 8/21/2019 OEL - RHEL - Performance Tuning

    6/93

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . .

    4.1.3. Hard ware p erformance po licy (x86_energy_perf_po licy)

    4.1.4. turbostat

    4.1.5. numastat

    4.1.6. N UMA Affinity Manag ement Daemon (numad)

    4.1.6 .1. Benefits o f numad

    4.1.6.2. Mod es o f op eratio n

    4.1.6.2.1. Using numad as a service

    4.1.6.2.2. Using numad as an executable

    4.2. CPU Sched uling

    4.2.1. Realtime scheduling po licies

    4.2.2. Normal scheduling p ol icies

    4.2.3. Po lic y Selectio n

    4.3. Interrup ts and IRQ Tuning

    4.4. Enhancements to N UMA in Red Hat Enterp rise Linux 6

    4.4.1. Bare-metal and Scalab ility Op timizations

    4.4.1.1. Enhancements in topology-awareness

    4.4.1.2. Enhancements in Multi-processor Synchronization

    4.4.2. Virtualization Optimizations

    Chapter 5. Memory

    5.1. Huge Translation Lookaside Buffer (HugeTLB)

    5.2. Huge Pages and Transparent Huge Pages

    5.3. Using Valgrind to Profile Memory Usage

    5.3.1. Profili ng Memo ry Usag e with Memcheck

    5.3.2. Profiling Cache Usage with Cachegrind

    5.3.3. Pro filing Heap and Stack Space with Mass if

    5.4. Capacity Tuning

    5.5. Tuning Vir tual Memory

    Chapter 6. Input/Out put

    6.1. Features

    6 .2. Analysis

    6.3. Too ls

    6.4. Configuration

    6 .4.1. Comp letely Fair Q ueuing (CFQ)

    6.4.2. Deadline I/O Scheduler

    6.4.3. Noo p

    Chapt er 7. File Systems

    7.1. Tuning Considerations for File Systems

    7.1.1. Formatting Op tions

    7.1.2. Mount Op tions

    7.1.3. File sys tem maintenance

    7.1.4. App lication Consi derations

    7.2. Profiles for file system performance

    7.3. File Systems

    7.3.1. The Ext4 File System

    7.3.2. The XFS File System

    7.3.2.1. Basic tuning for XFS

    7.3.2.2. Ad vanced tuning fo r XFS

    7.3.2.2.1. Optimizing for a large number of files

    7.3.2.2.2. Optimizing for a large number o f files in a sing le directo ry

    7.3.2.2.3. Op timising for co ncurrency

    35

    35

    36

    38

    39

    39

    39

    40

    40

    41

    41

    42

    42

    43

    43

    43

    44

    44

    46

    46

    46

    47

    47

    48

    49

    50

    53

    56

    56

    56

    58

    6 2

    6 2

    6 4

    6 5

    67

    6 7

    6 7

    6 7

    6 9

    6 9

    6 9

    70

    70

    71

    71

    71

    72

    72

    72

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    2

  • 8/21/2019 OEL - RHEL - Performance Tuning

    7/93

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . .

    7.3.2.2.5. Op timising for sustained metadata mod ifications

    7.4. Clustering

    7.4.1. Glo b al File System 2

    Chapt er 8. Net working

    8.1. Network Performance Enhancements

    Receive Packet Steering (RPS)

    Receive Flow St eering

    get sockopt support for T CP thin- streams

    T ransparent Proxy (TProxy) suppo rt

    8.2. Op timized Network Settings

    Socket receive buffer size

    8 .3. Overview o f Packet Receptio n

    CPU/cache aff inity8.4. Resolving Co mmon Q ueuing/Frame Loss Issues

    8 .4.1. NIC Hard ware Buffer

    8 .4.2. Socket Queue

    8.5. Multicast Consi derations

    8.6. Receive-Side Scaling (RSS)

    8 .7. Receive Packet Steering (RPS)

    8 .8. Receive Flow Steering (RFS)

    8.9 . Accelerated RFS

    Revision Hist ory

    73

    74

    74

    77

    77

    77

    77

    78

    78

    78

    80

    8 0

    818 1

    8 1

    8 2

    8 3

    8 3

    8 4

    8 5

    8 6

    87

    T able of Cont ents

    3

  • 8/21/2019 OEL - RHEL - Performance Tuning

    8/93

    Preface

    1. Document Conventions

    This manual uses several con ventions to h ighlight certain words an d phrases and draw attention to

    specific pieces of information.

    1.1. T ypographic Conventions

    Four typographic conventions are used to ca ll attention to specific words and phrases. These

    conventions, and the circumstances they apply to, are as follows.

    Mono-spaced Bold

    Used to hig hlight system input, includ ing shell command s, file names and paths. Also used to

    high light keys and key combinations. For example:

    To see the contents of the file my_next_bestselling_novelin you r currentworking d irectory, enter th e cat my_next_bestselling_novelcommand at the

    shell prompt and press Enterto execute the command.

    The above includes a file name, a shell command and a key, all presented in mono-spaced bold and

    all distingu ishab le thanks to context.

    Key combina tions can be distinguished from an in dividua l key by the plus sign that connects each

    part of a key combination . For example:

    Press Enterto execute the command.

    Press Ctrl+Alt+F2to switch to a virtual termina l.

    The first example high ligh ts a particula r key to press. The second example highligh ts a key

    combination : a set of three keys pressed simultaneously.

    If sou rce code is discussed, class na mes, methods, functions, variab le names and returned va lues

    mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:

    File-related classes include filesystemfor file systems, filefor files, and dirfordirectories. Each class has its own asso ciated set of permissions.

    Proportional Bold

    This denotes words or phrases encountered o n a system, including a pplication names; dialog -box

    text; labeled buttons; check-box and rad io-bu tton labels; menu titles and submenu titles. For

    example:

    Choose System Preferences Mousefrom the main menu bar to laun ch

    Mouse Preferences. In the Buttonstab , select the Left-handed mousecheck

    box and click Closeto switch the primary mouse button from the left to the right(making the mouse suitable for use in the left hand).

    To insert a special character into a gedit file, choose Applications

    Accessories Character Map from the main menu bar. Next, choose Search Findfrom the Character Map menu bar, type the name of the character in the

    Searchfield an d click Next. The character you sought will be highlighted in the

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    4

  • 8/21/2019 OEL - RHEL - Performance Tuning

    9/93

    Character Table. Double-click this highlighted character to place it in the Text

    to copyfield and then click the Copybutton. Now switch back to your document andchoose Edit Pastefrom the gedit menu b ar.

    The above text inclu des app lication names; system-wide menu names and items; app lication -specific

    menu n ames; and buttons an d text found within a GUI interface, all presented in proportional bo ld

    and all distinguisha ble by context.

    Mono-spaced Bold Italicor Proportional Bold Italic

    Whether mono-spaced bold or prop ortional bold, the addition of italics indicates replaceable or

    variable text. Italics d enotes text you do not input literally o r displayed text tha t chang es dependin g

    on circumstance. For example:

    To con nect to a remote machine using ssh, type ssh [email protected] a

    shell prompt. If the remote machine is example.comand your username on that

    machine is john, type ssh [email protected].

    The mount -o remount file-systemcommand remounts the named file system.

    For example, to remount the /homefile system, the command is mount -o remount/home.

    To see the version o f a currently installed packag e, use the rpm -qpackage

    command. It will return a result as follo ws:package-version-release.

    Note the words in b old italics a bove: username, domain.name, file-system, package, version and

    release. Each word is a p laceho lder, either for text you enter when issuing a command or for text

    disp layed by the system.

    Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and

    important term. For example:

    Publican is a DocBookpublishing system.

    1.2. Pull-quote Conventions

    Terminal ou tput and source code listings are set off visually from the surroun ding text.

    Output sent to a termina l is set in mono-spaced romanand presented thus:

    books Desktop documentation drafts mss photos stuff svnbooks_tests Desktop1 downloads images notes scripts svgs

    Source-code listings are also set in mono-spaced romanbut add syntax highlig hting as follows:

    staticintkvm_vm_ioctl_deassign_device(structkvm *kvm, structkvm_assigned_pci_dev *assigned_dev){ intr = 0; structkvm_assigned_dev_kernel *match;

    mutex_lock(&kvm->lock);

    match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,

    assigned_dev->assigned_dev_id); if(!match) { printk(KERN_INFO "%s: device hasn't been assigned before, " "so cannot be deassigned\n", __func__); r = -EINVAL;

    Preface

    5

  • 8/21/2019 OEL - RHEL - Performance Tuning

    10/93

    gotoout; }

    kvm_deassign_device(kvm, match);

    kvm_free_assigned_device(kvm, match);

    out: mutex_unlock(&kvm->lock);

    returnr;}

    1.3. Notes and Warnings

    Fina lly, we use three visua l styles to draw attention to information that might otherwise be overlooked.

    Note

    Notes are tips, shortcuts or alternative approaches to the task a t hand . Ignoring a note shouldhave no nega tive consequences, but you might miss out on a trick that makes your life easier.

    Important

    Important boxes detail things that are easily missed: configuration changes that only apply to

    the current session, or services that need restarting before an update will app ly. Ignoring a

    box labeled Important will no t cause data loss but may cau se irritation an d frustration.

    Warning

    Warnings shou ld no t be ignored. Ignoring warnings will most likely cause data loss.

    2. Get t ing Help and Giving Feedback

    2.1. Do You Need Help?

    If you experience difficulty with a procedure described in this do cumentation, visit the Red Hat

    Customer Portal at http://access.redhat.com. Through the customer portal, you can:

    search o r browse through a knowledgebase of technical support articles abou t Red Hat products.

    submit a suppo rt case to Red Hat Globa l Support Services (GSS).

    access other product documentation.

    Red Hat also hosts a large number of electronic mailin g lists for discussio n o f Red Hat software and

    techno logy. You can find a list of publicly availab le mailing lists athttps://www.redhat.com/mailman/listinfo . Click on the name of any mailin g list to subscribe to that list

    or to access the list archives.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    6

    https://www.redhat.com/mailman/listinfohttp://access.redhat.com/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    11/93

    2.2. We Need Feedback!

    If you find a typographica l error in this manual, or if you have though t of a way to make this manua l

    better, we would love to hear from you! Please submit a report in Bugzilla : http://bugzilla.redhat.com/

    aga inst the product Red Hat Enterprise Linux 6.

    When submitting a bug report, be sure to mention the manua l's identifier: doc-

    Performance_Tuning_Guide

    If you have a suggestion for improving the documentation , try to be as specific as possib le when

    describing it. If you have found an error, please includ e the section number and some of the

    surrounding text so we can find it easily.

    Preface

    7

    http://bugzilla.redhat.com/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    12/93

    Chapter 1. Overview

    The Performance Tuning Guideis a comprehensive reference on the configu ration and o ptimization of

    Red Hat Enterprise Linux. While this release also con tains information on Red Hat Enterprise Linux 5

    performance capabi lities, all in structions su ppl ied herein are specific to Red Hat Enterprise Linux 6.

    1.1. How to read this book

    This bo ok is d ivided in to chap ters discussing specific subsystems in Red Hat Enterprise Linux. The

    Performance Tuning Guidefocuses on three major themes per subsystem:

    Features

    Each subsystem chapter describes performance features un ique to (or implemented

    differently in) Red Hat Enterprise Linux 6. These chapters also discu ss Red Hat Enterprise

    Linux 6 u pdates that significantly improved the performance of specific sub systems over

    Red Hat Enterprise Linux 5.

    Analysis

    The book also enumerates performance indica tors for each specific subsystem. Typica l

    values for these ind icators a re described in the context of specific services, helping you

    understand their significance in real-world, production systems.

    In addition, the Performance Tuning Guidealso sh ows d ifferent ways o f retrieving

    performance da ta (that is, profiling) for a subsystem. Note that some of the profiling too ls

    showcased here are documented elsewhere with more detail.

    Configuration

    Perhaps the most important information in this book are instructions on how to adjust the

    performance o f a specific sub system in Red Hat Enterprise Linux 6. The Performance Tuning

    Guideexplain s how to fine-tune a Red Hat Enterprise Linux 6 subsystem for sp ecific

    services.

    Keep in mind tha t tweaking a specific sub system's performance may a ffect the performance of

    another, sometimes adversely. The default configuration of Red Hat Enterprise Linux 6 is optimal for

    mostservices running under moderateloads.

    The procedures enumerated in the Performance Tuning Guidewere tested extensively by Red Hat

    engineers in bo th lab and field. However, Red Hat recommends that you p roperly test all p lanned

    configura tions in a secure testing envi ronment before applyin g it to your production servers. Youshould also back up all da ta and co nfiguration information before you start tuning your system.

    1.1.1. Audience

    This book is sui table for two types of readers:

    System/Bus iness Analyst

    This b ook enumerates and explains Red Hat Enterprise Linux 6 performance features a t a

    high level, providing enough information on how subsystems perform for specific

    workloads (both b y defau lt and when o ptimized). The level of d etai l used in describ ing RedHat Enterprise Linux 6 p erformance features helps p otential customers and sales engineers

    understand the suitabil ity of this p latform in provid ing resource-intensive services at an

    acceptable level.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    8

  • 8/21/2019 OEL - RHEL - Performance Tuning

    13/93

    The Performance Tuning Guidealso provides links to more detailed documentation on each

    feature whenever possib le. At that detail level, readers can understand these performance

    features enough to form a high-level strategy in d eploying an d op timizing Red Hat

    Enterprise Linux 6. This a llows readers to both develop andevaluate infrastructure

    proposals.

    This feature-focused level of documentation is su itable for readers with a high-level

    understanding o f Linux subsystems and enterprise-level networks.

    System Administrat or

    The procedures enumerated in this bo ok are suitab le for system administrators with RHCE

    skill level (or its equ ivalent, tha t is, 3-5 years experience in deploying and managing

    Linux). The Performance Tuning Guideaims to provide as much detail as possible about the

    effects of each configuration ; this means describin g any performance trade-offs that may

    occur.

    The underlying skill in performance tuning lies not in knowing how to ana lyze and tune a

    subsystem. Rather, a system administrator adept at performance tun ing knows how to

    ba lance and optimize a Red Hat Enterprise Linux 6 system for a specific purpose. This meansalsoknowing which trade-offs and performance penalties are acceptable when attempting

    to implement a con figura tion design ed to bo ost a specific subsystem's performance.

    1.2. Release overview

    1.2.1. Summary of features

    1.2.1 .1. New features in Red Hat Ent erprise Linux 6

    Read this section for a brief overview of the performance-related ch anges inclu ded in Red Hat

    Enterprise Linux 6 .

    1.2.1.1.1. New in 6 .5

    Updates to the kernel remove a bottleneck in memory manag ement and improve performance by

    allowing I/O load to spread across multiple memory poo ls when the irq-tab le size is 1 GB or

    greater.

    The cpupowerutils packa ge has been upda ted to include the turbostat and

    x86_energy_perf_policytools. These too ls are newly documented in Section 4.1.4,

    turbostat and Section 4.1.3, Hardware performance po licy (x86_energy_perf_policy) .

    CIFS now suppo rts larger rsize options and a synchrono us readpages, allowing for significant

    increases in through put.

    GFS2 now provides an Orlov block a llocator to increase the speed a t which block a llocation

    takes place.

    The virtual-hostsprofile in tuned has been adjusted. The value for

    kernel.sched_migration_costis now 5000000nanoseconds (5 milliseconds) instead o f

    the kernel default 500000nanoseconds (0.5 milliseconds). This reduces contention at the runqueue lock for large virtualization h osts.

    The latency-performanceprofile in tuned has been adjusted. The value for power

    management qua lity of service, cpu_dma_latencyrequirement, is now 1instead of 0.

    [1]

    Chapt er 1. O verview

    9

  • 8/21/2019 OEL - RHEL - Performance Tuning

    14/93

    Several op timization s are now included in the kernel copy_from_user()and

    copy_to_user()functions, improv ing the performance of both.

    The perf too l has received a number of updates and enhancements, inclu ding:

    Perf can now use hardware counters provided by the System z CPU-measurement counter

    facility. There are four sets of hardware coun ters ava ilab le: the basic counter set, the problem-

    state counter set, the crypto-activity counter set, and the extended counter set.

    A new command, perf trace, is now available. This enables strace-like behavior usingperf in frastructure to a llow add itiona l targets to be traced. For further information, refer to

    Section 3 .6.4, Perf .

    A script browser has been added to enable users to view all availab le scripts for the current

    perf data file.

    Several ad ditional sample scripts are now ava ilable.

    Ext3 has been upd ated to reduce lock contention, thereby improving the performance of multi-

    threaded write operations.

    KSM has been updated to be aware of NUMA topo logy, allowing it to take NUMA locality in to

    account while coalescing pa ges. This prevents performance drops related to pages being moved

    to a remote node. Red Hat recommends avoid ing cross-node memory merging when KSM is in

    use.

    This update introduces a new tunable, /sys/kernel/mm/ksm/merge_nodes, to con trol this

    behavior. The default valu e (1) merges pages across different NUMA nodes. Setmerge_nodesto

    0to merge pages only on the same node.

    hdparmhas been updated with several new flags, including --fallocate, --offset, and -R

    (Write-Read-Verify enablement). Additionally, the --trim-sector-rangesand --trim-sector-ranges-stdinoptions replace the --trim-sectorsop tion, allowing more than asing le sector range to b e specified. Refer to the man page for further information about these

    options.

    1.2.2. Horizontal Scalability

    Red Hat's efforts in improv ing the performance of Red Hat Enterprise Linux 6 focus on scalability.

    Performance-boosting features are evaluated primarily based on how they affect the platform's

    performance in different areas of the workload spectrum that is, from the lonely web server to the

    server farm mainframe.

    Focusing on scalabili ty allows Red Hat Enterprise Linux to maintain its versatility for different types of

    workloads and purposes. At the same time, this means that as your business grows and your

    workload sca les up, re-co nfiguring your server environment is less prohib itive (in terms of cost and

    man-hours) and more intuitive.

    Red Hat makes improvements to Red Hat Enterprise Linux for both horizontal scalabilityand vertical

    scalability; however, horizon tal scalabili ty is the more generally app licab le use case. The idea behind

    horizontal scalability is to use multiple standard computersto distribute heavy workloads in order to

    improve performance and reliability.

    In a typical server farm, these standard computers come in the form of 1U rack-mounted servers andblade servers. Each standard computer may be as small as a simple two-socket system, although

    some server farms use large systems with more sockets. Some enterprise-grade networks mix large

    and small systems; in such cases, the large systems are high performance servers (for example,

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    10

  • 8/21/2019 OEL - RHEL - Performance Tuning

    15/93

    database servers) and the small ones are dedicated app lication servers (for example, web or mail

    servers).

    This type of sca lab ility simplifies the growth of your IT infrastructure: a medium-sized business with

    an a ppro priate load migh t only need two pizza box servers to suit all their needs. As the business

    hires more people, expands its operations, increases its sales volumes and so forth, its IT

    requirements increase in both volume and complexity. Horizontal sca lab ility allows IT to simply

    deploy additional machines with (mostly) identical configurations as their predecessors.

    To summarize, horizontal scalability adds a layer of abstraction that simplifies system hardware

    administration. By developing the Red Hat Enterprise Linux platform to scale horizon tally, increasing

    the capacity and performance of IT services can be as simple as adding new, easily con figured

    machines.

    1.2.2 .1. Parallel Comput ing

    Users benefit from Red Hat Enterprise Linux's horizontal sca lab ility not just because it simplifies

    system hardware administration; but also because horizontal scalability is a suitable development

    philosophy given the current trends in hardware advancement.

    Consider this: most complex enterprise applica tions have thousa nds of tasks that must be performed

    simultaneously, with different coordination methods between tasks. While early computers ha d a

    single-core processor to juggle all these tasks, virtually all processors available today have multiple

    cores. Effectively, modern computers put multiple cores in a sing le socket, making even sing le-socket

    desktops or laptops multi-processor systems.

    As of 20 10, stand ard Intel an d AMD processors were available with two to sixteen cores. Such

    processors are prevalent in pizza box or blade servers, which can now con tain as many as 40 cores.

    These low-cost, high -performance systems bring la rge system capab ilities and ch aracteristics into

    the mainstream.

    To achieve the best performance and utiliza tion o f a system, each core must be kept busy. This

    means that 32 separate tasks must be runn ing to take advan tage of a 32 -core blade server. If a

    blade chass is con tains ten of these 32-core blades, then the entire setup can pro cess a minimum of

    320 tasks simultaneously. If these tasks a re part of a single job, they must be coord ina ted.

    Red Hat Enterprise Linux was developed to adapt well to hardware development trends and ensure

    that bu sinesses can fully b enefit from them. Section 1.2.3, D istributed Systems explores the

    technologies that enab le Red Hat Enterprise Linux's horizo ntal sca lab ility in greater detail.

    1.2.3. Distributed Systems

    To fully realize horizontal scala bility, Red Hat Enterprise Linux uses many components of distributed

    computing. The technologies that make up d istributed computing are divided in to three layers:

    Communication

    Horizontal scala bility requires many tasks to be performed simultaneously (in paral lel). As

    such , these tasks must have interprocess communicationto coordinate their work. Further, a

    platform with horizontal scalability should be able to share tasks across multiple systems.

    Storage

    Storage via local disks is not sufficient in addressing the requirements of horizontalscala bility. Some form of distributed or sha red storage is needed, one with a layer of

    abstraction that allows a single storage volume's capacity to grow seamlessly with the

    addition of new storage hardware.

    Chapt er 1. O verview

    11

  • 8/21/2019 OEL - RHEL - Performance Tuning

    16/93

    Management

    The most important duty in d istributed computing is the managementlayer. This

    management layer coordinates all software and hardware compon ents, efficiently

    managing communication, storage, and the usage of shared resources.

    The follo wing sections describe the technologies within each layer in more detail.

    1.2.3.1. Co mmunicatio n

    The communication layer ensures the transport of data, and is composed of two parts:

    Hardware

    Software

    The simplest (and fastest) way for multiple systems to communicate is through shared memory. This

    entails the usage of familiar memory read/write operations; shared memory has the high bandwidth,

    low latency, and low overhead o f ordina ry memory read/write operations.

    Ethernet

    The most common way o f communicating between computers is over Ethernet. Today, Gigabit Ethernet

    (GbE) is provided by defau lt on systems, and most servers include 2-4 ports of Gigab it Ethernet. GbE

    provides good b andwidth and latency. This is the foundation of most distributed systems in use

    today. Even when systems include faster network ha rdware, it is still common to u se GbE for a

    dedicated management in terface.

    10GbE

    Ten Gigabit Ethernet(10GbE) is rapidly growing in acceptance for high end and even mid-range

    servers. 10GbE provides ten times the bandwidth o f GbE. One of its major a dvantages is with modern

    multi-core processors, where it restores the ba lance between communication and computing. You

    can compare a single core system using Gb E to an eigh t core system using 10GbE. Used in this way,

    10GbE is especially valuable for maintaining overall system performance and avoiding

    communication bottlenecks.

    Unfortunately, 10GbE is expensive. While the cost of 10GbE NICs has come down, the price of

    interconnect (especially fibre op tics) remains h igh, and 10GbE network switches are extremely

    expensive. We can expect these prices to decline over time, but 10GbE today is most heavily u sed in

    server room backbones and performance-critical applications.

    Infiniband

    Infiniband offers even higher performance than 10GbE. In ad dition to TCP/IP and UDP network

    connections used with Ethernet, Infiniband also supports shared memory communication. This

    allows Infiniband to work between systems via remote direct memory access(RDMA).

    The use of RDMA allows Infiniband to move data d irectly between systems without the overhead of

    TCP/IP or socket connections. In turn, this reduces latency, which is critical to some applica tions.

    Infiniban d is most commonly used in High Performance Technical Computing(HPTC) applications

    which requ ire high bandwid th, low la tency and low overhead . Many supercomputing app lica tions

    benefit from this, to the point that the best way to improve performance is by investing in Infiniband

    rather than faster processors or more memory.

    RoCE

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    12

  • 8/21/2019 OEL - RHEL - Performance Tuning

    17/93

    RDMA over Converged Ethernet(RoCE) implements Infiniband-style communications (inc lud ing

    RDMA) over a 10GbE infrastructure. Given the cost improvements associated with the growing

    volume of 10GbE produ cts, it is reasonable to expect wider usage of RDMA and RoCE in a wide

    range of systems and app lications.

    Each of these communication methods is fully-su pported b y Red Hat for use with Red Hat Enterprise

    Linux 6 .

    1.2.3.2. St orage

    An environ ment that uses distribu ted computing u ses multiple instances of shared storag e. This can

    mean one of two things:

    Multiple systems storing data in a single location

    A storage unit (e.g. a volume) composed o f multiple storage appliances

    The most familiar example of storage is the local d isk drive mounted on a system. This is approp riate

    for IT operations where all appl ication s are hosted on on e host, or even a small number of hosts.

    However, as the infrastructure scales to do zens or even hun dreds of systems, managing as manylocal storage disks becomes difficult and complicated.

    Distributed storage adds a layer to ease and automate storage hardware administration as the

    business scales. Having multiple systems share a handful of storage instan ces reduces the number

    of devices the administrator needs to manage.

    Consolidating the storage capabilities of multiple storage appliances into one volume helps both

    users and administrators. This type of distributed storage provides a layer of abstraction to storage

    poo ls: users see a single unit of storage, which an administrator can easily grow by adding more

    hardware. Some techno log ies that enable distributed storage also provide added benefits, such as

    failover and multipathing.

    NFS

    Network File System(NFS) allows multiple servers or users to mount and use the same instance of

    remote storage via TCP or UDP. NFS is commonly used to ho ld data sha red by multiple applica tions.

    It is also convenient for bu lk storage of large amounts of data.

    SAN

    Storage Area Networks(SANs) use either Fibre Channel or iSCSI protoco l to p rovide remote access to

    storage. Fibre Channel infrastructure (such as Fibre Channel host bus adapters, switches, and

    storage arrays) combines high performance, high bandwidth, and massive storage. SANs separatestorage from processing, providing considerable flexibility in system design.

    The other majo r advantage of SANs is tha t they prov ide a mana gement environment for performing

    major storage hardware administrative tasks. These tasks in clud e:

    Controlling access to storage

    Managing large amoun ts of data

    Provisioning systems

    Backing up and replicating data

    Taking snapshots

    Supporting system failover

    Chapt er 1. O verview

    13

  • 8/21/2019 OEL - RHEL - Performance Tuning

    18/93

    Ensuring data integrity

    Migrating data

    GFS2

    The Red Hat Global File System 2(GFS2) file system provides several specialized capa bili ties. The

    basic function of GFS2 is to provid e a single file system, includ ing concurrent read/write access,

    shared across multiple members of a cluster. This means that each member of the cluster sees exactlythe same data "on d isk" in the GFS2 filesystem.

    GFS2 allows all systems to have concurrent access to the "disk". To maintain data integrity, GFS2

    uses a Distributed Lock Manager(DLM), which on ly allows one system to write to a specific location a t

    a time.

    GFS2 is especially well-suited for failover applications that require high availability in storage.

    For further information about GFS2, refer to the Global File System 2. For further information about

    storage in general, refer to the Storage Administration Guide. Both are available from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    1.2.3.3. Converged Net works

    Communication over the network is normally done through Ethernet, with storage traffic us ing a

    dedicated Fib re Channel SAN environment. It is common to ha ve a dedica ted network or serial link

    for system manag ement, and perhaps even heartbeat . As a result, a sing le server is typically on

    multiple networks.

    Providing multiple connections on each server is expensive, bulky, and complex to manage. This

    gave rise to the need for a way to consolid ate all conn ections into on e. Fibre Channel over Ethernet

    (FCoE) and Internet SCSI(iSCSI) address this n eed.

    FCoE

    With FCoE, standard fibre chann el commands and data pa ckets are transported over a 10GbE

    physical infrastructure via a single converged network adapter(CNA). Standard TCP/IP ethernet traffic

    and fibre channel storage operations can be transported via the same link. FCoE uses one physical

    network interface card (an d one cable) for multiple logica l network/storage connections.

    FCoE offers the following a dvantages:

    Reduced number of conn ections

    FCoE reduces the number of network connections to a server by ha lf. You can still choose

    to have multiple connections for performance or availabili ty; however, a single connection

    provides both storage and network connectivity. This is especially helpful for pizza box

    servers and b lade servers, since they both h ave very limited sp ace for components.

    Lower cos t

    Reduced number of connections immedia tely means reduced number of cab les, switches,

    and o ther networking equipment. Ethernet's h istory also features great economies of scale;

    the cost of networks drops d ramatically a s the number of devices in the market goes from

    millions to b illions, as was seen in the decline in the price of 100Mb Ethernet and gigab it

    Ethernet devices.

    [2]

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    14

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    19/93

    Similarly, 10GbE will a lso become cheaper as more bus inesses adapt to its use. Also, as

    CNA hardware is integrated into a sing le chip, widespread use will also increase its volume

    in the market, which will result in a sign ificant price drop over time.

    iSCSI

    Internet SCSI(iSCSI) is ano ther type of con verged network pro tocol; it is an alternative to FCoE. Like

    fibre channel, iSCSI provides blo ck-level storage over a network. However, iSCSI does no t provide a

    complete management environment. The main advan tage of iSCSI over FCoE is that iSCSI provides

    much of the capab ility and flexibil ity of fibre channel, but at a lower cost.

    [1] Red Hat Certified Engineer. For more information, refer to

    http://www.redhat.com/training/certifications/rhce/.

    [2] Heartbeatis the exchange o f messag es b etween systems to ensure that each system is s till

    functioning. If a system "lo ses heartbeat" it is assumed to have failed and is shut do wn, with another

    system taking o ver fo r it.

    Chapt er 1. O verview

    15

    http://www.redhat.com/training/certifications/rhce/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    20/93

    Chapter 2. Red Hat Enterprise Linux 6 Performance Features

    2.1. 64-Bit Support

    Red Hat Enterprise Linux 6 supports 64-bit processors; these processors can theoretically use up to

    16 exabytesof memory. As of general availability (GA), Red Hat Enterprise Linux 6.0 is tested and

    certified to support up to 8 TB of physical memory.

    The size of memory supported by Red Hat Enterprise Linux 6 is expected to grow over several minor

    updates, as Red Hat continues to in troduce an d improve more features that enab le the use of larger

    memory blocks. Fo r cu rrent details, see https://access.redhat.com/site/articles/rhel-limits . Example

    improvements (as of Red Hat Enterprise Linux 6.0 GA) are:

    Huge pages and transpa rent huge pages

    Non-Uniform Memory Access improvements

    These improvements are outlined in g reater detail in the sections that follow.

    Huge pages and transparent huge pages

    The implementation of huge pagesin Red Hat Enterprise Linux 6 allows the system to manage memory

    use efficiently across d ifferent memory worklo ads. Huge pages dynamically u tilize 2 MB pages

    compared to the standard 4 KB page size, allowing app lications to scale well while processing

    gigabytes an d even terabytes o f memory.

    Huge pag es are difficul t to manu ally create, manage, and use. To address this, Red Hat Enterprise 6

    also features the use of transparent huge pages(THP). THP automatically manages many of the

    complexities invo lved in the use of huge pages.

    For more information on huge pages and THP, refer to Section 5.2, Huge Pages and Transparent

    Huge Pages .

    NUMA improvement s

    Many new systems no w support Non-Uniform Memory Access(NUMA). NUMA simplifies the desig n a nd

    creation of ha rdware for large systems; however, it also a dds a layer of complexity to app lication

    develop ment. For example, NUMA implements b oth local and remote memory, where remote memory

    can take several times longer to access than loca l memory. This feature has performance

    implications for operating systems and a pplications, an d should be configured carefully.

    Red Hat Enterprise Linux 6 is better op timized for NUMA use, thanks to several additional featuresthat help mana ge users and applica tions o n NUMA systems. These features include CPU affinity,

    CPU pinn ing (cpusets), numactl and control group s, which allow a p rocess (affinity) or applica tion

    (pinning ) to "b ind" to a specific CPU or set of CPUs.

    For more information about NUMA support in Red Hat Enterprise Linux 6, refer to Section 4.1.1, CPU

    and NUMA Topology .

    2.2. Ticket Spinlocks

    A key part of any system design is ensuring that one process does not alter memory used by anotherprocess. Uncontrolled data change in memory can result in data corruption and system crashes. To

    prevent this, the operating system allows a process to lock a piece of memory, perform an operation ,

    then un lock or " free" the memory.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    16

    https://access.redhat.com/site/articles/rhel-limits
  • 8/21/2019 OEL - RHEL - Performance Tuning

    21/93

    One common implementation of memory locking is through spin locks, which allow a process to keep

    checking to see if a lock is availa ble and take the lock as soon a s it becomes available. If there are

    multiple processes competing for the same lock, the first one to request the lock after it has been freed

    gets it. When all processes have the same access to memory, this approach is " fair" a nd works qu ite

    well.

    Unfortunately, on a NUMA system, not all processes ha ve equal a ccess to the locks. Processes on

    the same NUMA node as the lock have an unfair advan tage in obtaining the lock. Processes on

    remote NUMA nodes experience lock starva tion and degraded performance.

    To address this, Red Hat Enterprise Linux implemented ticket spinlocks. This feature adds a

    reservation queue mechanism to the lock, al lowing allprocesses to take a lock in the order that they

    requested it. This eliminates timing problems and unfair ad van tages in lock requests.

    While a ticket spinlock has slightly more overhead than an ordinary spinlock, it scales better and

    provides better performance on NUMA systems.

    2.3. Dynamic List Structure

    The operating system requires a set of information on each processor in the system. In Red Hat

    Enterprise Linux 5, this set of information was a lloca ted to a fixed-size array in memory. Information

    on each individu al processor was obtained by indexing into this array. This method was fast, easy,

    and straightforward for systems that contained relatively few processors.

    However, as the number of processors for a system grows, this method p rodu ces signi ficant

    overhead. Because the fixed-size array in memory is a sing le, shared resource, it can become a

    bottleneck as more processors attempt to access it at the same time.

    To address this, Red Hat Enterprise Linux 6 uses a dynamic list structurefor processor information.

    This al lows the array used for processor information to be allocated dyna mically: if there are only

    eight processors in the system, then only eight entries are created in the list. If there are 2048processors, then 2048 entries a re created as well.

    A dynamic list structure allows more fine-grain ed lockin g. For example, if information needs to b e

    updated a t the same time for processors 6, 72, 183, 657, 931 and 1546, this can be done with greater

    parallelism. Situations like this obviously occur much more frequently on large, high-performance

    systems than on small systems.

    2.4. Tickless Kernel

    In previous versions of Red Hat Enterprise Linux, the kernel used a timer-based mechanism thatcontinuously p rodu ced a system interrupt. Du ring each interrupt, the systempolled; that is, it checked

    to see if there was work to be done.

    Dependin g on the setting, this system interrupt or timer tickcou ld occur several hu ndred or several

    thousand times per second. This ha ppened every second, regardless of the system's worklo ad. On a

    ligh tly load ed system, this impactspower consumptionby preventing the processor from effectively

    using sleep states. The system uses the least power when it is in a sleep state.

    The most power-efficient way for a system to op erate is to do work as quickly as possib le, go into the

    deepest sleep state possible, and sleep a s long as p ossib le. To implement this, Red Hat Enterprise

    Linux 6 uses a tickless kernel. With this, the interrupt timer has been removed from the idle loop,

    transforming Red Hat Enterprise Linux 6 into a completely in terrupt-driven environment.

    The tickless kernel allows the system to go into deep sleep states du ring idle times, and respon d

    quickly when there is work to be don e.

    Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures

    17

  • 8/21/2019 OEL - RHEL - Performance Tuning

    22/93

    For further information, refer to the Power Management Guide, available from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    2.5. Control Groups

    Red Hat Enterprise Linux p rovides many useful options for performance tuning . Large systems,

    scaling to hundreds of processors, can b e tuned to deliver superb performance. But tuning these

    systems requ ires considerable expertise and a well-defined workload. When large systems were

    expensive and few in number, it was acceptable to g ive them special treatment. Now that these

    systems are mainstream, more effective tools are needed.

    To further complica te things, more powerful systems are being u sed now for service consolid ation .

    Workloads that may h ave been run ning on four to eight older servers are now placed in to a sin gle

    server. And as d iscussed earlier in Section 1.2.2.1, Para llel Computing , many mid-range systems

    nowadays contain more cores than yesterday's high-performance machines.

    Many modern ap plications are designed for parallel processing, using multiple threads or p rocesses

    to improve performance. However, few applications can make effective use of more than eight

    threads. Thus, multiple applica tions typica lly need to be installed on a 32 -CPU system to maximizecapacity.

    Consider the situation : small, inexpensive mainstream systems are no w at parity with the performance

    of yesterday's expensive, high-performance mach ines. Cheaper high-performance mach ines gave

    system architects the ability to conso lida te more services to fewer machin es.

    However, some resources (such as I/O and network communication s) are shared, and do not grow

    as fast as CPU count. As such , a system housing multiple app lications can experience degraded

    overall p erformance when one ap plication h ogs too much o f a sing le resource.

    To ad dress this, Red Hat Enterprise Linux 6 n ow supports control groups(cgroups). Cgroups allow

    administrators to allocate resources to specific tasks as needed. This means, for example, being a bleto alloca te 80% of four CPUs, 60GB of memory, and 40% of disk I/O to a database application. A

    web a pp lica tion running o n the same system could be given two CPUs, 2GB of memory, and 50% of

    availab le network band width.

    As a result, both da tabase and web applica tions deliver goo d performance, as the system prevents

    both from excessively consuming system resources. In addi tion, many aspects of cgroup s are self-

    tuning, allowing the system to respond acco rdingly to changes in workload.

    A cgroup has two major components:

    A list of tasks assigned to the cgroup

    Resources alloca ted to those tasks

    Tasks assigned to the cgroup run withinthe cgroup. Any child tasks they spawn a lso run within the

    cgroup. This a llows an a dministrator to manage an entire application a s a single unit. An

    administrator can also configure allocations for the following resources:

    CPUsets

    Memory

    I/O

    Network (bandwidth)

    Within CPUsets, cgroups a llow administrators to configure the number of CPUs, affinity for specific

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    18

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    23/93

    CPUs or nodes , and the amoun t of CPU time used by a set of tasks. Using cgroups to configure

    CPUsets is vital for ensuring good overall performance, preventing an application from consuming

    excessive resources at the cost of other tasks while simultaneously ensuring tha t the app lication is

    not starved for CPU time.

    I/O bandwidth an d network band width are managed by other resource controllers. Aga in, the

    resource controllers allow you to determine how much bandwidth the tasks in a cgroup can

    consume, and ensure that the tasks in a cgroup n either consu me excessive resources nor are

    starved of resources.

    Cgroup s allo w the administrator to define and a llocate, at a high level, the system resources that

    various applications need (and will) consume. The system then automatically manages and

    balances the various a pplica tions, delivering g ood predictable performance and optimizing the

    performance of the overall system.

    For more information on how to use control g roups, refer to the Resource Management Guide,

    available from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    2.6. Storage and File Syst em Improvements

    Red Hat Enterprise Linux 6 also features several improvements to storage and file system

    management. Two of the most no table advances in this version are ext4 and XFS sup port. For more

    comprehensive coverage of performance improvements relating to storage and file systems, refer to

    Chapter 7, File Systems.

    Ext4

    Ext4 is the default file system for Red Hat Enterprise Linux 6. It is the fourth generation version of the

    EXT file system family, supporting a theoretical maximum file system size of 1 exabyte, and single file

    maximum size of 16TB. Red Hat Enterprise Linux 6 supports a maximum file system size of 16TB, anda single file maximum size of 16TB. Other than a much larger storage capacity, ext4 also inclu des

    several n ew features, such as:

    Extent-based metadata

    Delayed a llocation

    Journal check-summing

    For more in formation abou t the ext4 file system, refer to Section 7.3.1, The Ext4 File System.

    XFS

    XFS is a robu st and mature 64-bit journalin g file system that supports very large files and file systems

    on a sing le host. This file system was o riginally developed by SGI, and has a long history of running

    on extremely large servers and storage arrays. XFS features include:

    Delayed a llocation

    Dyna mically-allocated inodes

    B-tree indexing for scala bili ty of free space management

    Online defragmentation and file system growing

    Sophisticated metadata read-ahead algorithms

    [3]

    Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures

    19

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    24/93

    While XFS scales to exabytes, the maximum XFS file system size supported by Red Hat is 100TB. For

    more information about XFS, refer to Section 7.3.2, The XFS File System .

    Large Bo ot Drives

    Tradition al BIOS supports a maximum disk size of 2.2TB. Red Hat Enterprise Linux 6 systems using

    BIOS can suppo rt disks larger than 2.2TB by using a new disk structure called Global Partition Table

    (GPT). GPT can on ly be used for da ta disks; it cannot be used for boot drives with BIOS; therefore,

    boot drives can o nly b e a maximum of 2.2TB in size. The BIOS was originally created for the IBM PC;

    while BIOS has evolved considerably to adapt to modern hardware, Unified Extensible Firmware

    Interface(UEFI) is designed to support new and emerging hardware.

    Red Hat Enterprise Linux 6 also supports UEFI, which can be used to replace BIOS (still sup ported).

    Systems with UEFI runnin g Red Hat Enterprise Linux 6 a llow the use of GPT and 2 .2TB (and la rger)

    partitions for both boot partition and d ata partition.

    Important UEFI for 32-bit x86 systems

    Red Hat Enterprise Linux 6 does not support UEFI for 32-bi t x86 systems.

    Important UEFI for AMD64 and Intel 64

    Note that the boot configu rations of UEFI and BIOS differ sign ificantly from each other.

    Therefore, the installed system must boot using the same firmware that was used du ring

    installation . You cann ot install the operating system on a system that uses BIOS and then

    boot this instal lation on a system that uses UEFI.

    Red Hat Enterprise Linux 6 supports version 2.2 o f the UEFI specification. Hardware that

    supports version 2.3 of the UEFI specification o r later shou ld bo ot and operate with Red Hat

    Enterprise Linux 6, but the add itiona l functionality defined by these later specifications will not

    be ava ilab le. The UEFI specifications a re available from http://www.uefi.org/specs/agreement/.

    [3] A node is generally defined as a set of CPUs o r co res within a soc ket.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    20

    http://www.uefi.org/specs/agreement/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    25/93

    Chapter 3. Monitoring and Analyzing System Performance

    This chapter briefly introduces tools that can be used to monitor and ana lyze system and app lication

    performance, and poin ts out the situation s in which each tool is most useful. The data collected by

    each too l can reveal bo ttlenecks or other system problems that con tribute to less-than-optimal

    performance.

    3.1. The proc File Syst em

    The proc" file system" is a directory that contains a hierarchy o f files that represent the current stateof the Linux kernel. It allows applica tions and u sers to see the kernel's v iew of the system.

    The procdirectory also contains information abou t the hardware of the system, and an y currently

    runn ing processes. Most of these files are read-only, but some files (primarily those in /proc/sys)can be manipulated b y users and app lications to communicate configuration changes to the kernel.

    For further information about viewing and editing files in the procdi rectory, refer to the Deployment

    Guide, availa ble from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    3.2. GNOME and KDE System Monitors

    The GNOME and KDE desktop environments both have graphica l tools to ass ist you in monitoring

    and modifying the behavior of your system.

    GNO ME System Monito r

    The GNO ME System Monito rdisp lays basic system information and allows you to monitor system

    processes, and resource or file system usage. Open it with the gnome-system-monitorcommandin the Terminal, or click on the Applicationsmenu, and select System Too ls > System

    Monitor.

    GNO ME System Monito rhas four tabs:

    System

    Displays basic in formation a bou t the computer's hardware and software.

    Processes

    Shows active processes, and the relation ships between those processes, as well asdetailed information abou t each process. It also lets you filter the processes disp layed, and

    perform certain actions on those processes (start, stop, kill, cha nge priority, etc.).

    Resources

    Displays the current CPU time usage, memory and swap space usag e, and network usage.

    File Systems

    Lists all moun ted file systems alongside some basic information about each , such as the

    file system type, mount poin t, and memory usa ge.

    For further information about the GNO ME System Moni to r, refer to the Help menu in the

    app lication, or to the Deployment Guide, availab le from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    Chapt er 3. Monitoring and Analyzing Syst em Performance

    21

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    26/93

    KDE System Gu ard

    The KDE System Gu ard allows you to monitor current system load and processes that are running .

    It also lets you perform actions on processes. Open it with the ksysguardcommand in theTerminal, or click on the Kickoff Application Launcherand select Applications> System>

    System Monit or .

    There are two tabs to KDE System Gu ard :

    Process Tabl e

    Displays a list of all running processes, alphabetically by default. You can also sort

    processes by a nu mber of other properties, inclu ding total CPU usage, physica l or sha red

    memory usag e, owner, and p riority. You can also filter the visible results, search for specific

    processes, or perform certain actions on a process.

    System Load

    Displays historical g raphs of CPU usage, memory and swap space usage, and n etwork

    usage. Hover over the graphs for detailed ana lysis and graph keys.

    For further information about the KDE System Gu ard , refer to the Help menu in the app lication.

    3.3. Built-in Command-line Monitoring Tools

    In addition to graphical monitoring tools, Red Hat Enterprise Linux provides several tools that can be

    used to moni tor a system from the command lin e. The advantage of these tools is tha t they can be

    used ou tside run level 5. This section discu sses each tool briefly, and suggests the purposes to

    which each tool is best suited.

    top

    The top tool p rovides a dyna mic, real-time view of the processes in a runn ing system. It can disp lay

    a variety of information, includ ing a system summary and the tasks currently being manag ed by the

    Linux kernel. It also has a limited ab ility to manipu late processes. Both its operation an d the

    information it displays are highly configurable, and an y configuration details can be made to persist

    across restarts.

    By defau lt, the processes shown are ordered by the percentage of CPU usage, giving an easy view

    into the processes that are consuming the most resources.

    For detailed information a bout using top , refer to its man page: man top.

    ps

    The pstool takes a snapshot of a select group of active processes. By default this group is limited to

    processes owned by the current user and a ssocia ted with the same termina l.

    It can provide more detailed in formation abo ut processes than top , but is not dynamic.

    For detailed information a bout using ps, refer to i ts man page: man ps.

    vmstat

    vmstat (Virtual Memory Statistics) ou tputs instan taneous reports about your system's p rocesses,

    memory, paging, b lock I/O, interrupts an d CPU activity.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    22

  • 8/21/2019 OEL - RHEL - Performance Tuning

    27/93

    Although it is not dyn amic like top , you can specify a sampling interval, which lets you observe

    system activity in near-real time.

    For detailed information a bout using vmstat , refer to its man page: man vmstat.

    sar

    sar(System Activity Reporter) collects and reports information about today's system activity so far.

    The defau lt output covers today's CPU utilization a t ten minu te intervals from the beginning o f theday:

    12:00:01 AM CPU %user %nice %system %iowait %steal %idle12:10:01 AM all 0.10 0.00 0.15 2.96 0.00 96.7912:20:01 AM all 0.09 0.00 0.13 3.16 0.00 96.6112:30:01 AM all 0.09 0.00 0.14 2.11 0.00 97.66...

    This too l is a useful alternative to a ttempting to create periodic reports on system activity throu gh top

    or similar tools.

    For detailed information a bout using sar, refer to i ts man page: man sar.

    3.4. irqbalance

    irqbalanceis a command lin e tool that distribu tes hardware interrupts across processors to

    improve system performance. It runs as a daemon by defau lt, but can be run o nce on ly with the --

    oneshotoption.

    The following parameters are useful for improving performance.

    --powerthresh

    Sets the number of CPUs that can id le before a CPU is p laced into powersave mode. If more

    CPUs than the threshold are more than 1 standard d eviation below the average softirq

    workload and no CPUs are more than one standard deviation ab ove the average, and

    have more than o ne irq assign ed to them, a CPU is placed into p owersave mode. In

    powersave mode, a CPU is not part of irq balancing so that it is not woken unnecessarily .

    --hintpolicy

    Determines how irq kernel affinity hinting is handled. Valid valu es are exact(irq affinity

    hint is always ap plied), subset(irq is balanced, but the assigned o bject is a subset of theaffinity hint), or ignore(irq affinity hin t is igno red completely).

    --policyscript

    Defines the location of a script to execute for each interrupt request, with the device path

    and irq number passed as arguments, and a zero exit code expected by irqbalance. The

    script defined can sp ecify zero or more key value pai rs to guide irqbalancein managing

    the passed irq.

    The following are recognized as valid key value pairs.

    ban

    Valid values are true(exclude the passed irq from balancing ) or false(performbalancing o n this irq).

    Chapt er 3. Monitoring and Analyzing Syst em Performance

    23

  • 8/21/2019 OEL - RHEL - Performance Tuning

    28/93

    balance_level

    Allows user override of the balance level of the passed irq. By default the bala nce

    level is based on the PCI device class of the device that owns the irq. Valid va lues

    are none, package, cache, or core.

    numa_node

    Allows user override of the NUMA node that is considered local to the passed irq .If information about the local node is not specified in ACPI, devices are

    considered equ idis tant from all nodes. Valid values are integers (starting from 0)

    that identify a specific NUMA node, and -1, which sp ecifies that an irq should beconsidered equidistant from all nodes.

    --banirq

    The interrupt with the specified interrupt request number is a dded to the list of banned

    interrupts.

    You can a lso use the IRQBALANCE_BANNED_CPUSenvironment variable to specify a mask o f CPUs

    that are ignored b y irqbalance.

    For further details, see the man pag e:

    $ man irqbalance

    3.5. Tuned and ktune

    Tuned is a daemon tha t monitors and collects data on the usage of various system compon ents,

    and uses that information to dynamically tun e system settings as required. It can react to chan ges in

    CPU and network use, and adjust settings to improve performance in active devices or reduce power

    consumption in inactive devices.

    The accompanying ktunepa rtners with the tuned-admtool to p rovide a number of tuning profiles

    that are pre-configured to enha nce performance and reduce power consumption in a nu mber of

    specific use ca ses. Edit these profiles or create new profiles to create performance so lution s tailo red

    to your environment.

    The profiles provided as part of tuned-adminclude:

    default

    The defaul t power-savin g profile. This is the most basic power-savin g profile. It enables

    on ly the disk and CPU plug-ins. Note that this is not the same as turnin g tuned-admoff,

    where bo th tuned and ktuneare disabled.

    latency-performance

    A server profile for typica l latency performance tun ing. This profile disables dynamic tuning

    mechanisms and transp arent hugepages. It uses the performancegoverner for p-states

    through cpuspeed, and sets the I/O schedu ler to deadline. Add itiona lly, in Red Hat

    Enterprise Linux 6.5 a nd later, the profile requests a cpu_dma_latencyvalue of 1. In Red

    Hat Enterprise Linux 6.4 and earlier, cpu_dma_latencyrequested a va lue of 0.

    throughput-performance

    A server profile for typica l throughput performance tuning. This p rofile is recommended if

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    24

  • 8/21/2019 OEL - RHEL - Performance Tuning

    29/93

    the system does not ha ve enterprise-class s torage. throughput-performance disa bles power

    saving mechanisms and enables the deadlineI/O scheduler. The CPU governor is set to

    performance. kernel.sched_min_granularity_ns(scheduler minimal preemption

    gran ularity) is set to 10milliseconds, kernel.sched_wakeup_granularity_ns

    (scheduler wake-up gran ula rity) is set to 15milliseconds, vm.dirty_ratio(virtualmemory dirty ratio) is set to 40%, and transp arent huge pages are enabled.

    enterprise-storage

    This profile is recommended for enterprise-sized server configurations with enterprise-class

    storage, including battery-backed controller cache protection and management of on-disk

    cache. It is the same as the throughput-performanceprofile, with one addition: file

    systems are re-mounted with barrier=0.

    virtual-guest

    This profile is optimized for virtual machines. It is based on the enterprise-storageprofile, but also d ecreases the swappiness of virtual memory. This p rofile is availab le in

    Red Hat Enterprise Linux 6 .3 and later.

    virtual-host

    Based on the enterprise-storagep rofile, virtual-hostdecreases the swappin ess ofvirtual memory and enables more aggressive writeback of dirty pag es. Non-roo t and non-

    boot file systems are mounted with barrier=0. Add itiona lly, as of Red Hat Enterprise Linux

    6.5, the kernel.sched_migration_costparameter is set to 5milliseconds. Prior to Red

    Hat Enterprise Linux 6.5, kernel.sched_migration_costused the default value of 0.5milliseconds. This p rofile is availa ble in Red Hat Enterprise Linux 6.3 and later.

    Refer to the Red Hat Enterprise Linu x 6 Power Management Guide, availab le from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , for further information abouttuned and ktune.

    3.6. Applicat ion Profilers

    Profiling is the process of gathering information about a program's behavior as it executes. You

    profile an appl ication to determine which areas of a program can b e optimized to increase the

    prog ram's overall sp eed, reduce its memory usag e, etc. App lication profiling too ls help to simplify

    this process.

    There are three supported profiling too ls for use with Red Hat Enterprise Linux 6 : SystemTap ,

    OProfile and Valgrind . Documenting these profilin g tools is o utside the scope of this guide;

    however, this section does prov ide links to further information and a brief overview of the tasks for

    which each profiler is su itable.

    3.6.1. SystemT ap

    SystemTap is a tracing and probing tool that lets users monitor and ana lyze operating system

    activities (particularly kernel activities) in fine detail. It provides information s imilar to the output of

    tools like netstat, top , psand iostat , but includes additional filtering and an alysis options for the

    information tha t is collected.

    SystemTap p rovides a deeper, more precise analysis of system activities and a pp lication behavior to

    allow you to pinpo int system and app lication bo ttlenecks.

    Chapt er 3. Monitoring and Analyzing Syst em Performance

    25

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    30/93

    The Function Callgraph plug-in for Eclipse uses SystemTap a s a back-end, allowing it to thoroughly

    monitor the status of a prog ram, including fun ction ca lls, returns, times, and user-space variables,

    and display the information visually for easy op timization.

    For further information about SystemTap, refer to the SystemTap Beginners Guide, available from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    3.6.2. OProfile

    OProfile (oprofile ) is a system-wide performance monitoring too l. It uses the processor's d edicated

    performance monitoring h ardware to retrieve information about the kernel and system executables,

    such as when memory is referenced, the number of L2 cache requests, and the number of hardware

    interrupts received. It can a lso be used to determine processor usage, and which ap plica tions and

    services are used most.

    OProfile can also be used with Eclipse via the Eclipse OProfile plug-in. This plug-in a llows users to

    easily determine the most time-consuming areas o f their code, and perform all command-line

    functions of OProfile with rich visua lization of the results.

    However, users should be aware of several OProfile limitations:

    Performance monitoring sa mples may not be precise - because the processor may execute

    instructions ou t of order, a sample may be recorded from a nearby instruction , instead o f the

    instruction that triggered the in terrupt.

    Because OProfile is system-wide and expects processes to start and stop multiple times, samples

    from multiple runs are allowed to accumulate. This means you may need to clear sa mple data

    from previous runs.

    It focuses on identifying prob lems with CPU-limited p rocesses, and therefore does not identify

    processes that are sleeping whi le they wait on locks for other events.

    For further information about using OProfile, refer to the Deployment Guide, availab le from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or to the oprofile

    documentation on your system, located in /usr/share/doc/oprofile-.

    3.6.3. Valgrind

    Valg rind p rovides a number of detection a nd p rofiling tools to help improve the performance and

    correctness of your app lication s. These tools can detect memory and thread-related errors as well as

    heap, stack an d array o verruns, allowing you to easily loca te and correct errors in your a pplication

    code. They can also pro file the cache, the heap, and branch-prediction to identify factors that mayincrease application speed and minimize application memory use.

    Valgrind ana lyzes your a pplication by running it on a synthetic CPU and instrumenting the existing

    applica tion code as it is executed. It then prints "co mmentary" clearly identifying each process

    invo lved in applica tion execution to a user-specified file descriptor, file, or network so cket. The level

    of instrumentation varies depending on the Valg rind too l in use, and its settings, but it is important to

    note that executing the instrumented code can take 4-50 times lon ger than n ormal execution .

    Valgrind can be used on your application as-is, without recompiling. However, because Valgrind

    uses debugging information to pinpoint issues in your code, if your ap plication an d support libraries

    were no t compi led with debugging in formation ena bled, recompi ling to include this information is

    highly recommended.

    As of Red Hat Enterprise Linux 6.4, Valgrind integrates with gdb(GNU Project Debug ger) to improve

    debugging efficiency.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    26

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    31/93

    More information about Valgrind is ava ilab le from the Developer Guide, available from

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or by using the man

    valgrindcommand when the valgrindpackage is installed. Accompanying documentation can alsobe found in:

    /usr/share/doc/valgrind-/valgrind_manual.pdf

    /usr/share/doc/valgrind-/html/index.html

    For in formation about how Valgrind can be used to profile system memory, refer to Section 5.3,

    Using Valgrin d to Pro file Memory Usage.

    3.6.4. Perf

    The perf tool p rovides a number of useful performance counters that let the user assess the impact

    of other commands on their system:

    perf stat

    This command prov ides overall statistics for common performance events, inclu dinginstructions executed and clock cycles consumed. You can use the option flags to gather

    statistics on events other than the default measurement events. As of Red Hat Enterprise

    Linux 6 .4, it is possible to use perf statto filter monitoring b ased on one or more

    specified con trol groups (cgroups). For further information, read the man page: man perf-

    stat.

    perf record

    This command records performance data into a file which can be later analyzed using

    perf report. For further details, read the man page: man perf-record.

    perf report

    This command reads the performance data from a file and ana lyzes the recorded data. For

    further details, read the man page: man perf-report.

    perf list

    This command lists the events available on a particular machine. These events will vary

    based on the performance monitoring h ardware and the software configu ration o f the

    system. For further information, read the man pag e: man perf-list.

    perf top

    This command performs a similar function to the top tool. It generates and disp lays a

    performance counter pro file in realtime. For further information, read the man pag e: man

    perf-top.

    perf trace

    This command performs a similar function to the strace too l. It monitors the system calls

    used by a specified thread or process and all sign als received by that applica tion.

    Add itiona l trace targets are availab le; refer to the man pa ge for a full l ist.

    More information about perf is availa ble in the Red Hat Enterprise Linux Developer Guide, availab le

    from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .

    Chapt er 3. Monitoring and Analyzing Syst em Performance

    27

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    32/93

    . .

    Red Hat Enterprise MRG's Realtime component includes Tuna, a too l that allows users to both

    ad just the tunable valu es their system and view the results of those cha nges. While it was developed

    for use with the Realtime compon ent, it can also be used to tune standard Red Hat Enterprise Linux

    systems.

    With Tuna, you can adjust or disab le unnecessary system activity, includin g:

    BIOS parameters related to power management, error detection, and system management

    interrupts;

    network settings, such as interrupt coalescing , and the use of TCP;

    journaling activi ty in journaling fi le systems;

    system logging ;

    whether interrupts and user p rocesses are han dled by a specific CPU o r range of CPUs;

    whether swap space is used; an d

    how to deal with out-of-memory exceptions.

    For more detailed con ceptual in formation about tun ing Red Hat Enterprise MRG with the Tuna

    interface, refer to the "General System Tuning" chapter of the Realtime Tuning Guide. For detailed

    instructions ab out using the Tuna interface, refer to the Tuna User Guide. Both gu ides are available

    from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_MRG/.

    Red Hat Ent erprise Linux 6 Performance Tuning G uide

    28

    http://access.redhat.com/site/documentation/Red_Hat_Enterprise_MRG/
  • 8/21/2019 OEL - RHEL - Performance Tuning

    33/93

    Chapter 4. CPU

    The term CPU, which stands for central processing unit, is a misnomer for most systems, since central

    implies single, whereas most modern systems have more than o ne processing uni t, or core.

    Physically, CPUs are contained in a packag e attached to a motherboard in a socket. Each socket on

    the motherboard has va rious con nections: to other CPU sockets, memory co ntrollers, interrupt

    controllers, and other peripheral devices. A socket to the operating system is a logica l grou ping of

    CPUs and a ssociated resources. This concept is central to most of our discussions on CPU tuning .

    Red Hat Enterprise Linux keeps a wealth of statistics about system CPU events; these statistics are

    useful in pla nn ing o ut a tuning strategy to improve CPU performance. Section 4.1.2, Tunin g CPU

    Performance discusses some of the more useful statistics, where to find them, and how to an alyze

    them for performance tuning.

    Topology

    Older computers had relatively few CPUs per system, which allowed a n a rchitecture known as

    Symmetric Multi-Processor(SMP). This meant that each CPU in the system had similar (or symmetric)access to availa ble memory. In recent years, CPU count-per-socket has g rown to the po int that trying

    to give symmetric access to a ll RAM in the system has become very expensive. Most h igh CPU coun t

    systems these days ha ve an arch itecture known as Non-Uniform Memory Access(NUMA) instead o f

    SMP.

    AMD processors h ave ha d this type of a rchitecture for some time with their Hyper Transport(HT)

    interconnects, while Intel has begun implementing NUMA in their Quick Path Interconnect(QPI)

    designs. NUMA and SMP are tuned differently, since you need to account for the topologyof the

    system when a llocating resources for an app lication.

    Threads

    Inside the Linux operating system, the unit of execution is kn own as a thread. Threads ha ve a register

    context, a stack, and a segment of executab le code which they run on a CPU. It is the job o f the

    operating system (OS) to sch edule these threads on the availa ble CPUs.

    The OS maximizes CPU utilization by load-balancing the threads across available cores. Since the

    OS is primarily co ncerned with keeping CPUs busy, it does not make optimal decisions with respect

    to app lication performance. Moving an app lication thread to a CPU on ano ther socket can worsen

    performance more than simply waiting for the current CPU to become available, since memory access

    operations can slow drastically across so ckets. For high -performance app lications, it is usua lly

    better for the designer to determine where threads are placed. Section 4.2, CPU Scheduling

    discu sses how to best alloca te CPUs and memory to best execute applica tion threads.

    Interrupts

    One of the less obvious (bu t nonetheless importan t) system events that can impact appl ication

    performance is the interrupt(also known as IRQs in Linux). These events are handled by the

    operating system, and are used by peripherals to signal the arrival o f data or the completion of an

    operation , such as a network write or a timer event.

    The manner in which the OS or CPU that is executing application code han dles an interrupt does notaffect the app lication 's functiona lity. However, it can impact the performance of the app lication. This

    chapter also discusses tips on preventing interrupts from adversely impacting application

    performance.

    Chapt er 4. CPU

    29

  • 8/21/2019 OEL - RHEL - Performance Tuning

    34/93

    4.1. CPU Topology

    4 .1.1. CPU and NUMA T opo logy

    The first computer processors were uniprocessors , meaning that the system had a sin gle CPU. The

    illusion of executing processes in parallel was done by the operating system rapidly switching the

    sing le CPU from one thread o f execution (process) to another. In the quest for increasing system

    performance, designers noted tha t increasing the clock rate to execute instructions faster on lyworked up to a p oin t (usually the limitations on creating a stable clock waveform with the current

    technology). In an effort to get more overall system performance, designers ad ded another CPU to the

    system, allowing two para llel streams of execution. This trend o f addin g processors has continued

    over time.

    Most early multiprocessor systems were designed so that each CPU had the same logica l pa th to

    each memory location (usually a p aral lel bus). This let each CPU access any memory location in the

    same amount of time as any other CPU in the system. This type of architecture is known a s a

    Symmetric Multi-Processor (SMP) system. SMP is fine for a small number of CPUs, but once the CPU

    count gets above a certain po int (8 or 16), the number of para llel traces requ ired to allow equa l

    access to memory uses too much of the availa ble board real estate, leaving less room forperipherals.

    Two new concepts combined to allow for a hig her number of CPUs in a system:

    1. Serial buses

    2. NUMA topologies

    A serial bus is a sing le-wire communication pa th with a very high clock ra te, which transfers data as

    packetized bursts. Hardware designers began to use serial bu ses as high -speed in terconnects

    between CPUs, and between CPUs and memory controllers and other peripherals. This means that

    instead o f requ iring between 32 and 6 4 traces on the boa rd from eachCPU to the memory subsystem,there was now onetrace, substantially reducing the amount of space required on the board.

    At the same time, hardware designers were packin g more transistors into the same space by reducin g

    die sizes. Instead o f putting ind ividual CPUs directly onto the main b oard, they started packing them

    into a processor package as multi-core processors. Then, instead of trying to provide equal access

    to memory from each processor package, design ers resorted to a Non-Uniform Memory Access

    (NUMA) strategy, where each package/socket combination has one or more dedicated memory area

    for high speed access. Each socket also has an in terconn ect to other sockets for slower access to

    the other sockets' memory.

    As a simple NUMA example, suppose we have a two-socket motherboard, where each socket has

    been popula ted with a quad-core package. This means the total number of CPUs in the system is

    eight; four in each socket. Each socket also ha s an a ttached memory ba nk with four gigabytes of

    RAM,