IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and...

67
IBM® Cloud and Smarter Infrastructure Software SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide Document version 2.3.6 IBM SmartCloud Orchestrator Performance Team

Transcript of IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and...

Page 1: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

IBM® Cloud and Smarter Infrastructure Software

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

Document version 2.3.6

IBM SmartCloud Orchestrator Performance Team

Page 2: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

© Copyright International Business Machines Corporation 2014. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Page 3: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

CONTENTS

Contents............................................................................................................................iii

List of Figures................................................................................................................... vi

Author List .......................................................................................................................viii

Revision History ............................................................................................................... ix

1 Introduction..........................................................................................................10

2 SmartCloud Orchestrator 2.3 Overview...............................................................11

2.1 Functional Overview ................................................................................11

2.2 Architectural Overview.............................................................................13

3 Performance Overview........................................................................................16

3.1 Sample Benchmark Environment ............................................................16

3.2 Key Performance Indicators ....................................................................18

3.2.1 Concurrent User Performance................................................................ 19

3.2.2 Provisioning Performance....................................................................... 22

4 Performance Benchmark Approaches.................................................................24

4.1 Monitoring and Analysis Tools.................................................................24

4.1.1 nmon Samples........................................................................................ 25

4.2 Infrastructure Benchmark Tools...............................................................27

4.3 Cloud Benchmarks ..................................................................................27

5 Capacity Planning Recommendations.................................................................28

iii

Page 4: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

5.1 Cloud Capacity Planning Spreadsheet ....................................................28

5.2 SmartCloud Orchestrator Management Server Capacity Planning .........29

5.3 Provisioned Virtual Machines Capacity Planning ....................................30

6 Cloud Configuration Recommendations..............................................................34

6.1 OpenStack Keystone Cache Configuration .............................................34

6.2 OpenStack Keystone Worker Support.....................................................34

6.3 IaaS Gateway Cluster Support ................................................................35

6.4 IBM Workload Deployer Configuration ....................................................35

6.5 Virtual Machine IO Scheduler Configuration............................................36

6.6 Advanced Configuration and Power Interface Management ...................36

6.7 Java Virtual Machine Heap Configuration ...............................................37

6.8 Database Configuration ...........................................................................37

7 Cloud Maintenance Recommendations...............................................................39

7.1 SmartCloud Orchestrator Volume Management......................................39

7.1.1 Install Time Requirements ...................................................................... 39

7.1.2 Long Running System Requirements..................................................... 40

7.2 The SmartCloud Orchestrator Database and Schema Summary............43

7.3 Database Management ...........................................................................43

7.3.1 DBMS Versions....................................................................................... 43

7.3.2 Automatic Maintenance .......................................................................... 43

7.3.3 Operating System Configuration (Linux) ................................................ 44

iv

Page 5: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

7.4 Database Hygiene Overview ...................................................................44

7.4.1 Database Backup Management ............................................................. 45

7.4.2 Database Statistics Management........................................................... 47

7.4.3 Database Reorganization ....................................................................... 48

7.4.4 Database Archiving................................................................................. 49

7.4.5 Database Maintenance Automation ....................................................... 50

8 Summary Cookbook............................................................................................51

8.1 Base Installation Recommendations .......................................................51

8.2 Post Installation Recommendations ........................................................52

8.3 High Scale Recommendations ................................................................52

Appendix A: SmartCloud Orchestrator Monitoring Options.............................................53

A.1 OpenStack Monitoring .............................................................................53

A.2 SmartCloud Orchestrator Monitoring .......................................................55

A.3 Infrastructure Monitoring..........................................................................55

Appendix B: OpenStack Keystone Monitoring ................................................................58

B.1 PvRequestFilter .......................................................................................58

B.2 Enabling PvRequestFilter ........................................................................59

Appendix C: IaaS Gateway Cluster Enablement.............................................................61

References......................................................................................................................64

v

Page 6: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,
Page 7: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

Figure 33: Database Online Backup Schedule................................................................................45

Figure 34: Database Incremental Backup Enablement ...................................................................45

Figure 35: Database Online Backup Manual Restore .....................................................................46

Figure 36: Database Online Backup Automatic Restore .................................................................46

Figure 37: Database Log Archiving to Disk .....................................................................................46

Figure 38: Database Log Archiving to TSM.....................................................................................46

Figure 39: Database Roll Forward Recovery: Sample A.................................................................47

Figure 40: Database Roll Forward Recovery: Sample B.................................................................47

Figure 41: Database Backup Cleanup Command ...........................................................................47

Figure 42: Database Backup Automatic Cleanup Configuration .....................................................47

Figure 43: Database Statistics Collection Command ......................................................................47

Figure 44: Database Statistics Collection Table Iterator .................................................................48

Figure 45: Database Reorganization Commands ...........................................................................48

Figure 46: Database Reorganization Table Iterator ........................................................................48

Figure 47: Database Archiving Impact.............................................................................................49

Figure 48: Sample Database Maintenance Schedule .....................................................................50

Figure 49: Sample Database Maintenance Crontab Entry ..............................................................50

Figure 50: Base Installation Recommendations ..............................................................................51

Figure 51: Post Installation Recommendations ...............................................................................52

Figure 52: High Scale Recommendations .......................................................................................52

Figure 53: OpenStack Ceilometer Metrics.......................................................................................53

Figure 54: OpenStack Ceilometer Core Metrics..............................................................................54

Figure 55: Infrastructure Core Metrics .............................................................................................57

Figure 56: Keystone Monitoring PvRequestFilter Format................................................................58

Figure 57: Keystone Monitoring PvRequestFilter Sample Output...................................................58

Figure 58: Keystone Monitoring Log Messages Example ...............................................................59

Figure 59: Keystone Monitoring Statistics Example ........................................................................60

vii

Page 8: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

AUTHOR LIST

This paper is the team effort of a number of cloud performance specialists comprising the SmartCloud Orchestrator performance team. Additional recognition goes out to the entire SmartCloud Orchestrator and OpenStack development teams.

Mark Leitch (primary contact for this paper) IBM Toronto Laboratory Nate Rockwell IBM USA Tiarnán Ó Corráin IBM Ireland

Amadeus Podvratnik Marc Schunk Peter Altevogt IBM Boeblingen Laboratory

Alessandro Chiantera Giorgio Corsetti Massimo Marra Michele Licursi Paolo Cavazza Ugo Madama IBM Rome Laboratory

viii

Page 9: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

ix

REVISION HISTORY

Date Version Revised By Comments

February 1st, 2014 Draft MDL Initial version for review.

February 23rd, 2014 2.3.0 MDL Initial version for distribution.

February 28th, 2014 2.3.1 MDL Update based on review comments.

March 18th, 2014 2.3.2 MDL Volume management update based on SCO 2.3.0.1 delivery. Addition of monitoring points in Appendix A.

March 27th, 2014 2.3.3 MDL Added maintenance crontab samples and scripts.

April 8th, 2014 2.3.4 MDL Added IWD configuration options.

August 20th, 2014 2.3.5 MDL Added Keystone monitoring reference material.

August 28th, 2014 2.3.6 MDL Added Keystone worker, IaaS gateway cluster material.

Figure 1: Revision History

Page 10: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

1 Introduction Capacity planning involves the specification of the various components of an installation to meet customer requirements, often with growth or timeline considerations. A key aspect of capacity planning for cloud, or virtualized, environments is the specification of sufficient physical resources to provide the illusion of infinite resources in an environment that may be characterized by highly variable demand. This document will provide an overview of capacity planning for the IBM SmartCloud Orchestrator (SCO) Version 2.3. In addition, it will offer management best practices to achieve a well performing installation that demonstrates service stability.

SCO Version 2.3 offers end to end management of service offerings across a number of cloud technology offerings including VMware, Kernel-based Virtual Machine (KVM), IBM PowerVM, and IBM System z. A key implementation aspect is integration with OpenStack, the de facto leading open virtualization technology. OpenStack offers the ability to control compute, storage, and network resources through an open, community based architecture.

In this document we will provide an SCO 2.3 overview, including functionality, architecture, and performance. We will then offer the capacity planning recommendations, including considerations for hardware configuration, software configuration, and cloud maintenance best practices. A summary “cookbook” is provided to manage installation and configuration for specific instances of SCO.

Note: This document is considered a work in progress. Capacity planning recommendations will be refined and updated as new SCO releases are available. While the paper in general is considered suitable for all SCO Version 2.3 releases, it is best oriented towards SCO Version 2.3.0.1. In addition, a number of references are provided in the References section. These papers are highly recommended for readers who want detailed knowledge of SCO server configuration, architecture, and capacity planning.

Note: Some artifacts are distributed with this paper. The distributions are in zip format. However Adobe protects against files with a “zip” suffix. As a result, the file suffix is set to “zap” per distribution. To use these artifacts, simply rename the distribution to “zip” and process as usual.

10

Page 11: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

11

2 SmartCloud Orchestrator 2.3 Overview An overview of SCO Version 2.3 will be provided from the following perspectives:

1. Functional

2. Architectural

2.1 Functional Overview

The basic functional capability of SCO involves the management of cloud computing resources for dynamic data centers. The following figure provides a functional (service level) overview of SCO.

Figure 2: SCO Functional Overview

In a nutshell, SCO offers infrastructure, platform, and orchestration services that make it possible to lower the cost of service delivery (both in terms of time and skill) while delivering higher degrees of standardization and automation. A more detailed cloud marketplace view of the SCO solution follows.

Page 12: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Figure 3: SCO Cloud Marketplace View

The core functional capabilities of SCO include the following.

Workflow Orchestration. The Business Process Manager (BPM) component offers a standard library as well as a graphical editor for workflow orchestration. Overall, this provides a powerful mechanism for complex and custom business process in the cloud context.

Pattern Management. The IBM Workload Deployer (IWD) offers sophisticated pattern support for deploying multi node applications that may consist of complex middleware. Once again, graphical editor support for pattern management is provided.

Image Management. This is comprised of an image construction and composition tool, as well as a Virtual Image Library (VIL) to facilitate image development and reduce image sprawl.

Service Management. Service management options are available in the SCO Enterprise edition. It provides a set of management utilities to further facilitate business process management.

Not shown in the diagram is a Scalable Web Infrastructure to facilitate cloud self service offerings. For more information please consult the SCO information center (URL). In addition, the SCO resource center is available (URL).

12

Page 13: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

13

2.2 Architectural Overview

The following diagram shows the reference deployment topology for SCO. A description of the reference topology follows.

Figure 4: SCO Architecture Reference Topology

Page 14: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

The core of the reference topology is based on a core set of virtual machines:

Central Server 1. This server hosts the DB2 Database Management System (DBMS). The performance of the DBMS is critical to the overall solution and is dealt with extensively in Section 7.3.

Central Server 2. This server hosts OpenStack Keystone, providing identity, token, catalog, and policy services. In addition, it hosts the Virtual Image Library (VIL) and SCO gateway services. The most critical aspect of this server is managing the Keystone configuration as described in Section 6.1.

Central Server 3. This server hosts the IBM Workload Deployer pattern engine and the Scalable Web UI. Performance configuration of these components is described in Section 6.

Central Server 4. This server hosts the Business Process Manager engine. Performance configuration of these components is described in Section 6.

Central Server 5. This server hosts the System Automation Application Manager. This is an optional virtual machine that can be used to manage automatic start and stop orchestration of the SCO management server itself.

Associated with these core server virtual machines are a number of region servers. Region servers may represent a specific cluster or geographic zone of cloud compute nodes. Sample compute nodes are shown for VMware, KVM, and PowerVM, with associated communication paths. For example, for VMware the SCE driver is used to drive the operation of the VMware cluster. For KVM, the OpenStack control node is used to coordinate the KVM instance.

Given this is a virtual implementation, some considerations should be kept in mind:

In general, it is more difficult to manage performance in a virtual environment due to the additional hypervisor management overhead and system configuration.

Device parallelism via dedicated storage arrays/LUNs is preferred. Sample approaches, from most impactful to least impactful, are provided below.

o Separate data stores for “managed from” and “managed to” environments.

o Spread data stores across several physical disks to maximize storage capability.

o Separate data stores for image templates and provisioned images.

o Employ the “deadline” or “noop” scheduler algorithm for management server and provisioned VMs (see Section 6.5).

o Optimize base storage capability (i.e. SSD with “VMDirectPath” enablement for VMware). Servers where this may be critical, due to their

14

Page 15: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

15

dependency on disk IO capabilities, are Central Server 1 and the VMware vCenter instances.

Network optimization, for example 10GbE adoption. In addition, segment customer networks to an acceptable level to reduce address lookup impact.

Page 16: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

3 Performance Overview There are two distinct aspects of cloud performance:

1. Performance of the SCO management server itself. This is the primary focus of this section.

2. Performance of the provisioned server instances. This is more of a capacity planning statement, and is covered in Section 5.3.

We will provide a general overview of the Key Performance Indicators (KPIs) for the SCO management server. The following sections will describe the general benchmark environment, and the associated KPIs.

3.1 Sample Benchmark Environment

The following figure shows a sample configuration that has been used for SCO benchmarks.

Figure 5: SCO Sample Benchmark Environment

16

Page 17: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

17

The environment is characterized by the following features, broken down in terms of the SCO management server (aka “managed from”) and the associated cloud (aka “managed to”).

Managed from:

o Server configuration:

4/5 HS22V Blades with 2 x 4 cores Intel Xeon x5570 2.93 GHz.

8 physical cores per blade, 16 logical cores when hyper-threading is enabled.

72 GB RAM per blade.

2 x Redundant 10G Ethernet Networking (Janice HSSM).

2 x Redundant 8G FC Network (Qlogic FC SM).

o Storage configuration: 1 x DS3400 with 4 Exp with 12 Disk 600 GB SAS 10K each (48 x 600 GB = 28.8 TB raw).

Managed to:

o Server configuration:

Tens of HS22V Blades with 2 x 6 cores Intel Xeon x5670 2.93 GHz.

12 physical cores per blade, 24 logical cores when hyper-threading is enabled.

72 GB RAM per blade.

2 x Redundant 10G Ethernet Networking (Janice HSSM).

2 x Redundant 8G FC Network (Qlogic FC SM).

o Storage configuration: 1 x Storwize v7000 with 3 Exp with 12 Disks 2 TB NL-SAS 7.2k each (36 x 2 TB = 72 TB raw).

o Storage access has been configured to use the multi-path access granted by Storwize. In particular, VMware ESXi servers have been configured to use all of the 8 active paths to access LUNs using a round robin policy.

Page 18: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

3.2 Key Performance Indicators

The following Key Performance Indicators are managed for SCO through a set of comprehensive benchmarks.

1. Concurrent User Performance, comprising:

a. Average response time for SCO pages related to administrative tasks.

b. Average response time for SCO pages related to end user tasks.

2. Provisioning throughput, comprising:

a. Provisioning throughput for a vSys with a single part.

b. Average service time for provisioned VMs.

3. LAMP (Linux, Apache, MySQL, Python) stack performance, comprising:

a. vApp deployment time.

b. vApp stop time.

c. vApp deletion time.

4. Bulk windows stack performance comprising vSys with multiple parts (15 VMs) provisioning time.

5. Virtual Image Library performance comprising:

a. Registration discovery throughput.

b. Registration basic indexing throughput.

c. Image checkin time.

d. Image checkout time.

A key aspect of the benchmarks is they are run with associated background workloads and for a long duration (e.g. weeks or months). The rationale behind this is very simple: to run benchmarks that closely emulate the customer experience and will drive “real world” results (versus overly optimistic lab based results). We will describe the concurrent user and provisioning throughput KPIs in more detail.

18

Page 19: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

19

3.2.1 Concurrent User Performance

SCO User Interface performance is established through concurrent user benchmark tests. In order to understand the applicability of such a benchmark, it is important to understand what is meant by a concurrent user. Consider:

P = total population for an instance of SCO (including cloud administrators, end users, etc.).

C = the concurrent user population for an instance of SCO. Concurrent users are considered to be the set of users within the overall population P that are actively managing the cloud environment at a point in time (e.g. administrator operations in the User Interface, provisioning operations, etc.).

In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic that a total population of 200 users may have a concurrent user population of 40 users (i.e. 20%).

For the concurrent user workload driven for SCO, there are three sets of criteria that drive the benchmark:

1. Load driving parameters.

2. Data population.

3. Load driving (user) scenarios.

Load Driving Parameters

The following load driving parameters apply.

1. User transaction rate control. The frequency that simulated users drive actions against the back end is managed via loop control functions. Closed loop simulation approaches are used where a new user will enter the system only when a previous user completes. Through the closed loop system, steady state operations under load may be driven.

2. Think times. Think times are the “pause” between user operations, meant to simulate the behavior of a human user. The think time interval used is [100%,300%] (meaning, the replay via the load driver is up to three times the rate of the scenario recording rate).

3. Bandwidth throttling. In order to simulate low speed or high latency lines, bandwidth throttling is employed for some client workloads. The throttle is set to a value that represents a moderate speed ADSL connection (cable/DSL simulation setting of 1.5 Mbps download, 384 Kbps upload).

Page 20: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Data Population Parameters

The benchmark is run against a data model that represents a large scale customer environment. The following table shows a sample configuration where the system is populated with data to represent a large number of users, active Virtual System instances, and active Virtual Machines existing prior to SCO installation. Through this approach, the workload for managing the solution is representative of some customer environments.

Benchmark Parameter Value

Cloud Administrators 1 Cloud Domains 1 Tenants 1 Users 200

Hypervisor Types 1

(VMware) Cloud Groups 1 Environment Profile 1

Image Templates 40

(20Linux, 20 Windows)

vSys Patterns 20 + 1

(20 Linux vSys patterns, 1 bulk Windows pattern)

vApp Patterns 1

(LAMP vApp for VMware domain)

Flavors 5

(1 flavor for RHEL, 3 flavors for Windows, 1 flavor for vApp)

Active vSys instances 20

(1 per Linux vSys Pattern)

Standalone (Unmanaged) VMs 400

(10 per image template 200 Linux, 200 Windows)

Figure 6: Benchmark Data Model Population

20

Page 21: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

21

Load Driving (User) Scenarios

The concurrent user population (i.e. C) is broken down into the following user profile distribution and scenarios.

User Profile Scenario per User

Number of Users: 20 (50% overall)

User Type: End User

Task Type: VM Provisioning

Activity: vSys with single part (Linux) provisioning through Self-Service Catalog (SSC) offering on VMware.

1. Login. 2. Provision vSys single part using SSC

offering. 3. Wait until available. 4. Go to the vSys instance details page. 5. Delete vSys using SSC offering. 6. Wait until deletion complete. 7. Logout. 8. Enter next cycle according to arrival rate.

Number of Users: 16 (40% overall)

User Type: End User

Task Type: User Management

Activity: End user operations through Self-Service Catalog (SSC) offering.

1. Login. 2. Submit SSC offering "Create User in VM",

selecting one of the VMs belonging to one of the pre-populated vSys.

3. Wait until done. 4. Submit SSC offering "Delete User in VM",

selecting the same VM. 5. Wait until done. 6. Logout. 7. Enter next cycle according to arrival rate.

Number of Users: 2 (5% overall)

User Type: Administrator

Task Type: Monitoring

Activity: Administrative operations through the IBM Workload Deployer user interface.

1. Login. 2. List hypervisors. 3. Select a hypervisor. 4. List VMs in hypervisor. 5. Show all instances. 6. Go to "My Requests". 7. Sort the requests by status. 8. View the trace log. 9. Logout.

Number of Users: 1 (2.5% overall)

User Type: End User

Task Type: Provisioning

Activity: vApp (LAMP) provisioning through IBM Workload Deployer user interface on VMware.

1. Login. 2. Provision vApp using the IWD UI. 3. Wait until available. 4. Stop vApp using the IWD UI. 5. Wait until done. 6. Delete vApp using the IWD UI. 7. Wait until deletion complete. 8. Logout. 9. Enter next cycle according to arrival rate.

Number of Users: 1 (2.5% overall)

User Type: End User

Task Type: Provisioning

1. Login. 2. Provision vSys bulk Windows using SSC

offering. 3. Wait until available. 4. Go to vSys instance details page. 5. Delete vSys bulk Windows using SSC

Page 22: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Activity: vSys with multiple parts (bulk Windows) provisioning through Self-Service Catalog (SSC) offering on VMware.

offering. 6. Wait until deleted. 7. Logout. 8. Enter next cycle according to arrival rate.

Figure 7: Load Driving (User) Scenarios

In overall terms, 55% of the load driving activities are driving Virtual Machine provisioning scenarios. The remaining 45% of scenarios are general administration and management tasks. For the active workload, the user operations meet the following response time thresholds.

Administrative page response times: 90% of pages < 10s, 100% of pages < 15s.

End user operations: 90% of pages < 2s, 100% of pages < 5s.

3.2.2 Provisioning Performance

Cloud provisioning is enormously complex in performance terms. Hardware configuration, user workloads, image properties, and a multitude of other factors combine to determine overall capability. SCO provisioning performance is typically measured via a closed system, defined as an isolated system where we can demonstrate a constant sustained provisioning workload. In order to achieve this, as requests complete within the system, new requests are initiated.

Figure 8: Provisioning Performance in a Closed System

The performance systems running SCO workloads literally run for months. These systems are treated like customer systems with 24x7 operations and field ready maintenance approaches in place (as described in Section 7). In terms of provisioning performance, the following are sample statistics a long run scenario driven for a number of weeks, once a period of operational stability has been reached based on the recommendations provided in this paper.

Number of systems provisioned: 172,536 VMs.

Provisioning rate (average): 187 VMs/hour.

Service times (average): 3 minutes 28 seconds (IBM Workload Deployer with VMware linked clones).

Workflow capability: On the order of 300 workflows per hour (generally short running workflows under a minute in duration).

22

Page 23: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

23

Success rate: 99.996%

Given this is sustained, continuous workload, higher peak workloads are, of course, possible. The success rate is considered especially noteworthy.

Page 24: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

4 Performance Benchmark Approaches As part of cloud management and capacity planning, it is valuable to manage cloud benchmarks. Value propositions include:

Understanding the capability of the cloud infrastructure (and potentially poorly configured or under performing components of the infrastructure).

Understanding the base capability of the SCO implementation and associated customization.

Understanding the long term performance stability of the system.

We will describe basic system monitoring approaches, infrastructure benchmarks, and cloud benchmarks.

4.1 Monitoring and Analysis Tools

The following table shows the core recommended monitoring and analysis tools.

Tool Description

pdcollect SCO log collection tool. Documentation and recommended invocation: SCO Product Information Center

esxtop VMware performance collection tool. Documentation: URL Recommended invocation: esxtop -b -a -d 60 -n <number_of_samples> > <output file>

nmon nmon is a comprehensive system monitoring tool for the UNIX platform. It is highly useful for understanding system behavior. Documentation: URL Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file> Note: On Windows systems, Windows perfmon may be used.

db2support Database support collection tool. Documentation: URL Recommended invocation: db2support <result directory> -d <database> -c -f -s -l

DBMS Snapshots

DBMS snapshot monitoring can offer insight into SQL workload, and in particular expensive SQL statements. Documentation: URL

WAIT Java WAIT monitoring can provide a non invasive view of JVM performance through accumulated Java cores and analytic tools. Documentation and recommended invocation: URL

Figure 9: Monitoring and Analysis Tools

24

Page 25: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

25

4.1.1 nmon Samples

The following figures represent nmon samples for a 22 concurrent user scenario (based on the user profiles in Section 3.2.1.

Page 26: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Figure 10: nmon Samples

Analysis of the samples follows.

The samples show the summary utilization (CPU, IO) for Central Servers 1 through 4, and the Region Server.

All servers have 8 vCPUs allocated, with the exception of Central Server 4, which has 4 vCPUs.

In general, all nodes are consuming less than 1 vCPU. The exceptions are the Region Server (≈1.6 vCPUs) and Central Server 3 (≈2.4 vCPUs). This reflects an IBM Workload Deployer scenario.

For IO, the bulk of the IO workload is associated with the database node. This is not surprising, and reinforces the recommendations for IO optimization on the DBMS node.

While the summary view is valuable for an “at a glance” assessment, it is always recommended to look at the fine grained results in nmon to ensure processor utilization is healthy (e.g. minimal or no blocked processes, minimal or zero wait times, healthy multi processor utilization).

26

Page 27: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

27

4.2 Infrastructure Benchmark Tools

The following table shows some recommended infrastructure benchmark tools.

Tool Description

iometer I/O subsystem measurement and characterization tool for single and clustered systems. Documentation: URL Recommended invocation: dynamo /m <client host name or ip>

iperf TCP and UDP measurement and characterization tool that reports bandwidth, delay, jitter, and datagram loss. Documentation: URL Recommended server invocation: iperf –s Recommended client invocation #1: iperf -c <server host name or ip> Recommended client invocation #2: iperf -c <server host name or ip> -R

UnixBench UNIX measurement and characterization tool, with reference benchmarks and evaluation scores. Documentation: URL Recommended invocation: ./Run

Figure 11: Infrastructure Benchmark Tools

4.3 Cloud Benchmarks

Cloud benchmarks should be based on enterprise utilization. Sample benchmarks that are easy to manage include the following.

1. Single VM deployment times.

2. Small scale concurrent VM deployment times (e.g. 10 requests in parallel).

3. REST API response times.

It is recommended to establish a small load driver, record a baseline, and then use these small benchmarks as a standard to assess ongoing cloud health. More complex benchmarks, including client request monitoring approaches, may of course be established.

For OpenStack specific benchmarks, OpenStack Rally may be leveraged (see the References section for further detail). In addition, the Open Systems Group is involved in cloud computing benchmark standards. A report, including the IBM CloudBench tool, is available in the References section.

Page 28: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

5 Capacity Planning Recommendations We will provide capacity planning recommendations through three approaches.

Static planning via a spreadsheet approach.

Capacity planning for the SCO management server (aka the “managed from” infrastructure).

Capacity planning for the provisioned Virtual Machines (aka the “managed to” infrastructure).

5.1 Cloud Capacity Planning Spreadsheet

In order to provide a desired hardware and software configuration for an SCO implementation, a wide range of parameters must be understood. The following questions are usually relevant.

1. What operations are expected to be performed with SCO?

2. What are the average and peak concurrent user workloads?

3. What is the enterprise network topology?

4. What is the expected workload for provisioned virtual servers, and how do they map to the physical configuration?

5. For the provisioned servers:

a. What is the distribution size?

b. What are the application service level requirements?

A capacity planning spreadsheet is attached to this paper (“SCO Capacity Planning Profile v2.3.3.xlsx”). The spreadsheet may be used to provide a cloud profile for further sizing activities (e.g. a capacity planning activity in association with the document authors).

28

Page 29: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

29

5.2 SmartCloud Orchestrator Management Server Capacity Planning

The SCO management server requirements are documented in the SCO Information Center (URL). The summary table is repeated here for discussion purposes1.

Server & Configuration Processor (vCPUs)

Memory (GB)

Storage (GB)

Minimum 2 vCPUs 6 GB 100 GB Central Server 1

Recommended 4 vCPUs 12 GB 200 GB

Minimum 2 vCPUs 8 GB 141 GB Central Server 2

Recommended 4 vCPUs 12 GB 200 GB

Minimum 2 vCPUs 4 GB 80 GB Central Server 3

Recommended 4 vCPUs 8 GB 160 GB

Minimum 2 vCPUs 6 GB 50 GB Central Server 4

Recommended 2 vCPUs 8 GB 60 GB

Minimum n/a n/a n/a Central Server 5 (optional)

Recommended 2 vCPUs 4 GB 20 GB

Minimum 2 vCPUs 4 GB 76 GB Region Server

Recommended 8 vCPUs 8 GB 160 GB

Minimum 10 vCPUs 28 GB 447 GB Totals

Recommended 24 vCPUs 52 GB 800 GB

Figure 12: SCO Management Server Capacity Planning

While further qualifiers are available in the Information Center, some comments apply.

In general, the recommended vCPU and memory allocations should be met.

To determine the ratio of virtual to physical CPUs, monitoring of the production system is required. For performance verification, a 1:1 mapping is used.

For the physical mapping, it is important to distinguish between “real” cores and hyper threaded (HT) cores. External benchmarks suggest an HT core may yield 30% of the capability of a “real” core.

1 Provided values reflect the SCO 2.3.0.1 release.

Page 30: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

The recommended storage amounts are highly subjective. For example, the minimum recommendations are sufficient for performance verification systems driven for months (with some minor exceptions). Recommended volume management approaches are provided in Section 7.1.

5.3 Provisioned Virtual Machines Capacity Planning

Managing cloud workloads is typically driven as a categorization exercise where workload “sizes” are used to determine the overall capacity requirements. A capacity planning tool is available for managing the cloud workload sizes (URL). We will provide an overview of using this tool.

The first step is to provide any relevant business value. In the absence of a defined opportunity, simple “not applicable” entries may be given (per the sample below). Once submitted, you must accept the usage agreement which will bring up the demographic page.

Figure 13: Capacity Planning Tool: Inquiry Form

30

Page 31: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

31

The demographic page simply asks for generic information about the submitter.

Figure 14: Capacity Planning Tool: User Demographic Information

When “Continue” is selected, then the systems and storage page is provided.

Figure 15: Capacity Planning Tool: Systems and Storage

Page 32: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Then the target system and associated utilization and Virtual Machine requirements are selected. Note for the utilization we select 20% headroom to support peak cloud workloads.

Figure 16: Capacity Planning Tool: System and Workload Options

At this point, the virtual machine requirements may be selected. Note a number of entries may be added.

Figure 17: Capacity Planning Tool: Virtual Machine Requirements

A confirmation screen is then provided to finalize the capacity planning request.

Figure 18: Planning Tool: Confirmation Screen

32

Page 33: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

33

The summary capacity planning recommendation is then provided. The summary details the compute node, CPU, memory, and storage requirements based on the selected configuration and associated workloads.

Figure 19: Planning Tool: System Summary

Page 34: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

6 Cloud Configuration Recommendations The SCO 2.3 and 2.3.0.1 offerings provide suitable configuration as part of the default installation. However, there are some specific configuration aspects that may improve the capability. The configuration points follow.

1. OpenStack Keystone cache.

2. OpenStack Keystone worker support.

3. IaaS Gateway cluster support.

4. IBM Workload Deployer configuration.

5. Virtual Machine IO scheduler.

6. Advanced Configuration and Power Interface (ACPI) management.

7. Java Virtual Machine heap.

8. Database.

6.1 OpenStack Keystone Cache Configuration

SCO is deployed with a default two gigabyte cache for the Keystone cache (aka “memcached”) configuration. The intent of the cache is to provide an in memory repository of Keystone tokens to improve system throughput, particularly under concurrent workloads.

Assuming there exists sufficient memory on the Keystone VM (Central Server 2), the recommendation is to double the cache configuration to four (4) gigabytes. Instructions on how to modify the cache setting are provided here.

An appendix is provided that offers guidance on low level Keystone monitoring to determine health and throughput capability.

6.2 OpenStack Keystone Worker Support

The initial SCO 2.3 offering contains a Keystone implementation that is characterized by a single execution thread instance. Improvements have been made to exploit multiple concurrent Keystone workers. This change is generally advised when Keystone exhibits high request latency, or is seen to consume a significant amount of a virtual CPU (e.g. > 80%). In order to exploit this support, two steps are required.

1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this paper may be contacted for further detail (this paper will be revised upon official availability).

2. Revise the configuration to exploit multiple workers. Further detail on this is provided below.

34

Page 35: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

35

With the Keystone Worker improvement in place, the following configuration change will allow a pool of four public workers, and four administrative workers. This will permit increased concurrency, at the expense of virtual CPU consumption. As a result, the virtual CPU allocation should be increased based on monitoring data. In the “4+4” worker example below, it is expected to increase the virtual CPU allocation on the order of two to four virtual CPUs.

Location: (Central Server 2)

/etc/keystone/keystone.conf

# The number of worker processes to serve the public WSGI application

# (integer value).

public_workers=4

# The number of worker processes to serve the admin WSGI application

#(integer value).

admin_workers=4

Figure 20: Keystone Worker Configuration

6.3 IaaS Gateway Cluster Support

Similar to the Keystone worker support in the previous section, the IaaS Gateway cluster support permits the deployment of a scalable cluster of IaaS Gateway instances to drive greater concurrency and reduce latency. In order to exploit this support, two steps are required.

1. Obtain the required SCO 2.3 limited availability fix or fixpack. The authors of this paper may be contacted for further detail (this paper will be revised upon official availability).

2. Implement the cluster. See Appendix C for further details.

Similar to the Keystone worker support, the IaaS Gateway cluster will drive additional virtual CPU utilization. It is expected to monitor and increase the virtual CPU allocation based upon system load.

6.4 IBM Workload Deployer Configuration

The IWD component offers a number of configuration options. One specific option provides the ability to control a polling interval to refresh cloud information. Based on the size of the cloud, this configuration option should be changed.

Location: (Central Server 3)

/opt/ibm/rainmaker/purescale.app/private/expanded/ibm/rainmaker.vmsupport-

4.0.0.1/config/vmpublish.properties

Original: RuntimeInterval=12000

Recommended: RuntimeInterval=30000

Figure 21: IWD Configuration

Page 36: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

6.5 Virtual Machine IO Scheduler Configuration

Each Linux instance has an IO scheduler. The intent of the IO scheduler is to optimize IO performance, potentially by clustering or sequencing requests to reduce the physical impact of IO. In a virtual world, however, the operating system is typically disassociated from the physical world through the hypervisor. As a result, it is recommended to alter the IO scheduler algorithm so that it is more efficient in a virtual deployment, with scheduling delegated to the hypervisor.

The default scheduling algorithm is typically “cfq” (completely fair queuing). Alternative and recommended algorithms are “noop” and “deadline”. The “noop” algorithm, as expected, does as little as possible with a first in, first out queue. The “deadline” algorithm is more advanced, with priority queues and age as a scheduling consideration. System specific benchmarks should be used to determine which algorithm is superior for a given workload. In the absence of available benchmarks, we would recommend the “deadline” scheduler be used.

The following console output shows how to display and modify the IO scheduler algorithm for a set of block devices. In the example, the “noop” scheduler algorithm is set. Note to ensure the scheduler configuration persists, it should be enforced via the operating system configuration (e.g. /etc/rc.local).

Figure 22: Modifying the IO Scheduler

6.6 Advanced Configuration and Power Interface Management

The Advanced Configuration and Power Interface (ACPI) operating system support may exhibit high virtual CPU utilization and offers limited value in virtual environments. It is recommended to disable ACPI on the SCO “managed from” nodes through the following steps.

1. Disabling “kacpid”. To switch off the kernel ACPI daemon, edit “/etc/grub.conf” and append "acpi=off" to the kernel boot command line. For example: title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7-d540b32b1f35 initrd /boot/initramfs-2.6.32-431.el6.x86_64.img becomes:

36

Page 37: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

37

title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7-d540b32b1f35 acpi=off initrd /boot/initramfs-2.6.32-431.el6.x86_64.img

2. Disabling the user-space acpi daemon. To disable user space ACPI on managed-from nodes: chkconfig acpid off

3. Reboot the nodes.

6.7 Java Virtual Machine Heap Configuration

The default Java Virtual Machine (JVM) heap sizes are intended to be economical. However, in the presence of sufficient available memory, it is recommended to increase the heap allocation. The three change sets below are recommended for application. They apply to Central Server 3 and, in particular, the IBM Workload Deployer instance. The IWD instance should be restarted once the changes are complete.

Location: /opt/ibm/rainmaker/purescale.app/config/overrides.config

Original: /config/zso/jvmargs = ["-Xms1024M","-Xmx1024M"]

Recommended: /config/zso/jvmargs = ["-Xms1536M","-Xmx1536M"]

Location: /etc/rc.d/init.d/iwd-utils Original: sed -i -e 's/3072M/1024M/g' $ZERO_DIR/config/overrides.config Recommended: sed -i -e 's/3072M/1536M/g' $ZERO_DIR/config/overrides.config

Location: /opt/ibm/rainmaker/purescale.app/config/zero.config Original: "-Xms1024M","-Xmx1024M" Recommended: "-Xms1536M","-Xmx1536M"

Figure 23: Java Virtual Machine Heap Change Sets

6.8 Database Configuration

SCO is deployed with a DB2 database. The performance of the database is critical to the overall capability of the solution. The following database configuration changes are recommended for a base SCO 2.3 installation. Note some configuration changes should be in place for a SCO 2.3.0.1 installation, as noted. As a result, these specific steps are optional depending on the specific version deployed.

Type Configuration

Configuration For each relevant database (see Section 7.2) set: STMT_CONC = LITERALS LOCKTIMEOUT = 60 NUM_IOCLEANERS = AUTOMATIC NUM_IOSERVERS = AUTOMATIC AUTO_REORG = ON

Page 38: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

For example: db2 UPDATE DB CFG FOR OPENSTAC USING LOCKTIMEOUT 60

Index Addition A number of OpenStack database indexes are required. Please apply the “SCO_CREATE_INDEXES.sh” script provided with this paper.

Note an “SCO_DROP_INDEXES.sh” script is provided in the event it is desired to drop the indexes.

Foreign Key Modification

An OpenStack foreign key should be modified to enable cascading deletes. Please apply the “SCO_MODIFY_FKEY.sh” script provided with this paper.

Figure 24: Database Configuration Change Sets

38

Page 39: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

39

7 Cloud Maintenance Recommendations We will describe recommended maintenance approaches for the SCO file systems volumes and the DB2 Database Management System.

7.1 SmartCloud Orchestrator Volume Management

We will outline the SCO 2.3 volume management requirements. We will first describe the install time requirements, and then the requirements for a long running system.

7.1.1 Install Time Requirements

The following table describes the SCO volume requirements, both overall and installation time free space requirements2. The overall requirements are useful for initial hardware allocations. The free space requirements are part of the installer pre-requisite checks. The intent is to ensure basic system health for the minimal set of file systems (i.e. ‘/’ and ‘/home’).

Volume Requirements (GB)

Server vCPUs RAM (GB)

Overall Free Space: ‘/’ Free Space:

‘/home’

Central Server 1

2 6 100 75 19

Central Server 2

2 8 1413 55 30

Central Server 3

2 4 80 70 4

Central Server 4

2 6 50 40 4

Central Server 5

2 4 20 20 n/a4

Region Server

2 4 76 40 30

Figure 25: SCO 2.3 Volume Management: Install Time Requirements

Some comments on the installation requirements:

These are the minimum installation requirements. The minimum and recommended requirements are provided in the product information center (URL).

The root requirement excludes the home requirement.

2 Referenced requirements are for the SCO 2.3.0.1 release. 3 Also requires 10GB and 40GB in the /opt and /tmp file systems, respectively. 4 Central Server 5 is an optional component. It is not managed as part of the installation pre-requisite check and is listed here for completeness.

Page 40: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

The /home file system on Central Server 2 and the Region Server is primarily consumed by the /home/library directory of the Virtual Image Library. This path may be symbolic linked to an external volume to simplify image volume management.

It should be noted there is a gap between the overall numbers and the free space numbers reported. This is the result of the following factors.

o The overall numbers describe the volume requirements at the hardware level, prior to base operating system installation.

o The installer pre-requisite check is dealing with an installed system (i.e. post base operating system installation). As a result, approximately 6 GB is expected to be consumed by the base installation and related artifacts. Once this is factored in, the numbers align.

7.1.2 Long Running System Requirements

While the installation requirements are useful, the true management aspect arises from a system under load for a significant period of time. The following tables show fine grained disk requirements for systems running continuous workloads (the so called “24 x 7” workloads) for months.

Volume Size in MB

Volume Central Server 1

Central Server 2

Central Server 3

Central Server 4

Region Server

/bin/ 10 10 8 10 10/boot/ 27 27 27 27 27/data/ 11273 - - - 1820/drouter/ - - 23403 - -/etc/ 35 35 35 35 36/home/ 131153 24 1 1 23/iaas/ 8 - - - 7/lib/ 138 146 135 140 129/lib64/ 28 32 28 28 28/opt/ 2738 4075 1203 6444 611/root/ 6 2 2 2 1699/sbin/ 15 18 15 15 18/tmp/ 4 142 27 157 67/usr/ 3250 3587 3048 3062 3556/var/ 521 399 672 186 3908

Figure 26: Long Running System Requirements: System A

Volume Size in MB

Volume Central Server 1

Central Server 2

Central Server 3

Central Server 4

Region Server

/bin/ 10 10 8 10 10/boot/ 27 27 27 27 27/data/ 11273 - - - 1820/drouter/ - - 37263 - -/etc/ 35 35 35 35 36

40

Page 41: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

41

/home/ 173042 3154 1 1 8240/iaas/ 8 - - - 7/lib/ 138 146 135 140 129/lib64/ 28 32 28 28 28/opt/ 2738 6856 1204 6503 611/root/ 2 2 2 2 447/sbin/ 15 18 15 15 18/tmp/ 67 142 90 280 152/usr/ 3250 3588 3048 3062 3556/var/ 179 13653 11406 1076 540

Figure 27: Long Running System Requirements: System B

This fine grained information is useful, but also a bit overwhelming. Let us look at a summary view relative to the installation free space requirements. Please keep in mind the free space requirements are typically 6GB off from the overall (hardware) requirement, but we consider the finer grained values more useful for comparison purposes.

Volume Management (GB)

Server Volume Install Free

Space System A Utilization

System B Utilization

‘/’ 75 18 17 Central Server 1

‘/home’ 19 128 169

‘/’ 55 8 24 Central Server 2

‘/home’ 30 < 1 3

‘/’ 70 28 52 Central Server 3

‘/home’ 4 < 1 < 1

‘/’ 40 10 11 Central Server 4

‘/home’ 4 < 1 < 1

‘/’ 40 12 7 Region Server

‘/home’ 30 < 1 8

Figure 28: Long Running System Requirements Summary

The summary view, in the context of the installation free space requirements shows some surprising results.

The installation requirements are generally overstated. While there is some factoring for maintaining large installation bundles, the values ensure long term operational health (with some exceptions, described below).

For the Central Server 1, the ‘/data’ directory actually contains ~11GB which includes the RHEL ISO files required for installation.

Page 42: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Notable issues are highlighted in bold and orange and described below.

o The ‘/home’ volume is clearly out of control on both System A and B. This is actually an error logging issue, and is described in the following section.

o The ‘/home’ on the System B Region Server is showing greater than expected utilization. This is associated with the Virtual Image Library management and is considered within the recommended allocation.

Not all file systems are enumerated in the interests of brevity. These file systems can generally be considered noise, contributing on the order of a handful of megabytes per server. The one exception to this is the ‘/install’ file system where most notably it consumes 20 GB on the System A region server and 61 GB on the System B region server.

It should be noted these results are for a specific installation. As always, different installations may have different requirements based on usage. For example, images used for the Virtual Image Library on Central Server 2 can contribute significantly to utilization. Volume monitoring is always recommended as a best practice.

Central Server 1 Error Logging Issue

A core question is: why is the Central Server 1 ‘/home’ utilization so high? The simple answer is for the systems in questions, a program error is generating massive log entry activity into the database. For example, the PDWDB database log entries are consuming 147GB alone (87%) of the overall space!

Is this normal? Absolutely not. A specific program error was triggered in our environment, and suitable fixes have been put in place.

The following section provides a brief summary of the SCO 2.3 database structure, archive logging, and some recommended database management approaches (including online backup management).

42

Page 43: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

43

7.2 The SmartCloud Orchestrator Database and Schema Summary

The SCO DB2 databases typically run under the default instance of DB2INST1. The following table summarizes the individual SCO databases.

Database Schema(s) Comments

BPMDB

CMNDB

PDWDB

BPMUser Business Process Manager (BPM) databases.

OPENSTAC CIRnnnnn GLEnnnnn NOAnnnnnn SCEnnnnnn KSDB

OpenStack database. Note the “nnnnn” schema suffix is variable per region.

RAINMAKE DB2INST1 IBM Workload Deployer (IWD) database. Uses the default schema for the database instance (in this case, DB2INST1).

STORHOUS DB2INST1 IBM Workload Deployer (IWD) database. Uses the default schema for the database instance (in this case, DB2INST1).

Figure 29: Database and Schema Summary

7.3 Database Management

Generally speaking, the “out of the box “database configuration will achieve good results for both large and small installations. The following recommendations are primarily in the area of database maintenance.

7.3.1 DBMS Versions

The following DBMS versions are recommended. All versions should be 64 bit.

Version Notes

DB2 10.1 fp3 or later DB2 10.5 and upward is not currently supported.

Figure 30: DBMS Versions

7.3.2 Automatic Maintenance

DB2 offers a number of automatic maintenance options. Automatic statistics collection (aka runstats) is considered a basic and necessary configuration setting, and is enabled for the product by default. Two other recommended configuration settings follow. It is

Page 44: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

expected these configuration settings will be enabled by default in future versions of the products.

1. Real time statistics. The default runstats configuration generally collects statistics at two hour intervals. The real time statistics option provides far more granular statistics collection, essentially generating statistics as required at statement compilation time.

2. Automatic reorganization. Many customers ignore database reorganization and system performance starts to decline. This can be especially critical in the cloud space. The recommendation is to enable automatic reorganization support so it is self managed by the DBMS. Further discussion of database reorganization is covered in section 7.4.3.

The following commands may be used to enable these automatic maintenance options. At the time of this writing, they are conditionally recommended. Each of these options has runtime impact and should be monitored to ensure there is no unnecessary system impact. In order to facilitate this, they should only be enabled once the system has been established and monitored. In addition, automatic reorganization is dependent on the definition of a maintenance window (see the DB2 Information Center for more detail).

update db cfg for OPENSTAC using AUTO_STMT_STATS ON

update db cfg for OPENSTAC using AUTO_REORG ON

Figure 31: Database Automatic Maintenance Configuration

7.3.3 Operating System Configuration (Linux)

The product installation guides have comprehensive instructions for Operating System pre-requisites and configuration. However, on Linux systems improper configuration is common, so we will highlight specific issues.

The first configuration point to check is the file system ulimit for the maximum number of open files allowed for a process (i.e. nofiles). The value for this kernel limit should be either “unlimited” or “65536”. The DB2 reference for this configuration setting is available here.

In addition, the kernel semaphore and message queue specifications should be correct. These configuration settings are a function of the physical memory available on the machine. The DB2 reference for these configuration settings is available here.

7.4 Database Hygiene Overview

The following steps will be described for database hygiene overview:

1. Database backup management.

2. Database statistics management.

3. Database reorganization.

4. Database archive management.

5. Database maintenance automation.

44

Page 45: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

45

Steps make reference to recommended scheduling frequencies. The general purpose “cron” scheduling utility may be used to achieve this. However, other scheduling utilities may also be used. The key aspect of a cron’ed activity is it is scheduled at regular intervals (e.g. nightly, weekly) and typically does not require operator intervention. Designated maintenance windows may be used for these activities.

7.4.1 Database Backup Management

It is recommended that nightly database backups be taken. The following figures offer a sample database offline backup (utilizing compression), along with a sample restore.

backup db <dbname> user <user> using <password> to <backup directory> compress

Figure 32: Database Backup with Compression Command

restore db <dbname> from <backup directory> taken at <timestamp> without

prompting

Figure 33: Database Offline Backup Restore

Online backups may be utilized as well. The following figure provides commands that comprise a sample weekly schedule. With the given schedule, the best case scenario is a restore requiring one image to restore (Monday failure using the Sunday night backup). The worst case scenario would require four images (Sunday + Wednesday + Thursday + Friday). An alternate approach would be to utilize a full incremental backup each night to make the worst case scenario two images. The tradeoffs for the backup approaches are the time to take the backup, the amount of disk space consumed, and the restore dependencies. A best practice can be to start with nightly full online backups, and introduce incremental backups if time becomes an issue.

(Sun) backup db <dbname> online include logs use tsm

(Mon) backup db <dbname> online incremental delta use tsm

(Tue) backup db <dbname> online incremental delta use tsm

(Wed) backup db <dbname> online incremental use tsm

(Thu) backup db <dbname> online incremental delta use tsm

(Fri) backup db <dbname> online incremental delta use tsm

(Sat) backup db <dbname> online incremental use tsm

Figure 34: Database Online Backup Schedule

Note to enable incremental backups, the database configuration must be updated to track page modifications, and a full backup taken in order to establish a baseline.

update db cfg for OPENSTAC using TRACKMOD YES

Figure 35: Database Incremental Backup Enablement

To restore the online backups, either a manual or automatic approach may be used. For the manual approach, you must start with the target image, and then revert to the oldest relevant backup and move forward to finish with the target image. A far simpler approach is to use the automatic option and let DB2 manage the images. A sample of each approach is provided below, showing the restore based on the Thursday backup.

Page 46: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

46

restore db <dbname> incremental use tsm taken at <Sunday full timestamp>

restore db <dbname> incremental use tsm taken at <Wednesday incremental

timestamp>

restore db <dbname> incremental use tsm taken at <Thursday incremental delta

timestamp>

Figure 36: Database Online Backup Manual Restore

restore db <dbname> incremental auto use tsm taken at <Thursday incremental delta

timestamp>

Figure 37: Database Online Backup Automatic Restore

In order to support online backups, archive logging must be enabled. The next subsection provides information on archive logging, including the capability to restore to a specific point in time using a combination of database backups and archive logs.

Database Log Archiving

A basic approach we will advocate is archive logging with the capability to support online backups. The online backups themselves may be full, incremental (based on the last full backup), and incremental delta (based on the last incremental backup). In order to enable log archiving to a location on disk, the following command may be used.

update db cfg for <dbname> using logarchmeth1 DISK:/path/logarchive

Figure 38: Database Log Archiving to Disk

Alternatively, in order to enable log archiving to TSM, the following command may be used5.

update db cfg for <dbname> using logarchmeth1 TSM

Figure 39: Database Log Archiving to TSM

Note that a “logarchmeth2” configuration parameter also exists. If both of the log archive method parameters are set, each log file is archived twice (once per log archive method configuration setting). This will result in two copies of archived log files in two distinct locations (a useful feature based on the resiliency and availability of each archive location).

Once the online backups and log archive(s) are in effect, the recovery of the database may be performed via a database restore followed by a roll forward through the logs. Several restore options have been previously described in section 7.4.1. Once the restore has been completed, roll forward recovery must be performed. The following are sample roll forward operations.

5 The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration options with them (logarchopt1, logarchopt2) for further customization.

Page 47: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

47

rollforward <dbname> to end of logs

Figure 40: Database Roll Forward Recovery: Sample A

rollforward <dbname> to 2012-02-23-14.21.56 and stop

Figure 41: Database Roll Forward Recovery: Sample B

It is worth noting the second example recovers to a specific point in time. For a comprehensive description of the DB2 log archiving options, the DB2 information center should be consulted (URL). A service window (i.e. stop the application) is typically required to enable log archiving.

Database Backup Cleanup

Unless specifically pruned, database backups may accumulate and cause issues with disk utilization or, potentially, a stream of failed backups. If unmonitored backups begin to fail, it may make disaster recovery near impossible in the event of a hardware or disk failure. A simple manual method to prune backups follows.

find /backup/DB2 -mtime +7 | xargs rm

Figure 42: Database Backup Cleanup Command

A superior approach is to let DB2 automatically prune the backup history and delete your old backup images and log files. A sample configuration is provided below.

update db cfg for OPENSTAC using AUTO_DEL_REC_OBJ ON

update db cfg for OPENSTAC using NUM_DB_BACKUPS 21

update db cfg for OPENSTAC using REC_HIS_RETENTN 180

Figure 43: Database Backup Automatic Cleanup Configuration

It is also generally recommended to have the backup storage independent from the database itself. This provides a level of isolation in the event volume issues arise (e.g. it ensures that a backup operation will not fill the volume hosting the tablespace containers, which could possibly lead to application failures).

7.4.2 Database Statistics Management

As discussed in the previous “Automatic Maintenance” section, database statistics ensure that the DBMS optimizer makes wise choices for database access plans. The DBMS is typically configured for automatic statistics management. However, it may often be wise to force statistics as part of a nightly or weekly database maintenance operation. A simple command to update statistics for all tables in a database is the “reorgchk” command.

reorgchk update statistics on table all

Figure 44: Database Statistics Collection Command

One issue with the reorgchk command is it does not enable full control over statistics capturing options. For this reason, it may be beneficial to perform statistics updates on a table by table level. However, this can be a daunting task for a database with hundreds of tables. As a result, the following SQL statement may be used to generate administration commands on a table by table basis.

Page 48: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with

distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in

('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');

Figure 45: Database Statistics Collection Table Iterator

7.4.3 Database Reorganization

Over time, the space associated with database tables and indexes may become fragmented. Reorganizing the table and indexes may reclaim space and lead to more efficient space utilization and query performance. In order to achieve this, the table reorganization command may be used. Note, as discussed in the previous “Automatic Maintenance” section, automatic database reorganization may be enabled to reduce the requirement for manual maintenance.

The following commands are examples of running a “reorg” on a specific table and its associated indexes. Note the “reorgchk” command previously demonstrated will actually have a per table indicator of what tables require a reorg. Using the result of “reorgchk” per table reorganization may be achieved for optimal database space management and usage.

reorg table <table name> allow no access

reorg indexes all for table <table name> allow no access

Figure 46: Database Reorganization Commands

It is important to note there are many options and philosophies for doing database reorganization. Every enterprise must establish its own policies based on usage, space considerations, performance, etc. The above example is an offline reorg. However it is possible to also do an online reorg via the “allow read access” or “allow write access” options. The “notruncate” option may also be specified (indicating the table will not be truncated in order to free space). The “notruncate” option permits more relaxed locking and greater concurrency (which may be desirable if the space usage is small or will soon be reclaimed). If full online access during a reorg is required, the “allow write access” and “notruncate” options are both recommended.

Note it is also possible to use our table iteration approach to do massive reorgs across hundreds of tables as shown in the following figure. The DB2 provided snapshot routines and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status of reorg operations.

select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no

access;' from SYSCAT.TABLES where tabschema in

('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');

select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname || '

allow no access;' from SYSCAT.TABLES where tabschema in

('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB');

Figure 47: Database Reorganization Table Iterator

48

Page 49: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

49

7.4.4 Database Archiving

Database archiving is the act of removing unnecessary or obsolete information in order to preserve optimum performance. The intent is to keep table cardinality manageable so that query performance is stable, and to minimize IO overhead. The following graph shows the real world impact of proper database archiving.

Figure 48: Database Archiving Impact

The graph shows provisioning service times pre and post archiving. For the pre archiving interval, not only are the average service times much higher (dark blue line), but the distribution of service times is much wider (series of cyan data points). Once the archiving is implemented, the service times are extremely stable with a much narrower time distribution.

In order to achieve database archiving, an archive script and associated documentation is provided with this paper6 (see “ArchiveScripts.zip”). The archiving is an OpenStack function and copies the historical content to a shadow database (implying the data is still available and online). It is recommended the database archiving be part of a scheduled maintenance activity via the crontab (see the next section for details).

6 The archive scripts are also part of the SCO 2.3.0.1 distribution.

Page 50: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

7.4.5 Database Maintenance Automation

For standard database maintenance, it is advisable to automate the scheduling and execution of the maintenance activities via the crontab. The following table shows a sample schedule for the maintenance operations for the relevant SCO databases.

Database Statistics Reorgs Archiving

STORHOUS Sunday Saturday

PWDWB Tuesday Monday

BPMDB Wednesday Tuesday

OPENSTAC Monday Sunday Saturday

RAINMAKE Thursday Wednesday

CMNDB Friday Thursday

Figure 49: Sample Database Maintenance Schedule

The following example demonstrates maintenance activities on the OPENSTAC database. Similar examples are provided with this paper via the “CrontabScripts.zap” attachment. In general, the sample cron entries schedule activities in disjoint time windows throughout the week. This serves to provide fully online maintenance operations with minimal impact.

# Run runstats and reorgchk for openstac db 0 2 * * Mon db2inst1 /home/db2inst1/tools/gen_runstats.sh OPENSTAC /home/db2inst1/tools

30 2 * * Sun db2inst1 /home/db2inst1/tools/gen_reorg.sh OPENSTAC /home/db2inst1/tools

Figure 50: Sample Database Maintenance Crontab Entry

50

Page 51: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

51

8 Summary Cookbook The following tables provide a cookbook for the solution implementation. The cookbook approach implies a set of steps the reader may “check off” as completed to provide a stepwise implementation of the SCO solution. The recommendations will be provided in three basic steps:

1. Base installation recommendations.

2. Post installation recommendations.

3. High scale recommendations.

All recommendations are provided in tabular format. The preferred order of implementing the recommendations is in order from the first row of the table through to the last.

8.1 Base Installation Recommendations

The base installation recommendations are considered essential to a properly functioning SCO instance. All steps should be implemented.

Identifier Description Status

B1 Perform the base SCO installation, ensuring the recommended configuration described in Section 5.2 is achieved.

A central DB2 server should be used (i.e. the region servers should not manage a local DBMS unless there are compelling geographic considerations). Where possible it is recommended to install the DBMS on bare metal, or in a DBA managed pool, to facilitate performance management.

B2 Enable the Keystone memcached implementation (Section 6.1).

B3 Enable the OpenStack Keystone worker support (Section 6.2).

B4 Enable the IaaS Gateway cluster support (Section 6.3).

B5 Optimize the IWD component (Section 6.4).

B6 Configure the Linux IO scheduler (Section 6.5).

B7 Disable the ACPI management (Section 6.6).

B8 Ensure the Java heaps are optimized (Section 6.7).

B9 Configure the central database (Section 6.8).

B10 Configure the database server Linux instance per section 7.3.3.

Figure 51: Base Installation Recommendations

Page 52: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

8.2 Post Installation Recommendations

The post installation recommendations will provide additional throughput and superior functionality. All steps should be implemented.

Identifier Description Status

P1 Perform a set of infrastructure and SCO benchmarks to determine the viability of the installation (see Sections 4.2 and 4.3).

P2 Implement the database statistics maintenance activity per Sections 7.4.2 and 7.4.5.

P3 Implement the database reorg maintenance activity per Sections 7.4.3 and 7.4.5.

P4 Implement the database archiving maintenance activity per Sections 7.4.4 and 7.4.5.

P5 Implement a suitable backup and disaster recovery plan comprising regular backups of all critical server components (including the database and relevant file system objects). Guidelines are provided in the SCO Information Center (URL).

Figure 52: Post Installation Recommendations

8.3 High Scale Recommendations

The high scale recommendations should be incorporated once the production installation wants to support the high water mark for scalability. All steps may be optionally implemented over time based upon workload.

Identifier Description Status

S1 Apply the latest SCO fixpack.

S2 Monitor the performance of the installation (Section 4.1) and adjust the management server to the recommended installation values (Section 5.2) as appropriate.

S3 Optimize Central Server 1 (DBMS) performance. A basic way to achieve this is to have dedicated, high performance storage allocated to the database containers and logs.

Figure 53: High Scale Recommendations

52

Page 53: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

53

APPENDIX A: SMARTCLOUD

ORCHESTRATOR MONITORING OPTIONS

Monitoring is important to understand and ensure the health of any cloud solution. A number of monitoring approaches are available for SCO. The solutions are described via the following summary sections, broken down into three categories.

1. OpenStack monitoring via Ceilometer.

2. SCO monitoring via IBM BPM.

3. Infrastructure monitoring via IBM Tivoli Monitoring (ITM) and third party solutions.

A separate appendix is provided that is specific to OpenStack Keystone monitoring.

A.1 OpenStack Monitoring

OpenStack monitoring is provided via the Ceilometer component. Cielometer offers a comprehensive and customizable infrastructure, including support for event and threshold management. Note while Ceilometer is not part of the base SCO 2.3 distribution, it is a constituent of the OpenStack Grizzly base, with continued enhancement in subsequent OpenStack releases.

Ceilometer provides three distinct types of metrics:

1. Cumulative: counters that accumulate or increase over time.

2. Gauge: counters that offer discrete, point in time values.

3. Delta: differential counters showing change rates.

A vast array of metrics is provided by Ceilometer. An easy way to interactively derive the set of available metrics is to query Ceilometer directly (see the sample below). In addition, the Ceilometer documentation provides the default set, with associated attributes (URL).

ceilometer meter-list -s openstack

Figure 54: OpenStack Ceilometer Metrics

Page 54: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

The following table provides a core set of recommended monitoring points for OpenStack. A broader set may of course be used.

Component Meters

Nova (Compute Node Management) cpu_util disk.read.requests.rate disk.write.requests.rate disk.read.bytes.rate disk.write.bytes.rate network.incoming.bytes.rate network.outgoing.bytes.rate network.incoming.packets.rate network.outgoing.packets.rate The following counters require enablement: compute.node.cpu.kernel.percent compute.node.cpu.idle.percent compute.node.cpu.user.percent compute.node.cpu.iowait.percent

Neutron (Network Management) network.create network.update subnet.create subnet.update

Glance (Image Management) image.update image.upload image.delete

Cinder (Volume Management) volume.size

Swift (Object Storage Management) storage.objects storage.objects.size storage.objects.containers storage.objects.incoming.bytes storage.objects.outgoing.bytes

Heat (Orchestration) stack.create stack.update stack.delete stack.suspend stack.resume

Figure 55: OpenStack Ceilometer Core Metrics

In addition, Ceilometer provides a REST API that allows cloud administrators to record KPIs. For instance, infrastructure metrics could be placed in Ceilometer with a HTTP POST request. As Ceilometer includes a data store, as well as some basic statistical functionality, it is a candidate for an integration point for cloud monitoring data.

54

Page 55: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

55

A.2 SmartCloud Orchestrator Monitoring

SCO monitoring should be employed to address the solution layer “above” OpenStack. The primary mechanism for SCO monitoring is enablement of the BPM performance data warehouse (relevant information available in the References section)7. The performance data warehouse may be enabled via “autotracking”, which will enable both custom KPIs as well as the default total time KPIs. The core KPIs to understand BPM capability are:

BPM processes executed per second.

Average service times per BPM process.

It is important to note that given Ceilometer provides a general plugin and distribution infrastructure, it may be combined with the SCO monitoring solution. A sample approach for managing these monitoring points follows.

1. Derive a BPM plugin to retrieve raw times from the BPM performance data warehouse (PDWDB) database. The preferred method is the provided REST interface (versus direct database access).

2. Perform calculations based on the raw data. For example, converting a series of milestones into performance KPIs, or calculating statistical quantities (e.g. standard deviation, harmonic mean).

3. Push the results to Ceilometer as the meter distribution mechanism.

4. Read the results via the Ceilometer REST API and display in the visualization tool of your choice.

A.3 Infrastructure Monitoring

Infrastructure monitoring can address the operating system and hypervisor health of the cloud. Available tools include IBM Tivoli Monitoring (ITM) or the open source offering Nagios. For example, ITM v6.2 provides the follow infrastructure monitoring agents (for reference, see URL).

1. IBM Tivoli Monitoring Endpoint.

2. Linux OS.

3. UNIX Logs.

4. UNIX OS.

5. Windows OS.

6. i5/OS®.

7. IBM Tivoli Universal Agent.

7 It is worth noting that BPM is built on IBM WebSphere and as a result, WebSphere monitoring capabilities also apply.

Page 56: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

8. Warehouse Proxy.

9. Summarization and Pruning.

10. IBM Tivoli Performance Analyzer.

Critical KPIs to monitor at the infrastructure level are summarized in the following table (VMware is provided as a representative hypervisor sample).

Component Meters

Operating System CPU utilization including kernel, user, IO wait, and idle times.

Disk utilization including read/write request and byte rates.

Network utilization including incoming and outgoing packet and byte rates.

Volume free space across the central and region servers. Special attention should be paid to the Virtual Image Library on Central Server 2 to ensure the “/home/library” space is well managed.

DBMS: ITM for DB2 (URL) Application IO activity workspace. Application lock activity workspace. Application overview workspace. Buffer Pool workspace. Connection workspace. Database workspace. Database Lock Activity workspace. Historical Summarized Capacity

Weekly workspace. Historical Summarized Performance

Weekly workspace. Locking Conflict workspace. Tablespace workspace.

Application Server: ITCAM Agent for

WebSphere Applications (URL)

WebSphere Agent Summary workspace.

Application Server Summary workspace.

J2EE: ITCAM Agent for J2EE (URL) Application Health Summary workspace.

HTTP: ITCAM Agent for HTTP Servers

(URL)

Web Server Agent workspace.

Hypervisor: ITM for Virtual Environments

(URL)

Server workspace. CPU workspace. Disk workspace. Memory workspace. Network workspace. Resource Pools workspace.

56

Page 57: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

57

Virtual Machines workspace.

Hypervisor: VMware esxtop sample CPU: Run(%RUN), Wait (%WAIT), Ready (%RDY), Co-Stop (%CSTP).

Network: Dropped packets (%DRPTX, %DRPRX).

IO: Latency (DAVG, KAVG), Queue length (QUED)

Memory: Memory reclaim (MCTLSZ), Swap (SWCUR, SWR/s, SWW/s),

Figure 56: Infrastructure Core Metrics

Page 58: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

APPENDIX B: OPENSTACK KEYSTONE

MONITORING

The Keystone component is critical to overall performance of SmartCloud Orchestrator. For example, if one component saturates Keystone, the overall throughput of the system will be impacted. This is magnified by the fact that Keystone has only a single execution thread instance. In order to understand Keystone performance, the best method is to look at the requests and responses via a proxy such as the IaaS Gateway. This provides the ability to see requests that are dropped before being processed by Keystone.

We will describe an approach for monitoring Keystone via the PvRequestFilter.

B.1 PvRequestFilter

The PvRequestFilter was designed to output request and response data into the Keystone log. When enabled it prints the data as warning messages, so it is not necessary to turn up the default debug level to generate the log messages.

The format of the messages is as follows. All fields except “<duration>” are printed out for both requests and responses. The duration of the request is printed only for the response.

WARNING [REQUEST|RESPONSE] <millisecond timestamp to identify request>

<REMOTE_ADDR>:<REMOTE_PORT> <REQUEST_METHOD> <RAW_PATH_INFO> [<duration>]

Figure 57: Keystone Monitoring PvRequestFilter Format

Sample output follows.

2014-07-21 17:16:56.509 22811 WARNING keystone.contrib.pvt_filter.request [-]

REQUEST 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users

2014-07-21 17:16:56.785 22811 WARNING keystone.contrib.pvt_filter.request [-]

RESPONSE 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 0.276294

2014-07-21 17:16:56.807 22811 WARNING keystone.contrib.pvt_filter.request [-]

REQUEST 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains

2014-07-21 17:16:56.824 22811 WARNING keystone.contrib.pvt_filter.request [-]

RESPONSE 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 0.017691

2014-07-21 17:16:56.839 22811 WARNING keystone.contrib.pvt_filter.request [-]

REQUEST 2014-07-21_17:16:56.839 172.18.152.103:1279 GET

/v3/users/e92b94d7068843ef98d664521bd9c983/projects

2014-07-21 17:16:56.868 22811 WARNING keystone.contrib.pvt_filter.request [-]

RESPONSE 2014-07-21_17:16:56.839 172.18.152.103:1279 GET

/v3/users/e92b94d7068843ef98d664521bd9c983/projects 0.028558

Figure 58: Keystone Monitoring PvRequestFilter Sample Output

58

Page 59: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

59

B.2 Enabling PvRequestFilter

The process to enable PvRequestFilter follows.

1. Log onto Central Server 2.

2. Extract the distribution provided with this paper (keystoneStats.zap).

3. Install the filter and backup the existing configuration: ./deployKeystoneFilter.sh

4. Make the following changes to the “/etc/keystone/keystone.conf” file. Note: Reversing step 2 will disable the filter.

a. Add the following lines just above line starting with "[filter:debug]". [filter:pvt] paste.filter_factory = keystone.contrib.pvt_filter.request:PvtRequestFilter.factory

b. Add "pvt" to three of the pipeline statements: [pipeline:public_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension user_crud_extension pvt public_service [pipeline:admin_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension crud_extension pvt admin_service [pipeline:api_v3] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension pvt service_v3

c. Restart the keystone service. service openstack-keystone restart

d. Validate that the “/var/log/keystone/keystone.log” is producing the appropriate log messages (sample below).

e. Update the “hosts.table” file to reflect your environment.

f. Run the workload or scenario for analysis.

g. Generate the statistics for the request and response data in the “keystone.log” file (sample below): ./keystoneStats.sh /var/log/keystone/keystone.log > results

Figure 59: Keystone Monitoring Log Messages Example

Page 60: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

Figure 60: Keystone Monitoring Statistics Example

60

Page 61: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

61

APPENDIX C: IAAS GATEWAY CLUSTER

ENABLEMENT

The following steps are required to enable the IaaS Gateway cluster.

1. Prepare the HTTP server as a load balancer.

a. Ensure the HTTP server is installed.

i. Check if there is already a HTTP server on Central Server 2: service httpd status

ii. If there is already an HTTP server, stop it with the following command: service httpd stop

iii. If there is no HTTP server installed, use the following command to install one: yum install httpd

b. Update the “httpd.conf” with the load balancer configuration.

i. Modify the file “/etc/httpd/conf/httpd.conf” with the following changes.

1. Update the listen port to the gateway port: # Listen 80 Listen 9973

2. Append the load balancer configuration to the end the file: <VirtualHost *:9973> ProxyRequests off <Proxy balancer://mycluster> # three node gateway cluster BalancerMember http://127.0.0.1:12001 BalancerMember http://127.0.0.1:12002 BalancerMember http://127.0.0.1:12003 Order Deny,Allow Deny from none Allow from all ProxySet lbmethod=byrequests </Proxy> # path of requests to balance "/" -> everything ProxyPass / balancer://mycluster/ </VirtualHost>

2. Prepare the configuration file for cluster members, by performing the following commands.

Page 62: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

cd /etc/iaasgateway/ cp iaasgateway.conf iaasgateway00.conf vi iaasgateway00.conf #It should look like below before applying this fix: [service] iaasgateway_listen = <central-server-2-ip> iaasgateway_listen_port = 9973 #Update it to: iaasgateway_listen = 127.0.0.1 iaasgateway_listen_port = 1200X iaasgateway_user_entry = <central-server-2-ip> iaasgateway_user_entry_port = 9973 # copy configure files and update port cp iaasgateway00.conf iaasgateway01.conf sed -i 's/1200X/12001/' iaasgateway01.conf cp iaasgateway00.conf iaasgateway02.conf sed -i 's/1200X/12002/' iaasgateway02.conf cp iaasgateway00.conf iaasgateway03.conf sed -i 's/1200X/12003/' iaasgateway03.conf

3. Prepare the init scripts and update the configuration file. cd /etc/init.d/ cp openstack-iaasgateway openstack-iaasgateway01 cp openstack-iaasgateway openstack-iaasgateway02 cp openstack-iaasgateway openstack-iaasgateway03 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway01/' openstack-iaasgateway01 sed -i 's/iaasgateway.conf/iaasgateway01.conf/' openstack-iaasgateway01 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway02/' openstack-iaasgateway02 sed -i 's/iaasgateway.conf/iaasgateway02.conf/' openstack-iaasgateway02 sed -i 's/prog=openstack-iaasgateway/prog=openstack-iaasgateway03/' openstack-iaasgateway03 sed -i 's/iaasgateway.conf/iaasgateway03.conf/' openstack-iaasgateway03

4. Start up the cluster, through the following commands. service openstack-iaasgateway stop Stopping openstack-iaasgateway: [ OK ] service openstack-iaasgateway01 start Starting openstack-iaasgateway01: [ OK ] service openstack-iaasgateway02 start Starting openstack-iaasgateway02: [ OK ] service openstack-iaasgateway03 start Starting openstack-iaasgateway03: [ OK ] service httpd start Starting httpd: [ OK ]

62

Page 63: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

63

5. Ensure the cluster startup will persist across reboots. # Turn the non-clustered gateway off. chkconfig --level 2345 openstack-iaasgateway off # Turn the clustered gateway on. chkconfig --level 2345 openstack-iaasgateway01 on chkconfig --level 2345 openstack-iaasgateway02 on chkconfig --level 2345 openstack-iaasgateway03 on chkconfig --level 2345 httpd on

6. Check the IaaS Gateway service status.

a. Try to open following link in a browser. The content should operate the same as prior to applying the cluster. http://<central-server-2-ip>:9973/providers

b. Check for listening ports with the following command: netstat -nap | grep 1200 | grep LISTEN tcp 0 0 127.0.0.1:12001 0.0.0.0:* LISTEN 7269/python tcp 0 0 127.0.0.1:12002 0.0.0.0:* LISTEN 7286/python tcp 0 0 127.0.0.1:12003 0.0.0.0:* LISTEN 7303/python

c. Check whether the load balancer is listening: netstat -nap | grep 9973 | grep LISTEN tcp 0 0 :::9973 :::* LISTEN 7321/httpd

d. Verify you may login to the SCO UI.

7. The IaaS Gateway cluster is now enabled.

Page 64: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

REFERENCES

SmartCloud Orchestrator and Related Component References IBM SmartCloud Orchestration Information Center

SCO 2.3 Information Center IBM SmartCloud Orchestrator Resource Center

SCO Resource Center IBM Business Process Manager V8.0 Performance Tuning and Best Practices http://www.redbooks.ibm.com/redpapers/pdfs/redp4935.pdf IBM Business Process Manager Performance Data Warehouse

http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/topic/com.ibm.wbpm.admin.doc/topics/managing_performance_servers.html

IBM Tivoli Monitoring Information Center http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.3fp1/welc

ome.htm IBM DB2 10.1 Information Center http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=/com

OpenStack References OpenStack Performance Presentation (Folsom, Havana, Grizzly) http://www.openstack.org/assets/presentation-media/openstackperformance-v4.pdf OpenStack Ceilometer http://docs.openstack.org/developer/ceilometer OpenStack Rally https://wiki.openstack.org/wiki/Rally

Hypervisor References Performance Best Practices for VMware vSphere™ 5.0 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf Performance Best Practices for VMware vSphere™ 5.1 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf VMware: Troubleshooting virtual machine performance issues VMware Knowledge Base VMware: Performance Blog http://blogs.vmware.com/vsphere/performance Linux on System x: Tuning KVM for Performance

64

Page 65: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

65

KVM Performance Tuning Kernel Virtual Machine (KVM): Tuning KVM for performance http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf PowerVM Virtualization Performance Advisor Developer Works PowerVM Performance IBM PowerVM Best Practices

http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf

Benchmark References Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group, https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf

Page 66: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

® © Copyright IBM Corporation 2014 IBM United States of America Produced in the United States of America US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice. Any references in this document to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation 4205 South Miami Boulevard Research Triangle Park, NC 27709 U.S.A. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. If you are viewing this information softcopy, the photographs and color illustrations may not appear.

66

Page 67: IBM SmartCloud Orchestrator Version 2.3: Capacity Planning ... · PDF fileIBM® Cloud and Smarter Infrastructure Software . SmartCloud Orchestrator Version 2.3: Capacity Planning,

SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide

67

Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

Other company, product, or service names may be trademarks or service marks of others.