Grau d Rails

PEGA Best practices and standards

V1.0

Created By:

BPM Practice

1

PEGA Best Practices and Standards

DOCUMENT CONTROL Role Name

Owner BPM Practice- Delivery Support Stream

Reviewer

Other

2


Table of Contents

DOCUMENT CONTROL ..................................................................................................... 1

Table of Contents .............................................................................................................. 2

I. Ten Guardrails for PEGA Best Practice ................................................................... 3

I.A Adopt an Iterative Approach ............................................................................................. 3

I.B Establish a Robust Foundation ......................................................................................... 3

I.C Do nothing that is Hard ..................................................................................................... 3

I.D Limit Custom Java ............................................................................................................ 4

I.E Build for Change ™ .......................................................................................................... 4

I.F Design Intent-driven Processes ........................................................................................ 4

I.G Create Easy-to-Read Flows.............................................................................................. 5

I.H Monitor Performance Regularly ........................................................................................ 6

I.I Calculate and Edit Declaratively, Not Procedurally .......................................................... 6

I.J Keep Security Object-Oriented ......................................................................................... 7

II. Best practices for successful performance load testing ....................................... 8

II.A Design the load test to validate the business use. Do the load math. .............................. 8

II.B Validate performance for each component first .............................................................. 10

II.C Script user login only once .............................................................................................. 10

II.D Set realistic think times ................................................................................................... 10

II.E Switch off virus-checking ................................................................................................ 12

II.F Validate your environment first, then tune ...................................................................... 12

II.G Prime the application first ............................................................................................... 13

II.H Ensure adequate data loads ........................................................................................... 13

III. Measure results appropriately ................................................................................15

IV. Focus on the right tests ...........................................................................................16

V. Best practices for customizing/updating connector rules ....................................17

VI. Best practices for using the SAVE JSP tag ............................................................18

VII. Best JVM memory settings depend on vendor, version, and other factors .........19

3


I. Ten Guardrails for PEGA Best Practice

The Ten Guardrails to Success are design guidelines and recommended best practices for Process

Commander Implementations. Following the fundamental principles promoted in the Ten Guardrails

to Success leads to rules-based applications that are well designed, straightforward to maintain, and

architected to Build for Change™.

They are keys to success with Process Commander.

I.A Adopt an Iterative Approach

Define an initial project scope that can be delivered and provide business benefit within 60-90

days from design to implementation.

Document five concrete use case scenarios up front and evaluate them at the end to

calibrate benefits.

Use your scenarios as story boards and ensure that each delivers a measurable business

benefit.

I.B Establish a Robust Foundation

Design your class structure complying with the recommended class pattern.

It should be understandable, be easy to extend, and utilize the standard work and data

classes appropriately.

Use your organization entities as a starting pattern, and then proceed with class groups.

Lead with work objects. Create the class structure and “completed work” objects early.

Position rules correctly by class and/or RuleSet.

Actively use inheritance to prevent rule redundancy.

I.C Do nothing that is Hard

Use “out of the box” functionality as much as possible, especially in the initial project release.

Avoid creating custom HTML screens or adding buttons.

Always use the “Auto Generated HTML” feature for harness sections and flow actions.

Always use the standard rules, objects, and properties. Reporting, Urgency, Work Status,

and other built-in behaviors rely on standard properties.

Never add a property to control typical work or for managing the status or timing of work.

4


I.D Limit Custom Java

Avoid Java steps in activities when standard Process Commander Rule types, library

functions, or activity methods are available.

Reserve your valuable time and Java skills for implementing things that do not already exist.

I.E Build for Change ™

Identify and define 10-100 specific rules that business users own and will maintain.

Activities should not be on this list. Use other rule types for business-maintained logic.

I.F Design Intent-driven Processes

Your application control structure must consist of flows and declarative rules, calling activities

only as needed.

Use flow actions to prompt a user for input.

5


Present fewer than five connector flow actions for any individual assignment. If you need

more than that, you need to redesign the process.

Create activity rules that implement only a single purpose to maximize reuse.

I.G Create Easy-to-Read Flows

Your flows must fit on one page and must not contain more than 15 SmartShapes (excluding

Routers, Notify shapes and Connectors) per page.

If a flow has more than 15 SmartShapes:

Create a subflow.

Use parallel flows to perform additional functions

6


I.H Monitor Performance Regularly

Evaluate and tune application performance at least weekly using Performance Analyzer

(PAL) to check rule and activity efficiency.

Use PAL early to establish benchmarks. Compare these readings to follow on readings;

correct application as required.

I.I Calculate and Edit Declaratively, Not Procedurally

Whenever the value of a property is calculated or validated, you must use declarative rules

wherever appropriate.

Create a Declare Expressions rule instead of using a Property-Set method in an activity.

Use a Declare Constraints rule instead of a Validation rule.

7


I.J Keep Security Object-Oriented

Your security design must be rule-based and role-driven based on who should have access

to each type of work.

Never code security controls in an activity.

Use the standard access roles that ship with Process Commander only as a starting point.

Use RuleSets to segment related work for the purpose of introducing rule changes to the

business, not as a security measure.

8


II. Best practices for successful performance load testing

Pegasystems has extensive performance load testing experience, based on hundreds of

implementations. The following are the ten best practices to help you plan for success in the testing

and implementation of Process Commander Solutions. These guardrails are valid regardless of which

software you use for load testing.

2.1 Design the load test to validate the business use

2.2 Validate performance for each component first

2.3 Script user login only once

2.4 Set realistic think times

2.5 Switch off virus-checking

2.6 Validate your environment first, then tune

2.7 Prime the application first

2.8 Ensure adequate data loads

2.9 Measure results appropriately

2.10 Focus on the right tests

II.A Design the load test to validate the business use. Do the

load math.

Design the load test to meet the business use of the solution. This means executing a test that is as

close as feasible to the real anticipated use of the application.

It is absolutely important that your application performance tests are designed to mimic the real-world

production use. To ensure this happens, the identify the right volume and the right mix of work across

a business day. Always does the maths, to ensure you understand the throughput of the tests and be

able to say that in any n minutes the test had throughput of y items that would represent a full daily

rate of x items, which is A% of current volumes of V/day.

Calculate out the work rate per workday hour. Estimate the volume of the work based on the type of

use expected. For example, in a Customer Process Manager solution, there is one work-object

9


(interaction) per call. Using the following example you can calculate the actual load on the application

for a given business hour of use.

Assume that the expected number of users is 250, and the expected number of concurrent users

(using the application at the same time) is 175. If peak-hour throughput is not known precisely, a

good rule of thumb is that the peak-hour volume is usually 20% of the volume in a given business

day.

Assume each interaction creates on average 2 service (case) requests. If each operator can

take 10 calls per hour, calculate the load rate on the application as follows:

8 hours x 10 calls x 175 users x 3 work operations = 42,000 units of work.

Therefore, 42,000 * 20% = 8,400 during the peak hour and 42,000 / 8 = 5,250 per work hour

Using the per-hour or peak-hour rate, ensure that the pacing and work load arrival rate to the

application is correct. If the duration of the test is a half hour, ensure that only half the work rate is

accomplished, not the per-hour rate.

Always consider the use of the application in respect of work; work creation, work retrieval, and work

assignment. Calculate this use profile per the example above.

Common omissions and mistakes can also cause a performance test to diverge from real-world

patterns of use. For example, two common mistakes are to use a single test operator ID or a single

customer account number. On their own or together, these two mistakes can create database access

or deadlocking issues.

Other common oversights include testing the application in "developer mode" — so the operator IDs

used as the virtual user have the Developer portal or rules check-out enabled. Both of these

conditions force significant server processing beyond the load that would normally be experienced if

non-developer Operator IDs are used.

Often, there is not just one but a mix of transactions that dominate the workload; they all need to be

modeled. Getting the proper proportions of transaction rates into the mix is key. If, for example,

before entering a new case users perform an average of 2.5 searches, the load test should reflect this

ration.

Finally, do not load the application incorrectly by assuming that fewer test users doing more work per

hour provides equivalent system demand as the calculated workload. Using the above example, a

user work rate of 30 operations per hour (5,250 / 175 users) is not the same as 87.5 users doing 60

10


operations per hour. (This common mistake is usually made to minimize test software license costs

that are based on v-user counts.)

II.B Validate performance for each component first

Don't consider any load tests until you have run PAL on each flow first. Fix any string (a single user

exercising each happy path through the application) test issues first.

Before exercising a performance test, the best practice is to exercise the main paths through the

application (including all those to be exercised by the test script) and then take a PAL (Performance

Analyzer) reading for each path. Investigate and fix any issues that are exposed.

Before going to the next step of running the load test, repeat this exercise with additional users.

Running the same test with 10 users should indicate immediately whether the application's

performance is disproportionately worse at scale. If this is the case, investigate and fix the area of the

application that PAL data shows has the performance problem.

II.C Script user login only once

Make sure that v-users are logged in once, not once for each test iteration or interaction. Login and

logoff operations are expensive, and will skew results dramatically.

Many load tests are compromised because the test team scripts each individual virtual user to login,

do a little work. and then logoff. This script is then repeatedly executed in the load test. This is not

how real users behave, and this approach produces invalid test results because of the overhead

associated with login and logoff (for example, memory collections and recycles).

Ensure that your test includes several operator IDs. Using one Operator ID profile for every v-user will

cause choke points in the test, because access to the same resource (for example operator ID

records) creates contention.

II.D Set realistic think times

Use realistic think times and include some randomization. Include think time in the flow interactions

as well as after the end of flows. Review the results data and graphs excluding think times.

Your test scripts need to include think time to represent real human behavior and the corresponding

load of the application. Think time or "pause time" is very important in duration-based tests; otherwise

more work will arrive than appropriate in the period of the test (see Guardrail #1 above).

You can insert think time within the script steps and at the end. Scripts are typically enumerated as:

11


A1 + t + A2 + t + A3 + t + &

Where A is an action, t is think time. and & indicates a cycle back to the beginning of the script for a

given set of interactions.

Randomizing think time with a % deviation allows the work pacing of each v-user to be slightly offset.

As a general rule-of-thumb, use -3% — +3% of the value of t as the deviation. This value spreads the

arrival rate of work to the application and avoids unrealistic oscillations in load (waves of work arriving

all at the same time). Use of think time allows you to simulate other arrival patterns (such as Poisson

or Erlang distributions) where appropriate.

Consider think times for the different operations a user will need to perform as they interact with the

application. From the Model Human Processor (MHP) studies (Card, Moran and Newell), these

broadly fall into perception, cognitive and motor actions. For examples, they include:

reading time

determining what action to take

mouse moving clicking

typing time

local actions that do not create a server interaction,

These all need to be considered and factored into your think time scripting.

For the case where think time is to be calculated for a given use or load, over a duration, you can use

Little's Law to calculate the required values.

Consider work by virtual Users (U) arriving at a rate R to the server and sp

ending T time utilizing the server. Little's law claims that U=R*T. You can compute the service

throughput of a system by dividing the number of users with the time spent in the server (R=U/T).

Now, assume that users will wait a To time in between requests — a think time. This value is an

interval typical for users to interact with the application. So from the rule U=R*T, you can expand and

infer that the number of users in think time will be: Uo = R*To.

But the number total of users in such a case will be a combination of those who are in the server and

those who are in think time. So service throughput can be expressed as R = U/(T+To), where

T is time spent in the server (response time),

To is the average think time,

U is the number of users

R the throughput.

12


If your system has 200 users requesting services with 16,000 requests for 15 minutes and a response

time average of 2 seconds, you can calculate that think time will be:

U/R-T = 200/17.77 - 2 = 9.25 seconds average

Using this value, you can script the interactions with the server to accommodate and create the

appropriate load in a given period.

A common mistake is assuming that you have set the scripted think times correctly without reviewing

real-world evidence. Always validate the script execution to ensure the work rate is correct. Counting

new records in specific database tables to check that over a given duration the actual counts match

expected values is a good method to validate expected work throughput.

Finally, when reviewing test response time results and test graphs, ensure think time is not included.

(This is a report setting in most load-based test tools.) A common mistake is to include think time in

average response times or end-to-end script times, thus skewing the results.

II.E Switch off virus-checking

Make sure that virus checking is not enabled on the v-user client. When virus checking runs, it

impacts any buffered i/o, and this changes the collected response times.

A best practice is to ensure that any virus checking software is not enabled on virtual user client test

injectors. During testing, the maximum available CPU should be dedicated to the client injector

processes or threads. Bottlenecks or saturation can elongate response times collected by these

processes, leading to false results and perception of server-side slowness.

II.F Validate your environment first, then tune

Do not try to tune the application immediately after the first test. Validate the environment first. Treat

the first run as only a test of a test.

In too many cases, a test team analyzed initial test results from the test environment and reached

premature — and misleading —conclusions about the design and performance of an application.

Always plan to validate and performance-tune the environment first, before attempting to spend

valuable time looking for problems in the application.

As you test, be aware of other constraints or factors that may influence the overall test results For

example working in a shared environment at the applications server, database, network or integration

13


services level. Ensure you have visibility regarding the impact on the overall test by having sufficient

metrics on this other use.

Load testing is not the time to shake out application quality issues. Make sure that the PegaRULES

log is relatively clean before attempting any load tests. If exceptions and other errors occur often

during routine processing, the load test results will not be valid.

II.G Prime the application first

Run a first use assembly (FUA) cycle before the actual test. Tune the environment as needed based

on pre-test data. Don't start cold.

Every test should have an explicit objective. Stress and load tests are often compromised because

the objective is not stated, mixed, or vague.

If the objective of a load test is to observe the behavior of an application with a known load, then

conduct this test only when the system has achieved a steady state, and review the results in that

light. Otherwise, other unknown conditions affecting performance can obfuscate the real results.

The best practice to observe the behavior and response time throughput of an application is to ensure

that sufficient caching has occurred after startup, as would occur over a period of time in the

production environment. Interpreting response time data for a short period of testing (including a

period where First Use Assembly occurs) does not provide correct insights into the performance of

the application in a production setting.

For best results, execute each flow or function at least 4 times after startup. Priming of some caches

requires 4 cycles.

II.H Ensure adequate data loads

Make sure loads are realistic and sufficient data is available to complete tests in the time period.

Transferring work is a good example.

Many performance issues first become evident in applications that have been in production for a

certain period of time. Often this is because load testing was performed with insufficient data loads.

As a result, response-time performance of the data paths was satisfactory during testing.

14


For example, the performance of a database table scan can be as effective on a table, with a certain

number of records, as a selection through an index. However if the table grows significantly in

production and a needed index is not in place, performance will seriously degrade. Ensuring the data

load sizing for key work items and for attachments access, assignment and reporting is important.

Calculate the amount of work that will be open and the amount resolved over a known time period,

and load the test application with adequate data to reflect your calculations.

For duration-based testing, ensure that there is sufficient work available to meet the demands of the

test. A common mistake is to run out of work and not have adequate detection mechanisms in the

test script to account for this situation. For example, running a script that emulates a user getting the

next highest priority of work, updating the item and then transferring it from their worklist to a

workbasket will require that there are enough rows in the assignment tables to support the duration of

the test. In this case, calculate total work assignment rates for the total number of users to validate

that the test will not silently fail in background.

Also, if the application involves external database tables or custom tables within the PegaRULES

database, make sure these contain realistically large numbers of rows, with a reasonable mix. For

example, if the Process Commander application uses a lookup into a POLICY table, make sure this

table has rows for the all policies that the application will need to search for, not just a few. Data in

such tables must vary -- if all the policies are for one or a few customers, then doing a customer

lookup will produce an unrealistically large number of hits.

Make sure that tables that grow during testing don't grow to be unrealistically large. To often, test

scenarios omit the "Resolve" step at the end of each test, so that the Process Commander

assignment tables grow and grow, rather than reach a steady state.

15


III. Measure results appropriately

Do not use average response times for transactions as the absolute unit of measure for test results.

Always consider Service Level Agreements (SLAs) in percentile terms. Load testing is not a precise

science; consider the top percentile user or requestor experience. Review results in this light.

The best practice is to determine a realistic service level agreement for end-user response time

experience. In most test tools, data points are collected from all virtual users and then averaged to

show an average response time, as measured for pre-determined scripted transactions. However, as

in the real-world, some data points will be anomalous and unexplained; this is a normal aspect of

systems and especially of load testing, where an attempt to mimic or mirror a real system is not a

precise science.

An attempt to normalize these data discrepancy and results is made through calculations of standard

deviations. However, few practitioners are able to articulate or describe how to apply such elements

to the test results.

An easier method is to simply average the transaction data showing response times in percentile

terms. Use the test tool's reporting capabilities to provide the percentile average response time of the

virtual users' experience. Know what the overall averages are and the 90th percentile.

For transaction intensive applications ("heads-down" use) a recommended value is 80

percentile.

For mixed-type use applications, use 90 percentile.

For ad-hoc, infrequent type use, a 95 percentile average wills provide a more statistically

relevant result set than 100 percentile of the average.

Typically, measuring the expected experience of 8 or 9 out of every 10 users for an application

provides a more insightful profile of how the application will work in production than a 100 percentile

average that includes some significant outlying response-time data points.

16


IV. Focus on the right tests

Don't try to achieve the impossible and load test for thousands of users. Judicious use of PAL data,

load test results and basic extrapolation are first indicators of scale.

Trying to orchestrate large, multi-use, complex load

tests can be daunting, logistically challenging and time

consuming.

Pegasystems recommends that you test the

application with a step approach, first testing with 50

users, then 100, 150, and 200 for example. You can

then easily put the results into an Excel spreadsheet

and chart them. Use Excel's built-in capability to generate an equation from the trend-line and plot a

model. In addition, Excel can compute the R2 value.

In brief, R2 is the relative predictive power of a model. R2 is a descriptive measure between 0 and 1.

The closer it is to one, the better the model is. By "better" it means a greater ability to predict. A value

of R2 equal to 1.0 would imply that a quadratic regression provides perfect predictions.

Using this set of data points and the regression formula, predictive values can be extrapolated for a

higher number of virtual users early in the testing cycle. Using the above example chart and a simple

predictive model, it can be seen that the expected response time for 500 users would be;

Y = 0.02 * 500 ^ 1.0507 thus, 13.7 seconds

Collect data on CPU utilization, I/O volume, memory utilization and network utilization to help

understand the influences on performance.

Review the Pega Log and the Alert log after each load test. Use the Pega Log Analyzer or

PegaAES to summarize the logs.

Begin testing just with HTTP transactions first (disable agents and listeners). Then test the

agents and listeners, and finally test with both foreground and background processing.

Relate the capacity of the test machines to production hardware. If the test machines have

20% of the performance of the production machines, then the test workload should be 20% of

the expected production workload. If you expect to use two or more JVMs per server in

production, use the same number when testing.

17


V. Best practices for customizing/updating connector rules

The Connector and Metadata wizard creates rules for connectors that communicate with external

systems. You can use the wizard to generate connectors for EJB, Java, .NET, SOAP, and SQL.

At times you may need to customize or upgrade your Connector rules. For example, a parameter in

the external service may change, requiring you to update the Connector rule.

In these cases, use these best practices to minimize issues with updating and regenerating your

Connector rules:

Don’t use RuleSet Versions for generated interfaces, keep the version at 01-01-01 and

lock/unlock when needed. This approach is useful because classes are part of a RuleSet but

are not versioned. If for example a WSDL change would affect the class structure, it will not

be properly reflected in a new rule set version. (If you need to keep multiple versions of a

WSDL active in the application you need to specify different top level classes during interface

generation).

Don’t add properties or any rules within a class generated for the interface (example given

above). If the interface needs to be regenerated in the future, the clean-up will not complete

properly. (You cannot delete a class when there are rules using the class as the Applied to

Class).

Clean up existing generated rules before attempting to regenerate. Don’t try to regenerate an

interface if the clean-up did not complete successfully. If a class, model, property or other

already exists from a previously generation it will be skipped.

18


VI. Best practices for using the SAVE JSP tag

When the save JSP tag (or directive) is used in a hand-crafted XML rule, HTML rule, HTML property

rule, or other stream rule, choose a value for the name attribute that is unique application-wide or

even system-wide..

The save JSP tag saves a name/value pair to a temporary scratchpad during stream processing.

As an example, suppose that one HTML rule contains the following JSP tag:

<pega:save name=”result” ref=”.Total” />

If a completely separate HTML rule sets the name “result” as well, a collision of values may occur.

<pega:save name=”result” ref=”.FinalPrice” />

If an activity calls the first HTML rule and later called the second HTML rule, the first would set the

temporary save name of “result” to the value in the .Total property.

The second rule resets this temporary save name to the value in the .FinalPrice property. If

subsequent rules expect the .Total property value, the application will fail or return unexpected

results.

To avoid this issue, create specific or unique names for the save tag.

Examples

<pega:save name=”TotalResult” ref=”.Total” />

<pega:save name=”PriceResult” ref=”.FinalPrice” />

This prevents the system from overwriting values on the scratchpad and, as a result, returning

unexpected results.

NOTE: Do not use any values beginning with px, py, or pz. These are prefixes reserved for internal

Pega object names, and can also cause unpredictable results if used as a save name.

19


VII. Best JVM memory settings depend on vendor, version, and

other factors

The following are the maximum and minimum JVM memory settings:

minimum 256 MB

maximum 768 MB

However, in our system, we set 512 MB as the minimum and 768 as the maximum, because during

performance testing we saw the heap size grow above 256 to the 512 level and consistently remain in

that general area. So we start at 512 MB to remove the heap allocation.

Every JVM and every version of JVM is different, and each implements a slightly different garbage

collection (GC) mechanism. From vendor to vendor, things vary considerably, so it is not possible to

publish universal guidelines.

The only way to determine the optimal settings for your application is to set them, observe, make

changes, observe again, and so on.

In general, on the Sun JVM, Pegasystems in-house testing has found that if you set the max to 768

MB, things work well. The current IBM JVMs don’t work so well if you set min and max to the same

value, because of the way they do internal memory allocation. You generally want the settings to be

around 512 MB and 768 MB as starting point.

Pegasystems performs in-house scale and overnight tests of Process Commander. The test systems

are set 768 MB max memory and a simulated load of 200 users is applied. That usually results in a

little memory to spare.

Do not overlook the max perm size setting, which is documented in the same section of the

Installation Guide as min and max memory.

Tuning these values requires some study. The JVM output can provide you with good detail about

memory and garbage collection. For Sun, use print heap of GC; for IBM, run verbose GC. The

Monitor Servlet can also help you correlate your usage statistics, as well as your garbage statistics.

Using all those resources together begins to allow you to get a picture of your application.

Don't try to tune the JVM before you tune the application. Instead, start with the application. Gets the

application running with the best performance you can achieve first, only then start tuning the JVM.

Grau d Rails

Documents

Transcript of Grau d Rails