Dyna Trace Whitepaper Performance

Performance Management and Diagnostics in Performance Management and Diagnostics in Performance Management and Diagnostics in Performance Management and Diagnostics in Distributed Java and .NET ApplicationsDistributed Java and .NET ApplicationsDistributed Java and .NET ApplicationsDistributed Java and .NET Applications >> Rapidly resolve performance problems across the software application lifecycle

>> rapidly resolve performance problems in distributed java & .net applications 1

ContentsContentsContentsContents

EXECUTIVE SUMMARY ................................................................................... 1

INTRODUCTION........................................................................................... 2

APPLICATION PERFORMANCE IN HETEROGENEOUS MULTI-SERVER CLUSTERED ENVIRONMENTS........................................................................................... 3

Symptoms and Causes of Performance Problems ......................................4

Fixing Performance Problems ..................................................................5

TRADITIONAL TOOLS FOR APPLICATION PERFORMANCE MANAGEMENT ....... 7

Developer Tools.....................................................................................7

Administrator Tools ................................................................................8

Need a Better Solution ...........................................................................8

PERFORMANCE MANAGEMENT IN APPLICATION LIFE-CYCLE ......................... 9

Application Performance Management Solution Requirements..................11

DYNATRACE DIAGNOSTICS ........................................................................ 12

Efficient Diagnostics .............................................................................13

Out-of-the-box, Extensible Diagnostics...................................................16

COMPARE YOURSELF.................................................................................. 17

CONCLUSION ........................................................................................... 19

Performance Management and Diagnostics in Distributed Java and .NET Applications 1

Executive SumExecutive SumExecutive SumExecutive Summarymarymarymary

Today’s complex mission critical applications run in heterogeneous multi-server environments. When these applications falter, business productivity grinds to a halt, users are inconvenienced – costs rise and profits fall.

Modern technologies such as Ajax, Java and .NET and approaches such as SOA, EAI, and MDA enable engineers to create and deploy applications rapidly. However, development tools generally do not enable engineers to establish a good understanding of the application’s performance characteristics, and avoid performance problems. Consequently, performance problems are discovered late in the application life-cycle and have to be corrected at considerable time and expense.

In load-test and production environments, application performance management solutions typically consist of server monitors. When performance problems occur, such monitors provide alerts, but not enough information to diagnose the root cause because they do not look deeply inside the transaction execution to identify the actual root cause. Due to their large overheads, development tools cannot be used in such environments to troubleshoot the problems. As a result, IT personnel can spend hours or days trying to reproduce and analyze these problems. Often limited by the available information, they ameliorate the situation by adding resources or tuning at the server and system layer, without resolving the underlying design or programming issue.

To eliminate wasted time and expense, IT organizations need a new class of application performance management solutions to monitor and diagnose performance problems. These solutions must provide detailed, transaction-specific diagnostic information for single and multi-server transactions. Such solutions should support the requirements of system administrators, performance analysts, testers and developers throughout the application life-cycle.

In contrast to traditional monitoring tools designed to detect the symptoms of performance problems by measuring aggregate statistics at the server level, dynaTrace DiagnosticsdynaTrace DiagnosticsdynaTrace DiagnosticsdynaTrace Diagnostics®, has been expressly designed to not only detect but also diagnose the root causenot only detect but also diagnose the root causenot only detect but also diagnose the root causenot only detect but also diagnose the root cause of performance problems:

� dynaTrace Diagnostics collects necessary contextual behavior data during transaction execution to construct the transaction’s execution path, known as the PurePathPurePathPurePathPurePath®.

� PurePath maps the transaction’s precise execution path, containing relevant sequence, timing, resource usage and contextual information for each method/step the transaction executes.

� If the transaction is executed on multiple servers, whether running on the same or different machines, dynaTrace Diagnostics precisely measures and reveals the PurePath through all of these servers.

� To minimize overhead and impact on application performance, dynaTrace Diagnostics’ embedded, dynamic, lightweight agents offload data they collect and send it to a central Diagnostics Server for efficient, real-time, off-line analysis.

dynaTrace Diagnostics’ unique design enables IT personnel to:

� PPPPrevent performance problemsrevent performance problemsrevent performance problemsrevent performance problems by gaining a better understanding of the dynamic behavior of the applications during development, and

� RRRReduce time to repaireduce time to repaireduce time to repaireduce time to repair by reconstructing the problem transaction quickly from captured data to identify its root cause – enabling repair in minutes, not hours or days.


Introduction Today, a large number of mission-critical business processes are supported by performance sensitive applications. Developers can rapidly create such applications without writing a lot of “infrastructure” code using frameworks such as Java EE, .NET, Ajax and Atlas, etc. These applications can scale quickly by accessing objects and services located on other servers through built-in remoting capabilities – allowing application deployment in a variety of distributed multi-server clustered configurations. SOA and EAI drive this trend further by leveraging existing applications and services in distributed environments.

While such frameworks speed development, they also hide inner workings that can contribute significantly to resource consumption, especially if such capabilities are misused. Consequently, mission-critical applications are often deployed with latent performance issues that surface later in production. Industry surveys reveal that:

� Among companies with $1B or more in revenues, nearly 85% experienced incidents of performance degradation1,

� 40% of the unplanned downtime is due to application failures, and

� The cost of down time of mission-critical applications averages over $100,000/hour2.

Industry surveys also show that:

� IT groups spend 24% of their time in resolving application slow-downs3, and

� 80% of unplanned downtime can be mitigated by application development and operations working together4.

Clearly, IT personnel spend too much time reacting to performance problems. Current tools are ill-suited for resolving application performance bottlenecks: development tools are inappropriate in production environments for many reasons including high overhead; monitoring tools detect but do not provide detailed diagnostic information necessary to resolve performance problems.

In order to reduce the time to resolve such problems, IT personnel need a common, easy to use, low overhead

measurement and analysis system that can efficiently collect necessary and sufficiently detailed diagnostic data, and speed

up root cause analysis.

In this paper, we first develop the requirements for such a system and then introduce dynaTrace Diagnostics, which has been expressly designed to detect and diagnose performance problems throughout the application life-cycle – from development through production – at a very low overhead.

1 Jean-Pierre Garbani, “Best Practices in Problem Management”, Forrester Research, June 23, 2004 2 Theresa Lanowitz, “Delivering Business Value Through Software Quality”, Gartner Symposium IT Expo 2004, October 17-22, 2004. 3 http://www.e-channelnews.com/ec_storydetail.php?ref=412807 referring to Applied Research survey commissioned by Symantec. 4 Theresa Lanowitz, “Delivering Business Value Through Software Quality”, Gartner Symposium IT Expo 2004, October 17-22, 2004.

Performance problems are common in mission critical Java and .NET applications.

Problem resolution takes too much time and resources.


A transaction can suffer performance problem anywhere in its execution path.

Application Performance in Heterogeneous Multi-Server Clustered Environments As discussed, today’s mission critical applications run in heterogeneous multi-server environments. Figure 1 details an example application’s typical transaction flow, which starts at the users Web browser, traverses the Java SE/EE servers for authentication and Web page rendering, executes business logic in the .NET servers, accesses mainframe databases and integrates external systems through Web services.

DATA TIER

MAINFRAME

RDBMS

PRESENTATION TIER

JAVA SE/EE SERVER

BUSINESS TIER

.NET SERVER

EXTERNAL

WEB-SERVICES

Figure 1: A Transaction in multi-server, heterogeneous, clustered application environment

The arrows along the red lines indicate the transaction’s high level execution path. The detailed execution path enumerates the processing steps (method calls, servlet invocation, etc.) and their context through various components in sequence of the transaction’s execution.

Various performance problems can occur during this execution. These problems can lead to a variety of symptoms, some during the transaction’s execution and some well after.

Different transactions can follow different paths in a distributed system.


Addressing symptoms of performance problems does not really address the root cause.

Framework and library code can consume significant resources inadvertently.

Remoting can lead to performance problems.

Causes of Performance Problems An application may present a variety of symptoms of performance problems due to a number of different causes. The causes include non-optimal use of pre-existing software frameworks and/or their built-in remoting capabilities, other design errors, coding errors, resource contention or inappropriate configuration settings. As illustrated in Figure 2, these causes can occur anywhere along a transaction’s execution path.

Figure 2: Typical sources of performance problems in distributed applications

Performance Implications of Frameworks Developers create modern applications rapidly by leveraging frameworks or preexisting infrastructure software libraries. Therefore, when a transaction executes, a significant amount of library code is executed as part of the transaction’s execution path. Application performance therefore depends not only on code that developers write specifically for the application but also on the facilities used from underlying libraries – and the hidden interactions among them. Therefore, developers need to understand the dynamic behavior of the underlying code and choose the right set of capabilities from the framework.

Performance Implications of Distributed Deployment or Remoting For handling large transaction volumes or enhanced scalability, frameworks allow multi-tier software developed in a single application server environment to be easily deployed in multi-server distributed configurations. However, when two application tiers communicate across server and/or machine boundaries, performance can be significantly affected. Such degradation depends upon the serialization or data marshalling costs and network latencies, which in turn depend upon the number of remote calls and the data transferred per call (Figure 3). If the application is not well designed for remoting, code running on one tier can remotely access objects resident on other tiers automatically, resulting in an unexpectedly large number of remote calls or data transfers.

The performance effect of such poor design is generally not apparent during development because developers typically work with single server configurations, and even when they work with multi-server distributed configurations, they test the


software at low loads. Therefore, to eliminate latent performance problems due to remoting, developers need to understand the effect of remoting by examining the dynamic interactions of the components.

Client ApplicationClient ApplicationClient ApplicationClient ApplicationClient ApplicationClient ApplicationClient ApplicationClient Application Total LatencyTotal LatencyTotal LatencyTotal LatencyTotal LatencyTotal LatencyTotal LatencyTotal Latency

StubStubStubStubStubStubStubStub

(De)serialization(De)serialization(De)serialization(De)serialization(Un)marshalling(Un)marshalling(Un)marshalling(Un)marshalling


TransportTransportTransportTransportTransportTransportTransportTransport

ConversionConversionConversionConversionLatencyLatencyLatencyLatency

ConversionConversionConversionConversionLatencyLatencyLatencyLatency

TCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IP NetworkNetworkNetworkNetworkLatencyLatencyLatencyLatency

NetworkNetworkNetworkNetworkLatencyLatencyLatencyLatency

Server ApplicationServer ApplicationServer ApplicationServer ApplicationServer ApplicationServer ApplicationServer ApplicationServer Application

DispatcherDispatcherDispatcherDispatcherDispatcherDispatcherDispatcherDispatcher



TransportTransportTransportTransportTransportTransportTransportTransport

TCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IPTCP/IP

Figure 3: Latency introduced by remoting.

Fixing Performance Problems Table 1 enumerates a number of symptoms of performance problems and their probable causes. Clearly, there can be many causes for each symptom. This implies that the symptom of a performance problem does not explicitly reveal the cause of the problem. Often what is thought of as a cause is really a symptom and one may need to drill down recursively to find the root cause.

To find the root cause, it is important – and in many cases imperative – to identify the individual transaction(s) experiencing the performance problems and their execution path in the environment in which they are executing. This data must be sufficient for properly and efficiently diagnosing the problem. Without such information, performing problem diagnosis is the same as shooting in the dark and it is easy to jump to the wrong conclusions.

Finding and fixing the root cause of the problem is generally not possible without knowing the transaction’s actual execution path.


Symptom Sample Causes

High response time for specific transactions or most transactions

Excessive resource consumption by transaction Too much synchronization wait time Too much time to get inside the connection or server pool Improper settings such as pool size Excessive delay for external web-services Undersized system

Erratic transaction response time

Excessive garbage collection High resource utilization Erratic response of external web-services

Application failures or time outs

Programming errors – improper error condition handling Data specific problems Memory exhaustion Memory leaks Socket exhaustion File handler exhaustion

High CPU utilization

Poor/Inefficient algorithms Poor design choices consuming significant time in underlying layers Poor implementation – redundant work Undersized system Improper transaction routing

High memory utilization or too frequent garbage collection

Memory leaks Objects persist for unnecessarily long time Pool size too large Undersized system Lots of short lived objects

High network utilization between servers

Too many remoting calls Too much data transfer per call – poor design; lack of cohesion

High IO Rate Too many SQL calls – improper database or query design Poor/Inefficient algorithms Insufficient cache Pool size too large for configuration leading to thrashing

Too high synchronization delays

Poor algorithm design - not enough parallelism Excessive execution time for sub-transactions Locks being held for too long

Excessive resource consumption by transaction

Poor algorithms Poor design choices consuming significant time in underlying layers Poor implementation – redundant work Too many remote calls Too much data transfer for remote calls Objects held for too long Poor SQL query and/or database design

Long pool queue or utilization

Too much resource consumption by transactions Large transaction execution time for other reasons including too much remoting, large synchronization delays or delays for external services or database queries Incorrectly sized pool

Table 1: Performance problem symptoms and typical causes


Debuggers, loggers and code profilers do not directly support analysis of distributed applications and do not work in production systems.

Traditional Tools for Application Performance Management As noted earlier, it is important for the developers to understand the dynamic behavior and performance characteristics of their design choices in order to build a well performing application. When a transaction experiences performance problems, to really fix the problems, it is critical that IT personnel do proper diagnosis and identify the exact causes and locations of the deficiencies. Proper tools are needed to perform the job and we therefore discuss the effectiveness of traditional tools in preventing and eliminating performance problems. Traditional performance problem detection and resolution tools fall into the following broad categories:

� Developer tools including debuggers, loggers and other forms of custom instrumentation, code profilers, and

� Administrator tools, which primarily include server monitors and system utilities.

Developer Tools DebuggersDebuggersDebuggersDebuggers are an integral part of the developers’ tool kit. They enable developers to go through specific execution steps at a controlled pace and allow them to focus on a specific small area of interest at a time but do not significantly enhance the overall understanding of the interactions among components and layers. Debuggers are not suited for use in production or load-test environments because, for example, (a) they stall the application, making performance measurements impossible, (b) they create high overhead, and (c) they require users to be expert programmers with access to source code, and (d) one can look only at one thread at a time causing timeouts (e.g. XA transaction timeout or servlet timeout).

Creation of custom instrumentationcustom instrumentationcustom instrumentationcustom instrumentation such as using loggers or custom output routines requires access to the source code and the advance knowledge of what one needs to monitor for solving the problem. While it may some times be appropriate to instrument the code during the development phase, it is generally not practical for solving problems found during the later phases for many reasons, including (a) source code may need to be changed to do the necessary instrumentation, (b) since custom instrumentation is often written as in-line code, it can potentially change the behavior of the code, and as a result, either mask existing problems or introduce new performance problems, (c) the analysis of the output produced by such instrumentation is generally too laborious and time consuming, and (d) correlation of log messages across transactions often requires too much effort or may even be impossible.

Code profilersCode profilersCode profilersCode profilers are useful in development to understand which pieces of code consume the most CPU and for doing some statistical code optimization. However, their lack of support for distributed and heterogeneous application environments and the limited insights they provide into the dynamic application behavior – in particular due to the statistical nature of profiler output (averages and percent distributions) and the lack of context information required to reconstruct or even understand performance problems – prevent developers from using profilers in diagnosing and resolving transaction performance problems discovered in later life-


Monitoring tools do not provide sufficient detail to eliminate the root cause of the performance problems and force one to alleviate the symptoms by tuning server configuration.

Monitors do not provide sufficient data to reconstruct the problem scenario, limiting the ability to reproduce problem which results into large time to repair.

cycle phases. Further, since code profilers introduce large overheads, sometimes as large as 10x to 10,000x, they cannot be run in real load-test or production environments.

Administrator Tools MMMMonitors and system utilitiesonitors and system utilitiesonitors and system utilitiesonitors and system utilities provide overall usage and performance statistics on the server at reasonable overhead. Looking at this class of data, a server administrator can potentially guess at the problem and tune the application or server configuration.5 Even when the tuning action provides performance relief, it may not necessarily address the root cause of the problem and may shift the bottleneck to elsewhere in the system. In addition, while the aggregate data provided by a monitor may be useful to a skilled administrator in alleviating consistently or regularly recurring symptoms, it does not help with resolving problems that appear sporadically.

When monitoring utilities provide specific information about transactions, the information is generally limited and cumbersome to obtain. For example, some monitors require the user to specify the transactions to be monitored in-depth. Others provide intermediate information such as servlet response time, but do not provide contextual information about what the servlets are actually executing. Even those monitors that attempt to provide some execution context do not provide enough information to determine the root cause. And some limit the user to monitor only certain transactions under certain conditions thus eliminating true application-wide analysis.

The situation degrades when dealing with multi-server transactions because traditional monitors measure the individual transaction’s behavior within a single server and cannot track the end-to-end execution of the transaction across multiple servers6. Hence, an engineer has to infer transaction behavior such as transaction routing using only aggregate, statistical information (average, max, min, etc.) available at the server level. Consequently, the engineer can know the likelihood that a transaction is executed on a certain server but cannot possibly identify the exact conditions, interactions and execution paths that led to the problem, limiting his ability to reconstruct the problem scenario. This lack of visibility in the application behavior forces the engineer to identify root cause through trial and error, resulting in long and cumbersome repair times.

The Need For a Better Solution Clearly traditional tools are insufficient for understanding the performance implications of software design and the application behavior in production systems. They are, therefore, inadequate for proactively reducing the risk of performance problems or for rapidly resolving performance problems when they do occur.

5 For example, increase the JDBC connection pool size or increase the heap size. 6 Such monitors can monitor transaction of a multi-tier application only when all tiers run in the same server. They are unable to monitor multi-tier transactions when the tiers run on more than one server, irrespective of whether the servers run on the same physical machine or run in a distributed environment.


Effective application performance management needs a life-cycle approach.

Performance Management in the Application Life-Cycle Developers tend to work in single server environments and performance problems often go unnoticed until later in the life cycle when:

� QA finishes functional testing and starts verifying performance characteristics

� Performance analysts start performance and longevity testing to develop multi-server configuration guidelines

� Operations deploys the application into production

� Customers complain or abandon transactions after the system has gone live

Due to the complexity of performance problems, it often takes a painful amount of time before the root cause is identified. A contentious situation such as the one depicted in Figure 4 often results before the problem is resolved.

If only, thedevelopershad a clueabout whatthe realenviromentis like.

Only ifQA/Ops willhire someonewho canprogram!

It worksin myenviroment.

I cannotreproducetheproblem.

Come backwhen you canreproduce itor get moredetails.

We gotperformancealert at 2AM forthe checkouttransaction.

What more doyou want?I gave youeverything thatthe system gaveme.

ThisProblemhas beenthere for1 month!Fix it now!

Figure 4: A consequence when performance problems are hard to diagnose

To prevent such troublesome situations and to deliver a high performance system, IT must pay attention to performance issues throughout the application life-cycle.


Figure 5 outlines performance related roles and responsibilities of different players and the information flow among them.

DevelopmentDevelopmentDevelopmentDevelopment QualityQualityQualityQualityAssuranceAssuranceAssuranceAssurance

StagingStagingStagingStaging,,,,DeploymentDeploymentDeploymentDeployment ProductionProductionProductionProduction

ArchitectsArchitectsArchitectsArchitects,,,,DevelopersDevelopersDevelopersDevelopers TestersTestersTestersTesters System System System System ArchitectsArchitectsArchitectsArchitects,,,,

Performance Performance Performance Performance AnalystsAnalystsAnalystsAnalysts OperationsOperationsOperationsOperations

Phase

Players

TypicalEnvironment

PerformanceInformation

•Monitoring and Diagnostic ToolsMonitoring and Diagnostic ToolsMonitoring and Diagnostic ToolsMonitoring and Diagnostic Tools•Recommended settings and thresholdsRecommended settings and thresholdsRecommended settings and thresholdsRecommended settings and thresholds•Expected performance characteristics, potential bottlenecksExpected performance characteristics, potential bottlenecksExpected performance characteristics, potential bottlenecksExpected performance characteristics, potential bottlenecksand key performance indicatorsand key performance indicatorsand key performance indicatorsand key performance indicators

•Performance and behavior changes between versionsPerformance and behavior changes between versionsPerformance and behavior changes between versionsPerformance and behavior changes between versions•Performance and scalability reportsPerformance and scalability reportsPerformance and scalability reportsPerformance and scalability reports•Dynamic components interaction characteristicsDynamic components interaction characteristicsDynamic components interaction characteristicsDynamic components interaction characteristics•Data for offline analysis and problem reconstructionData for offline analysis and problem reconstructionData for offline analysis and problem reconstructionData for offline analysis and problem reconstruction

Design for performanceDesign for performanceDesign for performanceDesign for performance

Specify performance Specify performance Specify performance Specify performance management toolsmanagement toolsmanagement toolsmanagement tools

Understand dynamic Understand dynamic Understand dynamic Understand dynamic application behaviorapplication behaviorapplication behaviorapplication behavior

Reconstruct and Reconstruct and Reconstruct and Reconstruct and fix problemsfix problemsfix problemsfix problems

DiscoverandDiscoverandDiscoverandDiscoverand identify identify identify identify performance issues performance issues performance issues performance issues

Report quality trends for Report quality trends for Report quality trends for Report quality trends for every application every application every application every application

componentcomponentcomponentcomponent

Isolate and documentIsolate and documentIsolate and documentIsolate and documentperformance issuesperformance issuesperformance issuesperformance issues

Tune performanceTune performanceTune performanceTune performanceunder realunder realunder realunder real----worldworldworldworld

conditionsconditionsconditionsconditions

Optimize configuration Optimize configuration Optimize configuration Optimize configuration for scalabilityfor scalabilityfor scalabilityfor scalability

Isolate and documentIsolate and documentIsolate and documentIsolate and documentperformance issuesperformance issuesperformance issuesperformance issues

Alert on and document Alert on and document Alert on and document Alert on and document performance issues performance issues performance issues performance issues

24x724x724x724x7

Recognize performance Recognize performance Recognize performance Recognize performance trendstrendstrendstrends

Triage performance Triage performance Triage performance Triage performance issuesissuesissuesissues

Minimize downtimeMinimize downtimeMinimize downtimeMinimize downtime

PerformanceEngineering

Responsibilities

Instrument and Measure Transaction Behavior. Monitor Fulfillment of Service Level Agreements.

Reveal Software’s Dynamic Behavior and Performance Implications.Detect, Diagnose and Resolve Application Performance Problems

throughout Application Life-Cycle.

Application Performance Management

Solution

Figure 5: Performance roles, responsibilities and information flow during application life-cycle


Effective APM requires a common tool that all IT personnel, including developers, testers, system architects, performance analysts, administrators and operators, can use effectively.

Application Performance Management Solution Requirements As one considers Figure 5 above and the implications of modern software development frameworks and remoting discussed earlier, it becomes evident that:

During development:

� Developers need tools to analyze the dynamic behavior of the application through its underlying layers for understanding the performance implications of design alternatives,

During post-development (QA, Staging or Pre-deployment and Operational):

� Non-development IT personnel, particularly Operations, need a tool to automatically detect performance anomalies such as SLA violations and capture all necessary and relevant information for problem reconstruction. This tool should require neither programming skills nor access to the source code.

� Non-development IT personnel need an easy to use tool for doing high-level triage and providing full in-depth code level diagnostics information to system architects and development.

� Developers need detailed data with complete transaction context and step-by-step execution details from production/load-test environments for off-line diagnosis so that they can reconstruct what happened rather than make repeated, laborious attempts to reproduce problems.

� Organizations need tight cooperation and efficient, productive communication between all stakeholders responsible for application performance. To reinforce this point, note that IT personnel from different groups interact tremendously throughout the application life-cycle on performance matters (Figure 5) and recognize that non-developers typically identify performance problems and developers resolve them.

Traditional application performance management tools do not meet the requirements for rapidly identifying, diagnosing and resolving the performance problems

throughout the application life-cycle.

A next generation solution is needed.

dynaTrace Diagnostics® is such a solution


A transaction’s PurePath identifies the code executed by the transaction, the execution context and the server on which it was executed.

dynaTrace Diagnostics dynaTrace Diagnostics® is an application performance management solution, which fulfills the measurement and diagnosis requirements that have been identified in this paper. Specifically designed to support the entire application life-cycle, dynaTrace’s PurePath® technology captures essential information for all transactions during their execution across multiple servers in heterogeneous distributed environments at very low overhead (Figure 6). This enables IT personnel to:

� Understand the dynamic behavior of the software so that, where possible, performance problems can be prevented, and

� Detect and diagnose performance problems so they can be quickly resolved whenever they occur.

In addition to high-level performance indicators, dynaTrace Diagnostics maps out the precise execution path - the PurePath - of each individual transaction from its entry at the first monitored server, through all other servers where it is processed, across system, technology and component boundaries.

PurePath uses KnowledgeSensors™ to capture all performance and relevant context information with minimum performance overhead.

Figure 6: dynaTrace Diagnostics visualizes traces of individual transactions in distributed heterogeneous

environments

KnowledgeSensors mark a transaction’s progress along its execution path and identify all transaction entry points (e.g., Java Servlet invocations) and method calls, as well as their sequence and nesting. For each transaction, the


PurePath allows one to reconstruct problem scenario without trial and error.

dynaTrace Diagnostics allows you to diagnose problems efficiently, whether they are visible to the users or not.

KnowledgeSensors record performance information like method call sequence, arguments, return values, exceptions, log messages, elapsed time and resource utilization statistics such as CPU usage, IO usage, network traffic, objects created, SQL calls, remote calls, and synchronization delays; see Figure 7 below.

dynaTrace Diagnostics records the PurePath for all transactions at very low overhead and sends it to the Diagnostics server for analysis. This ensures that IT:

� Has a complete record of execution for all transactions - to get 100% monitoring coverage (and not to miss issues),

� Has a record of each transaction’s execution across application tiers, servers and machines - to analyze every potential issue,

� Can actually see the transaction’s execution path, and avoid guessing by trying to follow its execution from one server to the next,

� Can understand the dynamic behavior of the software,

� Can determine the root cause of the problem experienced by a specific transaction the first time – without having to pre-specify what transactions to monitor and then wait for the problem to recur,

� Can recreate problem scenarios, including problem transactions, from recorded data and pinpoint the exact cause of performance problems quickly, avoiding traditional, expensive trial and error approach,

� Can diagnose problems in near real time or afterwards, and

� Can diagnose problems off-line without loading the production systems.

Expensive co

mponent

implementaton 5 s

Click on

TransferFunds 68 s

Synchronisation

Delay 24 s

Memory Leak

172 SQL calls 3

9 s80000 Objects

over SOA calls

Context

(method

arguments,

return values)

exceptions,

log messages,

timing & resource

usage

100 remote calls

10 MB transfered

Figure 7: A transaction's PurePath shows where it experiences performance problems

Efficient Diagnostics dynaTrace Diagnostics enables engineers to diagnose problems efficiently. Given PurePath information, they need not spend time trying to reproduce the problem. They can analyze the problem by performing either:

�� Outside-In diagnosis beginning with an incident of a user-visible performance problem, such as a slow-responding transaction or user-visible exception message and drilling down until the root cause is identified, or


Drill down through a transaction’s PurePath to determine the root cause of its performance problems.

�� Inside-Out diagnosis beginning with an internal measure of the performance problem, such as an exception message or a method running very slowly, identify associated transactions and drill down through their PurePaths to identify the root cause.

When certain transactions do not meet service level or performance requirements, dynaTrace Diagnostics’ intuitive console allows IT personnel to drill down through the transaction’s PurePath to identify the root cause(s) of performance problems (Figure 8) such as:

� Execution steps (e.g., method calls and servlet executions) that consume too many resources or run slowly,

� Excessively called methods or servlets, even if in framework software,

� Code that makes an excessive number of SQL calls or long running SQL calls,

� Excessive wait for resources such as execution threads or connection,

� Threads and locks causing synchronization delays,

� Components that make excessive remote calls,

� Remote calls doing excessive data transfer,

� Remote method or web-services calls taking too much time,

� Code where memory leaks occur, and

� Code where large number of short-lived objects are created and destroyed.

Figure 8: dynaTrace Diagnostics PurePath allows engineers to reconstruct the problem by viewing the exact call sequence including performance metrics and detailed context information and unearth the root cause of the

problems.


dynaTrace Diagnostics helps resolve performance problems before customers experience or report them.

When internal problems such as massive memory consumption or server crashes are encountered, SLA violations are detected, or a comparative analysis of historical data reveals potential performance issues (Figure 9), IT personnel use dynaTrace Diagnostics to identify transactions with a high contribution to the symptoms. By drilling down into these transactions’ PurePaths, they can better understand the context that leads to such high contributions. By using this context information and the ability to recreate transactions, they can quickly identify the root causes.

Figure 9: dynaTrace Diagnostics dashboard highlights problems, allows trend analysis, and historic comparisons

such as of different application versions

For example, consider two situations which may not be visible – yet – to the users: an exception message and a slow running method. With dynaTrace Diagnostics

�� When an exception message is found in the logs, one can identify:

• The transaction and its parameters that led to the exception,

• Actual method call that generated the exception message,

• The parameters passed to this method call as well as to all of its predecessors, including the parameters input by the user, and thus,

• The root cause such as user error, insufficient error handling in code, other logic errors, or system conditions such as out of disk space.

�� When one identifies a slow running method, one can quickly determine:

• Whether the method runs slow constantly or just from time to time,


dynaTrace Diagnostics allows one to monitor the entire software stack – from the custom application code to run time environment of the virtual machine for custom and packaged application software.

dynaTrace Diagnostics helps address the problems in right priority, before they affect many users.

• Transactions that execute this method,

• Transactions that execute and bog down in this method, and

• The break-down of such transaction's execution time in this method into the time taken by underlying method calls and queries, and

• The core method that needs to be corrected to achieve higher performance.

Note that since dynaTrace Diagnostics maps the performance of individual transactions, rather than just the aggregate performance of all transactions or a class of transactions, it allows IT personnel to:

�� Determine a transaction’s business value by looking at its parameters, allowing them to prioritize different incidents and focus energies on the most valuable issues.

�� Address performance issues in their infancy – when they show up for a few transactions – before those affect a large number of users and have a negative impact on business.

Out-of-the-box, Extensible Diagnostics dynaTrace Diagnostics comes with an array of ready-to use KnowledgeSensors for a variety of commercial and open source:

� run-time virtual machine environments

� database access layers

� application platforms and servers

� remoting libraries

� web services stacks

� messaging libraries and frameworks covering the entire software stack (Figure 10).

Application Server

Runtime Environment

Frameworks

Sun Java AS, IBM WebSphere,BEA WebLogic, JBoss, Apache TomcatOracle AS, SAP Netweaver

Sun JVM, IBM JVM,BEA JRockit

IBM WebSphere, BEA WebLogic, AXIS, Web Methods, Glue

WebService Stacks

RMI(IIOP, JRMP, HTTP(s), T3)Visibroker, IIOP/ORBS

Remoting

IBM WebSphere MQ, IBM CICS Transaction ServerBEA T3 (RMI, JMS),

Messaging

Spring, Toplink, Struts, Ajax

Application Platform J2EE, JSEE

Application Java application

Database Access Layer

SQL, JDBC, Hibernate

Microsoft Windows AS

Microsoft CLR

.NET/WCF

.NET(WCF)

ADO.NET, ASP.NET

Atlas

.NET

.NET application in C# or other languages

SQL, ADO.NET

CustomKnowledge

Sensors

Pre-builtKnowledge

Sensors

ShippedWith

dynaTraceDiagnostics

Figure 10: KnowledgeSensors capture transaction execution through all software layers

(For up-to-date list, please visit www.dynaTrace.com)


These pre-built KnowledgeSensors encapsulate deep knowledge enabling IT personnel to manage performance in their environments right out of the box – without any effort spent on customization.

For more detail into custom applications such as a policy quotation system for an insurance company for example, developers can easily define and package KnowledgeSensors for their own applications using dynaTrace Diagnostics’ point and click interface, then ‘hot’ deploy them to the target environment.

For packaged applications such as SAP ERP, application developers can easily define and package KnowledgeSensors for those applications. These packages can either be shipped with the application or separately. Alternately, IT personnel at the licensed organizations, or third parties, can define KnowledgeSensors for that application without needing access to the application’s source code, package them, and then deploy on their own.

Compare For Yourself Earlier, we asserted that dynaTrace Diagnostics is the only solution available on the market that meets the requirements for efficient performance problem detection and analysis that can be used throughout the application life-cycle by allallallall members of the IT team. We invite you to scan Table 2 to compare other products that you may be familiar to dynaTrace Diagnostics.

We are confident that you will agree that dynaTrace Diagnostics fits the bill perfectly while other solutions

fall significantly short


Compare Yourself Key Diagnostics CapabilitiesKey Diagnostics CapabilitiesKey Diagnostics CapabilitiesKey Diagnostics Capabilities d

ynaTrace

dynaTrace

dynaTrace

dynaTrace

Diagnostics

Diagnostics

Diagnostics

Diagnostics

other

other

other

other

vendor

vendor

vendor

vendor

Diagnosis Depth RequirementsDiagnosis Depth RequirementsDiagnosis Depth RequirementsDiagnosis Depth Requirements

Capture necessary data for each individual transaction, and not just average transaction measurements, in load testing and 24x7 production environments, enabling diagnosis of the business-critical outlier transactions.

��

Capture all performance and contextual data that is required for reconstructing a performance problem – thus eliminating the need to reproduce it – and quickly identifying the code where the performance problem occurs. Such data should include method response times, remoting performance and payload metrics, synchronization metrics, method and Web request arguments, log messages and exceptions.

��

Reveal the relationships among events such as exceptions, log messages, input metrics, SQL executions and performance threshold violations by associating them with transactions to identify the root-cause.

��

Analyze transaction metrics in context of server resource metrics to determine whether the performance problem is caused by configuration issues or programming issues.

��

Diagnose memory leaks, even in production environments. ��

Precisely trace execution of each transaction across multiple servers (logical or physical) and clients to understand its impact on each server and application component as well as to understand implications of remoting to design high performance distributed applications using SOA, Web-Services, etc.

��

Application LifeApplication LifeApplication LifeApplication Life----Cycle RequirementsCycle RequirementsCycle RequirementsCycle Requirements

Provide real-time data to Operations, down to the code level, for each and every individual transaction for high-level problem triage and to performance analysts and system architects for live root-cause analysis

��

Provide offline code-level diagnosis capabilities that enable developers and architects to interactively diagnose all individual transactions for reconstructing, isolating and resolving the performance problem, eliminating the need to reproduce the problem.

��

Capture necessary performance data in QA and production environments and transfer the information to engineering for analysis, potentially on another system, eliminating the need for having developers on site to debug performance problems or for having to spend significant amount of time on reproducing the problem.

��

Provide automated performance comparison reports, down to individual transactions and code level, among subsequent diagnosis sessions for evaluating the success of performance tuning activities, comparing different application versions and configurations and understanding the root-cause of the differences.

��

Enable engineers, architects and performance analysts to define measurement granularity, so that they get from QA and operations exactly what they need.

��

Store and maintain the performance data for long term historical and trend analysis. ��

Integrate with IDEs, automated build and test systems, load testing tools, issue-tracking systems and enterprise management systems to enhance the productivity of IT personnel throughout the application-life-cycle.

��

Deployment and Deployment and Deployment and Deployment and Operational RequirementsOperational RequirementsOperational RequirementsOperational Requirements

Configuration-free agents for automated, centralized deployment. ��

Centralized management of agents with automated and real-time remote configuration updates to quickly and easily adapt the depth and granularity of captured diagnostics data on the fly, without having to restart the application.

��

Auto-discover application components for out-of-the-box diagnosis results and intuitive customization. ��

Continuous measurement and diagnosis in load testing and 24x7 production environments through lightweight agent technology at negligible CPU overheads and flat memory usage of a few megabytes.

��

Monitor service levels at individual transaction level and alert on violations. Automatically capture history of all transactions including deep diagnostics data for off-line root-cause analysis to eliminate the need for problem reproduction.

��

Map transactions to requests, users and application functionality to prioritize problem resolution based on business impact.

��

User Interface and Usability RequirementsUser Interface and Usability RequirementsUser Interface and Usability RequirementsUser Interface and Usability Requirements

Simple, intuitive yet comprehensive and responsive user interface that does not require detailed programming knowledge but still provides information that programming experts can use.

��

Uses nomenclature and presents statistics that are relevant to and usable by all members of the IT team, whether they are developers, testers, system architects or server administrators.

��

Serves as the common solution to be used by developers and non-developers for capturing, storing and analyzing performance data throughout the application life-cycle, reducing time to repair.

��

Table 2: Comparison tool for evaluating application performance management solutions


Conclusion Consistently delivering high application performance in today’s complex multi-server heterogeneous distributed environments is a daunting task. The ability to diagnose and resolve performance problems rapidly is critical to achieve this goal. Therefore, architects and managers should think beyond traditional performance management paradigms and establish effective systems and processes throughout the entire application life-cycle. The ability to

�� Capture, at production-safe overhead, the detailed execution information for each transaction, during its execution,

�� Reliably reconstruct problem scenarios from the captured information, and

�� Quickly analyze this information to determine true root cause

are keys to fixing application performance problems quickly and easily.

With its innovative PurePath instrumentation technology, low-overhead dynamic monitoring, intuitive user interface featuring end-to-end visualization and analysis capabilities, and integration with IDEs and enterprise management frameworks, dynaTrace Diagnostics truly represents the next generation of solutions explicitly designed for use by all IT personnel throughout the application life-cycle.

dynaTrace Diagnostics goes far beyond monitoring and enables IT to take productive and efficient action to fix performance problems.

dynaTrace Diagnostics enables IT to:

�� Study applications’ dynamic behavior during development to eliminate redundant calls, inefficient objects and algorithms, and tune caches and configurations,

�� Fix the root cause of performance problems rather than mitigate or hide them by system tuning actions alone,

�� Unearth poorly performing transactions even when overall averages are within acceptable range and allow engineers to take corrective action before problems explode on a large scale,

�� Focus their energies on addressing troublesome transactions or hotspots, rather than trying unnecessarily to reduce overall average response times at considerable expense,

�� Identify and focus on business-critical applications or transactions rather than working harder to improve the performance of all transactions using the same server,

�� Perform their investigation offline without having to pre-define what data needs to be saved and without having to spend lot of time reproducing the problem,

�� Give the recorded data to the engineers for analysis, allowing everyone to focus on their primary duties, and thus,

�� Bridge the communication gap between system administrators, testers, performance analysts and developers.

Consequently, dynaTrace Diagnostics proactively averts performance problems and reduces time-to-repair. Its life-cycle-centric design enables IT personnel to work together efficiently and effectively to deliver high performance consistently in complex heterogeneous multi-server clustered systems.

We invite you to learn more at www.dynaTrace.com.www.dynaTrace.com.www.dynaTrace.com.www.dynaTrace.com.


Headquarter EMEA: dynaTrace softwHeadquarter EMEA: dynaTrace softwHeadquarter EMEA: dynaTrace softwHeadquarter EMEA: dynaTrace software GmbHare GmbHare GmbHare GmbH Freistädter Str. 313, 4040 Linz, Austria/Europe, T+ 43 (732) 908208, F +43 (732) 210100.008

Headquarter North America: dynaTrace software Inc,Headquarter North America: dynaTrace software Inc,Headquarter North America: dynaTrace software Inc,Headquarter North America: dynaTrace software Inc, West Street 200, Waltham, MA 02451, USA, T +1 (339) 9330317 F +1 (781) 2075365 E: [email protected]

All rights reservedAll rights reservedAll rights reservedAll rights reserved dynaTrace software is a registered trademark of dynaTrace software GmbH. All other marks and names mentioned herein may be trademarks of other respective companies. (070522)

Dyna Trace Whitepaper Performance

Technology

Transcript of Dyna Trace Whitepaper Performance