Match 31-bit WebSphere Application Server performance with
new features in 64-bit Java on System z Solve 31-bit virtual memory crowding issues without sacrificing
performance by using heap compression and large pages
Kishor Patil, Marcel Mitran, Jim Cunningham Software developers, IBM Software Group Applications and Integration Middleware
May 2009
© Copyright IBM Corporation, 2009.
Table of contents Abstract........................................................................................................................................1
Introduction .................................................................................................................................2
Prerequisites ...............................................................................................................................3 Large heap support for 64-bit Java in compressed references mode ..................................................... 3 Large page support.................................................................................................................................. 3
Compressed references support...............................................................................................4 Java object shape and compressed references ...................................................................................... 4 Object header compression..................................................................................................................... 5 Object reference compression schemes ................................................................................................. 5 Object reference compression for a heap size of 2 GB or less ............................................................... 5 Object reference compression for heap sizes greater than 2 GB ........................................................... 5 Special compression optimization for 2 GB to 6 GB heap sizes ............................................................. 6 Compressed references support on IBM z/OS........................................................................................ 7 Compressed references support on Linux on System z:......................................................................... 7 IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs option: ................................................. 7
Large page support.....................................................................................................................8 Large page support on IBM z/OS ............................................................................................................ 8 Large page support in Linux on System z ............................................................................................... 9 Java 6 exploitation of large page support ................................................................................................ 9
Using verbose:gc log for compressed references and large pages ....................................10
Performance analysis and guidelines.....................................................................................11 Multithreaded Java benchmark on z/OS................................................................................................ 11 Performance projections for 64-bit compressed references with heap sizes greater than 2 GB .......... 12 DayTrader benchmark running on IBM WebSphere Application Server V7 for z/OS ........................... 13 Java Heap footprint savings using compressed references .................................................................. 14 Garbage collection time savings using compressed references ........................................................... 15
Setting Java options in WebSphere Application Server .......................................................16 Converting a migrated WebSphere Application Server on IBM z/OS to run in 64-bit mode: ................ 16 Enabling compressed references mode in 64-bit WebSphere Application Server V7 on IBM z/OS:.... 16 Enabling large page support for Java heaps in WebSphere Application Server on IBM z/OS: ............ 17 Enabling compressed references and/or large page support options in IBM WebSphere Application Server V7 on Linux on System z: .......................................................................................................... 18
Conclusion.................................................................................................................................19
Resources..................................................................................................................................20
About the authors .....................................................................................................................21 Acknowledgements:...............................................................................................................................21
Trademarks and special notices..............................................................................................22
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Abstract
This article describes a pair of new features available in the IBM® Developer Kit for Java™ 6, 64-bit edition. The compressed references and large pages features were added to the IBM J9
Java virtual machine (JVM) and IBM Testarossa Just-in-Time (JIT) compiler to provide relief for memory footprint growth incurred when migrating from a 31-bit JVM to a 64-bit JVM. This growth in footprint typically increases system memory requirements while also regressing
throughput performance. This paper shows that it is possible to recover the 31-bit footprint and throughput performance using the 64-bit JVM for heap sizes up to 30 GB. We will review the advantages and disadvantages of using 31-bit SDK and 64-bit SDKs, provide a brief
implementation overview, and discuss the performance characteristics of various combinations of heap sizes and Java options.
All developers are encouraged to read through this article, but the intended audience is
enterprise application developers who are deploying Java workloads on the IBM System z10™ mainframe.
Introduction IBM® Developer Kit for Java™ 6 offers a 31-bit and 64-bit edition of the Java virtual machine (JVM) for the z/OS platform. The 31-bit edition has traditionally provided the best application performance and
footprint. As Java applications grow in complexity and scale, the limited virtual memory range (2 GB) of 31-bit address space puts pressure on Java and native heap usage resulting in out-of-memory errors. As such, there is a growing trend for adopting the 64-bit edition of the JVM. The heap relief provided by the
64-bit edition of the JVM comes at a performance and footprint cost. The overhead of using 64-bit wide object references can require up to 40% more Java heap. The inherently bigger objects also affect data locality and hence contribute to higher Translation Look-aside Buffer (TLB) and data cache miss rates,
resulting in worse application performance.
To overcome this performance bottleneck, IBM has introduced large page support in the latest IBM System z® servers (IBM System z10) and compressed references feature in IBM 64-bit SDK for z/OS,
Java Technology Edition, V6, SDK6. This article describes some best practices for taking advantage of the new hardware and the IBM SDK6 to improve 64-bit footprint and performance.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Prerequisites
Large heap support for 64-bit Java in compressed references mode IBM 64-bit SDK for z/OS, Java Technology Edition, V6, March, 2009 Maintenance Rollup
APAR PK82091 IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version
6, SR4 IBM z/OS V1R7 or later or 64-bit Linux on System z For best performance, IBM System z hardware using the IBM System z10 processors or later IBM z/OS support APAR OA26294 IBM WebSphere Application Server V7.0.0.3 includes the prerequisite Java release Java command line option –Xcompressedrefs to enable this feature
Large page support IBM 64-bit SDK for z/OS, Java Technology Edition, V6, September, 2008 Maintenance Rollup
APAR PK65878 IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version
5 IBM® 31-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version
5 IBM z/OS V1R9 or later or 64-bit Linux on System z™, Kernel 2.6.25 or later (SUSE SLES10
SP2 or RHEL 5) running in an LPAR IBM System z™ hardware using the IBM System z10 processors or later IBM z/OS support APAR OA20902 (only needed for IBM z/OS V1R9) IBM z/OS support APAR OA25485 IBM WebSphere Application Server version V7.0.0.1 includes the prerequisite Java release Java command line option –Xlp to enable this feature
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Compressed references support Many workloads using 31-bit JVM are reaching the limitations of a 31-bit virtual memory range. Moving these workloads to 64-bit JVM is desirable to reduce heap constraints, but comes at the cost of increased
footprint and reduced throughput.
Some workloads have shown up to 45% increase in average object size, as the object header and references double in width. Keeping the same heap size as 31-bit usually results in more frequent
garbage collection. In some cases, the application may experience out-of-memory error, if the garbage collection cannot satisfy the application memory requirements. Using large heap addresses these issues, but results in significant real memory footprint. Data locality is significantly reduced because the data
cache can hold fewer objects, resulting in a higher rate of data cache and TLB misses. As such, application performance is typically worse. For instance, in a multithreaded benchmark, even with a moderately bigger Java heap, we observed a 19% performance gap between 31-bit and 64-bit JVM.
Similarly, for a WebSphere Application Server banking application, we observed 8% performance gap.
Java object shape and compressed references
The figure below describes an object with two references for a 31-bit, 64-bit, and 64-bit compressed references IBM SDK6 runtime environment:
Figure 1. Java object shapes in 31-bit and 64-bit Java VMs
Each object has two parts: the object header, and object instance fields. The object header contains a reference to the class, 32-bit flags, and a monitor word, which holds the thread ID for any owning lock. Padding may be required to enforce alignment constraints.
As Figure 1 shows, the 64-bit object requires twice the memory used by the 31-bit object. When the 64-bit JVM with compressed references is used, the 64-bit object size is reduced back to the same size as the 31-bit object.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Object header compression
The clazz field is a reference to memory that contains class-related data such as static fields, reference to class loader, and the virtual function table. In compressed references mode, the class data is allocated
below the 2 GB virtual address, so it can fit into 32 bits. Similarly, the monitor field is compressed to 32 bits by allocating thread data below the 2 GB virtual address. Since all the fields in the object header require 32 bits, there exists no padding in the object header.
Object reference compression schemes
The object data includes instance fields such as integers, floats, doubles, chars, bytes, and object references. In compressed references mode, 64-bit object references are compressed to 32-bit values by using one of several compression schemes. The different schemes represent a trade-off between the
maximum Java heap size and the path-length incurred for compressing and decompressing the references.
The maximum Java heap is specified by the user by using –Xmx option. Note that even if the application
specifies a smaller starting heap using –Xms, the compression scheme is chosen based on the maximum heap rather than the starting heap.
Object reference compression for a heap size of 2 GB or less
For heap sizes of 2 GB or less, the Java heap is allocated in a virtual address range below 2^32. Within
this address range, the most significant 32 bits of the Java object reference are zeroes, hence only the low word is needed to represent the object reference.
Although a theoretical 4 GB heap can be allocated in this range, operating system restrictions limit the
size of the Java heap to 2 GB. For Linux® on System z, the amount of heap that can be allocated depends on the specifics of the Linux distribution. More details on this will follow in a separate section.
On IBM System z, SDK6 uses 32-bit instructions to store or compare object references. When an object
reference needs to be de-referenced, the high 32 bits need to be cleared to build a well-formed 64-bit pointer. This decompression is achieved by zero-extending the compressed 32-bit value when loading it out of the Java heap.
On the IBM System z9® processor and earlier models, the zero-extension and subsequent de-reference may result in an address generation interlock (AGI) pipeline stall. The IBM System z10 pipeline implements a bypass to remove the stall for this event, thus providing the best performance for 64-bit
compressed references. As mentioned earlier, the System z10 processor also includes large page support, another key feature for 64-bit performance.
Object reference compression for heap sizes greater than 2 GB
Java heaps that are greater than 2 GB cannot be allocated in the 2^32 virtual range which means that the
most significant 32 bits of the references may be non-zero. However, since all objects in the 64-bit Java heap are aligned at 64-bit boundaries, the least-significant 3 bits of the virtual address are always zero. The IBM JVM compresses the object references by right shifting the address by 1, 2, or 3 depending
upon where the top of the Java heap falls, which in turn depends on maximum requested heap size.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Table 1 below summarizes the maximum heap sizes, and respective shift amounts for each reference compression scheme.
Max heap size specified by –Xmx option
Top of the heap located Shift amount used for object reference compression*
2 GB or less below 2^32 0*
6 GB or less below 2^33 1*
14 GB or less below 2^34 2*
30 GB or less below 2^35 3*
Greater than 30GB above 2^35 Only supported without compressed
references
Table 1. Supported shifting modes for compressed references
* Linux on System z kernel and other programs may fragment the virtual memory ranges below 2^35.
This may force the Java heap to be located in a higher virtual memory range resulting in smaller thresholds for maximum heap and different shift values than stated in table 1.
The compression scheme is determined at JVM startup and is dependent on the virtual address range
where the top of the Java heap falls. The same compression scheme is applied to all object references for the life of that JVM.
Compression is applied on an object reference before it is stored in another object’s instance field. As
such, the reference is appropriately shifted right and then only the least significant 32 bits of the shifted value are stored.
When an object reference is read from the heap for the purpose of being de-referenced, it is read as a 32-
bit zero-extended compressed value into a 64-bit register. The value is then shifted left appropriately. The result is a fully formed 64-bit virtual address that can be de-referenced.
The additional shifting operations add path-length that represents a performance cost to the Java
application. This extra path-length is paid in exchange for reduced footprint and improved data locality. In some applications, the benefits of the reduced footprint and increased locality will provide a net gain in overall performance.
Special compression optimization for 2 GB to 6 GB heap sizes
For the shift-by-1 compression scheme (2 GB to 6 GB Java heap), the IBM JVM on System z can, in many cases, remove the cost of the shift operation by exploiting z-architecture’s base/index register
memory references. Explicit shifting is still required for array accesses and the GC runtime. As such, the performance of the shift-by-1 compression scheme is still worse than that of the shift-by-0 compression scheme. However, shift-by-1 does typically perform better than shift-by-2 or shift-by-3 compression
schemes. Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z
© Copyright IBM Corporation, 2009
Compressed references support on IBM z/OS
APAR OA26294 provides a direct assembler interface to z/OS Real Storage Manager (RSM), which allows memory allocations in the 2 GB (2^31) to 32 GB (2^35) virtual address range. The IBM JVM uses
this API to allocate the Java heap in this virtual address range. It is noted that memory allocated using the RSM API can be backed by either 4 KB pages or large pages.
Compressed references support on Linux on System z:
The size and location of the Java heap on Linux on System z will depend on the Linux distribution and
application configuration. In this section, some tricks and tips for allocating the optimal Java heap in the 0 to 2^32 virtual address range are provided.
For Linux on System z, 64-bit executables are always loaded at virtual address 2^31. The Java heap
must be contiguous, hence Java is limited to approximately 2 GB below 2^31 bar. The linker is responsible for selecting the base address at which the application is placed. By creating a custom Java launcher, the default linker script can be modified to use a different base address. The default linker
script can be captured by running "ld –verbose” as such:
% ld –verbose >myLinkerScript
The first line after the SECTIONS statement in the myLinkerScript should read PROVIDE (__executable_start = 0x80000000); . = 0x80000000 + SIZEOF_HEADERS;
The 0x80000000 value can be changed to construct a modified linker script that can be used to link a
custom Java launcher. The new value must be less than 2^32, hence it is best to move the __executable_start to the lower end of the address range. The system memory map should be
inspected to find the right virtual address from which the largest contiguous virtual address range can be
made available for the Java heap. The system memory map is found in /proc/XXXX/maps, where XXXX is a process ID (PID) of the Java process.
IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs option:
When the –Xcompressedrefs option is specified with a –Xmx value less-than 30 GB, the JVM will try to allocate the Java heap in the 2^31 to 2^35 virtual range. The shift amount for object compression will be automatically selected based on where the top of the Java heap is located. The verbose garbage
collection log can be used to find out shift value used by the JVM. Details on this topic will follow in a later section. As discussed earlier, the application performance will depend on the shift value. If the requested heap cannot be allocated below virtual address 2^35, the 64-bit JVM will fail to start (it will not
automatically switch to default mode).
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Large page support Virtual memory provides the illusion that applications can allocate and use memory addressable by the size of a pointer. As such, a 31-bit application can use up to 2 GB (2^31) of virtual memory, while a 64-bit
application can use 2^64 bytes of virtual memory.
The amount of real storage on a system can be much smaller than the amount of virtual memory used. To provide the illusion of large virtual storage, the operating system keeps track of virtual memory ranges
and dynamically maps them to real or absolute real storage ranges using a Dynamic Address Translation (DAT) structures. To improve the performance for virtual-to-real address translation, a special hardware table, called the Translation Look-aside Buffer (TLB), is used to cache recently used virtual-to-real
mappings.
If an application requires a large virtual footprint, the TLB may not provide sufficient addressability to map the application’s working set. In the event that a translation lookup is not found in the TLB, a full lookup in
the DAT structures is required, which can degrade application performance significantly. As the TLB is a fixed-size hardware buffer, the maximum amount of memory that the TLB can map is defined by the page size. In this respect, most standard hardware and operating systems use 4 KB pages. However, with
growing application footprints, support for larger page sizes has recently emerged.
Large page support on IBM z/OS
The IBM System z10 processors introduced support for 1 MB pages. IBM z/OS provides an assembler interface for allocating virtual memory using large pages in z/OS V1R9 through APAR OA20902 and
APAR OA25485. The large pages are defined at z/OS System Initial Program Load (IPL) using the LFArea=xxxxxxG keyword on IEASYSxx. Currently, the large pages are fixed (not swappable) and are backed by real memory. Hence it is advisable to allocate large pages in such a way that other application
code using normal pages could run without encroaching on the large pages. If the system is constrained for 4 KB pages and the large pages are still available, it will convert the available large pages into 4 KB pages. Also should the need for large pages arise, it will try to coalesce previously demoted large pages
(1 MB pages) to satisfy large pages request, and may do so by swapping out some 4 KB pages. However, if large pages are allocated, as they are not swappable, the allocated large pages cannot be converted to 4 KB pages.
Currently, IBM z/OS only supports the allocation of large pages above the 2 GB virtual memory bar; hence they are not available to 31-bit applications. Of note as well, IBM z/OS does not currently support locating 64-bit executable code above 2 GB virtual memory address and as such application executable
code may not reside in memory backed by large pages. However, 64-bit applications can still gain performance value by allocating data in virtual memory backed by large-pages in the following two ways:
1. Since each 1 MB large page represents 256 times more virtual memory than a 4 KB page, fewer
TLB entries are required to represent the data footprint of the application. This more efficient use of TLB resources will result in fewer TLB misses for data accesses.
2. As a result of reducing the number of TLB entries required for the data footprint of the application,
more TLB entries are available for the executable code, hence TLB misses on instruction fetches can be reduced or eliminated.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Our experimentations in a controlled environment showed 7% performance improvement to a Java multithreaded benchmark on z/OS. The current limitations of large page support described above are
subject to change in future z/OS releases.
Large page support in Linux on System z
Large pages can also be exploited on Linux on System z (kernel level 2.6.25 or later). However, there are many differences in setup, applicability, and performance when compared with z/OS.
The Linux on System z kernel supports 2 MB large pages (vs. 1 MB pages on z/OS). It uses the hugetlbfs API to emulate large pages. When the Linux on System z kernel is running in an LPAR on the IBM System z10 hardware, it emulates the 2 MB large pages by using two real large pages. When
running on older hardware, or when running on IBM z/VM, the Linux on System z kernel uses software simulation to provide the same support but with little performance benefit.
Our experimentations in a controlled environment showed 2% performance improvement using large
pages on a Java multithreaded benchmark on Linux on System z.
Linux on System z does not have any restrictions on virtual memory ranges for large page exploitation, so 31-bit applications can also benefit from large pages.
Linux on System z can also use large pages for executable code, so processor stalls due to TLB misses on instruction fetches can be reduced or eliminated.
Java 6 exploitation of large page support
IBM 64-bit SDK for z/OS, Java Technology Edition, V6 has recently introduced support for optionally
allocating Java heap and internal data for the virtual machine using large pages. IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version 5 and IBM® 31-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version 5 also support using large
pages.
By default, the JVM allocates normal 4 KB pages. The user may request that the JVM allocate the Java heap using large pages by specifying the –Xlp option. If the system does not have large pages enabled,
or does not have the required number of large pages available to satisfy the allocation request, the JVM will fall back to using normal 4 KB pages. The user may use –Xverbose:gc option to know what page size was used.
The performance benefit from large pages depends on the application characteristics. If the application was experiencing TLB misses with normal pages, then large pages can provide a significant performance boost. The Java applications that allocate lots of Java objects and cause frequent garbage collection will
typically benefit from large pages. The data access pattern can also affect the benefits of large pages.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Using verbose:gc log for compressed references and large pages IBM J9 garbage collector verbose logs now provide information about the compressed references and large pages mode utilized by the JVM instance. It can be enabled by specifying –verbose:gc command
line option.
In the above garbage collector verbose log, the compressedRefs attribute indicates that the JVM is using compressed references mode. The compressedRefsShift attribute indicates that the compression
shift amount is zero, so the Java heap was allocated below virtual address 2^32.
The pageSize attribute indicates that 1 MB (large) pages are used to back the Java heap.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Performance analysis and guidelines Compressed references and large pages performance were measured with several benchmarks on IBM
System z10 processor-based systems. Results from these measurements are presented in this section. The following charts display performance improvements that are based on measurements obtained using standard IBM benchmarks in a controlled environment. The actual throughput that any user application
will experience depends on the user system configuration, workload, and the application characteristics. Therefore, there is no assurance that an individual user can achieve throughput improvements that are equivalent to the performance ratios stated here. Users may experience significantly better or worse
application performance.
Both compressed references and large pages work to reduce latencies in data access. The two features are complementary to each other but attempt to address the same performance bottleneck. More
specifically, the compressed references feature compresses heap pointers so that more objects fit in the data caches, TLB and pages on the system. Large page support increases the addressable area of the TLB entries, thus increasing the TLB capacity and reducing the number of misses.
Multithreaded Java benchmark on z/OS
This benchmark was run on a 16-way System z10 dedicated z/OS LPAR with 16 GB of memory. The application spawns an increasing number of worker threads to measure scalability of the system. The overall throughput increases as the number of worker threads increases. When reaching system
capacity, the throughput remains fixed and becomes independent of any new worker threads. This benchmark generates a lot of objects and hence drives a lot of garbage collection. As such, improvements in data locality and improvements in GC performance are well showcased by this
benchmark.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Multi-threaded benchmark 64 Bit Compared to 31-bit Performance
z10 16-way z/OS 1.9 Java 6 SR3
64-bit 2 GB 64-bit 2 GB 64-bit 2 GB CompRefs +
64-bit 2 GB Large Pages CompRefs LargePages 10%
5% 5%
1% 0%
-5%
-10%
-12% -15%
-20% -19%
Figure 2. Multithreaded Benchmark Performance Comparison
It is noted that the maximum Java heap available to the 31-bit JVM when running this workload is 1450 MB.
The 64-bit edition of IBM SDK6 can allocate a bigger heap than 1450 MB, but even with increased heap
size of 2 GB, it experiences a 19% drop in throughput when compared with the 31-bit edition. This drop in throughput is due to a 40% increase in heap footprint. The significant loss in data locality due to larger object size results in an increase in data cache and TLB misses in the application code and during
garbage collection.
The effect of TLB misses can be reduced by using large pages to back the Java heap. The 64-bit edition of Java 6 with large (1 MB) pages backed 2 GB reduces the performance gap to 12%.
The effect of data locality can be reduced by using heap compression. When running the 64-bit edition in compressed references with 2 GB heap backed by normal 4 KB pages, it outperforms the 31-bit edition by 1%.
When we put both features together, the 64-bit edition running in compressed references mode with large (1 MB) pages backed 2 GB heap outperforms the 31-bit edition by 5% because of the combined effect of improved data locality and fewer TLB misses.
Performance projections for 64-bit compressed references with heap sizes greater than 2 GB
To measure the overhead of the shift-based compression schemes, performance measurements were done by fixing the Java heap to 2 GB (to allow for a direct comparison to shift-by-0 results). The
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
multithreaded Java benchmark measurements showed that the compression schemes shift-by-2 and shift-by-3 add an overhead of 3.12% over shift-by-0 scheme. For shift-by-1, which benefits from an
optimization to exploit the memory reference instructions available on the z-architecture, the cost of shifting adds only 2.24% performance overhead over shift-by-0 scheme.
However, increasing the heap size beyond 2 GB may reduce garbage collection overhead to the extent
that the extra overhead of shift-by-{1,2,3} compression scheme will be an acceptable trade-off.
This paper focuses on scenarios where moderate increase in heap size (to 2 GB or less) is tolerated to bridge the performance gap between 64-bit and 31-bit. The performance discussions on large heaps
beyond 2 GB, and small heaps less than 2 GB (same as 31-bit) are limited in order to reduce the scope of this paper.
DayTrader benchmark running on IBM WebSphere Application Server V7 for z/OS
The DayTrader benchmark is a stock trading 3-tier application running on IBM WebSphere Application
Server. It is an IBM variant of Apache DayTrader [http://cwiki.apache.org/GMOxDOC20/daytrader.html].
Figure 3. DayTrader benchmark performance comparison
The benchmark was run in a 3-tier setup consisting of a dedicated 12-way System z10 LPAR running WebSphere Application Server Version V7.0.0. on z/OS, a dedicated 8-way System z10 LPAR running
IBM DB2® on z/OS, and two client machines driving an automated workload comprised of 2000 users performing various trading activities.
The baseline measurement is the 31-bit WebSphere Application Server 7 with a 900 MB heap. A 900 MB
heap is the typical maximum 31-bit Java heap when running WebSphere Application Server on zOS.
The 64-bit WebSphere Application Server 7 can allocate a much bigger Java heap than 900 MB. With the 2 GB Java heap, the 64-bit WebSphere Application Server V7 lags behind the 31-bit edition by 7.83%.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
DayTrader 1.2 Java Performance64-bit performance compared to 31-bit base
z10 12+8 3-tier H/W Configuration
-7.83%
-4.85%
-2.46%
0.00%
-10.00%
-8.00%
-6.00%
-4.00%
-2.00%
0.00%
2.00% 64-bit? 2GB
64-bit 2GB+CR
64-bit 2GB+LP
64-bit 2GB+CR+LP
When we used 64-bit WebSphere Application Server V7 in compressed references mode using 2 GB heap, the throughput gap was reduced to 4.85%.
When we used large pages to back the 2 GB heap, but run the 64-bit WebSphere Application Server V7 in default mode, the throughput gap was reduced to 2.46%.
When combining the 2 GB heap with compressed references and large pages, the 64-bit WebSphere
Application Server V7 matches the 31-bit WebSphere Application Server V7 performance.
Java Heap footprint savings using compressed references
In order to completely understand the performance improvements offered by compressed references, it’s important to understand the effect on garbage collection. In a previous section entitled “Java Object Shape and Compressed References”, an example was given that shows the reduction in size of 64-bit objects when using compressed references. This savings is demonstrated in figure 4, which shows the amount of garbage collected per request for the DayTrader benchmark.
JVM Heap Footprint Savings With Compressed References
134
193
131
0
50
100
150
200
250
31-bit 64-bit 64-bit + CR
Gar
bag
e (K
B)
per
Req
ues
t
Figure 4. Amount of garbage collected per request for the DayTrader benchmark
Figure 4 shows that, using a 900 MB heap, the amount of garbage collected grows from 134 KB per
request in 31-bit mode to 193 KB (+44%) using standard 64-bit. However, using 64-bit compressed references this amount is reduced back to equivalence of 31-bit mode. Each value is computed by dividing the amount of heap memory freed for a given time interval by the total number of requests
completed during that time. Verbose GC logging must be enabled in order to acquire the information necessary for this analysis.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Garbage collection time savings using compressed references
The previous section showed the footprint savings realized using 64-bit compressed references. While this allows for increased scalability by freeing space for more objects in an equivalent JVM heap size, it
does not tell a complete story of garbage collection efficiency. Perhaps an even more important metric is the amount of time spent in garbage collection. Figure 5 shows GC time relative to the amount of time required for GC in 31-bit mode using a 900 MB JVM heap size. The time interval used is the same for the
previous section on JVM footprint savings.
Garbage Collection Time Savings With Compressed References
1.00
1.92
1.44
0.90
0.63
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
31-bit 64-bit Standard 64-bit Compressed 64-bit Standard 64-bit Compressed
Tim
e R
ela
tiv
e t
o 3
1-b
it 9
00
MB
He
ap
(900 MB Heap) (900 MB Heap) (900 MB Heap) (2048 MB Heap) (2048 MB Heap)
Figure 5. Time spent doing garbage collection for the DayTrader benchmark
Figure 5 shows that the amount of time spent doing garbage collection using standard 64-bit mode is
almost double (1.92x) that of 31-bit mode. Using 64-bit compressed references mode, this time is reduced to less than 1.5x (1.44). As expected, GC time is reduced even further by increasing the JVM heap size to 2048 MB, but this is due solely to less frequent GC cycles. The magnitude of improvement
of compressed references mode using a 2048 MB heap is similar to the 900 MB case.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Setting Java options in WebSphere Application Server
Converting a migrated WebSphere Application Server on IBM z/OS to run in 64-bit mode:
With WebSphere Application Server V7, the default mode when configuring new servers is 64-bit. The
configuration tool allows the client to configure new servers as 31-bit, but they need to make that selection.
However, when existing servers are migrated, their addressing mode is preserved; that is, a 31-bit server
will be migrated as a 31-bit server, and a 64-bit server will be migrated as a 64-bit server.
To switch the addressing mode of a specific server to 64-bit: Navigate to the Application Server Settings page in the Administrative Console: Servers >
Server Types > WebSphere application servers > server_name. Check the Run in 64-bit JVM mode check box. Recycle the server to make the change effective.
Refer to the WebSphere Application Server 7 Information Center topic entitled "Converting a migrated server to run in 64-bit mode" for considerations when switching addressing modes.
Enabling compressed references mode in 64-bit WebSphere Application Server V7 on IBM z/OS:
To enable a 64-bit JVM to run in the compressed references mode, you need to specify a new environment variable in WebSphere Application Server configuration:
In the administrative console, click: Servers > Server Types > WebSphere application servers
> server_name. Click the Configuration tab, and then under Server Infrastructure section, click Java and
process management > ProcessDefinition > servant. Then in the additional properties section, click Environment entries.
Add/update the environment entry for IBM_JAVA_OPTIONS as follows.
If you see an existing environment entry named IBM_JAVA_OPTIONS, edit it to append the Java option –Xcompressedrefs to the existing value.
Otherwise, click New to create a new environment entry.
Fill in following values in their respective fields of the form:
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Name: IBM_JAVA_OPTIONS Value: -Xcompressedrefs Description: Enable 64-bit Compressed References mode
Click Apply to update the WebSphere Application Server environment.
Restart WebSphere Application Server to start WebSphere Application Server in compressed references mode.
The above procedure updates the ‘was.env’ file in the WebSphere Application Server configuration
directory. The change will apply the settings to all (servant, control, and adjunct) regions.
It is noted that supplying –Xcompressedrefs as a generic JVM argument, will cause WebSphere Application Server to fail to start with unsupported Java option error. If the application requires bigger
than 30 GB Java heap, 64-bit default mode should be used instead.
Enabling large page support for Java heaps in WebSphere Application Server on IBM z/OS:
To use large pages with WebSphere Application Server version 7.0.0., large pages must first be set up on the IBM z/OS system running on IBM System z10 processor. Instructions for how to do this are outlined
in the documentation from IBM z/OS support APAR OA25485.
Since large pages are only available above 2^31 virtual address on IBM z/OS, the WebSphere Application Server needs to run in 64-bit mode.
To use large pages for Java heap, the –Xlp option must be specified in the WebSphere Application Server configuration options as a generic JVM argument section. In the administrative console for WebSphere Application Server, click:
Servers > Server Types > WebSphere application servers > server_name.
Click the Configuration tab, and then under Server Infrastructure section, click:
Java and process management > Process definition > servant > Java virtual machine.
Enter the –Xlp command line argument in the Generic JVM arguments field.
When using WebSphere Application Server version 7.0.0.1 on IBM z/OS,–Xlp may be applied to WebSphere Application Server configuration options for adjunct, servant and control region separately
following the same procedure as above.
These changes can be confirmed by checking the configuration files in the WebSphere Application Server configuration directory, adjunct.jvm.options, servant.jvm.options, control.jvm.options.
If limited large pages are available, it is recommended that large pages be applied to the servant region first.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Enabling compressed references and/or large page support options in IBM WebSphere Application Server V7 on Linux on System z:
Both command line options –Xcompressedrefs and –Xlp can be specified as generic JVM arguments to enable compressed references and large pages in WebSphere Application Server running on Linux for
System z.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Conclusion
As applications continue to grow in complexity and scale, the limitations of the 31-bit address space on
z/OS and Linux on System z are becoming more apparent. IBM WebSphere Application Server version 7.0.0.3 uses IBM 64-bit SDK for z/OS, Java Technology Edition, V6, March, 2009 Maintenance Rollup. The latter supports a pair of features for improving the performance of 64-bit Java. Compressed
references and large pages are complementary features that attempt to alleviate the performance bottleneck resulting from the increased footprint when using 64-bit Java. More specifically, compressed references compresses heap pointers so that more objects fit into the hardware data caches, translation
look-aside buffer (TLB), and pages available on the system. Large pages increase the addressable area of the TLB, thus reducing the number of page-table accesses.
By exploiting compressed references, customer applications running in 64-bit mode can now achieve a
Java heap footprint similar to that observed in 31-bit mode. Additionally, these customers can now move the Java heap above the 2^31 virtual address, which frees up below-the-bar storage for other uses, while allowing the Java heap to be backed with large pages. Combining the advantages of larger Java heaps,
heap compression, and large pages, customer applications could observe significantly improved throughput, sometimes out-performing 31-bit performance.
The two flagship performance features of IBM 64-bit SDK for z/OS, Java Technology Edition, V6, (March,
2009 Maintenance Rollup), showcase how IBM is exploiting the latest IBM System z hardware, driving changes in the z/OS operating system, adding intrinsic innovations in compiler technology, and making them available to IBM customers through changes in middleware such as WebSphere Application Server.
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Resources These Web sites provide useful references to supplement the information contained in this document:
WebSphere Compressed References Technology white paper
ftp://ftp.software.ibm.com/software/webserver/appserv/was/WAS_V7_64-bit_performance.pdf
Translation Look Aside Buffer (TLB)
http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer
Large page support on Linux
http://linuxgazette.net/155/krishnakumar.html
SHARE presentation on zOS Real Storage Manager (RSM) Large Page Support
http://ew.share.org/proceedingmod/abstract.cfm?abstract_id=19388&conference_id=20
IBM System z10 support for large pages
http://www.research.ibm.com/journal/rd/531/tzortzatos.pdf
IBM z/OS APAR OA20902 for large page support
http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0
&context=SWG80&q1=large+page+access&uid=isg1OA20902&loc=en_US&cs=utf-8&lang=en
IBM z/OS APAR OA25485 for large page support
http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0&
context=SWG80&q1=large+page+access&uid=isg1OA25485&loc=en_US&cs=utf-8&lang=en
IBM z/OS APAR OA26294 for large compressed references heap support
http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0&
context=SWG80&q1=OA26294&uid=isg1OA26294&loc=en_US&cs=utf-8&lang=en
IBM Developer Kits for Java download
https://www.ibm.com/developerworks/java/jdk/
Apache DayTrader
http://cwiki.apache.org/GMOxDOC20/daytrader.html
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
About the authors
Kishor Patil is a software developer with the IBM Testarossa JIT compiler team at IBM Toronto Lab. You can reach Kishor at [email protected].
Marcel Mitran is a technical manager with the IBM Testarossa JIT compiler team at the IBM Toronto Lab. You can reach Marcel at [email protected].
Jim Cunningham is a performance analyst with IBM WebSphere team at IBM Poughkeepsie Lab. You can reach Jim at [email protected].
Acknowledgements:
The authors would like to thank and acknowledge the following individuals from IBM for their contributions to this paper:
TR-JIT team: Derek Inglis, Joran Siu and Levon Stepanian
Java Performance team: Clark Goodrich, James Perlik and John Rankin
WebSphere Application Server on z/OS team: Mike Cox, Colette Manoni, and William Scott
z/OS RSM team: Elpida Tzortzatos
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009
Trademarks and special notices © Copyright IBM Corporation 2009.
References in this document to IBM products or services do not imply that IBM intends to make them
available in every country.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked
terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other
countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Information is provided "AS IS" without warranty of any kind.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly
available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the
supplier of those products.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending
upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the
ratios stated here.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part
of the materials for this IBM product and use of those Web sites is at your own risk.
Top Related