Java Internals V1.0

Java Internals

Course Details Objectives Understand the JVM Understand the Garbage Collection Understand the JVM Parameters How to analyze the garbage collection logs

Intended audience Project Managers Architects Performance engineers Testers ELT Pre-requisites Basic Concepts of Performance Engineering Operating System Basics Web architecture BasicsThis course deals with Java Internals and explains how the JVM works

IntroductionJava is a programming language originally developed by James Gosling at Sun MicrosystemsJava is a general-purpose, concurrent, class-based, object-oriented language that is specifically designed to have as few implementation dependencies as possible. It is intended to let application developers "write once, run anywhere". Advantages of JAVA Simple: Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than other programming languages. The reason that why Java is much simpler than C++ is because Java uses automatic memory allocation and garbage collection where else C++ requires the programmer to allocate memory and to collect garbage. Object-oriented: Java is object-oriented because programming in Java is centered on creating objects, manipulating objects, and making objects work together. This allows you to create modular programs and reusable code.Platform-independent: One of the most significant advantages of Java is its ability to move easily from one computer system to another.The ability to run the same program on many different systems is crucial to World Wide Web software, and Java succeeds at this by being platform-independent at both the source and binary levels.

IntroductionDistributed: Distributed computing involves several computers on a network working together. Java is designed to make distributed computing easy with the networking capability that is inherently integrated into it.Writing network programs in Java is like sending and receiving data to and from a file. For example, the diagram below shows three programs running on three different systems, communicating with each other to perform a joint taskInterpreted: An interpreter is needed in order to run Java programs. The programs are compiled into Java Virtual Machine code called bytecode.The bytecode is machine independent and is able to run on any machine that has a Java interpreter. With Java, the program need only be compiled once, and the bytecode generated by the Java compiler can run on any platform.Secure: Java is one of the first programming languages to consider security as part of its design. The Java language, compiler, interpreter, and runtime environment were each developed with security in mind. Robust: Robust means reliable and no programming language can really assure reliability. Java puts a lot of emphasis on early checking for possible errors, as Java compilers are able to detect many problems that would first show up during execution time in other languages.Multithreaded: Multithreaded is the capability for a program to perform several tasks simultaneously within a program. In Java, multithreaded programming has been smoothly integrated into it, while in other languages, operating system-specific procedures have to be called in order to enable multithreading. Multithreading is a necessity in visual and network programming.

Java InternalsIBM JDK 1.6

JVMThe IBM Virtual Machine for Java (JVM) is a core component of the Java Runtime Environment (JRE) from IBM. The JVM is a virtualized computing machine that follows a well-defined specification for the runtime requirements of the Java programming languageJVM is called Virtual because it provides a machine interface that does not depend on the underlying operating system and machine hardware architectureJava programs are compiled into bytecodes (Class file) which are then executed in the JVMJVM is specific to a operating system and hardware combinationAll JVMs:Execute code that is defined by a standard known as the class file format (bytecode)Provide fundamental runtime security such as bytecode verificationProvide intrinsic operations such as performing arithmetic and allocating new objects

Java Application Stack

Components of JVMThe JVM API encapsulates all the interaction between external programs and the JVMThe diagnostics component provides Reliability, Availability, and Serviceability (RAS) facilities to the JVM. The IBM Virtual Machine for Java is distinguished by its extensive RAS capabilities. The JVM is designed to be deployed in business-critical operations and includes several trace and debug utilities to assist with problem determination.The memory management component is responsible for the efficient use of the Java Heap

Components of JVMThe class loader component is responsible for supporting Java's dynamic code loading facilities. The dynamic code loading facilities include:Reading standard Java .class filesResolving class definitions in the context of the current runtime environmentVerifying the bytecodes defined by the class file to determine whether the bytecodes are language-legalInitializing the class definition after it is accepted into the managed runtime environmentVarious reflection APIs for introspection on the class and its defined members.The interpreter is the implementation of the stack-based bytecode machine that is defined in the JVM specification. The bytecodes define the logic of the application. It can switch between running bytecodes and handing control to the platform-specific machine-code produced by the JIT compiler (The Just-In-Time (JIT) compiler is a component of the Java Runtime Environment. It improves the performance of Java applications by compiling bytecodes to native machine code at run time.)The platform port layer is an abstraction of the native platform functions that are required by the JVM. Other components of the JVM are written in terms of the platform-neutral platform port layer functions. Further porting of the JVM requires the provision of implementations of the platform port layer facilities.

ClassloaderClass loading loads, verifies, prepares and resolves, and initializes a class from a Java class fileLoading involves obtaining the byte array representing the Java class file.Verification of a Java class file is the process of checking that the class file is structurally well-formed and then inspecting the class file contents to ensure that the code does not attempt to perform operations that are not permitted.Preparation involves the allocation and default initialization of storage space for static class fields. Preparation also creates method tables, which speed up virtual method calls, and object templates, which speed up object creation.Initialization involves the processing of the class's class initialization method, if defined, at which time static class fields are initialized to their user-defined initial values (if specified).The parent-delegation model requires that any request for a class loader to load a given class is first delegated to its parent class loader before the requested class loader tries to load the class itselfThe JVM has three class loaders, each possessing a different scope from which it can load classesBootstrap - responsible for loading only the classes that are from the core Java APIExtensions responsible for loading standard extensions packages in the extensions directoryApplication - responsible for load classes from the local file system, and will load files from the CLASSPATH

JIT CompilerThe Just-In-Time (JIT) compiler is a component of the Java Runtime Environment which improves the performance of Java applications by compiling bytecodes to native machine code at run time.The JIT compiler is enabled by default, and is activated when a Java method is called.When a method has been compiled, the JVM calls the compiled code of that method directly instead of interpreting it.Methods are not compiled the first time they are called. For each method, the JVM maintains a call count, which is incremented every time the method is called. The JVM interprets a method until its call count exceeds a JIT compilation threshold.To help the JIT compiler analyze the method, its bytecodes are first reformulated in an internal representation called trees, which resembles machine code more closely than bytecodes. Analysis and optimizations are then performed on the trees of the method. At the end, the trees are translated into native code.

JIT CompilerThe compilation consists of the following phases:Inlining - Inlining is the process by which the trees of smaller methods are merged, or "inlined", into the trees of their callersLocal optimizations -Local optimizations analyze and improve a small section of the code at a time (Ex. Register Usage, Local data flow optimization)Control flow optimizations - Control flow optimizations analyze the flow of control inside a method and rearrange code paths to improve their efficiency.Global optimizations -Global optimizations work on the entire method at once (Ex. Synchronization optimization, GC optimization)Native code generation - The trees of a method are translated into machine code instructions; some small optimizations are performed specific to the platforms architectural characteristics.All phases except native code generation are cross-platform code.The compiled code is placed into a part of the JVM process space called the code cache; the location of the method in the code cache is recorded, so that future calls to it will call the compiled code. The JVM process consists of the JVM executable files and a set of JIT-compiled code that is linked dynamically to the bytecode interpreter in the JVM.

Remote Method InvocationJava Remote Method Invocation (Java RMI) enables you to create distributed Java technology-based applications that can communicate with other such applications.Methods of remote Java objects can be run from other Java virtual machines (JVMs), possibly on different hosts.The RMI implementation consists of three abstraction layers.The Stub and Skeleton layer, which intercepts method calls made by the client to the interface reference variable and redirects these calls to a remote RMI service.The Remote Reference layer understands how to interpret and manage references made from clients to the remote service objects. The Transport layer, which is based on TCP/IP connections between machines in a network. It provides basic connectivity, as well as some firewall penetration strategies.

RMI uses object serialization to marshal and unmarshal parameters and does not truncate types, supporting object-oriented polymorphism. The RMI registry is a lookup service for ports.Distributed garbage collection:The RMI subsystem implements reference counting based Distributed Garbage Collection (DGC) to provide automatic memory management facilities for remote server objects.The client creates (unmarshalls) a remote reference, it calls dirty() on the server-side DGC. The call returns a lease guaranteeing that the server-side DGC will not collect the remote object for a certain time. After the client has finished with the remote reference, it calls the corresponding clean() method. The call indicates that the server does not need to keep the remote object alive for this clientRMI provides an easy way to distribute objects, but does not allow for interoperability between programming languages.Remote Method Invocation

The Common Object Request Broker Architecture (CORBA) is an open, vendor-independent specification for distributed computing. It is published by the Object Management Group (OMG).CORBA enables objects on various platforms and operating systems to interoperate, using the Internet Inter-ORB Protocol (IIOP).RMI-IIOP is an extension of traditional Java RMI that uses the IIOP protocol. This protocol allows RMI objects to communicate with CORBA objects. Java programs can therefore interoperate transparently with objects that are written in other programming languages, provided that those objects are CORBA-compliant.Objects can still be exported to traditional RMI (JRMP) and the two protocols can communicate.In RMI (JRMP), the server objects are called skeletons; in RMI-IIOP, they are called ties. Client objects are called stubs in both protocols.CORBA

Java Native InterfaceThe Java Native Interface (JNI) establishes a well-defined and platform-independent interface between the Java code and the Native code.Native code can be used together with Java in two distinct ways: as "native methods" in a running JVM and as the code that creates a JVM using the "Invocation API".Native Methods - Java native methods are declared in Java, implemented in another language (such as C or C++), and loaded by the JVM as necessary.Invocation API - The aspect of the JNI used for creating the JVM is called the JNI Invocation API

Java InternalsIBM JDK 1.6Memory Management

Memory ManagementMemory management contains the Garbage Collector and the Allocator. It is responsible for allocating memory in addition to collecting garbage

Java Heap

The heap is a contiguous area of storage that is obtained from the operating system at JVM initializationheapbase is the address of the start of the heapheaptop is the address of the end of the heapheaplimit is the address of the top of the currently-used part of the heap. heaplimit can expand and shrinkThe -Xmx option controls the size from heapbaseto heaptopThe -Xms option controls the initial size from heapbase to heaplimitDefault Value for XmxWindows: Half the real storage with a minimum of 16 MB and a maximum of 2 GB-1OS/390 and AIX: 64 MBLinux: Half the real storage with a minimum of 16 MB and a maximum of 512 MB-1 Default Value for XmsWindows, AIX, and Linux: 4 MB.OS/390: 1 MB

ObjectLayout of an object on the heap size + flags slotThe main purpose of this slot is to contain the length of the objectThe size + flags slot is four bytes on 32-bit architecture and eight bytes on 64-bit architectureMptrThe mptr slot is four bytes on 32-bit architecture and eight bytes on 64-bit architecture.Locknflags : Its main use is to contain data for the LK component when locking. (The LK component handles locking in the JVM)Object DataThis is where the object data starts, the layout of which is object dependent

The size + flags, mptr, and locknflags are sometimes known collectively as the header.

Objectsize + flags :The bottom three bits are not used for the size, so the Garbage Collector uses them for some flags to indicate different states of the objectAs the size of objects is limited, the top two bits can be used for flagsBit 1 has several purposes. It is the swapped bit, and is used during compaction. Bit 1 is also the multipinned bit. It is used to indicate that this object has been pinned multiple times. During a garbage collection cycle, the multipinned bit is removed and restored to allow the other uses of this multipurpose bit.Bit 2 is the dosed bit. The dosed bit is set on if the object is referenced from the stack or registers. (root objects) . Referenced means that the object cannot be moved in this garbage collection cycle (because the Garbage Collector cannot fix up the reference because it might not be a real reference but an integer that happens to have the same value that an object on the heap has).Bit 3 is the pinned bit. Pinned objects cannot be moved, usually because they are referenced from outside the heap. Examples of this are Thread and ClassClass objects.Bit 31 in 32-bit architecture, or bit 63 in 64-bit architecture, is the flat locked contention (flc) bit and is used by the locking (LK) component. Bit 32 in 32-bit architecture, or bit 64 in 64-bit architecture, is the hashed bit and is used to denote an object that has returned its hashed value. This is required because the hash value is the address of the object and the Garbage Collector needs to maintain this if it moves the object.

ObjectMptr : The mptr has one of two functions:If this is not an array, the mptr points to the method table, from where the Garbage Collector can get to the class block. In this way, the Garbage Collector can tell of what class an object is an instantiation. The method table and class block are allocated by the class loader (CL) component and are not in the heapIf this is an array, the mptr contains a count of how many array entries are in this object.Locknflags also contains these flags:Bit 2 is the array flag. If this bit is set on, the object is an array and the mptr field contains a count of how many elements are in the array.Bit 3 is the hashed and moved bit. If this bit is set on, it indicates that this object has been moved after it was hashed, and that the hash value can be found in the last slot of the objectThe locknflags slot is four bytes on 32-bit architecture and eight bytes on 64-bit architecture, although only the lower four bytes are used.

Object AllocationObject Allocation : Object allocation is driven by requests by applications, class libraries, and the JVM for storage of Java objects, which can vary in size and require different handling Every allocation requires a heap lock to be acquired to prevent concurrent thread accessTo optimize this allocation, particular areas of the heap are dedicated to a thread, known as the TLH (thread local heap), and that thread can allocate from its TLH without having to lock out other threads. This technique delivers the best possible allocation performance for small objects. Objects are allocated directly from a thread local heap.All objects less than 512 bytes (768 bytes on 64-bit JVMs) are allocated from the cacheLarge Object Allocation : All objects => 64K are termed large from the VM perspectiveIn practice, objects of 10MB+ in size are usually considered largeThe Large Object Area is 5% of the active heap by default.Any object is first tried to be allocated in the free list of the main heap if there is not enough contiguous space in the main heap to satisfy the allocation request for object => 64K, then it is allocated in the Large Object Area (wilderness)Objects < 64K can only be allocated in the main heap and never in the Large Object AreaThe LOA boundary is calculated when the heap is initialized, and recalculated after every garbage collection. The size of the LOA can be controlled using command-line options: -Xloainitial (0.05[5%]), -Xloaminimum(0), and Xloamaximum(0.5[50%]). The options take values between 0 and 0.95 (0% thru 95% of the current tenure heap size).

Types of AllocationCache allocation is specifically designed to deliver the best possible allocation performance for small objectsObjects are allocated directly from a thread local allocation buffer that the thread has previously allocated from the heap. A new object is allocated from the end of this cache without the need to grab the heap lock; therefore, cache allocation is very efficient. The criterion for using cache allocation is:Use cache allocation if the size of the object is less than 512 bytes, or if the object can be contained in the current cache blockThe cache block is sometimes called a thread local heap (TLH)

Heap lock allocation occurs when the allocation request cannot be satisfied in the existing cache. Heap lock allocation occurs when the allocation request is greater than 512 bytes or the allocation cannot be contained in the existing cache. Heap lock allocation requires a lock, and is avoided if possible by using the cache instead.

System Heap

The system heap contains only objects that have a life-expectancy of the life of the JVMThe objects that are in this heap are the class objects for system and shareable middleware and application classesThe Garbage Collector never collects the system heap because all objects that are in the heap are either reachable for the lifetime of the JVM, or, in the case of shareable application classes, have been selected to be reused during the lifetime of the JVMThe system heap is a chain of noncontiguous areas of storage. The initial size of the system heap is 128 KB in 32-bit architecture, and 8 MB in 64-bit architecture. If the system heap fills, it obtains another extent and chains the extents together.

Reachable Objects & Free List

Reachable ObjectsThe active state of the JVM is made up of the set of stacks that represents the threads, the statics that are inside Java classes, and the set of local and global JNI references All functions that are invoked inside the JVM itself cause a frame on the C stack. This information is used to find the rootsThese roots are then used to find references to other objects. This process is repeated until all reachable objects are foundFree ListThe head of the list is in global storage and points to the first free chunk that is on the heap. Each chunk of free storage has a size field and a pointer that points to the next free chunk. The free chunks are in address sequence. The last free chunk has a NULL pointer

Alloc bits and mark bitsAlloc bits and mark bitsThese two bit vectors indicate the state of objects that are on the heap. Because all objects that are on the heap start on an 8-byte boundary, both vectors have one bit to represent eight bytes of the heap. Therefore, each of these vectors is 1/64 of the heap sizeWhen objects are allocated in the heap, a bit is set on in allocbits to indicate the start of the objectDuring the mark phase, a bit is set on in markbits to indicate the start of a live object

Java InternalsIBM JDK 1.6Garbage Collection

TerminologyPinned objects are those that cannot be moved because the JNI has given native code direct access to the contents of the object, e.g., An array or ClassClass objects.Dosed ObjectsAll references from a stack or registers to an object cause garbage collector not to move these objects in compaction phase. Such objects that are temporarily fixed in position are referred to as dosed objects.When the method calls are completed and reference from that method frame on stack are cleared the object can now be moved.Dark MatterAny piece of storage that is more than 512 bytes is treated as free space and is available to mutators or object allocators. Other chunks that are less than 512 bytes are termed dark matter, and are not available as free space.

Garbage Collection BasicsGarbage Collection is performed when there is:An allocation failure in the heap lock allocationSpecific call to System.gcGarbage collection has three phases:MarkSweepCompaction (optional)Garbage Collection is a stop-the-world (STW) operation, because all application threads are stopped while the garbage is collected.GC occurs in the thread that handled the requestRequested object allocation that caused allocation failureProgrammatically requested GCOn heap lock allocation failure, if at least 30% of the heap has been allocated since the last garbage collection (30% can be changed with the -Xminf parameter) has been made since the last garbage collection, and the size of the allocation request is less than 64 KB, the Garbage Collector runs2 types of Garbage CollectorMark and Sweep CollectorGenerational Collector

Mark Sweep CollectorObtain locks and suspend threadsMark phaseProcess of identifying all objects reachable from the root setAll live objects are marked by setting a mark bit in the mark bit vectorReference handlingEnqueuing of finalizersSweep phaseSweep phase identifies all the objects that have been allocated, but no longer referencedCompaction (optional)Once garbage has been removed, we consider compacting the resulting set of objects to remove spaces between themRelease locks and resume threads

Mark PhaseProcess of identifying all objects reachable from the root setThe Garbage Collector performs the scan of a thread stack to identify the slot that can be a potential pointer to an objectObjects that are referenced in this way are known as roots, and have their dosed bit set on to indicate that they cannot be moved.All live objects are marked by setting a mark bit in the mark bit vectorParallel MarkThe majority of garbage collection time is spent marking objects. Therefore, a parallel version of Garbage Collector Mark has been developedThe time spent marking objects is decreased through the addition of helper threads and a facility that shares work between those threads A single application thread is used as the master coordinating thread, often known as the main gc thread. This thread has the responsibility for scanning C-stacks to identify root pointers for the collection A platform with N processors also has N-1 new helper threads that work with the master thread to complete the marking phase of garbage collectionThe default number of threads can be overridden with the Xgcthreads parameter where n represents the number of threads

Mark PhaseConcurrent MarkConcurrent mark gives reduced garbage collection pause times when heap sizes increaseIt starts a concurrent marking phase before the heap is full. In the concurrent phase, the Garbage Collector scans the roots by asking each thread to scan its own stack. These roots are then used to trace live objects concurrently. Tracing is done by a low-priority background thread and by each application thread when it does a heap lock allocationA STW (Stop The World) collection is started when one of the following occurs:Allocation failureSystem.gcConcurrent mark completes all the marking that it can doThis parameter enables concurrent mark: -Xgcpolicy: Gencon - Requests the combined use of concurrent and generational GC to help minimize the time that is spent in any garbage collection pause.Optthruput disables concurrent mark. This is the default setting.Optavgpause enables concurrent markSubpool - Disables concurrent mark. It uses an improved object allocation algorithm to achieve better performance when allocating objects on the heap. This option might improve performance on SMP systems with 16 or more processors. The subpool option is available only on AIX, Linux PPC and zSeries, z/OS, and i5/OS

Sweep PhaseThe sweep phase identifies the intersection of the allocbits and markbits vectors; that is, objects that have been allocated but are no longer referenced In the bitsweep technique, the Garbage Collector examines the markbits vector directly and looks for long sequences of zeros, which probably identify free space. When such a long sequence is found, the Garbage Collector checks the length of the object at the start of the sequence to determine the amount of free space that is to be released. If this amount of free space is greater than 512 bytes plus the header size, this free chunk is put on the freelistThe small areas of storage that are not on the freelist are known as "dark matter", and they are recovered when the objects that are next to them become free, or when the heap is compactedThe markbits are copied to the allocbits so that on completion, the allocbits correctly represent the allocated objects that are on the heap

Sweep PhaseParallel SweepParallel Bitwise Sweep improves sweep time by using all available processorsIn Parallel Bitwise Sweep, the Garbage Collector uses the same helper threads that are used in Parallel Mark, so the default number of helper threads is also the same. The heap is divided into sections. The number of sections is significantly larger than the number of helper threads. The calculation for the number of sections is as follows:32 x the number of helper threads, or the maximum heap size / 16 MB whichever is largerThe helper threads take a section at a time and scan it, performing a modified bitwise sweep. The results of this scan are stored for each section. When all sections have been scanned, the freelist is built.

Sweep PhaseConcurrent SweepLike concurrent mark, concurrent sweep gives reduced garbage collection pause times when heap sizes increaseConcurrent sweep starts immediately after a stop-the-world (STW) collectionThe mark map used for concurrent mark is also used for sweeping.The concurrent sweep process is split into two types of operations:Sweep analysis: Sections of data in the mark map (mark bit array) are analyzed for ranges of free or potentially free memory.Connection: The analyzed sections of the heap are connected into the free list.To enable concurrent sweep, use the -Xgcpolicy: parameter optavgpause. It becomes active along with concurrent mark. The modes optthruput, subpool, and gencon do not support concurrent sweep

Compaction PhaseWhen objects are freed by garbage collection, the heap becomes fragmented. This fragmentation can cause a state in which enough free space is still available in the heap, but the free space is not contiguous, so it cannot be used for further object allocations.Compaction defragments the Java heap. The process of compaction is complicated because, if any object is moved, the Garbage Collector must change all the references that exist to it.Represents pinned or dosed objects

Compaction PhaseCompaction occurs if any one of the following is true and -Xnocompactgc has not been specified:-Xcompactgc has been specified.Following the sweep phase, not enough free space is available to satisfy the allocation request. A System.gc() has been requested, and the last allocation failure garbage collection did not compact or -Xcompactexplicitgc has been specifiedAt least half the previously available memory has been consumed by TLH allocations (ensuring an accurate sample) and the average TLH size falls below 1024 bytesThe scavenger is enabled, and the largest object that the scavenger failed to tenure in the most recent scavenge is larger than the largest free entry in tenured space.The heap is fully expanded and less than 4% of old space is free.Less than 128 KB of the active heap is free.

Compaction AvoidanceCompaction avoidance focuses on correct object placement and is done using a concept called Wilderness PreservationWilderness preservation attempts to keep a region of the heap in an unused state by focusing allocation activity elsewhereThe wilderness (Large Object Area) is consumed only when necessary to satisfy a large allocation, or when not enough allocation progress has been made since the previous garbage collectionThe wilderness is allocated at the end of the active part of the heap. Its initial size is 5% of the active part of the heap, and it expands and shrinks depending on usage

Incremental CompactionThe process of compaction can cause a considerable increase in the pause time of a garbage collection cycle (Ex. 40 seconds for 1 GB heap). Long pause times are unacceptable for real-world applicationsIncremental compaction is a way of spreading compaction work across garbage collection cycles, thereby reducing pause timesIn incremental compaction, the Garbage Collector splits the heap into sections and compacts each section in the same way in which it does a full compaction. That is, the Garbage Collector moves all the moveable objects down the heapThis action retrieves all the dark matter and leaves large areas of free spaceIncremental compaction has two main steps:Identify and remember all references that point into the compaction region; this action is done during the mark phase. At the end of this stage, all free space that is in the sections can be identified. Compute the new locations of objects and move them in the compaction region. Then set up pointers to those objects.

Incremental CompactionIndividual sections on which incremental compaction runs are of fixed size, and therefore constrains the time required for compactionIncremental compaction is done only if the heap size is greater than a minimum value (128 MB)Incremental compaction operates in a cycle. An incremental compaction cycle is a cycle of successive garbage collection cycles that incrementally compacts the whole heap, a region at a time. The compaction spans multiple garbage collection cycles, therefore spreading compaction time over multiple garbage collections and reducing pause times.Incremental compaction isON by default. (Xpartialcompactgc - enables Incremental compaction , -Xnopartialcompactgc, - disables incremental compaction)

Reference ObjectsA reference object encapsulates a reference to some other object, which is called the referent. Reference objects enable all references to be handled and processed in the same way. Therefore, two separate objects are created on the heap: the object itself and a separate reference object.Objects that are associated with a finalizer are 'registered' with the Finalizer class on creation. The result is the creation of a Final Reference object that is associated with the Finalizer queue and that refers to the object that is to be finalized.A Reference Queue is a simple data structure onto which the garbage collector places reference objects when the reference field is cleared (set to null). Soft and weak references are automatically cleared by the garbage collector, if the referent objects are not strongly reachable. Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector as they are en-queued. An object that is reachable via phantom references will remain so until all such references are cleared or themselves become unreachableThe Garbage Collector is required to clear all Soft References before throwing an Out Of Memory Error.Soft References are considered young till they span 32 GC cycles (age=32) and are not eligible for collection till they are young. -Xsoftrefthreshold parameter can be used to adjust the frequency of collection.

Reference ObjectsGoing from strongest to weakest, the different levels of reachability reflect the life cycle of an object. An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it. An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference. Soft reference objects are cleared at the discretion of the garbage collector in response to memory demandAn object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization. An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it. Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above ways. During garbage collection, the referent field is not traced during the marking phase. When marking is complete, the references are processed in sequence:Soft - Soft references are for implementing memory-sensitive cachesWeak - Weak references are for implementing canonicalizing mappings that do not prevent their keys (or values) from being reclaimed,FinalPhantom - Phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.

Heap Expansion & ShrinkageHeap expansion occurs after garbage collection and after all the threads have been restarted, but while the HEAP_LOCK is still held. The active part of the heap is expanded up to the maximum if any one of the following is true:The Garbage Collector did not free enough storage to satisfy the allocation request.Free space is less than the minimum free space, which you can set by using the -Xminf parameter. The default is 30%.More than the maximum time threshold is being spent in garbage collection, set using the -Xmaxt parameter. The default is 13%.The amount of heap to be expanded are rounded to the nearest 512-byte boundary on 32-bit JVMs or a 1024-byte boundary on 64-bit JVMs

Heap Expansion & ShrinkageHeap shrinkage occurs after garbage collection, but when all the threads are still suspended. Shrinkage does not occur if any one of the following is true:The Garbage Collector did not free enough space to satisfy the allocation request. The maximum free space, which can be set by the -Xmaxf parameter (default is 60%), is set to 100%. The heap has been expanded in the last three garbage collections.This is a System.gc() and the amount of free space at the beginning of the garbage collection was less than -Xminf (default is 30%) of the live part of the heap.If none of the above is true and more than -Xmaxf free space exists, the Garbage Collector must calculate by how much to shrink the heap to get it to -Xmaxf free space, without going below the initial (-Xms) value.This value is are rounded to the nearest 512-byte boundary on 32-bit JVMs or a 1024-byte boundary on 64-bit JVMsA compaction occurs before the shrink if all the following are true: A compaction was not done on this garbage collection cycle.No free chunk is at the end of the heap, or the size of the free chunk that is at the end of the heap is less than 10% of the required shrinkage amount.The Garbage Collector did not shrink and compact on the last garbage collection cycle

Generational Concurrent GCA generational garbage collection strategy is well suited to an application that creates many short-lived objects (typically transactional applications).It can be enabled using -Xgcpolicy:gencon.The Java heap is split into two areas, a new (or nursery) area and an old (or tenured) area. Objects are created in the new area and, if they continue to be reachable for long enough, they are moved into the old area. Objects are moved when they have been reachable for enough garbage collections (known as the tenure age).The new area is split into two logical spaces: allocate and survivorObjects are allocated into the Allocate Space. When that space is filled, a garbage collection process called scavenge is triggered.During a scavenge, reachable objects are copied either into the Survivor Space or into the Tenured Space if they have reached the tenured age.

Generational Concurrent GCWhen all the reachable objects have been copied, the spaces in the new area switch roles. The new Survivor Space is now entirely empty of reachable objects and is available for the next scavenge.

Tenure age is a measure of the object age at which it should be promoted to the tenure areaThis age is dynamically adjusted by the JVM and reaches a maximum value of 14 An objects age is incremented on each scavenge. A tenure age of x means that an object is promoted to the tenure area after it has survived x flips between survivor and allocate space. The threshold is adaptive and adjusts the tenure age based on the percentage of space used in the new area.Tenured space is concurrently traced with a similar approach to the one used for Xgcpolicy:optavgpause

Java InternalsSun JDK 1.5

JVMThe Suns Java Virtual Machine for Java (JVM) is a core component of the Java Runtime Environment (JRE) from Sun Microsystems. The JVM is a virtualized computing machine that follows a well-defined specification for the runtime requirements of the Java programming languageJVM is called Virtual because it provides a machine interface that does not depend on the underlying operating system and machine hardware architectureJava programs are compiled into bytecodes (Class file) which are then executed in the JVMJVM is specific to a operating system and hardware combinationAll JVMs:Execute code that is defined by a standard known as the class file format (bytecode)Provide fundamental runtime security such as bytecode verificationProvide intrinsic operations such as performing arithmetic and allocating new objects

Suns JVM is called as the HotSpot JVMIt has 2 flavors, Client for client-side applications and Server VM tuned for server applications

Previous versions of JVM, such as Classic VM, uses indirect handles to represent object references. This makes relocating objects easier during garbage collectionRepresents significant performance bottleneck, because accesses to the instance variables of objects require two levels of indirection.In Java HotSpot VM, no handles are used by Java code. Object references are implemented as direct pointers.Provides C-speed access to instance variables. When an object is relocated during memory reclamation, the garbage collector is responsible for finding and updating all references to the object in place.Layout of an Object on the HeapObject contains the following componentsObject dataHeaderJava HotSpot VM uses a two machine-word object headerFirst header word contains information such as the identity hash code and GC status informationSecond header word is a reference to the object's class Only arrays have a third header word, for the array sizeReflective Data are represented as Objects (ex. Class). It enables the same GC to collect such objectsMemory Model

Native Thread Support (including Preemption and Multiprocessing)Per-thread method activation stacks are represented using the host operating system's stack and thread model.Both Java programming language methods and native methods share the same stack, allowing fast calls between the native code and Java code.Fully pre-emptive java threads are supported using the host operating system's thread scheduling mechanism.A major advantage of using native OS threads and scheduling is the ability to take advantage of native OS multiprocessing support transparentlyMemory Model

Memory management is the process of recognizing when allocated objects are no longer needed, deallocating (freeing) the memory used by such objects, and making it available for subsequent allocationsExplicit Memory ManagementProgrammers responsibility for allocating and freeing memoryCommon errorsDangling references - Object is removed from heap but the pointer to the object is not removedApplication memory leaks (Ex. De-allocating only the first element of a linked list, causing the other elements to go out of reach)Automatic Memory ManagementPerformed by a program called garbage collectorGarbage collection avoids the dangling reference problem, because an object that is still referenced somewhere will never be garbage collected and so will not be considered free. Garbage collection also solves the space leak problem since it automatically frees all memory no longer referenced.

Memory Management

Garbage Collection BasicsGarbage Collection is performed when there is:An allocation failure in the heap lock allocationSpecific call to System.gc [can be disabled using -XX:+DisableExplicitGC parameter]Garbage collector is responsible forAllocating memoryEnsuring that any referenced objects remain in memoryRecovering memory used by objects that are no longer reachable from references in executing code.Garbage collection has three phases:MarkSweepCompaction (optional)GC occurs in the thread that handled the request which triggered GC

Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collectionWhile naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work. Garbage Collector

Infant MortalityIn the graph, the sharp peak at the left represents objects that can be reclaimed (i.e., have "died") shortly after being allocated. Iterator objects, for example, are often alive for the duration of a single loop. Efficient collection is made possible by focusing on the fact that a majority of objects "die young".

HotSpot GenerationsMemory in the Java HotSpot virtual machine is organized into three generations: a young generation, an old generation, and a permanent generationThe young generation consists of an area called Eden plus two smaller survivor spacesMost objects are initially allocated in Eden.The survivor spaces hold objects that have survived at least one young generation collection and have thus been given additional chances to die before being considered old enough to be promoted to the old generationAt any given time, one of the survivor spaces holds such objects, while the other is empty and remains unused until the next collection.

HotSpot GenerationsUser Heap (Young and Tenured Generation)The sizes of the initial heap and maximum heap are calculated based on the size of the physical memory.If phys_mem is the size of the physical memory on the platform, the initial heap size will be set to phys_mem / DefaultInitialRAMFraction. DefaultInitialRAMFraction is a command line option with a default value of 64. The maximum heap size will be set to phys_mem / DefaultMaxRAM. DefaultMaxRAMFraction has a default value of 4.The Minimum and Maximum heap size can be set using Xms and Xmx parameter respectivelySystem Heap (Permanent Generation)Permanent Generation space is reserved for long-term objects. (mostly Class objects that are part of the native JVM or created by the application as loaded by ClassLoaders).The Minimum and Maximum heap size can be set using XX:PermSize= and XX:MaxPermSize= parameter (the default is 64M) respectivelyThe space occupied by permanent generation is in addition to the space used by user heap (Ex. PermGen=128MB, User Heap=512MB, Total RAM occupied = 640MB)

Fast AllocationAllocations from large contiguous blocks are efficient, using a simple bump-the-pointer technique (i.e) end of the previously allocated object is always kept track ofFor multithreaded applications, allocation operations need to be multithread-safe. If global locks were used to ensure this, then allocation into a generation would become a bottleneck and degrade performance.Thread-Local Allocation BuffersImproves multithreaded allocation throughput by giving each thread its own bufferOnly one thread can be allocating into each TLAB, allocation can take place quickly by utilizing the bump-the-pointer technique, without requiring any lockingA thread acquires a TLAB at it's first object allocation after a GC scavengeThe TLAB is released when it is full (or nearly so), or the next GC scavenge occursTLABs are allocated only in Eden, never from Survivor-Space or the OldGen.Size of the TLAB can be specified using -XX:TLABSize flag, the initial size of a TLAB is computed as: init_size = size_of_eden / (allocating_thread_count * target_refills_per_epoch) Allocating_thread_count is the expected number of threads which will be actively allocating during the next epoch (an epoch is the mutator time between GC scavenges.)Target_refills_per_epoch is the desired number of tlab allocations per thread during an epoch

Types of CollectorGarbage Collection TypesWhen the young generation fills up, a young generation collection (sometimes referred to as a minor collection) of just that generation is performed When the old or permanent generation fills up, what is known as a full collection (sometimes referred to as a major collection) is typically done. All generations are collected as part of full collection.Types of CollectorSerial Collector Young GenerationWith the serial collector, both young and old collections are done serially (using a single CPU), in a stop-the-world fashion.Live objects from the Eden space and From survivor space are copied to the To survivor spaceOnes that are too large to fit comfortably in the To survivor space are directly copied to the old generationOnce GC is complete, both Eden and the From survivor space are empty . Only the To survivor space contains live objects. At this point, the survivor spaces swap roles.

Types of CollectorSerial Collector Old GenerationThe old and permanent generations are collected via a mark-sweep-compact collection algorithmIn the mark phase, the collector identifies which objects are still live. The sweep phase sweeps over the generations, identifying garbage.The collector then performs sliding compaction, sliding the live objects towards the beginning of the old generation space, leaving any free space in a single contiguous chunk at the opposite end.Serial Collector the serial collector is automatically chosen as the default garbage collector on machines that are not server-class machinesSerial Collector can be chosen explicitly using -XX:+UseSerialGC parameterThe serial collector is the collector of choice for most applications that are run on client-style machines and that do not have a requirement for low pause times

Types of CollectorParallel CollectorThe parallel collector, also known as the throughput collector, was developed in order to take advantage of available CPUs rather than leaving most of them idle while only one does garbage collection work. The young generation parallel collector uses a parallel version of the young generation collection algorithm utilized by the serial collector. It is still a stop-the-world and copying collectorPerforming the young generation collection in parallel, using many CPUs, decreases garbage collection overhead and hence increases application throughput.Old generation garbage collection for the parallel collector is done using the same serial mark-sweep-compact collection algorithm as the serial collector.The parallel collector is automatically chosen as the default garbage collector on server-class machinesParallel collector can be explicitly chosen using -XX:+UseParallelGC parameterParallel collector is the choice for applications that do not have pause time constraints since infrequent, but potentially long, old generationcollections will still occur

Types of CollectorParallel Compacting Collector Young generation garbage collection for the parallel compacting collector is done using the same algorithm as that for young generation collection using the parallel collector.With the parallel compacting collector, the old and permanent generations are collected in a stop-the-world, mostly parallel fashion with sliding compactionThe collector utilizes three phases.First, each generation is logically divided into fixed-sized regions. In the marking phase, The initial set of live objects directly reachable from the application code is divided among garbage collection threads, and then all live objects are marked in parallel. As an object is identified as live, the data for the region it is in is updated with information about the size and location of the objectSummary phaseThe summary phase operates on regions, not objects.Examines the density of the regions, starting with the leftmost one, until it reaches a point where the space that could be recovered from a region and those to the right of it is worth the cost of compacting those regionsThe regions to the right of that point will be compacted, eliminating all dead space. The new location of the first byte of live data for each compacted region will be calculated and stored.The summary phase is currently implemented as a serial phase

Types of CollectorIn the compaction phase,The garbage collection threads use the summary data to identify regions that need to be filled, and the threads can independently copy data into the regions. This produces a heap that is densely packed on one end, with a single large empty block at the other end.As with the parallel collector, the parallel compacting collector is beneficial for applications that are run on machines with more than one CPU. The parallel operation of old generation collections reduces pause times and makes the parallel compacting collector more suitable than the parallel collector for applications that have pause time constraintsThe parallel compacting collector might not be suitable for applications run on large shared machines (such as SunRays), where no single application should monopolize several CPUs for extended periods of timeIn such cases, the number of threads to be used can be controlled using the XX:ParallelGCThreads= parameterParallel compacting collector can be explicitly chosen using -XX:+UseParallelOldGC parameter

Types of CollectorConcurrent Mark-Sweep (CMS) Collector Young generation collections do not typically cause long pauses. However, old generation collections, though infrequent, can impose long pauses, especially when large heaps are involved. To address this issue, the HotSpot JVM includes a collector called the concurrent mark-sweep (CMS) collector, also known as the low-latency collector.The CMS collector collects the young generation in the same manner as the parallel collector.Initial Mark - identifies the initial set of live objects directly reachable from the application code.Concurrent marking phase - marks all live objects that are transitively reachable from this set. This happens when the application is running.Remark Phase - finalizes marking by revisiting any objects that were modified during the concurrent marking phase. Efficiency is increased by running multiple threadsConcurrent sweep phase - reclaims all the garbage that has been identified

Types of CollectorThe CMS collector is the only collector that is non-compacting. Hence the free space is not contiguous, free lists has to be maintained.Floating garbage - Objects that become garbage during the mark phase will not be reclaimed until the next old generation collection.Fragmentation - Garbage collector tracks popular object sizes, estimates future demand, and may split or join free blocks to meet demandAs it is not a stop-the-world process, the CMS collector starts at a time based on statistics regarding previous collection times and how quickly the old generation becomes occupied. The CMS collector will also start a collection if the occupancy of the old generation exceeds something called the initiating occupancy. The value of the initiating occupancy is set by the command line option XX:CMSInitiatingOccupancyFraction=n, where n is a percentage of the old generation size. The default is 68.For machines with lesser number of CPUs,The CMS collector can be used in a mode in which the concurrent phases are done incrementallyThe work done by the collector is divided into small chunks of time that are scheduled between young generation collectionsCMS collector can be used if the application needs shorter garbage collection pauses and can afford to share processor resources with the garbage collector when the application is running. (Ex. Web Servers)CMS collector can be explicitly chosen using -XX:+UseConcMarkSweepGC parameterCMS Incremental Collector can be explicitly chosen using XX:+CMSIncrementalMode parameter

Reference ObjectsSoft references are cleared less aggressively in the server virtual machine than the client. The rate of clearing can be slowed by increasing -XX:SoftRefLRUPolicyMSPerMB= parameterSoftRefLRUPolicyMSPerMB is a measure of the time that a soft reference survives for a given amount of free space in the heap. The default value is 1000 ms per megabyte. This can be read to mean that a soft reference will survive (after the last strong reference to the object has been collected) for 1 second for each megabyte of free space in the heap.

Tunning Garbage CollectorMaximum Pause Time GoalThe maximum pause time goal is specified with the command line option -XX:MaxGCPauseMillis= where n represents the desired pause times in millisecondsHeap size and other garbage collection-related parameters will be adjusted in an attempt to keep garbage collection pauses shorter than n milliseconds. May reduce overall throughput of the applicationBy default no maximum pause time goal is set.Throughput GoalThe throughput goal is measured as the time spent doing garbage collection / the time spent outside of garbage collection (application time). The time spent in garbage collection is the total time for all generations. The goal is specified by the command line option-XX:GCTimeRatio=The ratio of garbage collection time to application time is 1 / (1 + n). For example -XX:GCTimeRatio=19 sets a goal of 5% of the total time for garbage collection. If the goal is not met, the sizes of the generations are increased in an effort to increase the time the application can run between collections.The default goal is 1% (i.e. n= 99).

Tunning Garbage CollectorFootprint GoalIf the throughput and maximum pause time goals have been met, the garbage collector reduces the size of the heap until one of the goals (invariably the throughput goal) cannot be met. Then, the goal that is not being met will be addressedGoal PrioritiesThe parallel collector prioritizes the goals in the following orderMaximum pause timeThroughput Footprint The statistics (e.g., average pause time) are updated at the end of each collectionChecks whether the goals are met else any needed adjustments to the size of a generation is madeExplicit garbage collections (calls to System.gc()) are ignored in terms of keeping statistics and making adjustments to the sizes of generations

Tunning Garbage CollectorAdjusting the Size of GenerationsGrowing and shrinking the size of a generation is done by increments that are a fixed percentage of the size of the generationBy default a generation grows in increments of 20% and shrinks in increments of 5%.Growth percentage is adjusted using the following parameters -XX:YoungGenerationSizeIncrement= for the young generation -XX:TenuredGenerationSizeIncrement= for the tenured generationShrink percentage is adjusted using -XX: AdaptiveSizeDecrementScaleFactor= parameter If the size of an increment for growing is X percent, the size of the decrement for shrinking will be X / n percent.At startup, there is a supplemental percentage added to the growth percentage.Supplement decays with the number of collections The intent of the supplement is to increase startup performance.There is no supplement used for shrink percentageMaximum pause time goalIf not met, the size of only one generation is shrunk at a timeIf not met for both generations , the size of the generation with the larger pause time is shrunk first.Throughput goalIf not met, the sizes of both generations are increased. Each is increased in proportion to its respective contribution to the total garbage collection time. (ex.) if young generation collection time is 25% of the total collection time and if growth percentage is 20%, then the young generation would be increased by 5%.

Key settings related to GC

Java InternalsIBM JDK 1.6Verbose GC

Verbose GC-verbose:gc option can be used for understanding what is happening during GC

Indicates that a garbage collection was triggered on the heap. Type="global" indicates that the collection was global (mark, sweep, possibly compact). The id attribute gives the occurrence number of this global collection. The totalid indicates the total number of garbage collections (of all types) that have taken place. intervalms gives the number of milliseconds since the previous global collection.

Verbose GC

Shows the number of objects that were moved during compaction and the total number of bytes these objects represented. The reason for the compaction is also shown. In this case, the compaction was forced, because -Xcompactgc was specified on the command line. This line is displayed only if compaction occurred during the collection. Lists the number of class loaders unloaded in this garbage collection and how many actual classes were unloaded by that operation. timevmquiescems as the number of milliseconds that the GC had to wait for the VM to stop so that it could begin unloading the classes timetakenms which is the number of milliseconds taken to perform the actual unload. This tag is only present if a class unloading attempt was made.

Indicates that during the handling of the allocation (but after the garbage collection), a heap expansion was triggeredThe area expanded, the amount by which the area was increased (in bytes), its new size, the time taken to expand, and the reason for the expansion are shown.

Verbose GC

Provides information relating to the number of Java Reference objects that were cleared during the collection. In this example, no references were cleared.

Provides information detailing the number of objects containing finalizers that were enqueued for VM finalization during the collection.The number of objects is not equal to the number of finalizers that were run during the collection, because finalizers are scheduled by the VM.

Provides information detailing the times taken for mark, sweep and compact phases along with the total time taken. When compaction was not triggered, the number returned for compact is zero.

Indicates the status of the tenured area following the collectionShows the occupancy levels of the different heap areas before the garbage collection - both the small object area (SOA) and the large object area (LOA).

Verbose GC

-verbose:gc output when a System.GC is executed

Verbose GC

Indicates that a System.gc() has occurred. The id attribute gives the number of this System.gc() call; in this case, this is the first such call in the life of this VM.timestamp gives the UTC timestamp when the System.gc() call was made intervalms gives the number of milliseconds that have elapsed since the previous System.gc() call. In this case, because this is the first such call, the number returned is zero.

Shows the amount of time taken to obtain exclusive VM access.optional line might occasionally be displayed, to inform you that the following garbage collection was queued because the allocation failure was triggered while another thread was already performing a garbage collection

Shows the total amount of time taken to handle the System.gc() call (in milliseconds).

Verbose GC

-verbose:gc output when a Scavenge GC occurs Indicates that a garbage collection has been triggered. The type="scavenger" attribute indicates that the collection is a scavenger collection.

Indicates that the scavenger failed to move some objects into the old or tenured area during the collection. The output shows the number of objects that were not moved, and the total bytes represented by these objects. If is shown, the scavenger failed to move or flip certain objects into the survivor space.

Verbose GC

Shows the number of objects that were flipped into the survivor space during the scavenger collection, together with the total number of bytes flipped.

Shows the percentage of the tilt ratio following the last scavenge event and space adjustment. The scavenger redistributes memory between the allocate and survivor areas using a process called tilting. Tilting controls the relative sizes of the allocate and survivor spaces, and the tilt ratio is adjusted to maximize the amount of time between scavenges

Shows the number of objects that were moved into the tenured area during the scavenger collection, together with the total number of bytes tenured.

Shows the amount of free and total space in the nursery area after a scavenge event. The output also shows the number of times an object must be flipped in order to be tenured. This number is the tenure age, and is adjusted dynamically. Shows the total time taken to perform the scavenger collection, in milliseconds

Verbose GC

- verbose:gc output when allocation failure occurs in new area (nursery)

Verbose GC

Indicates that an allocation failure has occurred when attempting to allocate to the new area. The id attribute shows the index of the type of allocation failure that has occurred. timestamp shows a local timestamp at the time of the allocation failure. intervalms shows the number of milliseconds elapsed since the previous allocation failure of that type.

Shows the number of bytes requested by the allocation that triggered the failure. Following the garbage collection, freebytes might drop by more than this amount. The reason is that the free list might have been discarded or the Thread Local Heap (TLH) refreshed. and The first set of and tags show the status of the heaps at the time of the allocation failure that triggered garbage collection. The second set of tags shows the status of the heaps after the garbage collection has occurred. The third set of tags shows the status of the different heap areas following the successful allocation.

Verbose GC

-verbose:gc output when allocation failure occurs in old area (tenured)

Concurrent Garbage CollectionConcurrent kickoffThe below output is produced when the concurrent mark process is triggered.

Concurrent Garbage CollectionAllocation failures shows that, as a result of the allocation failure, concurrent mark tracing was aborted.The below output is produced when concurrent mark is halted.

Shows that concurrent mark tracing was halted as a result of the allocation failure. The tracing target is shown, together with the amount that was performed, both by mutator threads and the concurrent mark background thread. The number of cards cleaned during concurrent marking is also shown, with the free-space trigger level for card cleaning. Card cleaning occurs during concurrent mark after all available tracing has been exhausted.

Concurrent Garbage CollectionIf concurrent mark completes all tracing and card cleaning, a concurrent collection is triggered.

--

Concurrent Garbage Collection

The target amount of tracing is shown, together with the amount that took place (both by mutators threads and helper threads).Information is displayed showing the number of cards in the card table that were cleaned during the concurrent mark process, and the heap occupancy level at which card cleaning began.

Shows that the full concurrent sweep of the heap was completed. The number of bytes of the heap swept is displayed with the amount of time taken, the amount of bytes swept that were connected together, and the time taken to do this.

Shows that final card cleaning has been triggered. The number of cards cleaned is displayed, together with the number of milliseconds taken to do so.

Java InternalsSun JDK 1.5Verbose GC

Verbose GCThe GC details can be printed using -XX:+PrintGC, -XX:+PrintGCTimeStamps, -XX:+PrintGCDetailsAging information of objects in young generation can belogged by using -XX:+PrintTenuringDistribution switch

Verbose GC CMS Collector[GC 39.910: [ParNew: 261760K->0K(261952K), 0.2314667 secs] 262017K->26386K(1048384K), 0.2318679 secs] - Young generation (ParNew) collection[GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs] - This is the initial marking phase of CMS where all the objects directly reachable from roots are marked. This is a stop-the-world process.[CMS-concurrent-mark-start] & [CMS-concurrent-mark: 0.521/0.529 secs] - Marking of live objects. Concurrent mark is a concurrent phase performed with all other threads running.[CMS-concurrent-preclean-start] & [CMS-concurrent-preclean: 0.017/0.018 secs] - Precleaning is also a concurrent phase. This phase identifies the objects in CMS heap which were updated by promotions from the young generation or new allocations, or were updated by mutators while the concurrent marking phase is ON.[GC40.704: [Rescan (parallel) , 0.1790103 secs] [weak refs processing, 0.0100966 secs][1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs] Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes reference objects.[CMS-concurrent-sweep-start] & [CMS-concurrent-sweep: 0.126/0.126 secs] - Sweeping of dead/non-marked objects. Sweeping is a concurrent phase performed with all other threads running.[CMS-concurrent-reset-start] & [CMS-concurrent-reset: 0.127/0.127 secs] - Reset phase re-initializes the CMS data structures so that a new cycle may begin at a later time Wherever the time is specified as x/y secs, x denotes the CPU time and y denotes the wall time (includes the yield to other threads also)

*

2008, Cognizant Technology Solutions. Confidential HotSpot GC Log Samples - 1Young Generation Too SmallOverall Heap Size : 32 MbYoung Heap Size : 4 Mb

Increasing young generation sizeOverall Heap Size : 32 MbYoung Heap Size : 8 Mb

*

2008, Cognizant Technology Solutions. Confidential

Small Tenured Generation sizeOverall Heap Size : 32 MbYoung Heap Size : 8 Mb

Large Tenured Generation sizeOverall Heap Size : 64 MbYoung Heap Size : 8 Mb

Major collection pause : 0.13 secsMajor collections occur every : 10 secs

Major collection pause : 0.21 secsMajor collections occur every : 30 secsHotSpot GC Log Samples - 2

Java InternalsIBM JDK 1.5OutOfMemory

OutOfMemoryErrorAn OutOfMemoryError exception results from running out of space on the Java heap or the native heapAn OutOfMemory error does not indicate a memory leak, just that the steady state of memory use that is required is higher than that available.The first step is to determine which heap is being exhausted and increase the size of that heapIf the problem is occurring because of a real memory leak, increasing the heap size does not solve the problem, but does delay the onset of the OutOfMemoryError exception or error conditions. That delay can be helpful on production systems.The maximum size of an object that can be allocated is limited only by available memory. The maximum number of array elements supported is 2^31 1. In reality, such huge arrays amy run into issues due to unavailability of memory.These limits apply to both 32-bit and 64-bit JVMs.Java Heap ExhaustionThe Java heap becomes exhausted when garbage collection cannot free enough objects to make a new object allocationJava heap exhaustion can be identified from the -verbose:gc output by garbage collection occurring more and more frequently, with less memory being freedIf the Java heap is being exhausted, and increasing the Java heap size does not solve the problem, the next stage is to examine the objects that are on the heap, and look for suspect data structures that are referencing large numbers of Java objects that should have been released.

OutOfMemoryErrorNative Heap ExhaustionNative memory OutOfMemoryError exceptions might occur when loading classes, starting threads, or using monitorsNative heap exhaustion can be monitored the svmon snapshot output for AIX and -Xdump:heap optionThe java.lang.OutOfMemoryError: Failed to create a thread message occurs when the system does not have enough resources to create a new thread.There are two possible causesThere are too many threads running and the system has run out of internal resources to create new threads.The system has run out of native memory to use for the new thread. Threads require a native memory for internal JVM structures, a Java stack, and a native stack.To correct the problem, either:Increase the amount of native memory available by lowering the size of the Java heap using the -Xmx option.Lower the number of threads in your application.

OutOfMemoryErrorInformation required for diagnising the OutOfMemoryCondition.The error itself with any message or stack trace that accompanied it.-verbose:gc output. (Even if the problem is determined to be native heap exhaustion, it can be useful to see the verbose gc output.)As appropriate:The Heapdump outputThe javacore.txt file Contains the threads and their stacktrace along with other infromation like locks, monitors, deadlocks, storage memory, shared classes, classloader and classes

Generating DumpsGenerating Heap DumpsIn order to generate java core dump, system core dump, heap dump and a snap dump at user signal, the dump agents must be configured through JVM options as follows. -Xdump:java+heap+system+snap:events=userWhen the JVM process command window is available, generate dumps as follows.Windows - Press CRTL+Break on the command window to generate the dumpsLinux or AIX - Press CTRL+\ on the shell windowWhen the JVM process command window is not available, generate dumps as followsWindows Use SendSignal utilityLinux or AIX Use KILL -3 commandHeap dump formatXdump:heap:opts=PHD (default)Xdump:heap:opts=CLASSIC Xdump:heap:opts=PHD+CLASSICJAVA_DUMP_OPTS = condition can be: ANYSIGNAL / DUMP / ERROR / INTERRUPT / EXCEPTION / OUTOFMEMORYdumptype can be: ALL / NONE / JAVADUMP / SYSDUMP / HEAPDUMP / |CEEDUMP (z/OS specific)

Following environment variables can also be usedValue set for JAVA_DUMP_OPTS takes the highest precedence

Java InternalsSun JDK 1.5OutOfMemory

OutOfMemoryErrorAn OutOfMemoryError does not necessarily imply a memory leak.Can be thrown if the heap is not sized properlyCan be thrown if the external handles are not closed properly (Ex. DB Connections, File Handles, Reference to EJB objects in remote JVM etc.)Exception in thread "main" java.lang.OutOfMemoryError: PermGen spaceThrown if the permanent space is not sized properlyMemory LeakAn object can hold a reference that prevents a class from being collected even though it's no longer in active useUsually, it is due to a class stored in the shared directory (Directory to store common libraries (lib directory) to be used by all applications running on the server)For tracing class loading, use -XX:+TraceClassLoading and -XX:+TraceClassUnloading parameters, and find the classes that were loaded but not unloaded.-XX:+TraceClassResolution parameter will help us track the class resolutionOutOfMemoryError When There's Still Memory AvailableThrown when the JVM is unable to find contiguous space for allocating an object Identify which of the generations is relatively empty (Ex. Applications dealing with large objects will have relatively empty young generation)Initial fix could be to use -XX:NewRatio parameter so as to provide more space to the old generationPermanent fix will be to reduce the size of the objects created by the application

OutOfMemoryErrorException in thread "main" java.lang.OutOfMemoryError: unable to create new native threadThrown if the there is not enough memory to create a threadEach thread takes about half a megabyte for its stack (Ex. For a 2 GB process [2GB includes user and system heap], maximum of 5,000 threads can be created, assuming all the memory is used for threads)Operating system has limitations on the number of threads that can be created by a process.Heap could be fragmentedHeap could have been used by application objects and hence there might not be enough contiguous space for creating thread. In such cases, increase the heap size.To find the root cause of out-of-memory errors,Look at the stack trace will help in identifying whether the exception is because of a large object Look at the GC logs will help in identifying the heap growth pattern. If there had been continuous growth in the heap till the exception and the GC is unable to collect enough memory, then there could be a memory leak. Otherwise, it might be a memory requirement.

Heap DumpFor Memory Leak analysisIdentify the frequency of the memory leak from the start of the server (Ex. 8 hours after the server restart)Use -XX:+HeapDumpOnOutOfMemoryError parameter to get the heap dump when OutOfMemory happensUse Heap Analysis Tool (HAT) (the jconsole management tool, and the jmap tool with the histo option) to understand the objects in the heapRestart the application and generate periodic heap dumps using /bin/jmap -dump:live,file=heap.dump.out,format=b till OutOfMemory occursAnalyze the heap dumps to identify the potential objects that could have caused the memory leakThread dumps can be generated using kill -3 command for unix and SendSignal utility for windows (For applications running in command window, Ctrl+\ for unix and Ctrl+Break for windows)

Another option generate Heap Dump in Unix is to generate a code dump using gcore [-pgF] [-o filename] [-c content] command. Jmap can be used to extract the heap dump from the core file.

Java InternalsIBM JDK 1.6Tools

ToolsHealth Center - http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/ Using Health Center will enable us to: Identify if native or heap memory is leakingDiscover which methods are taking most time to runPin down I/O bottlenecksVisualize and tune garbage collectionView any lock contentionsAnalyse unusual WebSphere Real Time eventsMemory Analyzer - http://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/Using Memory Analyzer will enable us to: Diagnose and resolve memory leaks involving the Java heapDerive architectural understanding of your Java application through footprint analysisImprove application performance by tuning memory footprint and optimizing Java collections and Java cache usageProduce analysis plug-ins with capabilities specific to your application

ToolsGarbage Collection and Memory Visualizer - http://www.ibm.com/developerworks/java/jdk/tools/gcmv/Using Health Center will enable us to: Monitor and fine tune Java heap size and garbage collection performanceCheck for memory leaksSize the Java heap correctlySelect the best garbage collection policyDump Analyzer - http://www.ibm.com/developerworks/java/jdk/tools/dumpanalyzer/Dump Analyzer will help us in quickly diagnose typical problems such as: Out of memoryDeadlocksJava Virtual Machine (JVM) or Java Native Interface (JNI) crashesIBM Thread and Monitor Dump Analyzer for Java - http://www.alphaworks.ibm.com/tech/jcaAnalyzes each thread information and provides diagnostic information, such as current thread information, the signal that caused the javacore, Java heap information (maximum Java heap size, initial Java heap size, garbage collector counter, allocation failure counter, free Java heap size, and allocated Java heap size), number of runnable threads, total number of threads, number of monitors locked, and deadlock information

ToolsDiagnostics Collector - http://www.ibm.com/developerworks/java/jdk/tools/diagnosticscollector/Use the Diagnostics Collector to: Automatically capture diagnostic information associated with a problem event.Avoid having to use the jextract tool to obtain platform specific information associated with the system dumpReduce manual work to collect dumpsSave searching for Java dumpsAllow easier management of dump filesCapture problem context information as well as dump files.Avoid ulimit problems and overlooked settings that disable dumps

Java InternalsSun JDK 1.5 - Tools

ToolsIf the application crashes because of an application or JRE bug, these are the options and tools that can be used to obtain additional information (either at the time of the crash or later using information from the crash dump)

ToolsTools that can help in scenarios involving a hung or deadlocked process

ToolsTools that can help in Monitoring

ToolsOther Tools & Options

*

2008, Cognizant Technology Solutions. Confidential Memory Leak Graph - SampleThe below GC graph depicts the pattern for a memory leak (Tool used : GCViewer)

Java InternalsNative OS Tools

Native Tools - Linux

Native Tools - Windows

Native Tools - Solaris

Java InternalsAppendix

AppendixSUN JDK 1.5For complete set of JVM parameters refer to http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.htmlReference:http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.htmlIBM JDK 1.6Reference:http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/diag/preface/jvm_meaning.html

*The LK component handles locking in the JVM.A lock prevents more than one entity from accessing a shared resource. Each object in Java has an associated lock (gained by using a synchronized block or method). In the case of the JVM, threads compete for various resources in the JVM and locks on Java objects.A monitor is a special kind of locking mechanism that is used in the JVM to allow flexible synchronization between threads. For the purpose of this section, read the terms monitor and lock interchangeably.To avoid having a monitor on every object, the JVM usually uses a flag in a class or method block to indicate that the item is locked. Most of the time, a piece of code will transit some locked section without contention. Therefore, the guardian flag is enough to protect this piece of code. This is called a flat monitor. However, if another thread wants to access some code that is locked, a genuine contention has occurred. The JVM must now create (or inflate) the monitor object to hold the second thread and arrange for a signaling mechanism to coordinate access to the code section. This monitor is now called an inflated monitor.

*The LK component handles locking in the JVM.A lock prevents more than one entity from accessing a shared resource. Each object in Java has an associated lock (gained by using a synchronized block or method). In the case of the JVM, threads compete for various resources in the JVM and locks on Java objects.A monitor is a special kind of locking mechanism that is used in the JVM to allow flexible synchronization between threads. For the purpose of this section, read the terms monitor and lock interchangeably.To avoid having a monitor on every object, the JVM usually uses a flag in a class or method block to indicate that the item is locked. Most of the time, a piece of code will transit some locked section without contention. Therefore, the guardian flag is enough to protect this piece of code. This is called a flat monitor. However, if another thread wants to access some code that is locked, a genuine contention has occurred. The JVM must now create (or inflate) the monitor object to hold the second thread and arrange for a signaling mechanism to coordinate access to the code section. This monitor is now called an inflated monitor.

*****

Java Internals V1.0

Documents

Transcript of Java Internals V1.0