Case 3 inspecting cause of failure - a car maker plm

5

Click here to load reader

description

onTune Case Study How customer use onTune in their complex environment. How they can take a benefit from onTune

Transcript of Case 3 inspecting cause of failure - a car maker plm

Page 1: Case 3   inspecting cause of failure - a car maker plm

© Copyright TeemStone Corporation 2014

Easiest way to manage your critical systems

Case 3 – Inspecting Cause of Failure

A Car Maker PLM

Page 2: Case 3   inspecting cause of failure - a car maker plm

1

Inspecting periodic PLM service HANG

PLM services had suspended for several minutes at 13:26 everyday.

Only occurred on PLM DB#2. Its active-active type mutual backup server PLM DB#1 is OK.

3 Oracle Java processes requested to allocate big memory of 3GB at a certain time 13:26

Large number of paging out caused very high CPU IO WAIT, and the server went into HANG temporarily.

Issue

Cause

Inspecting CPU I/O Wait, Paging Out, Memory usage, that may show the cause of problem.

Detail inspecting memory usage by process in second

Inspecting

Name User PID PPID Argument Max Max Time

java oracle 13718 13712 /ora_crs/app/grid/jdk/jre/bin/IA64W/java -DORACLE_HOME=/ora_crs 1027 2013-08-27 13:28

java oracle 13704 13396 /ora_crs/app/grid/jdk/bin/IA64W/java -classpath /ora_crs/app/gr 1028 2013-08-27 13:28

java oracle 13678 13676 /ora_engine/app/oracle/product/11.2/jdk/jre/bin/IA64N/java -cp 1054 2013-08-27 13:27

The process that may cause problem

Memory Usage

Page 3: Case 3   inspecting cause of failure - a car maker plm

2

Inspecting periodic PLM service HANG

CPUIO Wait

PagingOut

Memory Usage

13:15 ~ 13:42 Aug. 27 – PLM DB#2 (Abnormal State)

At 13:26, CPU usage, Paging Out Memory Usage graphs show abnormal peaks.

Page 4: Case 3   inspecting cause of failure - a car maker plm

3

Inspecting periodic PLM service HANG

CPUIO Wait

PagingOut

Memory Usage

13:15 ~ 13:42 Aug. 27 – PLM DB#1 (Healthy state)

Its mutual backup server status is well.

Page 5: Case 3   inspecting cause of failure - a car maker plm

4

Inspecting periodic PLM service HANG

12:52 ~ 13:32 Aug. 28 – Memory usage graph.

In this time period EM monitoring Agent that caused the problem was stopped.

Memory Usage

After stopping EM Agent on PLM DB#2, memory usage became normal state.