EPM Infrastructure: An Investigation

48
REMINDER Check in on the COLLABORATE mobile app EPM Infrastructure: An Investigation Prepared by: Alan Ramirez [email protected] Infrastructure Engineer Edgewater Ranzal You don’t know what you don’t know. When EPM is slow, where do you go? Infrastructure Insight and Workflow. Session ID#: 10125 @alanr723

Transcript of EPM Infrastructure: An Investigation

Page 1: EPM Infrastructure: An Investigation

REMINDERCheck in on the

COLLABORATE mobile app

EPM Infrastructure: An Investigation

Prepared by:Alan [email protected] EngineerEdgewater Ranzal

You don’t know what you don’t know.When EPM is slow, where do you go?Infrastructure Insight and Workflow.

Session ID#: 10125

@alanr723

Page 2: EPM Infrastructure: An Investigation

1,700+ Oracle EPM & BI projects successfully delivered since our founding in 1996

100% Focus on Oracle EPM/OBIEE

Product Experts across the full EPM/BI Suite -Planning, HFM, HSF, HPCM, Essbase, OBIEE, DRM, FDMEE

Oracle ACEs Across EPM/BI Platform (Planning/BI, HFM & FDMEE)

Exalytics – Installation, Configuration & Benchmarking Services

Infrastructure Services – Design, Configuration, Performance Tuning, Upgrades & Patching

Support Services – Remote Help Desk, Lev 1 Support, Patch release Support

Business Analytics Solutions Provider Using Oracle EPM and BI Technologies

Edgewater Ranzal

Page 3: EPM Infrastructure: An Investigation

Presenter InformationAlan Ramirez, Infrastructure Engineer■ Employed with Ranzal for 3 years■ Over 11 years of Oracle EPM/Hyperion experience

▪ Started on Essbase 6.5, Planning 3.5.1, HFM 4.0, Reports 7.x

▪ Experience: Software development, DBA, Infrastructure, QA/CM

■ Adept on all platforms, particular fondness for Linux▪ Red Hat Certified Engineer (RHEL 5)

▪ Exalytics Certified Specialist; experienced with X2-4, X3-4, X4-4

■ Core tenet: Systems approach as a science, not a black box▪ No recurring reboots - strive for stability from understanding

▪ Uptime is revered, restarts are for evaluation, not resolution

▪ Deliver quality through stability

▪ Customer service and documentation

Business Analytics Solutions Provider Using Oracle EPM and BI Technologies

Page 4: EPM Infrastructure: An Investigation

Agenda

■ Overview■ Getting Started■ Troubleshooting Workflow■ System Startup■ Patching■ Stability■ Comparing Environments■ Virtualization■ Real Life Examples■ Questions

Page 5: EPM Infrastructure: An Investigation

Overview

■ Glimpse into the workings of an Infrastructure Engineer that specializes in EPM

■ Goals:▪ Exposure to an approach▪ Awareness of various faculties of the product▪ Demonstrate a high level troubleshooting workflow ▪ Examples of a simple Infrastructure review

Page 6: EPM Infrastructure: An Investigation

Where to start?

How to get your bearings.

Put me on any one of your EPM servers, and I’ll figure out the rest.

Page 7: EPM Infrastructure: An Investigation

Deployment Report

■ All servers are connected to a common set of database tables collectively referred to as the EPM Registry

■ Survey the entire environment from any EPM server▪ All hostnames and configured products – architecture diagram▪ RDBMS flavor, hostname, and connection strings▪ WebLogic configuration▪ History of interaction with EPM Registry

— Clean and simple vs repetition and manual registry changes

— Were web apps (JVMs) redeployed recently?

— Were any other changes made recently to the config?

Page 8: EPM Infrastructure: An Investigation

ServersHostnames

OS

Specs (can be inaccurate)

Product distribution

- Web servers

- JVMs

- Services

- App servers

Page 9: EPM Infrastructure: An Investigation

Database:Platform, names, schemas, port number, etc

Page 10: EPM Infrastructure: An Investigation

Directories, User Providers, System Accounts

Page 11: EPM Infrastructure: An Investigation

EPM Deployment History Report (11.1.2.3)

Page 12: EPM Infrastructure: An Investigation

Troubleshooting Workflow

With the lay of the land, we can start digging in.

Page 13: EPM Infrastructure: An Investigation

Logs

■ Diagnostics■ Start with Web Tier

▪ ORACLE_EPM_INSTANCE/diagnostics/logs/services

▪ ORACLE_EPM_INSTANCE/diagnostics/logs/starter

▪ MW_HOME/user_projects/domains/EPMSystem/servers/server/logs

■ Services Tier - R&A Services, EPMA (Dimension) Server▪ ORACLE_EPM_INSTANCE/diagnostics/logs/product

■ Application Server logs▪ Essbase.log, HsvEventLog.log, Interop.log

■ Event Viewer

Page 14: EPM Infrastructure: An Investigation

Logs Step 1 - services directory: EPM_INSTANCE_HOME\diagnostics\logs\services

■ Directly relates to NT services

■ Typically the start of my workflow

■ Each svc has sysout and a syserr

■ Sysout most useful

■ Syserr rarely has timestamps

Page 15: EPM Infrastructure: An Investigation

Understanding WebLogic StateWhen WebLogic completes it’s startup process, it writes out:<Notice> <WebLogicServer> <BEA-000360> <Server started in RUNNING mode>

Believe it or not, you doget used to reading these, and will become familiar with what good logs look like, and can quickly evaluate logs and know if things are good or not. Most often I will tail the last 50-100 lines of each log, but not uncommon to quickly browse entire logs looking at patterns.

Page 16: EPM Infrastructure: An Investigation

Logs Step 2 - domain logs: MW_HOME\user_projects\domains\EPMSystem\servers\FoundationServices0\logs

■ Under the WebLogic domain is a directory for each Managed Server

■ Each Server directory contains a logs dir

■ More detailed than services logs

■ Logs for various sub threads

Page 17: EPM Infrastructure: An Investigation

Logs Step 3 – main logs directory: EPM_INSTANCE_HOME\diagnostics\logs

■ The services tier logs here▪ Reporting & Analysis Agent

▪ EPMA

▪ HSF

Page 18: EPM Infrastructure: An Investigation

■ Event Viewer▪ Application Log▪ System Log

■ HFM▪ FM Error Log Viewer▪ HsvEventLog.log

■ Essbase▪ Essbase.log▪ Appname.log

Logs Step 4 – App Server Logs

Page 19: EPM Infrastructure: An Investigation

System Startup

Startup times and Starter logs

Page 20: EPM Infrastructure: An Investigation

Start EPM System

o Many wrote their own scripts in 11.1.2.1 and earlier (net start, sc, psexec)o 1h 45m for triple redundancy customer with

62 services prompted me to study and refine. Reduced down to 25m with what became a standard for our team

o Much improved starting 11.1.2.2o Add’l tweaks get 11.1.2.3 up <2mins

■ 11.1.2.2▪ Parallel startup

▪ No dependencies

▪ Startup type: Automatic or Manual is fine

▪ Typical 8-15mins

■ 11.1.2.3 & 11.1.2.4▪ Same as 11.1.2.2, but

faster

▪ Typical 2-7mins

■ 11.1.2.1▪ Sequential due to

dependencies

▪ Startup type: Manual

▪ Single-threaded startup

▪ Typical 20-30mins

Page 21: EPM Infrastructure: An Investigation

■ Only created when using built-in scripts■ Quick confirmation that all services started successfully■ Analyze Pass column to be sure all are good■ Review of history can evidence health or even frustration

Starter logs:EPM_INSTANCE_HOME\diagnostics\logs\starter

Page 22: EPM Infrastructure: An Investigation

Patching

OpatchEPMSystem11R1; oracle_common; odi

Page 23: EPM Infrastructure: An Investigation

Patching – What version are you on?

“We’re on the ‘502’ version of EPM.”

o Each product has it’s own code line and version numbero 500 patch was a giant patch covering IE 10 support

- HUB 500 was all products except Essbase suite- Separate patches for Essbase 500, EAS 500, APS 500, etc.

o Back to individual version numbers per product

■ Mar 2014:▪ HUB 11.1.2.3.500

▪ Essbase 11.1.2.3.500

■ Dec 2014:▪ HSS 11.1.2.3.502

▪ HFM 11.1.2.3.502

▪ Essbase 11.1.2.3.505

■ Nov 2013:▪ HSS 11.1.2.3.001

▪ HFM 11.1.2.3.100

▪ Essbase 11.1.2.3.003

Page 24: EPM Infrastructure: An Investigation
Page 25: EPM Infrastructure: An Investigation
Page 26: EPM Infrastructure: An Investigation

Patching - Opatch

■ A Java-based utility from Oracle that assists with the exercise of applying and rolling back patches to Oracle software

■ Multiple Oracle homes, which Opatch directory?▪ EPMSystem11R1 – Oracle EPM System products▪ oracle_common – ADF/Jdeveloper components▪ odi – Oracle Data Integrator (FDMEE) component

Page 27: EPM Infrastructure: An Investigation

Patching – PSEs vs PSUs■ PSE: Patch Set Exception is a singular, one-off patch that typically

addresses a specific issue■ PSU: Patch Set Update is a collection, or grouping, of PSEs that

have been regression tested together■ Do not apply all available PSEs, but instead maintain latest PSUs■ PSUs are released on an approximately quarterly release schedule

Page 28: EPM Infrastructure: An Investigation

Available Patch Sets and Patch Set Updates for EPM Products (Doc ID 1400559.1)

OBIEE 11g: Bundle Patches (Doc ID 1488475.1)

Page 29: EPM Infrastructure: An Investigation

Stability

StabilityPerformanceExpectations

Page 30: EPM Infrastructure: An Investigation

Stability

■ How often do you restart services?■ How about rebooting servers?■ History

▪ Consistency of process, logs over time, routines….▪ Evaluate Starter logs

■ Some services are susceptible to abuse▪ Financial Reporting▪ Planning – web forms, SmartView▪ EAS

■ Essbase – often don’t realize there are issues▪ xcp files▪ Graceful shutdowns – check both Essbase and app logs

Page 31: EPM Infrastructure: An Investigation

Stability - Planning

■ Heap dumps enabled on OutOfMemory condition can show exactly what was going on when the JVM ran out of memory▪ Large/bad webforms▪ SmartView retrieves

— Large hit to JVM if suppression options are disabled

— Query below would have tried to produce > 28 million cells

■ Essbase Governor

▪ QRYGOVEXECTIME

▪ QRYGOVEXECBLK

■ Planning Governor

▪ ERROR_THRESHOLD_NUM_OF_CELLS=175,000

Page 32: EPM Infrastructure: An Investigation

Stability - WebLogic

■ STUCK threads?■ Long running task – any task where execution runs longer

than a predefined (default 10min) threshold▪ Not intelligent▪ Tunable, increase to 20 mins?▪ Need an in-depth understanding of the application

■ Causes▪ SmartView retrieves▪ Planning form resultset too large▪ Bad user sessions (Click the ‘x’ instead of proper logouts)▪ User Behavior: IE “Not Responding”, Close browser and retry▪ WebLogic Connection Pool too small

Page 33: EPM Infrastructure: An Investigation

Comparing Environments:Are they the same?

Eliminate as many variables as possible.

Page 34: EPM Infrastructure: An Investigation

Grading Environments – Many Criteria

■ Architecture▪ Server Specifications▪ VMware Infrastructure▪ Storage Infrastructure▪ EPM product distribution

■ Opatches■ Web tier

▪ JVM heap settings▪ Connection pools

■ App tier▪ Tuning values▪ Log sizes and rotations

■ RDBMS▪ Statistics/Indexes

■ Performance▪ Resource dedication (virt. only)▪ Power Plan▪ CPU▪ Storage▪ AV On Access Exclusions▪ Windows TCP/IP tuning

■ Networking▪ hosts file, name resolution,

TCP/IP settings▪ Topology, hops, subnets▪ FQDN

Page 35: EPM Infrastructure: An Investigation

Sample Infrastructure ReviewReview Summary of 26 major criteria across all Production EPM servers

CUSTOMER: American multinational food and beverage company

Considering correctness, stability, performance, what kind of shape is my EPM environment in?

Page 36: EPM Infrastructure: An Investigation

Virtualization

EPM can be virtualized very successfully when properly understood.

Page 37: EPM Infrastructure: An Investigation

Virtualization of Oracle EPM■ Primary advantage of a typical virtualization strategy is to reduce

capital and operating costs via server consolidation▪ Obtain greater densities w small/med servers (2-4 vCPUs, 4GB)▪ Common to see 20-25 active machines on a single host▪ Medium sized host: 16 cores, 64GB memory

■ Heavy footprint of EPM does not permit anywhere near the same degree of server consolidation

■ Reserve 100% of resources to achieve a 1:1 ratio physical to virtual■ Highly sensitive to even low latency■ Does NOT respond well in environments that are oversubscribed

▪ Overcommittment▪ Ballooning▪ Compression

Page 38: EPM Infrastructure: An Investigation

Real Life Examples

Each environment is unique and presents a new set of challenges.

■ Proactive DBA Killing Pools■ Profile Limits Essbase■ Factory BIOS Config■ Teaming NICs

Page 39: EPM Infrastructure: An Investigation

Story #1 – Proactive DBA

Customer: Medical Center for Private Research University

Issue:■ 12 hours to load and consolidate May data■ Repeatedly restarting EPM because don’t know what else to do■ No idea how to approach. Network! Storage? Hard drives! Oh my!

Page 40: EPM Infrastructure: An Investigation

Story #1 – Proactive DBA

■ Analyzed 6 days of logs across 11 WebLogic JVMs in 2 environments

■ All WLS connection pools drop simultaneously every 5 hours

Page 41: EPM Infrastructure: An Investigation

Story #1 – Proactive DBA in the way■ Root Cause:

▪ 6-8 months prior, connections did not appear to be properly closed when EPM System was stopped

▪ Frequent restarts as connections continue to grow▪ As a result, the DBA implemented a connection cleanup

routine to kill idle sessions▪ This routine was prematurely terminating valid database

pool connections held by the application servers

Page 42: EPM Infrastructure: An Investigation

Story #2 – Can’t connect to Essbase

Customer: Global Satellite Services Provider

ISSUE:■ EssbaseCluster-1 could not be expanded in EAS■ All Essbase applications could not be started, only some

▪ Error 1013000 loading application: Serious Error(1013000)▪ Unable to Create Request Server Thread

■ They had tried restarting services, EAS, Essbase, etc■ Cannot start additional Essbase applications

But then,■ I stopped two apps, and was able to start one of the apps that

didn’t previously start – suggestive of resource limits

Page 43: EPM Infrastructure: An Investigation

Story #2 – Can’t connect to Essbase

■ User profile settings too restrictive (Linux security: limits.conf)■ Essbase server cannot create additional processes

▪ Not possible to start additional applications▪ Cannot open additional connections from EAS to Essbase

BEFORE

AFTER

Page 44: EPM Infrastructure: An Investigation

Story #3 – Intel SpeedStep

Customer: American multinational financial services corporation

■ Two Exalytics servers: PROD is much slower■ Studied network, storage throughput tests, evaluated I/O

■ Cannot find anything, until I decided to check core count■ cat /proc/cpuinfo

▪ Noticed one degraded CPU frequency▪ Rechecked and it was fine, rechecked again to find lower speeds▪ Enter SpeedStep: power saving via stepping down clock speed

■ Resolution: Disable SpeedStep in BIOS

Page 45: EPM Infrastructure: An Investigation

Story #4 – WebLogic Won’t Start

Customer: Travel Technology company

■ No managed WebLogic servers would start

■ Admin Server would not start

■ WebLogic logs showed trying to listen on a certain IP address, but that IP address no longer exists

■ The IP address was that of the backup network

■ Disabling that NIC allowed WLS to start

■ Further research determined that HP (hosting provider) had teamed the NICs

Page 46: EPM Infrastructure: An Investigation

Questions?

If there are no questions or comments, then I didn’t do my job today.

Page 47: EPM Infrastructure: An Investigation

Contact Information

Edgewater Ranzal108 Corporate Park Drive, Suite 105White Plains, NY 10604Tel (914) 253-6600Email: [email protected]

Company ContactRobin Ranzal Knowles, President

Alan RamirezInfrastructureEdgewater Ranzal

[email protected]@alanr723

Thank you for attending!

Page 48: EPM Infrastructure: An Investigation

Please complete the session evaluationWe appreciate your feedback and insight

You may complete the session evaluation either on paper or online via the mobile app