EPM Infrastructure: An Investigation

download EPM Infrastructure: An Investigation

of 48

  • date post

    15-Jul-2015
  • Category

    Technology

  • view

    243
  • download

    4

Embed Size (px)

Transcript of EPM Infrastructure: An Investigation

  • REMINDERCheck in on the

    COLLABORATE mobile app

    EPM Infrastructure: An Investigation

    Prepared by:Alan Ramirezaramirez@ranzal.comInfrastructure EngineerEdgewater Ranzal

    You dont know what you dont know.When EPM is slow, where do you go?Infrastructure Insight and Workflow.

    Session ID#: 10125

    @alanr723

  • 1,700+ Oracle EPM & BI projects successfully delivered since our founding in 1996

    100% Focus on Oracle EPM/OBIEE

    Product Experts across the full EPM/BI Suite -Planning, HFM, HSF, HPCM, Essbase, OBIEE, DRM, FDMEE

    Oracle ACEs Across EPM/BI Platform (Planning/BI, HFM & FDMEE)

    Exalytics Installation, Configuration & Benchmarking Services

    Infrastructure Services Design, Configuration, Performance Tuning, Upgrades & Patching

    Support Services Remote Help Desk, Lev 1 Support, Patch release Support

    Business Analytics Solutions Provider Using Oracle EPM and BI Technologies

    Edgewater Ranzal

  • Presenter InformationAlan Ramirez, Infrastructure Engineer Employed with Ranzal for 3 years Over 11 years of Oracle EPM/Hyperion experience

    Started on Essbase 6.5, Planning 3.5.1, HFM 4.0, Reports 7.x

    Experience: Software development, DBA, Infrastructure, QA/CM

    Adept on all platforms, particular fondness for Linux Red Hat Certified Engineer (RHEL 5)

    Exalytics Certified Specialist; experienced with X2-4, X3-4, X4-4

    Core tenet: Systems approach as a science, not a black box No recurring reboots - strive for stability from understanding

    Uptime is revered, restarts are for evaluation, not resolution

    Deliver quality through stability

    Customer service and documentation

    Business Analytics Solutions Provider Using Oracle EPM and BI Technologies

  • Agenda

    Overview Getting Started Troubleshooting Workflow System Startup Patching Stability Comparing Environments Virtualization Real Life Examples Questions

  • Overview

    Glimpse into the workings of an Infrastructure Engineer that specializes in EPM

    Goals: Exposure to an approach Awareness of various faculties of the product Demonstrate a high level troubleshooting workflow Examples of a simple Infrastructure review

  • Where to start?

    How to get your bearings.

    Put me on any one of your EPM servers, and Ill figure out the rest.

  • Deployment Report

    All servers are connected to a common set of database tables collectively referred to as the EPM Registry

    Survey the entire environment from any EPM server All hostnames and configured products architecture diagram RDBMS flavor, hostname, and connection strings WebLogic configuration History of interaction with EPM Registry

    Clean and simple vs repetition and manual registry changes

    Were web apps (JVMs) redeployed recently?

    Were any other changes made recently to the config?

  • ServersHostnames

    OS

    Specs (can be inaccurate)

    Product distribution

    - Web servers

    - JVMs

    - Services

    - App servers

  • Database:Platform, names, schemas, port number, etc

  • Directories, User Providers, System Accounts

  • EPM Deployment History Report (11.1.2.3)

  • Troubleshooting Workflow

    With the lay of the land, we can start digging in.

  • Logs

    Diagnostics Start with Web Tier

    ORACLE_EPM_INSTANCE/diagnostics/logs/services

    ORACLE_EPM_INSTANCE/diagnostics/logs/starter

    MW_HOME/user_projects/domains/EPMSystem/servers/server/logs

    Services Tier - R&A Services, EPMA (Dimension) Server ORACLE_EPM_INSTANCE/diagnostics/logs/product

    Application Server logs Essbase.log, HsvEventLog.log, Interop.log

    Event Viewer

  • Logs Step 1 - services directory: EPM_INSTANCE_HOME\diagnostics\logs\services

    Directly relates to NT services

    Typically the start of my workflow

    Each svc has sysout and a syserr

    Sysout most useful

    Syserr rarely has timestamps

  • Understanding WebLogic StateWhen WebLogic completes its startup process, it writes out:

    Believe it or not, you doget used to reading these, and will become familiar with what good logs look like, and can quickly evaluate logs and know if things are good or not. Most often I will tail the last 50-100 lines of each log, but not uncommon to quickly browse entire logs looking at patterns.

  • Logs Step 2 - domain logs: MW_HOME\user_projects\domains\EPMSystem\servers\FoundationServices0\logs

    Under the WebLogic domain is a directory for each Managed Server

    Each Server directory contains a logs dir

    More detailed than services logs

    Logs for various sub threads

  • Logs Step 3 main logs directory: EPM_INSTANCE_HOME\diagnostics\logs

    The services tier logs here Reporting & Analysis Agent

    EPMA

    HSF

  • Event Viewer Application Log System Log

    HFM FM Error Log Viewer HsvEventLog.log

    Essbase Essbase.log Appname.log

    Logs Step 4 App Server Logs

  • System Startup

    Startup times and Starter logs

  • Start EPM System

    o Many wrote their own scripts in 11.1.2.1 and earlier (net start, sc, psexec)o 1h 45m for triple redundancy customer with

    62 services prompted me to study and refine. Reduced down to 25m with what became a standard for our team

    o Much improved starting 11.1.2.2o Addl tweaks get 11.1.2.3 up

  • Only created when using built-in scripts Quick confirmation that all services started successfully Analyze Pass column to be sure all are good Review of history can evidence health or even frustration

    Starter logs:EPM_INSTANCE_HOME\diagnostics\logs\starter

  • Patching

    OpatchEPMSystem11R1; oracle_common; odi

  • Patching What version are you on?

    Were on the 502 version of EPM.

    o Each product has its own code line and version numbero 500 patch was a giant patch covering IE 10 support

    - HUB 500 was all products except Essbase suite- Separate patches for Essbase 500, EAS 500, APS 500, etc.

    o Back to individual version numbers per product

    Mar 2014: HUB 11.1.2.3.500

    Essbase 11.1.2.3.500

    Dec 2014: HSS 11.1.2.3.502

    HFM 11.1.2.3.502

    Essbase 11.1.2.3.505

    Nov 2013: HSS 11.1.2.3.001

    HFM 11.1.2.3.100

    Essbase 11.1.2.3.003

  • Patching - Opatch

    A Java-based utility from Oracle that assists with the exercise of applying and rolling back patches to Oracle software

    Multiple Oracle homes, which Opatch directory? EPMSystem11R1 Oracle EPM System products oracle_common ADF/Jdeveloper components odi Oracle Data Integrator (FDMEE) component

  • Patching PSEs vs PSUs PSE: Patch Set Exception is a singular, one-off patch that typically

    addresses a specific issue PSU: Patch Set Update is a collection, or grouping, of PSEs that

    have been regression tested together Do not apply all available PSEs, but instead maintain latest PSUs PSUs are released on an approximately quarterly release schedule

  • Available Patch Sets and Patch Set Updates for EPM Products (Doc ID 1400559.1)

    OBIEE 11g: Bundle Patches (Doc ID 1488475.1)

  • Stability

    StabilityPerformanceExpectations

  • Stability

    How often do you restart services? How about rebooting servers? History

    Consistency of process, logs over time, routines. Evaluate Starter logs

    Some services are susceptible to abuse Financial Reporting Planning web forms, SmartView EAS

    Essbase often dont realize there are issues xcp files Graceful shutdowns check both Essbase and app logs

  • Stability - Planning

    Heap dumps enabled on OutOfMemory condition can show exactly what was going on when the JVM ran out of memory Large/bad webforms SmartView retrieves

    Large hit to JVM if suppression options are disabled

    Query below would have tried to produce > 28 million cells

    Essbase Governor

    QRYGOVEXECTIME

    QRYGOVEXECBLK

    Planning Governor

    ERROR_THRESHOLD_NUM_OF_CELLS=175,000

  • Stability - WebLogic

    STUCK threads? Long running task any task where execution runs longer

    than a predefined (default 10min) threshold Not intelligent Tunable, increase to 20 mins? Need an in-depth understanding of the application

    Causes SmartView retrieves Planning form resultset too large Bad user sessions (Click the x instead of proper logouts) User Behavior: IE Not Responding, Close browser and retry WebLogic Connection Pool too small

  • Comparing Environments:Are they the same?

    Eliminate as many variables as possible.

  • Grading Environments Many Criteria

    Architecture Server Specifications VMware Infrastructure Storage Infrastructure EPM product distribution

    Opatches Web tier

    JVM heap settings Connection pools

    App tier Tuning values Log sizes and rotations

    RDBMS Statistics/Indexes

    Performance Resource dedication (virt. only) Power Plan CPU Storage AV On Access Exclusions Windows TCP/IP tuning

    Networking hosts file, name resolution,

    TCP/IP settings Topology, hops, subnets FQDN

  • Sample Infrastructure ReviewReview Summary of 26 major criteria across all Production EPM servers

    CUSTOMER: American multinational food and beverage company

    Considering correctness, stability, performance, what kind of shape is my EPM environment in?

  • Virtualization

    EPM can be virtualized very successfully when properly understood.

  • Virtualization of Oracle EPM Primary advantage of a typical virtualization strategy is to reduce