Data Federation Service Sizing and Tuning Companion Guide

16
SAP BusinessObjects Business Intelligence platform 4 Document Version: - 2013-12-03 Data Federation Service Sizing and Tuning Companion Guide

description

SAP Data federation sizing guide

Transcript of Data Federation Service Sizing and Tuning Companion Guide

  • SAP BusinessObjects Business Intelligence platform 4Document Version: - 2013-12-03

    Data Federation Service Sizing and Tuning Companion Guide

  • Table of Contents1 Introduction to sizing and tuning recommendations for the data federation service. . . . . . . . . . . .3

    2 Configuration and methodology used for recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    3 Factors to be considered for performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

    4 Memory-consuming versus non-memory-consuming queries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    5 Efficiency of the reporting database and its network speed: recommendations. . . . . . . . . . . . . . . 7

    6 Query complexity: recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    7 Memory size: recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    8 Disk size: recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

    9 CPU size: recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    10 Tuning parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Table of Contents

  • 1 Introduction to sizing and tuning recommendations for the data federation serviceThis document provides complementary sizing and tuning information if you are using multisource-enabled universes in your SAP BusinessObjects Business Intelligence 4 (BI 4) platform deployment.For information on sizing the BI 4 suite of services, see the latest version of the SAP BusinessObjects BI 4 Sizing Guide at http://www.sap.com/bisizing .You will also need to refer to the Data Federation Administration Tool Guide on the SAP Help Portal.

    Disclaimer

    This document demonstrates how someone might perform sizing and tuning of the data federation service for a BI4 system. The methodology and recommendations offered here are examples of the tasks and thinking involved. The performance and functioning of an actual system may vary for many reasons. The examples offered here should not be considered a guarantee of success of a particular deployment. The software behavior and characteristics described in this document are subject to change in future versions without prior notice.This document is not part of the official SAP product documentation and is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.

    Data Federation Service Sizing and Tuning Companion GuideIntroduction to sizing and tuning recommendations for the data federation service 2013 SAP AG or an SAP affiliate company. All rights reserved. 3

  • 2 Configuration and methodology used for recommendationsConfiguration

    In the Central Management Console (CMC), the Data Federation Service is replaced by a new server that is separate from the default Adaptive Processing Server. This is done by: Stopping the default Adaptive Processing Server (whose kind is Adaptive Processing Server, and whose

    name is .AdaptiveProcessingServer). Removing the Data Federation Service from the default Adaptive Processing Server. Creating a new, empty server:

    The category of this new server is Data Federation Services. The only service added to this server is Data Federation Service. The name given to this server is .Data Federation Server.

    From here on, the term Data Federation Server is used to refer to this new server.The resulting configuration has the following services (all other services are stopped in the CMC): Central Management Server Input File Repository Output File Repository Data Federation Service as created aboveSee the related topic on tuning parameters for the applied parameter values.

    Methodology

    Sizing is obtained by injecting SQL queries directly into the data federation query server through a direct internal connection. In BI 4 platform environments, this internal connection is used by data federation clients through Corba.The load simulation system allows multiple queries to be injected simultaneously, hence simulating a load on the Data Federation Server in many different fashions with respect to number of injection points, connection and pooling policies, query result processing and validation. Response times and system measures are recorded for many different loads.Sizing is done with internal data federation test connectivities, so the Connection Server sizing is not covered here and does not interfere with the sizing done for data federation. However, based on measurements done on test samples, the impact of the Connection Server is negligible.

    Related Information

    Tuning parameters [page 14]

    4 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Configuration and methodology used for recommendations

  • 3 Factors to be considered for performanceThe factors that impact the performance of the query server in the Data Federation Server are the following, in decreasing order of importance: Efficiency of the reporting database and its network speed Query complexity Memory size Disk speed CPU speedSizing and tuning recommendations are given for each factor.

    Data Federation Service Sizing and Tuning Companion GuideFactors to be considered for performance 2013 SAP AG or an SAP affiliate company. All rights reserved. 5

  • 4 Memory-consuming versus non-memory-consuming queriesSome of the sizing and tuning recommendations depend on the type of query. Queries can be memory-consuming or non-memory-consuming: Memory-consuming queries contain joins that are executed by the data federation service. Such joins can be

    found in the Query Plan tab of the Data Federation Administration Tool.Examples of memory-consuming queries: Queries where actual federation is done, with several sources providing data. Queries on an SAP NetWeaver BW data source.

    Non memory-consuming queries are any queries that are not memory-consuming. Non-memory-consuming queries do not consume resources other than network bandwidth.Examples of memory-consuming queries: Queries on one source only and where all optimization techniques can be pushed to the source

    databases. Table scans.

    6 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Memory-consuming versus non-memory-consuming queries

  • 5 Efficiency of the reporting database and its network speed: recommendationsThe Data Federation Server will transfer potentially large chunks of data from the data sources, so the following recommendations should be taken into account: Install the Data Federation Server on network nodes that are as close as possible to the data sources. Use the fastest possible networks between the Data Federation Server and the data source. A network speed

    of 1 GB/sec or more is the recommended value. If possible, tune the source databases, keeping in mind that:

    The Data Federation Server does not update the data sources. Large chunks of data may be transferred at once.

    Data Federation Service Sizing and Tuning Companion GuideEfficiency of the reporting database and its network speed: recommendations 2013 SAP AG or an SAP affiliate company. All rights reserved. 7

  • 6 Query complexity: recommendationsQuery optimization parameters

    Queries are optimized by the Data Federation Server. The following classes of parameters have an impact on the optimization techniques: Parameters that control semi-join optimizations Parameters that control merge-join or order-based optimizationsSee the Data Federation Administration Tool Guide for a detailed description of the optimization techniques that are implemented with these classes of parameters.

    Source database capabilities

    The Data Federation Server optimization takes the source database capabilities into account when computing the optimal query plan, in order to push as many operations as possible to the source databases. Default capabilities are provided and have been verified for functional correctness. Some capabilities may be adjusted to improve performance.

    Statistics

    The Data Federation Server optimizer computes better query plans when site-dependent details are known. Statistics must be generated for the query plan to take advantage of them in the computation of the optimal query plan. Refer to the Data Federator Administration Tool guide on how to generate or refresh statistics.It is suggested to refresh the statistics used by the Data Federation Server as often as those of the source databases.

    8 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Query complexity: recommendations

  • 7 Memory size: recommendationsNon-memory-consuming queries should not be considered for sizing the memory usage of the Data Federation Server. If the Data Federation Server is going to process non-memory-consuming only, it may be left within the default Adaptive Processing Server. See the related topic for a definition of non-memory consuming queries

    Coarse-grain evaluation

    If you have no information about the types of the queries to be processed by the Data Federation Server, use the following guidelines: In the Central Management Console, create a Data Federation Server that is separate from the default

    Adaptive Processing Server. Reserve at least 512Mb of RAM for the Data Federation Service. This amount is required for the Data

    Federation Server to start. The Data Federation Server should be given as much memory as possible on the hardware system where it is

    installed. If the Data Federation Server is alone on this system, the Data Federation Server should be given no more that 75% of the memory in order for the JVM that is executing the Data Federation Server to run smoothly.

    Detailed evaluation

    The Data Federation Server has an internal paging mechanism that allows execution to continue by allocating disk-space even if all memory space in the JVM heap has been used. When disk-space is used, the execution slows down, but is not blocked. Use the following procedure to compute the memory size to be used for best performance.1. Define the number of queries that you want to run concurrently on the system. See the related topic on CPU

    size for suggestions.2. For the most frequently executed queries, estimate the memory size Mem (Q) with the following formula:

    This estimate considers a size for every operator that makes up the query that is the product of the following two numbers: The largest number of rows returned among the number of rows returned by each subquery. This

    estimate is conservative because filtering or reduction may be done in the optimization process. Consider this estimate a worst-case scenario.

    An equivalent row size that is the sum of the row sizes of all subqueries, since, at worst, all the attributes of all subqueries must be kept for processing to be done.

    Data Federation Service Sizing and Tuning Companion GuideMemory size: recommendations 2013 SAP AG or an SAP affiliate company. All rights reserved. 9

  • NoteIn many cases, this estimate is extremely conservative because it does not take into account the following factors: The actual number of memory-consuming operators can be less than the maximum. Some optimization may be applied to drastically reduce the number of rows that is passed from

    operator to operator (for example, filtering or semi-join). The optimizer may arrange the ordering of joins.In some rare instances, this estimate is optimistic because it makes no assumption about the nature of the operators. Some operators may have very large memory footprints (for example, unions and Cartesian products).

    Following is an example of calculating the memory size Mem (Q) using Query #10 of the TPC-H standard benchmark. This query can be executed with tables from 4 different datasources. On a 2.5 GB TPC-H database, the values for the 4 generated subqueries are:Subquery Number of rows Row size (bytes)1 25 2172 300,000 8603 114,544 2044 1,292,156 215

    With MAX_CONCURRENT_MEMORY_CONSUMING_OPERATORS = 3:Mem(Q10) = 3 x 1292156 x 1496 = 5.4 GB

    NoteBy experiment, this query only requires 380 MB because of the filtering that can be done by the optimizer.

    3. Multiply the memory size found in step 2 by the number of concurrent queries defined in step 1.This size must fit into the fraction of the JVM memory part reserved for query execution (default is 80% of the entire JVM memory).

    4. Add the 512 MB minimal requirement for the Data Federation Server to start. This gives the amount of memory for the Data Federation Server.

    The hardware that runs the Data Federation Server should be able to run the Data Federation Service JVM without paging the JVM process (an additional 30% RAM space is suggested on top of the JVM minimum memory size to have ample space for the JVM and the OS to co-exist)

    Related Information

    Memory-consuming versus non-memory-consuming queries [page 6]CPU size: recommendations [page 12]

    10 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Memory size: recommendations

  • 8 Disk size: recommendationsDisks are used in the 2 following cases. In all other cases, disk sizing is not relevant for the Data Federation Service. Paging mechanism: As seen in the recommendations for memory size, the Data Federation Server has an

    internal paging mechanism that allows execution to continue by allocating disk-space even if all memory space in the JVM heap has been used. The memory size estimation formula can be used to predict the size that is needed on disk to page the execution of a query.

    Store nodes: The optimizer may generate query execution plans with a potentially large number of operators. As seen in the example in the recommendations for memory size, memory consumption is highly dependent on the number of operators. To limit the memory needs, the number of operators is limited to a predefined value.When this value is reached, store nodes are introduced in the plan to use disks for storing intermediate results. These intermediate results are subparts of the data computed in the memory size estimation formula.

    NoteThe use of store nodes is not reflected in the diskUsed metric that is presented in the query monotoring part of the Data Federation Administration Tool. This metric only considers disk space that may be used because of the JVM being too small.

    Related Information

    Memory size: recommendations [page 9]

    Data Federation Service Sizing and Tuning Companion GuideDisk size: recommendations 2013 SAP AG or an SAP affiliate company. All rights reserved. 11

  • 9 CPU size: recommendationsCoarse-grain evaluation

    A suggested starting value is to have the number of concurrent memory-consuming queries equal to the number of CPUs that are available for data federation. This can be all the CPU cores in a machine if the machine is dedicated to data federation or a subset of all the CPU cores if the machine is also used for other tasks.

    Detailed evaluation

    The coarse-grain evaluation implies that all the CPUs available for data federation will be fully used. This assumption can be refined if the type of queries is known.The execution of a query over time by the query server can be roughly divided into two parts of varying size and resource consumption: a data transfer part and a processing part.The optimizer pushes as much optimization as possible to the data sources, thus reducing network traffic, and RAM and CPU load on the Data Federation Server system. Thus, queries can be divided into non-memory-consuming queries, and two types of memory-consuming queries. For a definition of memory-consuming versus non-memory consuming queries, see the related topic. Non-memory-consuming queries do not consume any resources. There is no processing part that consumes

    CPU, except basic SQL processing. For instance, very complex SQL functions may still need to be processed. Some memory-consuming query plans have a requirement to have all the data from a source available before

    processing starts, in which case: The data transfer part is wait time and does not consume any CPU resource. This depends on network

    and reporting database capabilities and speed. In a networked environment, it is strongly suggested to put the Data Federation Server geographically close to the reporting database to minimize this time.

    The processing part can use an entire CPU if available. Some memory-consuming queries can process data as it is comes into the Data Federation Server, in which

    case: The data transfer part can use an entire CPU if available. The processing part can use an entire CPU if available.

    For example, each of the TPC-H queries, when run alone, uses an average of 20% to 80% of one CPU.Averaging over all the TPC-H queries, run alone, uses 30% of one CPU.The scalability of the Data Federation Server is such that if a query uses 40% of one CPU (that means that the data transfer part does not consume any CPU and lasts 60% of the query response time), if the number of concurrent memory-consuming queries is set to the number of CPUs, all concurrent queries executed simultaneously will consume 40% of the entire system, provided that enough RAM is available at all times and no query needs to go to disk.In this case, it is possible to further increase the number of concurrent memory-consuming queries to a higher value. Suggested values are between 1.2 and 1.5 times the number of CPUs available in the system for data federation, provided that the memory sizing can follow.

    12 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    CPU size: recommendations

  • NoteThis will increase the throughput (number of queries per unit of time), and CPU usage, but not the fastest response time (more concurrency).

    Related Information

    Memory-consuming versus non-memory-consuming queries [page 6]

    Data Federation Service Sizing and Tuning Companion GuideCPU size: recommendations 2013 SAP AG or an SAP affiliate company. All rights reserved. 13

  • 10 Tuning parametersAdjust the following parameters to control memory usage of the Data Federation Server.

    Operating system tuning

    CPU and memory tuning: Since the Data Federation Server is Java code executing within a JVM, the operating system should be tuned for the best JVM performance.

    Disk tuning: The fastest possible disk mechanism (striped disks, SAN) should be used to get the best performance. The location of the temporary disk space is within the BI platform installation folder. If possible, the file-system should be tuned keeping the following factors in mind: Operations are Write + Read-after-write (no file update). Objects are short-living (only temporary files). Most I/O operations are Java I/O buffered operations.

    Central Management Console parameters

    For all queries (memory-consuming and non-memory-consuming queries), set the Thread Pool Size of the Data Federation Service to the number of queries that is expected to run at the same time.

    When the Data Federation Service is processing memory-consuming queries, set the JVM size to the value that has been defined during sizing. This is done by adjusting the Xmx and Xms parameters on the command line of the service.

    Data Federation Service Parameters

    The following parameters can be adjusted in the Data Federator Administration Tool: MAX_CONCURRENT_MEMORY_CONSUMING_QUERIES: Maximum number of memory-consuming queries

    that are running simultaneously at any time in the query server. Additional queries are queued and will be executed on a first come, first served basis. This parameter has a high impact on memory and CPU usage.

    MAX_CONCURRENT_MEMORY_CONSUMING_OPERATORS: Maximum number of operators that are allowed for executing a query. If more operators are required, a store node is included in the query plan to split the query into areas where the number of operators is not exceeded. This parameter has a high impact on memory and CPU usage.

    EXECUTOR_BUFFER_OVERHEAD: Correction factor that is applied to the buffer size estimation (when retrieving data, buffers can be larger than the anticipated size). When sizing, this correction is applied to the Rowsize value.

    EXECUTOR_TOTAL_MEMORY: Fraction of the JVM that is used for executing the queries. EXECUTOR_STATIC_MEMORY: Fraction of the JVM that is reserved statically within the fraction of the JVM

    that is used for executing the queries. This guarantees a minimal space for each concurrent query to start.

    14 2013 SAP AG or an SAP affiliate company. All rights reserved.Data Federation Service Sizing and Tuning Companion Guide

    Tuning parameters

  • Note that this space is divided by the maximum number of memory-consuming queries, so if errors are seen at query start with a high number of memory-consuming queries, this number should be increased.

    Data Federation Service Sizing and Tuning Companion GuideTuning parameters 2013 SAP AG or an SAP affiliate company. All rights reserved. 15

  • www.sap.com/contactsap

    2013 SAP AG or an SAP affiliate company. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice.Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary.These materials are provided by SAP AG and its affiliated companies ("SAP Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.Please see http://www.sap.com/corporate-en/legal/copyright/index.epx for additional trademark information and notices.

    Data Federation Service Sizing and Tuning Companion GuideTable of Contents1 Introduction to sizing and tuning recommendations for the data federation service2 Configuration and methodology used for recommendations3 Factors to be considered for performance4 Memory-consuming versus non-memory-consuming queries5 Efficiency of the reporting database and its network speed: recommendations6 Query complexity: recommendations7 Memory size: recommendations8 Disk size: recommendations9 CPU size: recommendations10 Tuning parametersCopyright and Trademarks